Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chatty Devices

778 views

Published on

Does mouse and monitor soon become a thing of the past? Will we communicate with devices in an augmented reality instead? Will devices still chat with us and not much more about us? Will they not rather know automatically what to do autonomously? Sascha does not just discuss these questions. He also shows how modern interaction and multimodal user interfaces can be integrated into your own connected applications using technologies such as Amazon Echo, Google Home and Microsoft Cortana.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Chatty Devices

  1. 1. Source: Dark Star, 1974 Chatty Devices Does mouse and monitor soon become a thing of the past? Sascha Wolter |@saschawolter | wolter.biz Mai 2017
  2. 2. Learned or Innate Source: Star Trek IV (20th Century Fox), 1986
  3. 3. Our Oldest Interface • The deeply instinctive nature of speech presents specific constraints and new challenges. Our brains are fundamentally wired to interpret the source of speech as human. […] Thus, a device that speaks to us is tapping into a deep river of psychological adaptations, and subject to a set of assumptions a pixel-based UI will never encounter. (Cheryl Platz, 2017, https://medium.com/microsoft- design/voice-user-interface-design-new-solutions-to-old- problems-baa36a64b3e4#.zc46diybh) • There is no consensus on the ultimate origin or age of human language (human language could be 40,000 years old or much older). Source: https://en.wikipedia.org/wiki/File:Real-time_MRI_-_Speaking_(English).ogv
  4. 4. Conversational User Experience Amazon Echo: Alexa… Google Home: Okay Google… LingLong DingDong : DingDong DingDong … 8 Million sold by the end of 2016 https://www.digitalcommerce360.com/2017/01/23/amazons-us-echo-sales-top-8-million/ Harman Kardon Invoke / Microsoft Home Hub: Hey Cortana
  5. 5. Amazon Echo: Alexa… 8 Million sold by the end of 2016 https://www.digitalcommerce360.com/2017/01/23/amazons-us-echo-sales-top-8-million/
  6. 6. Source: Mattel's Barbie Hello Dreamhouse Connected Devices (Internet of Things)
  7. 7. I know so much …about you Video: My Friend Cayla 2014
  8. 8. Source: MGM Child's Play (The Lakeshore Strangler), Vivid My friend Cayla Big Brother Award Internet of Uncanny Things German Federal Network Agency says, any toy capable of transmitting signals and recording images or sound without detection is banned. (https://t.co/R7UCmI9aj9)
  9. 9. Conversation Experience and Voice Intersection of text-based conversational user interfaces and voice user interfaces. Conversational Experience Voice User Interface (VUI) More than Command and Control Speech Processing Natural Language Understanding (NUI) Image: © Wolter ‘17
  10. 10. Conversation Experience and Voice What Researchers say and why Investors bet on bots! Conversational Experience Voice User Interface (VUI) 63% like to use Voice to control their home. 65% of Smartphone Users have used Voice Assistants. Bots are the new apps. (Satya Nadella, Microsoft CEO) Every fourth German wants to use Chatbots. Active Users: 1 billion WhatsApp, 800 million Facebook Messenger… 63 % don’t like to talk to/with machines. https://www.quora.com/Why-are-people-saying-Bots-are-the-new-apps https://www.bitkom.org/Presse/Presseinformation/Jeder-Vierte-will-Chatbots-nutzen.html http://www.fittkaumaass.de/news/chatbots-von-jedem-zweiten-online-kaeufer-abgelehnt 50 % doubt the reliability.
  11. 11. Google Voice Search, Google Now, and Google Assistant • Voice Search (2002) • Voice Search is merged with Now (2012) • Google Android and iOS • Mobile usage scenarios • Looks up the Internet but doesn’t know you • Natural sounding voice commands (https://www.cnet.com/how-to/complete-list-of-ok-google-commands/) Google Assistant • Google's Allo chat app* • Google’s Pixel phone* • Google Home • Probably “next gen. Google Now” • Conversational • Deeper artificial intelligence *Multimodal human-computer interaction involving several of the five human senses (i.e. vision and voice). Google Voice Search Google Now
  12. 12. Alexa’s built-in voice capabilities for your connected products: • Works same way it would with an Amazon Echo • Access to third-party skills developed using the Alexa Skills Kit (ASK). • Develover Kits • Still a Wake Word Engine needed (i.e. Sensory Alexa wake word suite) Commands/Conversation and Devices Source: https://developer.amazon.com/alexa-voice-service Skills Devices
  13. 13. Conversational User Experience Source: https://developer.amazon.com/alexa-voice-service Alexa in the Car: Ford, Amazon to Provide Access to Shop, Search and Control Smart Home Features on the Road. The world's first Amazon Alexa-enabled smartwatch: iMCO CoWatch. LG puts Amazon Alexa on a fridge.
  14. 14. …operates without a graphical user interface and is typically controlled via a network connection. Headless Devices Source: Discovery Channel 2013
  15. 15. Source: Room E demo by Jared Ficklin, http://www.youtube.com/watch?v=BGaAyBBur3I Interaction isn´t one-dimensional Multimodal interaction provides multiple modes of input and output. speechRecognizer = new SpeechRecognitionEngine(); var grammer = new Grammar(new FileStream("commands.grxml... speechRecognizer.LoadGrammar(grammer); speechRecognizer.SpeechRecognized += new EventHandler... speechRecognizer.SpeechHypothesized += new EventHandler... speechRecognizer.SpeechRecognitionRejected += new EventHandler... speechRecognizer.SetInputToAudioStream(stream, new SpeechAudioFormatInfo(EncodingFormat... speechRecognizer.RecognizeAsync(RecognizeMode...
  16. 16. https://github.com/Uberi/speech_recognition
  17. 17. Gulf between Human and Machine User and GoalsPhysical System (World) Source: Norman, D. (1986). "User Centered System Design: New Perspectives on Human-computer Interaction". CRC. ISBN 978-0-89859-872-8
  18. 18. Voice Input Changes Lives Inclusion • The biggest and most impactful benefit voice user experiences provide is vastly improved accessibility. Looking for inspiration? Go read the reviews of the Amazon Echo. […] Voice UIs allow us to remain fully human in our interactions. (Cheryl Platz, 2017, https://medium.com/microsoft-design/voice- user-interface-design-new-solutions-to-old-problems- baa36a64b3e4#.zc46diybh)
  19. 19. Voice User Interface (VUI) • Grice’s Maxims (1975) (https://plato.stanford.edu/entries/grice/) • Quality: Only say things that are true • Quantity: Don’t be more or less informative than needed • Relevance: Only say things relevant to the topic • Manner: Be brief, get to the point, and avoid ambiguity and obscurity • Cooperative Principle • Turn-taking • Context • Threading Herbert Paul Grice (March 13, 1913 – August 28, 1988)
  20. 20. Speech Recognition/ Speech to Text and Speech Synthesis / Text to Speech Speech Processing Source: Echo/Google Home infinite loop, https://youtu.be/ZfCfTYZJWtI
  21. 21. Physiological voice modelling (1791) Speech Synthesis Source: Kempelen's speaking machine (1791), http://www.dailymotion.com/video/x363xkr
  22. 22. Source: Hatsune Miku - World is mine– 2011, https://youtu.be/YSyWtESoeOc Vocaloid (2003) Hatsune Miku: “First sound from the Future.” Speech Synthesis
  23. 23. Speech Synthesis • Text-to-phoneme • Known/Unknow words • Text normalization • Henry VIII vs Chapter VIII • Prosodics and emotional content • How to sound “natural”? • … • Usually based on samples (versus physiological modelling) • Discrete symbols to continues Waveforms • Stochastic process (Hidden Markov model) • Machine Learning / Deep Learning (https://static.googleusercontent.com/media/research.google.com/en//pub s/archive/41539.pdf) Source: https://en.wikipedia.org/wiki/Speech_synthesis
  24. 24. Speech Synthesis Markup Language (SSML) • http://www.w3.org/TR/speech-synthesis/ • XML-based • Some elements and attributes: • break • phoneme • prosody • say-as • currency • digits • number • date • time • … • audio • ... <speak> <say-as> Welcome! Today is </say-as> <say-as interpret-as="date"> 20121213 </say-as> </speak>
  25. 25. Speech Recognition History • 1950’s: Bell Laboratories designed the "Audrey“ which could understand digits • 1960’s: IBM demonstrated “Shoebox” which could understand 16 words • 1970’s: Carnegie Mellon's "Harpy" speech- understanding system could understand 1011 words (approximately the vocabulary of an average three-year-old) • … Video: Massive Attack Tour 2008 http://www.uva.co.uk/archives/84
  26. 26. Speech Recognition Moving from word templates and sound patterns to probability. • 1980’s: Worlds of Wonder's Julie doll (1987), which children could train to respond to their voice. • 1990’s: In the early 90’s Dragon Dictate (9000 USD) and in the late 90’s Dragon NaturallySpeaking arrived to recognize continuous speech • 2000’s: It’s still guessing with around 80 percent accuracy. • 2010’s: Google's English Voice Search system now incorporates 230 billion words from actual user queries. • … Video: https://youtu.be/UkU9SbIictc
  27. 27. Speech Recognition Technics • Voice recognition (biometric) versus Speech recognition (content) • Speaker dependence vs. independence • Detection Algorithms • Fourier transformation (decorrelate the spectrum) • Dynamic time warping (DTW)-based speech recognition • Hidden Markov models (“stochastic state model”) • Grammar and Vocabulary • … • Front-End vs. Back-End • Natural Voice Control • Automatic Speech Recognition (ASR) • Natural Language Understanding (NLU) • Nonverbal communication • E.g. lip-reading, McGurk-Effect • Context and Grammar: • Simple Grammar: e.g. just digits or numbers • Advanced Grammar: Speech Recognition Grammar Specification (SRGS), W3C Standard • SRGS can take a variety of forms, with the most popular being Grammar XML (GRXML). (http://www.w3.org/TR/speech-grammar/ ) • Subject area (Healthcare, Military etc.) Image: © Wolter ‘17
  28. 28. Source: https://youtu.be/tDFfZlQRCwM 2016 Speaker Recognition and Reliability. Conversational UX
  29. 29. Codified and strict vs. Conversational • CLI: Command Line Interface • Input of Commands via Keyboard • Eliza by Joseph Weizenbaum (Psychotherapist), 1966 • Already 1220 chatbots according to the chatbots directory (https://www.chatbots.org/) [4]
  30. 30. Source: Subservient Chicken 2011 (http://web.archive.org/web/20110426194400/http://www.bk.com/en/us/campaigns/subservient-chicken.html)
  31. 31. Microsoft Xiaoice, Rinna, and Tay i.e. Xiaoice has 20 million registered users Sources: https://en.wikipedia.org/wiki/Xiaoice, https://en.wikipedia.org/wiki/Tay_(bot)
  32. 32. Persona for an Avatar with Personality Source: http://genieblog.ch/cortana-vs-siri-1-emotionen/
  33. 33. Source: Project Yorick, https://youtu.be/3Nss_2_rwdE Creepy (Ro)bot
  34. 34. Uncanny Valley Source: http://www.androidscience.com/theuncannyvalley/proceedings2005/uncannyvalley.html BB-8, Star Wars VII Source: Disney
  35. 35. Conversational UX turns real Source: https://youtu.be/jSVRrJJ2nl4, SNL Julie the Operator 2006
  36. 36. Human? • Chinese room: Does a machine literally "understand" Chinese? Or is it merely simulating the ability to understand Chinese? Searle calls the first position "strong AI" and the latter "weak AI". (https://en.wikipedia.org/wiki/Chinese_room) • Turing Test: A player C is given the task of trying to determine which player – A or B – is a computer and which is a human. C is limited to using the responses to written questions to make the determination. (https://en.wikipedia.org/wiki/Turing_test) • The Alexa Prize: A social bot that can converse coherently and engagingly with humans on popular topics for 20 minutes (similar to Loebner Prize with 25 minutes). (https://developer.amazon.com/alexaprize)
  37. 37. Source: Boris Adryan, 2015-10-20, http://iot.ghost.io/is-it-all-machine-learning/
  38. 38. Commonsense Knowledge and Intuition
  39. 39. Source: The Simpsons, 2001 Anticipation and Empathy
  40. 40. What is Machine Ethics? Source: http://moralmachine.mit.edu/
  41. 41. Source: http://www.youtube.com/Fzo_5q_dhIM, Green Tricycle Studios 2012 Respect Privacy. Conversational UX
  42. 42. Conversational UX, Topics and Guides • Microsoft • Kinect for Windows | Human Interface Guidelines v2.0 (https://developer.microsoft.com/en-us/windows/kinect/tools) • Interaction primer (https://docs.microsoft.com/en-us/windows/uwp/input-and- devices/input-primer) • Experience Principles and Best Practices (http://docs.botframework.com/en-us/directory/best-practices/) • Inclusive Design (https://www.microsoft.com/en-us/design/inclusive) • Google • Conversation Design (https://developers.google.com/actions/design/) • Amazon • Alexa Skills Kit Voice Design (https://developer.amazon.com/public/solutions/alexa/alexa- skills-kit/docs/alexa-skills-kit-voice-design-handbook)
  43. 43. g.co/dev/ActionsChecklist
  44. 44. Source: Kinect for Windows | Human Interface Guidelines v1.7 Wake Word Invocation Action/Skill Types Prompt Wake Word • Devices “Name” • Alexa, Echo, Amazon, or Computer • OK Google, or Hey Google • Hey Cortana • Keyword or trigger vs Always on, active listening • Indicate that the device is listening • Reduced false activation • Provide alternative input Invocation Name • Activating the Agent for your Skill (think of starting your app) • Usually two words (without articles), no trademarks Image: © Wolter ‘17
  45. 45. Action Types Direct integration (for home automation, media etc.) • Direct Actions* (Google) • Smart Home Skill (Amazon) • Flash Briefing Skill (Amazon) • Cortana Skill* (Microsoft) *not yet available Indirect integration (invocation trigger/name) • Conversation Actions (Google) • Custom Skill (Amazon) • Cortana Skill* (Microsoft) *not yet available Some restrictions for publishing! Needs invocation trigger!
  46. 46. Invocation Types Invocation Type Conversation Full Intent User: Alexa, ask Astrology Zone for the horoscope for Leo. Astrology Zone: Today’s outlook for Leo: An opportunity presents itself at work. Partial Intent User: Alexa, ask Astrology Daily for my horoscope. Astrology Daily: Horoscope for what sign? No Intent User: Alexa, talk to Astrology Daily. Astrology Daily: You can ask for your horoscope. Which is your sign? Ask <invocation name> <connecting word> <some action> <some action> <connecting word> <invocation name> Tell <invocation name> <connecting word> <some action> Search <invocation name> for <some action> Open <invocation name> for <some action> Talk to <invocation name> and <some action>Launch <invocation name> and <some action> Start <invocation name> and <some action> Resume <invocation name> and <some action> Run <invocation name> and <some action> Load <invocation name> and <some action> Begin <invocation name> and <some action> Use <invocation name> <connecting word> <some action>
  47. 47. Prompt Types Prompt Type Conversation Question Interaction remains open, waiting for respond. Astrology Daily: Horoscope for which sign? Statement Interaction will terminate. Astrology Daily: Today’s outlook for Pisces: You could be questioning your current path… Wizard of Oz experiment, Image: http://www.kristamcgeebooks.com/
  48. 48. The dirty secrets: JSON behind Amazon Alexa | Request Response
  49. 49. The dirty secrets: JSON behind Google Assistant | Request Response
  50. 50. How-to Alexa Skills https://developer.amazon.com/edw/home.html#/skills/list General Settings and Invocation Name Intents, Content, and Utterances
  51. 51. How-to Alexa Skills https://developer.amazon.com/edw/home.html#/skills/list Fulfillment via Endpoint/Webhook Testing
  52. 52. Tip: Development and Debugging • Prepare node.js • https://nodejs.org/ • https://expressjs.co var express = require('express'); var bodyParser = require('body-parser'); var app = express(); app.get('/', function (req, res) { res.send('Hello World!'); }); app.listen(3000, function () { console.log('Example app listening on port 3000!'); }); • Use an editor or IDE (i.e. Visual Studio Code) • https://code.visualstudio.com/Docs/runtimes/nodejs • Connect to your local server via tunnel • https://ngrok.com/ • Generates URL like https://b22ec890.ngrok.io/ ngrok http 3000 • Eclipse SmartHome/QIVICON • REST API http://127.0.0.1:8080/doc/index.html • Paper UI http://127.0.0.1:8080/ui/index.html
  53. 53. How-to Alexa Skills var Alexa = require('alexa-sdk'); app.post('/', function(req, res) { var context = { succeed: function (result) { console.log(result); res.json(result); }, fail:function (error) { console.log(error); } }; var alexa = Alexa.handler(req.body, context); alexa.registerHandlers(handlers); alexa.execute(); }); var handlers = { 'SwitchOnIntent': function () { var item = this.event.request.intent. slots.item.value; doRequest("ON"); this.emit(':tell', 'Switch ' + item); }, 'SwitchOffIntent': function () { var item = this.event.request.intent. slots.item.value; doRequest("OFF"); this.emit(':tell', 'Switch ' + item); } }; https://developer.amazon.com/edw/home.html#/skills/list
  54. 54. How-to Google Assistant Skills … https://developers.google.com/actions/
  55. 55. How-to Google Assistant Skills: Actions SDK https://developers.google.com/actions/
  56. 56. How-to Google Assistant Skills: Actions SDK • Conversation API • gactions CLI • Specifying Action Package (JSON) • Testing • Deployment • Actions SDK / ActionsSdkAssistant • npm install express body-parser actions-on-google --save • require('actions-on-google') .ActionsSdkAssistant; • Web Simulator https://developers.google.com/actions/
  57. 57. How-to Google Assistant Skills: api.ai • API.AI webhook protocol • Webfrontend • Specifying Action Package • Testing • Deployment • Actions SDK / ActionsAssistant • npm install express body-parser actions-on-google --save • require('actions-on-google') .ActionsAssistant; • Web Simulator https://developers.google.com/actions/ | https://console.api.ai/api-client/#/newAgent
  58. 58. How-to Google Assistant Skills: api.ai https://developers.google.com/actions/ | https://console.api.ai/api-client/#/agents Create Agent General Settings
  59. 59. How-to Google Assistant Skills: api.ai https://developers.google.com/actions/ | https://console.api.ai/api-client/#/agents Intents, Content, and Utterances Intents, Content, and Utterances
  60. 60. How-to Google Assistant Skills: api.ai https://developers.google.com/actions/ | https://console.api.ai/api-client/#/agents Content Training
  61. 61. How-to Google Assistant Skills: api.ai https://developers.google.com/actions/ | https://console.api.ai/api-client/#/agents Integration Invocation Name
  62. 62. How-to Google Assistant Skills: api.ai https://developers.google.com/actions/ | https://console.api.ai/api-client/#/agents Testing Fulfillment via Endpoint/Webhook
  63. 63. How-to Google Assistant Skills: api.ai GUI Interface NLU (Natural Language Understanding ) Conversation building features (i.e. Domains, Entities, State, Context) https://developers.google.com/actions/ | https://console.api.ai/api-client/#/newAgent
  64. 64. How-to Google Assistant Skills: api.ai var ApiAiAssistant = require('actions-on-google').ApiAiAssistant; app.post('/', function(req, res) { var assistant = new ApiAiAssistant( { request: req, response: res }); var actionMap = new Map(); actionMap.set("switchOnIntent", switchOnIntent); actionMap.set("switchOffIntent", switchOffIntent); assistant.handleRequest(actionMap); }); var switchOnIntent = function () { var item = assistant.getArgument("item"); doRequest("ON"); assistant.ask('Turned ' + item + ' on!'); }; var switchOffIntent = function () { var item = assistant.getArgument("item"); doRequest("OFF"); assistant.ask('Turned ' + item + ' off!'); }; https://developers.google.com/actions/tools/ngrok
  65. 65. @saschawolter | http://wolter.biz | https://github.com/wolter
  66. 66. Next: Microsoft Cortana Skills …and some more like Samsung Bixby etc. • Harman Kardon Speaker announced for 2017 • Cortana • Fictional AI character in the Halo video game. • Acquired 2009 as TellMe. • Intelligent personal assistant and knowledge navigator. • Competes with Siri and Google Now since 2013 (Windows Phone 8.1). • Available on Windows 10, Windows IoT, on Android and iOS, on Xbox, etc. • Cortana Skills Kit announced for early 2017 • Cortana Devices SDK announced for 2017 • Allows OEMs (original equipment manufacturer) and ODMs to create smart and personal devices. • Microsoft Bot Framework • Build and connect intelligent bots to interact with your users naturally. • Microsoft Luis - Language Understanding Intelligent Service (part of Cognitive Services) • Understand language contextually
  67. 67. Source: Dark Star, 1974, Bryanston Pictures I think, therefore I am. […] But how do you know that anything else exists? My sensory apparatus reveals it to me.
  68. 68. Chatty Devices Does mouse and monitor soon become a thing of the past? Sascha Wolter |@saschawolter | wolter.biz Mai 2017 Source: Dark Star, 1974, Bryanston Pictures

×