AI-driven UI: conversational interfaces and more

•Download as PPTX, PDF•

2 likes•138 views

Eirik Stavelin

Making Waves peek into making information systems with voice UI in Norwegian with google tech

Technology

AI-driven UI: conversational interfaces and more
Mai 22nd 2018
by Dmitriy Semashkov & Eirik Stavelin, Making Waves
How to build a chatbot for the
Norwegian market
http://www.meshnorway.com/events/ai-driven-ui-conversational-interfaces-and-more

HOW TO?
There are plenty of
guides for creating
conversational UIs,
personas and fulfilment
engines out there.
You can figure it out!
5
Step 0
Do .say(“Hello”)

BEHIND THE
SCENES
Conversational UIs in
Norwegian, our
understanding of the
black box & what this
technology can do.
6
Step 1
Talk to the box

8
GOING BACKWARDS
In order to build something we need to know what it consists of.

9
The anatomy of a
conversational UI
The larger parts
• Speech-to-text & text-to-speech
• Intentions: (AI, classification)
• Entities: (NLP, (N)ER, tricks, lists)
• Contexts: follow-up questions,
“short-term-memory”,
confirmations.
• Back-end stage &&/|| fulfilment
• Persona & content
Photo: The Anatomy of an Angel - Damien Hirst

1 1
Speech-to-text &&
text-to-speech
TLDR; The densest part of the black box,
for us it’s just in/out from an API anyway.
• Computers deal in code
• Developers in text
• Users in voice
• A mic is opened for the users,
developer receives a text transcript
• Developer returns a text string, TTS
makes sound

1 2
Intentions & goals
A classification problem
(TLDR; they did it for you, just bring your own data)
• Goals - things users want done
• Intents things users want to do to
fulfil goals

1 4
Entities
Named or unnamed - the bits of text that
distinguishes one order from the next
(I want extra pepperoni on mine)
• Identify “things” in the world
• NLP
• (N)ER
• regex
• Word lists
• magic
• …

1
5
NER
http://polyglot.readthedocs.io/en/latest/NamedEntityRecognition.html

1
6
NOUN PHRASES
https://en.wikipedia.org/wiki/Noun_phrase

1
7
NER W/DISAMBIGUATION
https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.0/entities

$1 8 DIALOGFLOW console.dialogflow.com Fire pils og en pizza Ei flaske vin I ny og ne' Lite biff og dyr champagne Men ka gjør no' det? the_entity [{“product”:”pils”,”amount":"4"},{"prod uct":"pizza"}, {"product":"biff"}]$

2 0
Contexts
Provides a short term memory
to the agent. Ensure that
prereqs are done, or keep track
of what was been talked about
already.

2 1
Back-end aka fulfilment
TLDR: you can use what-ever you want. Node.js lib
works ok.
• A simple webhook
• Computations are done back-end
• Log-in (account linking)
• Con: separates sentences the bot
says from DF into back-end system

2 8
Sales?
Support?
Self-service?
Any dictation/stenographer situation?
Knowledge (e.g. turist info, wiki lookups, etc)
Entertainment (the guide though history at Folkemuseet?)
User guides (how do I assemble the Peter Opsvik chair “Tripp Trapp”)?
For what tasks are
conversational UIs the perfect
match?

2 9
Alarm! I’m stuck.
Language barriers
Blind / visual impairment
Vehicles (car / boat /bike / etc)
My hands are occupied / I have tools in my hands
I’m actively moving around/under/over/through my {work}space
In what situations are a
computer/tablet a hindrance or
distraction?

just like in any other language (NO lang support out-of-box in DF)
the critical part (STT & TTS, intent-classification & entity-extraction) are
better now
the art of conversations is still hard
the tech is here - where to apply it for max effect is our problem to figure
out.
How to build a chatbot for the
Norwegian market

Similar to AI-driven UI: conversational interfaces and more

Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...Edureka!

Open-Source Project Tools for Corporate Projects?Bertrand Delacretaz

Generating docs from APIsjamiehannaford

Python enterprise vento di libertaSimone Federici

Multiskill Conversational AIDaniel Kornev

Microsoft Cognitive Services 入門＆最新情報Ayako Omori

Speech Recognition TechnologyAamir-sheriff

Codemotion Berlin 2015 recapTorben Dohrn

Introducing TensorFlow: The game changer in building "intelligent" applicationsRokesh Jankie

Rapid Product Design in the WildMichele Ide-Smith

Faster Secure Software Development with Continuous Deployment - PH Days 2013Nick Galbreath

Google Cloud Platform - Cloud-Native Roadshow StuttgartVMware Tanzu

Designing XR Experiences with Speech & Natural Language Understandingin UnityNick Landry

How to get along with HATEOAS without letting the bad guys steal your lunch -...YK Chang

Smalltalk and BusinessMariano Martínez Peck

Realizzare un Virtual Assistant con Bot Framework Azure e UnityMarco Parenzan

Von JavaEE auf Microservice in 6 Monaten - The Good, the Bad, and the wtfs...André Goliath

Functional Prototyping For Mobile AppsMovel

PyData Texas 2015 KeynotePeter Wang

MR + AI: Machine Learning for Language in HoloLens & VR AppsNick Landry

Similar to AI-driven UI: conversational interfaces and more (20)

Creating Chatbots Using TensorFlow | Chatbot Tutorial | Deep Learning Trainin...

Open-Source Project Tools for Corporate Projects?

Generating docs from APIs

Python enterprise vento di liberta

Multiskill Conversational AI

Microsoft Cognitive Services 入門＆最新情報

Speech Recognition Technology

Codemotion Berlin 2015 recap

Introducing TensorFlow: The game changer in building "intelligent" applications

Rapid Product Design in the Wild

Faster Secure Software Development with Continuous Deployment - PH Days 2013

Google Cloud Platform - Cloud-Native Roadshow Stuttgart

Designing XR Experiences with Speech & Natural Language Understandingin Unity

How to get along with HATEOAS without letting the bad guys steal your lunch -...

Smalltalk and Business

Realizzare un Virtual Assistant con Bot Framework Azure e Unity

Von JavaEE auf Microservice in 6 Monaten - The Good, the Bad, and the wtfs...

Functional Prototyping For Mobile Apps

PyData Texas 2015 Keynote

MR + AI: Machine Learning for Language in HoloLens & VR Apps

Recently uploaded

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Commit 2024 - Secret Management made easyAlfredo García Lavilla

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Artificial intelligence in cctv survelliance.pptxhariprasad279825

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf

WordPress Websites for Engineers: Elevate Your Brand

Connect Wave/ connectwave Pitch Deck Presentation

SAP Build Work Zone - Overview L2-L3.pptx

Commit 2024 - Secret Management made easy

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

DevEX - reference for building teams, processes, and platforms

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Powerpoint exploring the locations used in television show Time Clash

My Hashitalk Indonesia April 2024 Presentation

DevoxxFR 2024 Reproducible Builds with Apache Maven

Designing IA for AI - Information Architecture Conference 2024

Artificial intelligence in cctv survelliance.pptx

What's New in Teams Calling, Meetings and Devices March 2024

DMCC Future of Trade Web3 - Special Edition

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Ensuring Technical Readiness For Copilot in Microsoft 365

Anypoint Exchange: It’s Not Just a Repo!

AI-driven UI: conversational interfaces and more

2. AI-driven UI: conversational interfaces and more Mai 22nd 2018 by Dmitriy Semashkov & Eirik Stavelin, Making Waves How to build a chatbot for the Norwegian market http://www.meshnorway.com/events/ai-driven-ui-conversational-interfaces-and-more

3. HOW TO? There are plenty of guides for creating conversational UIs, personas and fulfilment engines out there. You can figure it out! 5 Step 0 Do .say(“Hello”)

4. BEHIND THE SCENES Conversational UIs in Norwegian, our understanding of the black box & what this technology can do. 6 Step 1 Talk to the box

5. 8 GOING BACKWARDS In order to build something we need to know what it consists of.

6. 9 The anatomy of a conversational UI The larger parts • Speech-to-text & text-to-speech • Intentions: (AI, classification) • Entities: (NLP, (N)ER, tricks, lists) • Contexts: follow-up questions, “short-term-memory”, confirmations. • Back-end stage &&/|| fulfilment • Persona & content Photo: The Anatomy of an Angel - Damien Hirst

7. 1 0

8. 1 1 Speech-to-text && text-to-speech TLDR; The densest part of the black box, for us it’s just in/out from an API anyway. • Computers deal in code • Developers in text • Users in voice • A mic is opened for the users, developer receives a text transcript • Developer returns a text string, TTS makes sound

9. 1 2 Intentions & goals A classification problem (TLDR; they did it for you, just bring your own data) • Goals - things users want done • Intents things users want to do to fulfil goals

10. 1 3

11. 1 4 Entities Named or unnamed - the bits of text that distinguishes one order from the next (I want extra pepperoni on mine) • Identify “things” in the world • NLP • (N)ER • regex • Word lists • magic • …

12. 1 5 NER http://polyglot.readthedocs.io/en/latest/NamedEntityRecognition.html

13. 1 6 NOUN PHRASES https://en.wikipedia.org/wiki/Noun_phrase

14. 1 7 NER W/DISAMBIGUATION https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.0/entities

15. 1 8 DIALOGFLOW console.dialogflow.com Fire pils og en pizza Ei flaske vin I ny og ne' Lite biff og dyr champagne Men ka gjør no' det? the_entity [{“product”:”pils”,”amount":"4"},{"prod uct":"pizza"}, {"product":"biff"}]

16. 1 9

17. 2 0 Contexts Provides a short term memory to the agent. Ensure that prereqs are done, or keep track of what was been talked about already.

18. 2 1 Back-end aka fulfilment TLDR: you can use what-ever you want. Node.js lib works ok. • A simple webhook • Computations are done back-end • Log-in (account linking) • Con: separates sentences the bot says from DF into back-end system

19. 2 2 Persona & content

20. 2 3

21. 2 4

22. 2 5

23. 2 6

24. 2 7

25. 2 8 Sales? Support? Self-service? Any dictation/stenographer situation? Knowledge (e.g. turist info, wiki lookups, etc) Entertainment (the guide though history at Folkemuseet?) User guides (how do I assemble the Peter Opsvik chair “Tripp Trapp”)? For what tasks are conversational UIs the perfect match?

26. 2 9 Alarm! I’m stuck. Language barriers Blind / visual impairment Vehicles (car / boat /bike / etc) My hands are occupied / I have tools in my hands I’m actively moving around/under/over/through my {work}space In what situations are a computer/tablet a hindrance or distraction?

27. just like in any other language (NO lang support out-of-box in DF) the critical part (STT & TTS, intent-classification & entity-extraction) are better now the art of conversations is still hard the tech is here - where to apply it for max effect is our problem to figure out. How to build a chatbot for the Norwegian market

Editor's Notes

Hello and good evening. My name is Eirik Stavelin and I’m here with my colleague Dmitriy Semashkov. We work as data scientist as Making Waves and are here to share some experiences in creating information systems with conversational UI with technology from google. I was given this title, how to build a chatbot fot the Norwegian market. I’ll get back to tach at the end.
Our title is “How to build conversational interfaces for the Norwegian market”. There are plenty of guides out there on how to build all the parts these systems consists of. You typically require fewer of them as features are being consolidated into bot-frameworks. As this still is somewhat new and changing technologies, documentation is quickly outdated and lacking, but this gets better as stable versions is rolled out. We roughly follow the design guidelines composed by google. So this is not what we are going to talk about…
…what we are going to talk about is our experiences as developing and designing data scientist in creating conversational UI with the google technologies we just had presented (hopefully).
We could do a data science perspective on this, or a design perspective or a programming perspective. Or a commercial one. There are many, and time is short so what we’ll do is to take our normal tech-y data-science view and zoom out a little, try to talk about the pieces with a certain distance and quickly get to the point where we hope you guys have a rough idea, and can rather ask and each other about the best ways.
Googles’ “conversational UI” is a black box, not all details are public - but the birds-eye-view of these systems is known. They are also more or less the same as with other such systems: text to speech & speech to text Intents Entities Short term memory Back end processing & longer term memory
About 15 years ago I sat and read to my computer. In broken English. Alone under the stairs. The new version of MS speech recognition was out, and from now on I’d never have to write a nother English paper again. Ever. Id just dictate the content, and the machine would deliver perfectly correct text. I’d ace my English grades all the way from that point. That didn’t work. Many of you probably also testet this tech in the late 90s and early 2000s. It dit not work. It did not transcribe well and the speech synthesis was awful. Every time between then and now when a new speech system was out, I’d ignore it as fast as I could. That stuff does not work. But now though, it sort of does work. Siri kind-of works. Alexa kind-of works. And the google assistant kind-of works. Perhaps this time around, voice as UI finally works well enough to actually be useful.
say -v nora "hei mesh, kan dere høre entusiasmen i stemmen min?” say -v Alex "I'm sorry Dave, I'm afraid I can't do that” || Jeg beklager Dave, jeg er redd jeg ikke kan gjøre det. https://deepmind.com/blog/wavenet-generative-model-raw-audio/ (hvordan siste bølge med talesyntese blir laget hos google)
Goals: what the user actually needs and wants to accomplish, should be navigable through one or more intents. To identify what intent a user input has is a classification problem with a threshold and a default fallback if no good candidate is found. What is new-ish for me here is that what ever algorithm google uses here, they’ve gotten really good at this problem even with very few training examples. Intents are much like simple functions in programming, they can be with or without parameters and trigger some action. In dialogflow, intents that require parameters will prompt the user with follow-questions in order to secure all needed inputs. A simple A&Q that just returns the text of a FAQ need no back-end, it can just return the answer-text as voice output. More complicated stuff needs a fulfilment engine through a webhook. This lets you connect you existing systems into the voice UI.
To map from input natural language text to intent and parameters is the NLU-part.
Here we have some training data, annotated with entities. I’ve made an ice cream ordering system, where the intent is to order_ice_cream, and we have a few entities. The flavour of the ice cream (in yellow) the container (in red) and the topping (in pink). We ca only presume this is also used in to the classification of the intent, but most visibly here as training examples of what and where entities are used in sentences assumed to be used in order too fulfil the intent.
If several intents needs to be fill filed in order to trigger a further down one, context can be set and make it possible to trigger new intents. These has an “expired date”, a number of back-n-forth between man and machine. Lets say you need to both have an item to purchase and an address to deliver to, in order to place an order. The intents getAddress and findProduct needs to be fulfilled and contexts set before the placeOrder intent can be triggered and the end goal of playing with that nice thing can be real. We can also set variables that can carry through the conversation, in order to track how many times we misunderstand each other, let permissions given be remembered, etc. This is one way this generation of conversational interfaces attempts to upkeep the illusion of smartness as a conversation partner.
Both the beginning and the end of a conversational interface voice, but both ends are done through text. What words should the assistant use? What level of speed, details, accuracy etc should I use? This was a good exercise for us, as it creates a space where PR-people, content-people, tech-people, admin-people etc all had to come together and create a persona. This also makes a good opportunity to dust off those core values that were composed back in the day. These core values and world-view function as a yard-stick for evaluating the quality of speech. Our persona is a grown woman with lots of experience and a bias towards healthy food. With that we can qualitatively but easily evaluate if wordings and tone-of-voice feels right. We found good value in using this persona as a common creation that includes different people, from Making Waves and our clients. (Our design Heidi Lisle leads this work.)
For most of us conversational or auditory interfaces are new or uncomfortable, as we remember how clumpsy and painful the journey has been, or just never bothered to go that route. We talk about it as new as we are entering a new generation of techniques in the TTS and STT areas. But the blind and visually impaired has endured the previous generations of this tech. I believe we can learn a lot useful from this group in how we design ours systems, and where to apply this kind of technologies. The Norwegian association of the blind /blindeforbundet. I have also slowly learned that people how can see and can read, prefer not to, and will gladly tak a 50/50 guess on a two-button confirm/deny dialogbox in their computer or mobile. Even if the result is that all their contacts in the address book is deleted if that choose wrong. Info in sound might remedy some of these situations. And then there are those who cannot read the language or have experience with computers. It should be possible for an computer illiterate Chinese person to purchase train tickets from the teicket-kiost at Oslo S without any clicking except the pin from the payment. There are probably a myriad of such problems where the machine creates friction, where voice can let users with other things to worry about than pushing buttons, interact with machines hands free.
Some of these problems are obvious and have financial incentives pushing them to be solved fast. Others might be of social, cultural or humanitarian character and take longer to find and fix. What ever those problems will turn out to be - to build a conversational user interface in Norwegian is no longer the hard part.

AI-driven UI: conversational interfaces and more

Recommended

Recommended

More Related Content

Similar to AI-driven UI: conversational interfaces and more

Similar to AI-driven UI: conversational interfaces and more (20)

Recently uploaded

Recently uploaded (20)

AI-driven UI: conversational interfaces and more

Editor's Notes