SlideShare a Scribd company logo
1 of 76
Download to read offline
Introduction to Conversational AI
Where Computational Linguistics (CL) meets Human-Computer Interaction (HCI)
1
Who has had a conversation with a machine?
2
Three examples:
Telephone hotline bots
Smart phone voice assistants
Social robots
3
- Speech is more than spoken text
- Only few aspects of talking can be automated
- Everyone, whatever your background, can
contribute to the field of Conversational AI
Lecture take-away:
4
What is Conversational AI?
>>the task of building machine that can verbally interact with humans<<
➢ Accelerated through advances in Automatic Speech Recognition (ASR) in
the 2010s
➢ Launch of Apple’s “Siri” voice assistant in 2011
➢ Google duplex demo in 2018
➢ Recent social robot hype: Sophia, Furhat, Pepper, Nao ..
5
Introduction: the rise of a new user interface
A new way to interact with computers
From…
Command line interface
to...
Graphical user interface (GUI)
to...
Voice user interface (VUI)
6
Introduction: the rise of a new user interface
The voice interface: the user perspective
- can be more intuitive, enabling people
to use the most natural means of
communication
- enable users to complete certain tasks
faster than by using a GUI
7
Introduction: the rise of a new user interface
The voice interface: the industry perspective
- improved user experience
- enable new products
- more personalized interaction with
technology
- reduce costs
8
Introduction: the rise of a new user interface
The voice interface: the research perspective
9
Introduction: the rise of a new user interface
- scientific challenges
basic and applied research on social interaction and communication need
- design challenges
new design language needed to implement this new type of interface
- technical challenges
how to build new tools for more robust/advanced speech processing
- ethnical and privacy issues
new ethnical and legal frameworks needed to regulate this powerful new and potentially invasive technology
Where CL meets HCI meets Conversation Analysis (CA)
➢ Challenge of the CL/HCI meeting point: the
discrepancy between what is known about talk and
what is technically possible
➢ Challenge of the HCI/CA meeting point: how can we
design and build voice technology informed by
Conversation Analysis - the science of talk.
10
Introduction: the rise of a new user interface
How does voice technology work?
11
Current state of the art
Source: https://www.theguardian.com/technology/2020/jan/11/why-do-we-gender-ai-voice-tech-firms-move-to-be-more-inclusive
A speech processing pipeline: ASR, NLU and TTS modules
12
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Automatic speech recognition (ASR), end-to-end
From hello … … … … … to “hello”
the word the string
13
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Automatic speech recognition (ASR), end-to-end
From soundwaves to spectrograms - from audio signal to graphical representation
14
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
“Printing” sound: the Fourier / Wavelet transform
Automatic speech recognition (ASR)
The Tchaikovsky problem:
Saying “Tchaikovsky” often ends up being transcribed as “chai coffee” or “shall I cough sky”
15
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Grapheme matching: H H E E E L L L O O O O O O
String matching: “hello”
Compare:
Grapheme matching: Y Y H H E E L L O O O O
String matching: ???
Text-to-speech (TTS)
“Reverse ASR”: from strings to soundwave
Some current issues: natural intonation, latency, no reflexive online attention to user behaviour
16
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Natural language understanding (NLU)
ASR input ---> process “meaning” ---> TTS output
17
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Natural language understanding (NLU)
ASR input ---> process “meaning” ---> TTS output
18
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Example:
“I want a pizza
with ham and
extra pineapple”
Natural language understanding (NLU)
ASR input ---> process “meaning” ---> TTS output
19
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Example:
“I want a pizza
with ham and
extra pineapple”
TakeOrder{
Type= pizza,
Topping: [ham],
Extra: [pineapple]
}
Natural language understanding (NLU)
ASR input ---> process “meaning” ---> TTS output
20
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Example:
“I want a pizza
with ham and
extra pineapple”
TakeOrder{
Type= pizza,
Topping: [ham],
Extra: [pineapple]
}
if TakeOrder=True:
say response[RepeatOrder]
else:
start_seq[TakeOrderFallback]
“Got it, your order is one
pizza with ham and extra
pineapple.”
Natural language understanding (NLU)
21
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Example: ...but what if the user instead says:
“I want a pizza “One pizza with that tasty ham and also pineapple, please”
with ham and
extra pineapple”
ASR input ---> process “meaning” ---> TTS output
Natural language understanding (NLU)
Rule based: define a set of rules to determine pizza type
Keyword spotting: “I want a pizza with ham and extra pineapple”
“One pizza with that tasty ham and also pineapple, please”
NLU processor: [intent][type][topping][topping type1][topping type 2]
22
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Natural language understanding (NLU)
Machine learning based: compute the rules based on data
Training dataset:
1 “I want a pizza with ham and extra pineapple”
2 “I want pizza and loads of gammon and pineapples”
3 “Give me one with ham and also pineapples”
4 “Pineapple ham pizza please”
5 “Ehh pineapple pen pizza please”
6 “I.. uhm.. I want a pizza with… ham and olives, no pineapples”
23
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Natural language understanding (NLU)
Machine learning based: compute the rules based on data
Training dataset:
1 “I want a pizza with ham and extra pineapple”
2 “I want pizza and loads of gammon and pineapples”
3 “Give me one with ham and also pineapples”
4 “Pineapple ham pizza please”
5 “Ehh pineapple pen pizza please”
6 “I.. uhm.. I want a pizza with… ham and olives, no pineapples”
24
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
“One pizza with
ham and
pineapple,
thanks”
User says:
Natural language understanding (NLU)
Machine learning based: compute the rules based on data
Training dataset:
1 “I want a pizza with ham and extra pineapple”
2 “I want pizza and loads of gammon and pineapples”
3 “Give me one with ham and also pineapples”
4 “Pineapple ham pizza please”
5 “Ehh pineapple pen pizza please”
6 “I.. uhm.. I want a pizza with… ham and olives, no pineapples”
25
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
Pizza-Order-Type classifier:
Probability Pizza types:
0.8 Ham+Pineapple
0.0 Anchovies
0.0 Seafood
0.2 Ham+Olives
“One pizza with
ham and
pineapple,
thanks”
User says:
Natural language understanding (NLU)
26
Current state of the art
Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
...but what if the user instead says:
“One pizza Hawaii please”
Natural language understanding (NLU)
...whether you define the rules for sorting or
compute them using machine learning,
in practice, NLU is..
- a sorting problem
- utterance to (intent) label mapping
- a state-transition computation
27
Current state of the art
Credit: https://twitter.com/nwilliams030/
Natural language understanding (NLU)
28
Current state of the art
Credit: https://towardsdatascience.com/architecture-overview-of-a-conversational-ai-chat-bot-4ef3dfefd52e?gi=165126e5cf47
Example architecture of a chatbot:
Chatbots as state machines:
Limits of Natural language understanding (NLU)
➢ A tool to build useful tools rather than understanding language in any
human sense
29
Current state of the art
Limits of Natural language understanding (NLU)
➢ A tool to build useful tools rather than understanding language in any
human sense
Image building a concierge bot…
30
Current state of the art
Limits of Natural language understanding (NLU)
➢ A tool to build useful tools rather than understanding language in any
human sense
Image building a concierge bot… then someone comes and puts a hat on it!
31
Current state of the art
Limits of Natural language understanding (NLU)
➢ A tool to build useful tools rather than understanding language in any
human sense
Image building a concierge bot… then someone comes and puts a hat on it!
➢ NLU tools are limited to specific use cases, don’t “scale up” well, imagine…
○ a generic model for politeness/rudeness?
○ a generic model of ‘common sense’?
○ a generic model of building rapport?
32
Current state of the art
Limits of Natural language understanding (NLU)
➢ A tool to build useful tools rather than understanding language in any
human sense
Image building a concierge bot… then someone comes and puts a hat on it!
➢ NLU tools are limited to specific use cases, don’t “scale up” well, imagine…
○ a generic model for politeness/rudeness?
○ a generic model of ‘common sense’?
○ a generic model of building rapport?
➢ Problem of situated interaction 33
Current state of the art
Limits of Natural language understanding (NLU)
➢ Problem of situated interaction: (see also ‘symbol grounding problem’)
Example: Image building a Pizza bot here...
...and in Italy
34
Current state of the art
Limits of Natural language understanding (NLU)
➢ Problem of situated interaction: (see also ‘symbol grounding problem’)
Example: Image building a Pizza bot here...
...and in Italy
Requires grounding the concept of “pizza” in local context;
Bots don’t generalize well, need localization to environment 35
Current state of the art
Some conceptual issues:
- Searle’s Chinese room argument
- the antrophomorphization of technology
36
Current state of the art
Some conceptual issues:
- Searle’s Chinese room argument
- the antrophomorphization of technology
There are plenty of engineering contests on mimicking human verbal behaviour:
Turing test, Winograd schema challenge, Loebner prize, Alexa prize etc.
37
Current state of the art
Some conceptual issues:
- Searle’s Chinese room argument
- the antrophomorphization of technology
There are plenty of engineering contests on mimicking human verbal behaviour:
Turing test, Winograd schema challenge, Loebner prize, Alexa prize etc.
Can we even “compute” talk?
- Turing vs Wittgenstein/Ryle debates (Wittgenstein 1953, Button et al. 1995,
Shanker 2002)
38
Current state of the art
Some conceptual issues:
- Searle’s Chinese room argument
- the antrophomorphization of technology
There are plenty of engineering contests on mimicking human verbal behaviour:
Turing test, Winograd schema challenge, Loebner prize, Alexa prize etc.
Can we even “compute” talk?
- Turing vs Wittgenstein/Ryle debates (Wittgenstein 1953, Button et al. 1995,
Shanker 2002)
Bottom line: No machine can currently participate in human social life in a
convincingly human-like way :(
Wittgenstein, L., 1953. Philosophische Untersuchungen. (Philosophical investigations).
Button, G., Lee, J.R., Coulter, J. and Sharrock, W., 1995. Computers, minds and conduct. Blackwell Publishers, Inc.
Shanker, S.G., 2002. Wittgenstein's Remarks on the Foundations of AI. Routledge. 39
Current state of the art
Some conceptual issues:
- Searle’s Chinese room argument
- the antrophomorphization of technology
There are plenty of engineering contests on mimicking human verbal behaviour:
Turing test, Winograd schema challenge, Loebner prize, Alexa prize etc.
Can we even “compute” talk?
- Turing vs Wittgenstein/Ryle debates (Wittgenstein 1953, Button et al. 1995,
Shanker 2002)
Bottom line: No machine can currently participate in human social life in a
convincingly human-like way :( … or wait… maybe it’s better this way
Wittgenstein, L., 1953. Philosophische Untersuchungen. (Philosophical investigations).
Button, G., Lee, J.R., Coulter, J. and Sharrock, W., 1995. Computers, minds and conduct. Blackwell Publishers, Inc.
Shanker, S.G., 2002. Wittgenstein's Remarks on the Foundations of AI. Routledge. 40
Current state of the art
Remember Eliza?
It is easy to build interactive software that “deceives” humans into believing it
pocesses human qualities or intelligence..
41
Current state of the art
Remember Eliza?
It is easy to build interactive software that “deceives” humans into believing it
pocesses human qualities or intelligence..
Weizenbaum himself on ELIZA: a PARODY of a therapist, a fake, a charade, aim
was to illustrate the LIMITS of computers, not their potential (Weizenbaum
1966)
Weizenbaum, J., 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Communications
of the ACM, 9(1), pp.36-45.
42
Current state of the art
Two fundamental problems:
How to formally represent units in talk?
How to formally represent “meaning” in talk?
43
Current state of the art
Formal representations of talk?
44
Current state of the art
How to represent talk in computer-readable formats:
Formal representations of talk?
How to represent talk in computer-readable formats:
45
Current state of the art
Movie subtitles
Formal representations of talk?
How to represent talk in computer-readable formats:
46
Current state of the art
Movie subtitles
Phonetic transcripts
(IPA, ARPAbet)
Formal representations of talk?
How to represent talk in computer-readable formats:
47
Current state of the art
"{'input':{'ssml':'<speak>The <say-as
interpret-as="characters">Welcome</say-as> standard
<break time="1s"/>to the
<sub alias="matrix">emph=1</sub> .</speak>',
'voice':{'languageCode':'en-us',
'name':'en-US-Standard-B',
'ssmlGender':'MALE'},'audioConfig':{'audioEncoding':'MP3'}
}"
Movie subtitles
Phonetic transcripts
(IPA, ARPAbet)
Transcripts+Mark-up language
(CHAT,XML, SSML)
Formal representations of talk?
Challenges: How to systematically represent…
48
Current state of the art
Formal representations of talk?
Challenges: How to systematically represent…
Non-lexical vocalizations?
49
Current state of the art
Formal representations of talk?
Challenges: How to systematically represent…
Non-lexical vocalizations?
Phonetic reductions?
50
Current state of the art
Formal representations of talk?
Challenges: How to systematically represent…
Non-lexical vocalizations?
Phonetic reductions?
What is noise and what is signal? Stretches, stress, pitch, intensity? Which
phonetic properties need to be included?
51
Current state of the art
Formal representations of talk?
Challenges: How to systematically represent…
Non-lexical vocalizations?
Phonetic reductions?
What is noise and what is signal? Stretches, stress, pitch, intensity? Which
phonetic properties need to be included?
Many subtle cues in social interaction are often not processed at all..
52
Current state of the art
Reducing talk to strings and tags:
Limits of representing talk are most visible on the
micro and macro ends of the unit spectrum
in linguistics. Consider...
53
Current state of the art
Reducing talk to strings and tags:
Limits of representing talk are most visible on the
micro and macro ends of the unit spectrum
in linguistics. Consider...
Phonetic/phonological units in talk:
“uhmm”, “err”, haha, <lip smack>(e.g. Keevallik and Ogden 2020)
54
Current state of the art
Reducing talk to strings and tags:
Limits of representing talk are most visible on the
micro and macro ends of the unit spectrum
in linguistics. Consider...
Phonetic/phonological units in talk:
“uhmm”, “err”, haha, <lip smack>(e.g. Keevallik and Ogden 2020)
Pragmatic units in talk:
55
Current state of the art
Reducing talk to strings and tags:
Limits of representing talk are most visible on the
micro and macro ends of the unit spectrum
in linguistics. Consider...
Phonetic/phonological units in talk:
“uhmm”, “err”, haha, <lip smack>(e.g. Keevallik and Ogden 2020)
Pragmatic units in talk:
???
56
Current state of the art
Reducing talk to strings and tags:
Limits of representing talk are most visible on the
micro and macro ends of the unit spectrum
in linguistics. Consider...
Phonetic/phonological units in talk:
“uhmm”, “err”, haha, <lip smack> (e.g. Keevallik and Ogden 2020)
Pragmatic units in talk:
Proposed units: turns, turn-constructional unit (TCU),
maybe clause
(The idea of ‘speech acts’ or what is an ‘intent’ is still hugely controversial
in the literature, see e.g. Button 1994)
Keevallik, L. and Ogden, R., 2020. Sounds on the Margins of Language at the Heart of Interaction. ROLSI, 53(1)
Button, G., 1994. What's wrong with speech-act theory. Computer Supported Cooperative Work (CSCW), 3(1), pp.39-42.
57
Current state of the art
Formal representations of “meaning”: some examples
58
Current state of the art
Formal representations of “meaning”: some examples
90s: Formal cognitive/“deep semantic” models of language
Lexical databases of formal representations of cognitive schemas, e.g. FrameNet (Baker er al. 1998)
59
Current state of the art
Formal representations of “meaning”: some examples
90s: Formal cognitive/“deep semantic” models of language
Lexical databases of formal representations of cognitive schemas, e.g. FrameNet (Baker er al. 1998)
00s: Formal speech act taxonomies
Formal models of what people are doing in talk, such as accusing, asserting, apologizing, building arguments and narratives (cf. DAMSL, Dialog Act ISO ( Core and
Allen 1997, Bunt et al. 2010))
60
Current state of the art
Formal representations of “meaning”: some examples
90s: Formal cognitive/“deep semantic” models of language
Lexical databases of formal representations of cognitive schemas, e.g. FrameNet (Baker er al. 1998)
00s: Formal speech act taxonomies
Formal models of what people are doing in talk, such as accusing, asserting, apologizing, building arguments and narratives (cf. DAMSL, Dialog Act ISO ( Core and
Allen 1997, Bunt et al. 2010))
10s: Statistical Language Models
Neural language models for dialog (”stochastic parrots” ): DialoGPT, ConveRT (Zhang 2020, Henderson et al. 2020, Bender et al 2021, Bender and Koller 2020)
61
Current state of the art
Formal representations of “meaning”: some examples
90s: Formal cognitive/“deep semantic” models of language
Lexical databases of formal representations of cognitive schemas, e.g. FrameNet (Baker er al. 1998)
00s: Formal speech act taxonomies
Formal models of what people are doing in talk, such as accusing, asserting, apologizing, building arguments and narratives (cf. DAMSL, Dialog Act ISO ( Core and
Allen 1997, Bunt et al. 2010))
10s: Statistical Language Models
Neural language models for dialog (”stochastic parrots” ): DialoGPT, ConveRT (Zhang 2020, Henderson et al. 2020, Bender et al 2021, Bender and Koller 2020)
No formal framework can represent meaning in natural conversation. The flexibility of language-in-interaction
and in-situ meaning-making practices continues to pose a critical challenge in computational lingustics and NLP.
62
Current state of the art
Baker, C.F., Fillmore, C.J. and Lowe, J.B., 1998, August. The berkeley framenet project. In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1 (pp. 86-90).
Bender, E.M. and Koller, A., 2020, July. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5185-5198).
Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S., 2021. On the dangers of stochastic parrots: Can language models be too big. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency; Association for Computing
Machinery: New York, NY, USA.
Bunt, H., Alexandersson, J., Carletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L. and Soria, C., 2010, May. Towards an ISO standard for dialogue act annotation. In Seventh conference on International
Language Resources and Evaluation (LREC'10).
Core, M.G. and Allen, J., 1997, November. Coding dialogs with the DAMSL annotation scheme. In AAAI fall symposium on communicative action in humans and machines (Vol. 56, pp. 28-35).
Henderson, M., Casanueva, I., Mrkšić, N., Su, P.H., Wen, T.H. and Vulić, I., 2020, November. ConveRT: Efficient and Accurate Conversational Representations from Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing: Findings (pp. 2161-2174).
Zhang, Y., Sun, S., Galley, M., Chen, Y.C., Brockett, C., Gao, X., Gao, J., Liu, J. and Dolan, W.B., 2020, July. DIALOGPT: Large-Scale Generative Pre-training for Conversational Response Generation. In Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics: System Demonstrations (pp. 270-278).
The limits of automation: from NLP to HCI
What cannot be formalized, cannot be automated...
...and has to be built by-hand.
63
Human-computer interaction
The limits of automation: from NLP to HCI
What cannot be formalized, cannot be automated...
...and has to be built by-hand.
64
Human-computer interaction
This is where conversation design / interaction design
comes in.
Conversation design (CD):
CD is a design discipline derived from UI/UX design and copywriting
- towards better (more useful / more human-like) conversations with bots, make
human-bot interaction more successful
CD is a design language based on human conversation (similar to how material
design is a design language based on pen and paper)
65
Human-computer interaction
Conversation design (CD):
CD is a design discipline derived from UI/UX design and copywriting
- towards better (more useful / more human-like) conversations with bots, make
human-bot interaction more successful
CD is a design language based on human conversation (similar to how material
design is a design language based on pen and paper)
CD is the practical solution for the many “scaling problems” that beset automation
in Conversational AI
66
Human-computer interaction
Build dialogs Which voice? Which bot persona:
67
Human-computer interaction
Conversation design (CD) for voice bot development
Design for voice user interfaces (VUI) in voice bots:
Range of NLU software: Google dialogflow, AmazonAlexa skills, wit.ai, Rasa...
Part of chatbot development teams
68
Human-computer interaction
Summary: Talking machines?
69
Summary
Summary: Talking machines?
Not happening.. but
While technical, conceptual and practical challenges remain...
...current voice interfaces are a significant technological achievements and
enable useful products
70
Summary
Summary: The Conversational AI industry
Conversational AI engineer: develop and implement tools related to
speech processing / NLU pipelines, push the limits of what’s technically
possible
Conversation designer: design and prototype human-bot interaction,
improve user experience, examine dialog flows and error sources, voice
branding, persona design...
71
Summary
Ongoing research at PolyU
★ Empirical and experimental studies: representations of conversations,
coordination and prediction for turn-taking (using Furhat, CoBra network, forthcoming)
★ Technical studies: moving away from intent-based NLU (cf. Sankar et al 2019)
★ Bot skill development for Chinese Conversational AI (Liesenfeld and Huang 2020)
★ Design studies: Conversation design for Chinese (collab. with School of Design)
If working with Social robots and Conversational AI has peaked your interest,
there are opportunities for students to join such current projects at PolyU :)
72
Summary
Sankar, C., Subramanian, S., Pal, C., Chandar, S. and Bengio, Y., 2019. Do Neural Dialog Systems Use the Conversation History Effectively? An
Empirical Study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 32-37).
Liesenfeld, A. and Huang, C.R., 2020, NameSpec asks: What's Your Name in Chinese? A Voice Bot to Specify Chinese Personal Names through
Dialog. In Proceedings of the 2nd Conference on Conversational User Interfaces, July 20-24..
Now lets get building!
73
Practicum: build your own bot
Build an interactive bot:
Use “Rasa playground” to build a useful bot
Put your knowledge of NLU to use:
- define training data, intents, entities, responses and dialog states
Play the role of both a conversation engineer and designer:
- Design, build and test your bot
- Play around with different conversational styles and different dialog flows
74
Practicum: build your own bot
Details: build a bot that automates pizza orders
➔ the bot should understand requests for 3 different types of pizza
➔ The bot should have at least three states:
◆ “return greeting”
◆ “take and confirm pizza order”
◆ “say goodbye”
➔ Define training data, intents and entities for each state
➔ Train and test your bot
75
Practicum: build your own bot
Today’s lecture aimed to provide you with:
➔ a sense of design and implementation techniques for
conversational agents
➔ an awareness of current research and emerging issues in
the field of Conversational AI
➔ a introduction to the practical skills involved in building
working conversational interfaces
76

More Related Content

Similar to Intro-lecture.pdf

Open source software for startups
Open source software for startupsOpen source software for startups
Open source software for startups
victorneo
 
Open Source in Higher Education 2007
Open Source in Higher Education 2007Open Source in Higher Education 2007
Open Source in Higher Education 2007
ssorden
 
Building an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learnedBuilding an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learned
Wojciech Koszek
 
Open Design @ Tec Guadalajara - Mexico - 23/08/2011
Open Design @ Tec Guadalajara - Mexico - 23/08/2011Open Design @ Tec Guadalajara - Mexico - 23/08/2011
Open Design @ Tec Guadalajara - Mexico - 23/08/2011
Massimo Menichinelli
 
Open Source Software For Education
Open Source Software For EducationOpen Source Software For Education
Open Source Software For Education
Videoguy
 

Similar to Intro-lecture.pdf (20)

Open Design Communities - MAKlab Glasgow (UK) 16/09/2011
Open Design Communities - MAKlab Glasgow (UK) 16/09/2011Open Design Communities - MAKlab Glasgow (UK) 16/09/2011
Open Design Communities - MAKlab Glasgow (UK) 16/09/2011
 
Open source software for startups
Open source software for startupsOpen source software for startups
Open source software for startups
 
Open Source for Women / Girl Geeks
Open Source for Women / Girl GeeksOpen Source for Women / Girl Geeks
Open Source for Women / Girl Geeks
 
Open Source in Higher Education 2007
Open Source in Higher Education 2007Open Source in Higher Education 2007
Open Source in Higher Education 2007
 
Accelerating DevOps with ChatOps
Accelerating DevOps with ChatOpsAccelerating DevOps with ChatOps
Accelerating DevOps with ChatOps
 
NetBeans 6.5
NetBeans 6.5NetBeans 6.5
NetBeans 6.5
 
Containers, Serverless, Polyglot Development World, And Others…10 trends resh...
Containers, Serverless, Polyglot Development World, And Others…10 trends resh...Containers, Serverless, Polyglot Development World, And Others…10 trends resh...
Containers, Serverless, Polyglot Development World, And Others…10 trends resh...
 
Web2.0 2012 - lesson 7 - technologies and mashups
Web2.0 2012 - lesson 7 - technologies and mashups Web2.0 2012 - lesson 7 - technologies and mashups
Web2.0 2012 - lesson 7 - technologies and mashups
 
Building an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learnedBuilding an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learned
 
Open Design @ Tec Guadalajara - Mexico - 23/08/2011
Open Design @ Tec Guadalajara - Mexico - 23/08/2011Open Design @ Tec Guadalajara - Mexico - 23/08/2011
Open Design @ Tec Guadalajara - Mexico - 23/08/2011
 
Open Source Software For Education
Open Source Software For EducationOpen Source Software For Education
Open Source Software For Education
 
Codemotion Berlin 2015 recap
Codemotion Berlin 2015   recapCodemotion Berlin 2015   recap
Codemotion Berlin 2015 recap
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
Puppet for SysAdmins
Puppet for SysAdminsPuppet for SysAdmins
Puppet for SysAdmins
 
NTU Workshop: 01 What Is Open Design
NTU Workshop: 01 What Is Open DesignNTU Workshop: 01 What Is Open Design
NTU Workshop: 01 What Is Open Design
 
EdTechJoker Spring 2020 - Lecture 10 HAXTheWeb
EdTechJoker Spring 2020 - Lecture 10 HAXTheWebEdTechJoker Spring 2020 - Lecture 10 HAXTheWeb
EdTechJoker Spring 2020 - Lecture 10 HAXTheWeb
 
Enjoying the full stack - Frontend 2010
Enjoying the full stack - Frontend 2010Enjoying the full stack - Frontend 2010
Enjoying the full stack - Frontend 2010
 
Open Content Library Guadec 2007
Open Content Library Guadec 2007Open Content Library Guadec 2007
Open Content Library Guadec 2007
 
Py4 inf 01-intro
Py4 inf 01-introPy4 inf 01-intro
Py4 inf 01-intro
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 

Recently uploaded

VIP Call Girls Dharwad 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Dharwad 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Dharwad 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Dharwad 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)
amitlee9823
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝
Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝
Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Arekere ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Arekere ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Arekere ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Arekere ☎ 7737669865☎ Book Your One night Stand (Bangalore)
amitlee9823
 
CHEAP Call Girls in Hauz Quazi (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Hauz Quazi  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Hauz Quazi  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Hauz Quazi (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Mayapuri (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Mayapuri  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Mayapuri  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Mayapuri (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Ashok Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Ashok Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men 🔝Vijayawada🔝 E...
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men  🔝Vijayawada🔝   E...➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men  🔝Vijayawada🔝   E...
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men 🔝Vijayawada🔝 E...
amitlee9823
 
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
amitlee9823
 
Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...
amitlee9823
 

Recently uploaded (20)

Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
Book Paid Lohegaon Call Girls Pune 8250192130Low Budget Full Independent High...
 
VIP Call Girls Dharwad 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Dharwad 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Dharwad 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Dharwad 7001035870 Whatsapp Number, 24/07 Booking
 
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
 
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In RT Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Sanjay Nagar ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Kothrud Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝
Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝
Call Now ≽ 9953056974 ≼🔝 Call Girls In Yusuf Sarai ≼🔝 Delhi door step delevry≼🔝
 
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Escorts Service Arekere ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Arekere ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Arekere ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Arekere ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
CHEAP Call Girls in Hauz Quazi (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Hauz Quazi  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Hauz Quazi  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Hauz Quazi (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CHEAP Call Girls in Mayapuri (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Mayapuri  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Mayapuri  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Mayapuri (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Pimple Saudagar Call Me 7737669865 Budget Friendly No Advance Booking
 
CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Ashok Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Ashok Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men 🔝Vijayawada🔝 E...
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men  🔝Vijayawada🔝   E...➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men  🔝Vijayawada🔝   E...
➥🔝 7737669865 🔝▻ Vijayawada Call-girls in Women Seeking Men 🔝Vijayawada🔝 E...
 
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
 
Introduction-to-4x4-SRAM-Memory-Block.pptx
Introduction-to-4x4-SRAM-Memory-Block.pptxIntroduction-to-4x4-SRAM-Memory-Block.pptx
Introduction-to-4x4-SRAM-Memory-Block.pptx
 
(INDIRA) Call Girl Napur Call Now 8617697112 Napur Escorts 24x7
(INDIRA) Call Girl Napur Call Now 8617697112 Napur Escorts 24x7(INDIRA) Call Girl Napur Call Now 8617697112 Napur Escorts 24x7
(INDIRA) Call Girl Napur Call Now 8617697112 Napur Escorts 24x7
 
Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...
Vip Mumbai Call Girls Kalyan Call On 9920725232 With Body to body massage wit...
 
HLH PPT.ppt very important topic to discuss
HLH PPT.ppt very important topic to discussHLH PPT.ppt very important topic to discuss
HLH PPT.ppt very important topic to discuss
 

Intro-lecture.pdf

  • 1. Introduction to Conversational AI Where Computational Linguistics (CL) meets Human-Computer Interaction (HCI) 1
  • 2. Who has had a conversation with a machine? 2
  • 3. Three examples: Telephone hotline bots Smart phone voice assistants Social robots 3
  • 4. - Speech is more than spoken text - Only few aspects of talking can be automated - Everyone, whatever your background, can contribute to the field of Conversational AI Lecture take-away: 4
  • 5. What is Conversational AI? >>the task of building machine that can verbally interact with humans<< ➢ Accelerated through advances in Automatic Speech Recognition (ASR) in the 2010s ➢ Launch of Apple’s “Siri” voice assistant in 2011 ➢ Google duplex demo in 2018 ➢ Recent social robot hype: Sophia, Furhat, Pepper, Nao .. 5 Introduction: the rise of a new user interface
  • 6. A new way to interact with computers From… Command line interface to... Graphical user interface (GUI) to... Voice user interface (VUI) 6 Introduction: the rise of a new user interface
  • 7. The voice interface: the user perspective - can be more intuitive, enabling people to use the most natural means of communication - enable users to complete certain tasks faster than by using a GUI 7 Introduction: the rise of a new user interface
  • 8. The voice interface: the industry perspective - improved user experience - enable new products - more personalized interaction with technology - reduce costs 8 Introduction: the rise of a new user interface
  • 9. The voice interface: the research perspective 9 Introduction: the rise of a new user interface - scientific challenges basic and applied research on social interaction and communication need - design challenges new design language needed to implement this new type of interface - technical challenges how to build new tools for more robust/advanced speech processing - ethnical and privacy issues new ethnical and legal frameworks needed to regulate this powerful new and potentially invasive technology
  • 10. Where CL meets HCI meets Conversation Analysis (CA) ➢ Challenge of the CL/HCI meeting point: the discrepancy between what is known about talk and what is technically possible ➢ Challenge of the HCI/CA meeting point: how can we design and build voice technology informed by Conversation Analysis - the science of talk. 10 Introduction: the rise of a new user interface
  • 11. How does voice technology work? 11 Current state of the art Source: https://www.theguardian.com/technology/2020/jan/11/why-do-we-gender-ai-voice-tech-firms-move-to-be-more-inclusive
  • 12. A speech processing pipeline: ASR, NLU and TTS modules 12 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
  • 13. Automatic speech recognition (ASR), end-to-end From hello … … … … … to “hello” the word the string 13 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
  • 14. Automatic speech recognition (ASR), end-to-end From soundwaves to spectrograms - from audio signal to graphical representation 14 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog “Printing” sound: the Fourier / Wavelet transform
  • 15. Automatic speech recognition (ASR) The Tchaikovsky problem: Saying “Tchaikovsky” often ends up being transcribed as “chai coffee” or “shall I cough sky” 15 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog Grapheme matching: H H E E E L L L O O O O O O String matching: “hello” Compare: Grapheme matching: Y Y H H E E L L O O O O String matching: ???
  • 16. Text-to-speech (TTS) “Reverse ASR”: from strings to soundwave Some current issues: natural intonation, latency, no reflexive online attention to user behaviour 16 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
  • 17. Natural language understanding (NLU) ASR input ---> process “meaning” ---> TTS output 17 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
  • 18. Natural language understanding (NLU) ASR input ---> process “meaning” ---> TTS output 18 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog Example: “I want a pizza with ham and extra pineapple”
  • 19. Natural language understanding (NLU) ASR input ---> process “meaning” ---> TTS output 19 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog Example: “I want a pizza with ham and extra pineapple” TakeOrder{ Type= pizza, Topping: [ham], Extra: [pineapple] }
  • 20. Natural language understanding (NLU) ASR input ---> process “meaning” ---> TTS output 20 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog Example: “I want a pizza with ham and extra pineapple” TakeOrder{ Type= pizza, Topping: [ham], Extra: [pineapple] } if TakeOrder=True: say response[RepeatOrder] else: start_seq[TakeOrderFallback] “Got it, your order is one pizza with ham and extra pineapple.”
  • 21. Natural language understanding (NLU) 21 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog Example: ...but what if the user instead says: “I want a pizza “One pizza with that tasty ham and also pineapple, please” with ham and extra pineapple” ASR input ---> process “meaning” ---> TTS output
  • 22. Natural language understanding (NLU) Rule based: define a set of rules to determine pizza type Keyword spotting: “I want a pizza with ham and extra pineapple” “One pizza with that tasty ham and also pineapple, please” NLU processor: [intent][type][topping][topping type1][topping type 2] 22 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
  • 23. Natural language understanding (NLU) Machine learning based: compute the rules based on data Training dataset: 1 “I want a pizza with ham and extra pineapple” 2 “I want pizza and loads of gammon and pineapples” 3 “Give me one with ham and also pineapples” 4 “Pineapple ham pizza please” 5 “Ehh pineapple pen pizza please” 6 “I.. uhm.. I want a pizza with… ham and olives, no pineapples” 23 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog
  • 24. Natural language understanding (NLU) Machine learning based: compute the rules based on data Training dataset: 1 “I want a pizza with ham and extra pineapple” 2 “I want pizza and loads of gammon and pineapples” 3 “Give me one with ham and also pineapples” 4 “Pineapple ham pizza please” 5 “Ehh pineapple pen pizza please” 6 “I.. uhm.. I want a pizza with… ham and olives, no pineapples” 24 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog “One pizza with ham and pineapple, thanks” User says:
  • 25. Natural language understanding (NLU) Machine learning based: compute the rules based on data Training dataset: 1 “I want a pizza with ham and extra pineapple” 2 “I want pizza and loads of gammon and pineapples” 3 “Give me one with ham and also pineapples” 4 “Pineapple ham pizza please” 5 “Ehh pineapple pen pizza please” 6 “I.. uhm.. I want a pizza with… ham and olives, no pineapples” 25 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog Pizza-Order-Type classifier: Probability Pizza types: 0.8 Ham+Pineapple 0.0 Anchovies 0.0 Seafood 0.2 Ham+Olives “One pizza with ham and pineapple, thanks” User says:
  • 26. Natural language understanding (NLU) 26 Current state of the art Credits: https://liesenf.github.io/action-formation-models; https://developer.nvidia.com/conversational-ai; https://furhatrobotics.com/blog ...but what if the user instead says: “One pizza Hawaii please”
  • 27. Natural language understanding (NLU) ...whether you define the rules for sorting or compute them using machine learning, in practice, NLU is.. - a sorting problem - utterance to (intent) label mapping - a state-transition computation 27 Current state of the art Credit: https://twitter.com/nwilliams030/
  • 28. Natural language understanding (NLU) 28 Current state of the art Credit: https://towardsdatascience.com/architecture-overview-of-a-conversational-ai-chat-bot-4ef3dfefd52e?gi=165126e5cf47 Example architecture of a chatbot: Chatbots as state machines:
  • 29. Limits of Natural language understanding (NLU) ➢ A tool to build useful tools rather than understanding language in any human sense 29 Current state of the art
  • 30. Limits of Natural language understanding (NLU) ➢ A tool to build useful tools rather than understanding language in any human sense Image building a concierge bot… 30 Current state of the art
  • 31. Limits of Natural language understanding (NLU) ➢ A tool to build useful tools rather than understanding language in any human sense Image building a concierge bot… then someone comes and puts a hat on it! 31 Current state of the art
  • 32. Limits of Natural language understanding (NLU) ➢ A tool to build useful tools rather than understanding language in any human sense Image building a concierge bot… then someone comes and puts a hat on it! ➢ NLU tools are limited to specific use cases, don’t “scale up” well, imagine… ○ a generic model for politeness/rudeness? ○ a generic model of ‘common sense’? ○ a generic model of building rapport? 32 Current state of the art
  • 33. Limits of Natural language understanding (NLU) ➢ A tool to build useful tools rather than understanding language in any human sense Image building a concierge bot… then someone comes and puts a hat on it! ➢ NLU tools are limited to specific use cases, don’t “scale up” well, imagine… ○ a generic model for politeness/rudeness? ○ a generic model of ‘common sense’? ○ a generic model of building rapport? ➢ Problem of situated interaction 33 Current state of the art
  • 34. Limits of Natural language understanding (NLU) ➢ Problem of situated interaction: (see also ‘symbol grounding problem’) Example: Image building a Pizza bot here... ...and in Italy 34 Current state of the art
  • 35. Limits of Natural language understanding (NLU) ➢ Problem of situated interaction: (see also ‘symbol grounding problem’) Example: Image building a Pizza bot here... ...and in Italy Requires grounding the concept of “pizza” in local context; Bots don’t generalize well, need localization to environment 35 Current state of the art
  • 36. Some conceptual issues: - Searle’s Chinese room argument - the antrophomorphization of technology 36 Current state of the art
  • 37. Some conceptual issues: - Searle’s Chinese room argument - the antrophomorphization of technology There are plenty of engineering contests on mimicking human verbal behaviour: Turing test, Winograd schema challenge, Loebner prize, Alexa prize etc. 37 Current state of the art
  • 38. Some conceptual issues: - Searle’s Chinese room argument - the antrophomorphization of technology There are plenty of engineering contests on mimicking human verbal behaviour: Turing test, Winograd schema challenge, Loebner prize, Alexa prize etc. Can we even “compute” talk? - Turing vs Wittgenstein/Ryle debates (Wittgenstein 1953, Button et al. 1995, Shanker 2002) 38 Current state of the art
  • 39. Some conceptual issues: - Searle’s Chinese room argument - the antrophomorphization of technology There are plenty of engineering contests on mimicking human verbal behaviour: Turing test, Winograd schema challenge, Loebner prize, Alexa prize etc. Can we even “compute” talk? - Turing vs Wittgenstein/Ryle debates (Wittgenstein 1953, Button et al. 1995, Shanker 2002) Bottom line: No machine can currently participate in human social life in a convincingly human-like way :( Wittgenstein, L., 1953. Philosophische Untersuchungen. (Philosophical investigations). Button, G., Lee, J.R., Coulter, J. and Sharrock, W., 1995. Computers, minds and conduct. Blackwell Publishers, Inc. Shanker, S.G., 2002. Wittgenstein's Remarks on the Foundations of AI. Routledge. 39 Current state of the art
  • 40. Some conceptual issues: - Searle’s Chinese room argument - the antrophomorphization of technology There are plenty of engineering contests on mimicking human verbal behaviour: Turing test, Winograd schema challenge, Loebner prize, Alexa prize etc. Can we even “compute” talk? - Turing vs Wittgenstein/Ryle debates (Wittgenstein 1953, Button et al. 1995, Shanker 2002) Bottom line: No machine can currently participate in human social life in a convincingly human-like way :( … or wait… maybe it’s better this way Wittgenstein, L., 1953. Philosophische Untersuchungen. (Philosophical investigations). Button, G., Lee, J.R., Coulter, J. and Sharrock, W., 1995. Computers, minds and conduct. Blackwell Publishers, Inc. Shanker, S.G., 2002. Wittgenstein's Remarks on the Foundations of AI. Routledge. 40 Current state of the art
  • 41. Remember Eliza? It is easy to build interactive software that “deceives” humans into believing it pocesses human qualities or intelligence.. 41 Current state of the art
  • 42. Remember Eliza? It is easy to build interactive software that “deceives” humans into believing it pocesses human qualities or intelligence.. Weizenbaum himself on ELIZA: a PARODY of a therapist, a fake, a charade, aim was to illustrate the LIMITS of computers, not their potential (Weizenbaum 1966) Weizenbaum, J., 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), pp.36-45. 42 Current state of the art
  • 43. Two fundamental problems: How to formally represent units in talk? How to formally represent “meaning” in talk? 43 Current state of the art
  • 44. Formal representations of talk? 44 Current state of the art How to represent talk in computer-readable formats:
  • 45. Formal representations of talk? How to represent talk in computer-readable formats: 45 Current state of the art Movie subtitles
  • 46. Formal representations of talk? How to represent talk in computer-readable formats: 46 Current state of the art Movie subtitles Phonetic transcripts (IPA, ARPAbet)
  • 47. Formal representations of talk? How to represent talk in computer-readable formats: 47 Current state of the art "{'input':{'ssml':'<speak>The <say-as interpret-as="characters">Welcome</say-as> standard <break time="1s"/>to the <sub alias="matrix">emph=1</sub> .</speak>', 'voice':{'languageCode':'en-us', 'name':'en-US-Standard-B', 'ssmlGender':'MALE'},'audioConfig':{'audioEncoding':'MP3'} }" Movie subtitles Phonetic transcripts (IPA, ARPAbet) Transcripts+Mark-up language (CHAT,XML, SSML)
  • 48. Formal representations of talk? Challenges: How to systematically represent… 48 Current state of the art
  • 49. Formal representations of talk? Challenges: How to systematically represent… Non-lexical vocalizations? 49 Current state of the art
  • 50. Formal representations of talk? Challenges: How to systematically represent… Non-lexical vocalizations? Phonetic reductions? 50 Current state of the art
  • 51. Formal representations of talk? Challenges: How to systematically represent… Non-lexical vocalizations? Phonetic reductions? What is noise and what is signal? Stretches, stress, pitch, intensity? Which phonetic properties need to be included? 51 Current state of the art
  • 52. Formal representations of talk? Challenges: How to systematically represent… Non-lexical vocalizations? Phonetic reductions? What is noise and what is signal? Stretches, stress, pitch, intensity? Which phonetic properties need to be included? Many subtle cues in social interaction are often not processed at all.. 52 Current state of the art
  • 53. Reducing talk to strings and tags: Limits of representing talk are most visible on the micro and macro ends of the unit spectrum in linguistics. Consider... 53 Current state of the art
  • 54. Reducing talk to strings and tags: Limits of representing talk are most visible on the micro and macro ends of the unit spectrum in linguistics. Consider... Phonetic/phonological units in talk: “uhmm”, “err”, haha, <lip smack>(e.g. Keevallik and Ogden 2020) 54 Current state of the art
  • 55. Reducing talk to strings and tags: Limits of representing talk are most visible on the micro and macro ends of the unit spectrum in linguistics. Consider... Phonetic/phonological units in talk: “uhmm”, “err”, haha, <lip smack>(e.g. Keevallik and Ogden 2020) Pragmatic units in talk: 55 Current state of the art
  • 56. Reducing talk to strings and tags: Limits of representing talk are most visible on the micro and macro ends of the unit spectrum in linguistics. Consider... Phonetic/phonological units in talk: “uhmm”, “err”, haha, <lip smack>(e.g. Keevallik and Ogden 2020) Pragmatic units in talk: ??? 56 Current state of the art
  • 57. Reducing talk to strings and tags: Limits of representing talk are most visible on the micro and macro ends of the unit spectrum in linguistics. Consider... Phonetic/phonological units in talk: “uhmm”, “err”, haha, <lip smack> (e.g. Keevallik and Ogden 2020) Pragmatic units in talk: Proposed units: turns, turn-constructional unit (TCU), maybe clause (The idea of ‘speech acts’ or what is an ‘intent’ is still hugely controversial in the literature, see e.g. Button 1994) Keevallik, L. and Ogden, R., 2020. Sounds on the Margins of Language at the Heart of Interaction. ROLSI, 53(1) Button, G., 1994. What's wrong with speech-act theory. Computer Supported Cooperative Work (CSCW), 3(1), pp.39-42. 57 Current state of the art
  • 58. Formal representations of “meaning”: some examples 58 Current state of the art
  • 59. Formal representations of “meaning”: some examples 90s: Formal cognitive/“deep semantic” models of language Lexical databases of formal representations of cognitive schemas, e.g. FrameNet (Baker er al. 1998) 59 Current state of the art
  • 60. Formal representations of “meaning”: some examples 90s: Formal cognitive/“deep semantic” models of language Lexical databases of formal representations of cognitive schemas, e.g. FrameNet (Baker er al. 1998) 00s: Formal speech act taxonomies Formal models of what people are doing in talk, such as accusing, asserting, apologizing, building arguments and narratives (cf. DAMSL, Dialog Act ISO ( Core and Allen 1997, Bunt et al. 2010)) 60 Current state of the art
  • 61. Formal representations of “meaning”: some examples 90s: Formal cognitive/“deep semantic” models of language Lexical databases of formal representations of cognitive schemas, e.g. FrameNet (Baker er al. 1998) 00s: Formal speech act taxonomies Formal models of what people are doing in talk, such as accusing, asserting, apologizing, building arguments and narratives (cf. DAMSL, Dialog Act ISO ( Core and Allen 1997, Bunt et al. 2010)) 10s: Statistical Language Models Neural language models for dialog (”stochastic parrots” ): DialoGPT, ConveRT (Zhang 2020, Henderson et al. 2020, Bender et al 2021, Bender and Koller 2020) 61 Current state of the art
  • 62. Formal representations of “meaning”: some examples 90s: Formal cognitive/“deep semantic” models of language Lexical databases of formal representations of cognitive schemas, e.g. FrameNet (Baker er al. 1998) 00s: Formal speech act taxonomies Formal models of what people are doing in talk, such as accusing, asserting, apologizing, building arguments and narratives (cf. DAMSL, Dialog Act ISO ( Core and Allen 1997, Bunt et al. 2010)) 10s: Statistical Language Models Neural language models for dialog (”stochastic parrots” ): DialoGPT, ConveRT (Zhang 2020, Henderson et al. 2020, Bender et al 2021, Bender and Koller 2020) No formal framework can represent meaning in natural conversation. The flexibility of language-in-interaction and in-situ meaning-making practices continues to pose a critical challenge in computational lingustics and NLP. 62 Current state of the art Baker, C.F., Fillmore, C.J. and Lowe, J.B., 1998, August. The berkeley framenet project. In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1 (pp. 86-90). Bender, E.M. and Koller, A., 2020, July. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5185-5198). Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S., 2021. On the dangers of stochastic parrots: Can language models be too big. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency; Association for Computing Machinery: New York, NY, USA. Bunt, H., Alexandersson, J., Carletta, J., Choe, J.W., Fang, A.C., Hasida, K., Lee, K., Petukhova, V., Popescu-Belis, A., Romary, L. and Soria, C., 2010, May. Towards an ISO standard for dialogue act annotation. In Seventh conference on International Language Resources and Evaluation (LREC'10). Core, M.G. and Allen, J., 1997, November. Coding dialogs with the DAMSL annotation scheme. In AAAI fall symposium on communicative action in humans and machines (Vol. 56, pp. 28-35). Henderson, M., Casanueva, I., Mrkšić, N., Su, P.H., Wen, T.H. and Vulić, I., 2020, November. ConveRT: Efficient and Accurate Conversational Representations from Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (pp. 2161-2174). Zhang, Y., Sun, S., Galley, M., Chen, Y.C., Brockett, C., Gao, X., Gao, J., Liu, J. and Dolan, W.B., 2020, July. DIALOGPT: Large-Scale Generative Pre-training for Conversational Response Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 270-278).
  • 63. The limits of automation: from NLP to HCI What cannot be formalized, cannot be automated... ...and has to be built by-hand. 63 Human-computer interaction
  • 64. The limits of automation: from NLP to HCI What cannot be formalized, cannot be automated... ...and has to be built by-hand. 64 Human-computer interaction This is where conversation design / interaction design comes in.
  • 65. Conversation design (CD): CD is a design discipline derived from UI/UX design and copywriting - towards better (more useful / more human-like) conversations with bots, make human-bot interaction more successful CD is a design language based on human conversation (similar to how material design is a design language based on pen and paper) 65 Human-computer interaction
  • 66. Conversation design (CD): CD is a design discipline derived from UI/UX design and copywriting - towards better (more useful / more human-like) conversations with bots, make human-bot interaction more successful CD is a design language based on human conversation (similar to how material design is a design language based on pen and paper) CD is the practical solution for the many “scaling problems” that beset automation in Conversational AI 66 Human-computer interaction
  • 67. Build dialogs Which voice? Which bot persona: 67 Human-computer interaction
  • 68. Conversation design (CD) for voice bot development Design for voice user interfaces (VUI) in voice bots: Range of NLU software: Google dialogflow, AmazonAlexa skills, wit.ai, Rasa... Part of chatbot development teams 68 Human-computer interaction
  • 70. Summary: Talking machines? Not happening.. but While technical, conceptual and practical challenges remain... ...current voice interfaces are a significant technological achievements and enable useful products 70 Summary
  • 71. Summary: The Conversational AI industry Conversational AI engineer: develop and implement tools related to speech processing / NLU pipelines, push the limits of what’s technically possible Conversation designer: design and prototype human-bot interaction, improve user experience, examine dialog flows and error sources, voice branding, persona design... 71 Summary
  • 72. Ongoing research at PolyU ★ Empirical and experimental studies: representations of conversations, coordination and prediction for turn-taking (using Furhat, CoBra network, forthcoming) ★ Technical studies: moving away from intent-based NLU (cf. Sankar et al 2019) ★ Bot skill development for Chinese Conversational AI (Liesenfeld and Huang 2020) ★ Design studies: Conversation design for Chinese (collab. with School of Design) If working with Social robots and Conversational AI has peaked your interest, there are opportunities for students to join such current projects at PolyU :) 72 Summary Sankar, C., Subramanian, S., Pal, C., Chandar, S. and Bengio, Y., 2019. Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 32-37). Liesenfeld, A. and Huang, C.R., 2020, NameSpec asks: What's Your Name in Chinese? A Voice Bot to Specify Chinese Personal Names through Dialog. In Proceedings of the 2nd Conference on Conversational User Interfaces, July 20-24..
  • 73. Now lets get building! 73 Practicum: build your own bot
  • 74. Build an interactive bot: Use “Rasa playground” to build a useful bot Put your knowledge of NLU to use: - define training data, intents, entities, responses and dialog states Play the role of both a conversation engineer and designer: - Design, build and test your bot - Play around with different conversational styles and different dialog flows 74 Practicum: build your own bot
  • 75. Details: build a bot that automates pizza orders ➔ the bot should understand requests for 3 different types of pizza ➔ The bot should have at least three states: ◆ “return greeting” ◆ “take and confirm pizza order” ◆ “say goodbye” ➔ Define training data, intents and entities for each state ➔ Train and test your bot 75 Practicum: build your own bot
  • 76. Today’s lecture aimed to provide you with: ➔ a sense of design and implementation techniques for conversational agents ➔ an awareness of current research and emerging issues in the field of Conversational AI ➔ a introduction to the practical skills involved in building working conversational interfaces 76