Five sessions
2.1: Human-AI interaction and design
2.2: Social relations with AI-based
chatbots (Marita Skjuve)
2.3: Chatbot interaction and design
2.4: Interacting with generative AI
2.5: Trustworthy interaction with AI
INTERACTION WITH AI
ANDAUTONOMOUS
SYSTEMS – MODULE 2
Session 3, 2023
Asbjørn Følstad, SINTEF
2.
2
Interaction with
AI –module 2
Five sessions
2.1: Human-AI interaction and design
2.2: Social relations with AI-based
chatbots (Marita Skjuve)
2.3: Chatbot interaction and design
2.4: Interacting with generative AI
2.5: Trustworthy interaction with AI
4
VS.
Learning | Improving| Black box | Fuelled by large data sets
Dynamic Mistakes inevitable Data gathering through interaction
Opaque
5.
Individual assignment –task 2:
Human-AI interaction design
• Amershi et al. (2019) and Kocielnik et al.
(2019) discuss interaction design for AI-
infused systems. Summarize main take-
aways from the two papers.
• Select two of the design guidelines in
Amershi et al. (2019). Discuss how the
AI-infused system you used as example
in the previous task adheres to, or
deviates from these two design
guidelines. Briefly discuss whether/how
these two design guidelines could
inspire improvements in the example
system.
• Bender et al. (2021) conduct a critical
discussion of a specific type of AI-
infused systems – those based on large
language models. Summarize their
argument concerning problematic
aspects of textual content and solutions
based on large langue models. https://www.microsoft.com/en-us/haxtoolkit/ai-guidelines/
Later
lecture
Learning | Improving| Black box | Fuelled by large data sets
7
Google Maps - Timeline
8.
8
Learning system -design
for change
• M1: make clear what
the system can do
• M2: make clear how
well the system can
do what it can do
Learning | Improving | Black box | Fuelled by large data sets
9.
9
Learning system -design
for change
• M1: make clear what
the system can do
• M2: make clear how
well the system can
do what it can do
Learning | Improving | Black box | Fuelled by large data sets
10.
10
Learning system -design
for change
• M1: make clear what
the system can do
• M2: make clear how
well the system can
do what it can do
Learning | Improving | Black box | Fuelled by large data sets
11.
11
Learning system -design
for change
• M1: make clear what
the system can do
• M2: make clear how
well the system can
do what it can do
Learning | Improving | Black box | Fuelled by large data sets
13
Mistakes inevitable -
designfor uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
14.
14
Mistakes inevitable -
designfor uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
15.
15
Mistakes inevitable -
designfor uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
16.
16
Mistakes inevitable -
designfor uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
17.
17
Mistakes inevitable -
designfor uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
19
Difficult to understand
andvalidate output –
design for
explainability
• M11: Make clear
why the system did
what it did
Learning | Improving | Black box | Fuelled by large data sets
20.
20
Difficult to understand
andvalidate output –
design for
explainability
• M11: Make clear
why the system did
what it did
Learning | Improving | Black box | Fuelled by large data sets
21.
21
Difficult to understand
andvalidate output –
design for
explainability
• M11: Make clear
why the system did
what it did
Learning | Improving | Black box | Fuelled by large data sets
23
Data wanted –
designfor data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
Learning | Improving | Black box | Fuelled by large data sets
24.
24
Data wanted –
designfor data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
Learning | Improving | Black box | Fuelled by large data sets
25.
25
Data wanted –
designfor data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
https://www.technologyreview.com/s/610634/microsofts-neo-
nazi-sexbot-was-a-great-lesson-for-makers-of-ai-assistants/
Learning | Improving | Black box | Fuelled by large data sets
26.
26
Data wanted –
designfor data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
Learning | Improving | Black box | Fuelled by large data sets
27.
27
Data wanted –
designfor data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
Learning | Improving | Black box | Fuelled by large data sets
28.
Practical application ofthe Amershi et
al. (2019) design guidelines for
Human-AI interaction
https://www.microsoft.com/en-us/haxtoolkit/ai-guidelines/
Consider a specific AI-based
service: ChatGPT. Which guidelines
are, in your view, relevant for this
service?
Discuss the service from
perspective of 1 of the relevant
guidelines. How does it comply wit
the guideline? Could it be
improved?
Go to design guidelines – online
version.
Plenary reflections based on the
group discussions
33
Høiland, C. (2019)“Hi, can I help?” Exploring how to design a mental
health chatbot for youths. Human Technology, 16(2), 139-169.
Implications of chatbots
Conversation as design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
36
Implications of chatbots
Conversationas design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
SR-Bank chatbot with escalation
to human operator
44
Different types of
chatbots
•Scripted
• Intents and actions
• Generative
Question in free text
Answer generated by
language model
45.
45
Different types of
chatbots
•Scripted, mainly flows, useful
e.g. for coaching or training
• Intents and actions, identifies
user objective through AI,
predefined replies
• Generative, based on large
language models such as
Open AI’s GPT-3
Intention-based
Generative
Easy to assess answer quality
Quality assured answers from knowledge
base
General answers / limited personalization
Limited set of answers
Adaptation resource demanding
Personal / unique answers
Difficult to assess answser quality
Answers not quality assured / hallucinations
New technology, lack experience in
resource demands for adaptation
Answers (nearly) any question
+
+
–
–
Rule-based / scripted
Convey content in dialogue format
Quality assured dialogue
General answers / limited personalization
Limited set of answers
Resource demanding to make good content
+
–
47
Chatbot human
likeness
The Turingtest as benchmark?
• Human likeness telltale of artificial
intelligence?
• Human likeness appreciated characteristic
of chatbots?
Dall-E: “Alan Turing looking puzzled, digital art”
48.
48
Chatbot human
likeness
The Turingtest as benchmark?
Human likeness with impact on user
experience and behaviour
Go, E., & Sundar, S. S. (2019). Humanizing chatbots: The effects of visual,
identity and conversational cues on humanness perceptions. Computers in
Human Behavior, 97, 304-316.
Humanlike chatbot beneficial for attitude towards
service and intention to use - provided the chatbot is
able to deliver on higher user expectations in
consequence of human likeness
50
Chatbot human
likeness
The Turingtest as benchmark?
Human likeness with impact on user
experience and behaviour
… however, pragmatic motivation key
for chatbot use
Effect of humanlikeness
and conversational
performance on trust in
chatbot?
1. Are chatbots’ humanlike features as
important for trust as their ability to
reliably provide support?
2. Is trust in chatbots for customer service
dominated by their conversational
performance rather than their
humanlikeness?
53.
Approach – online
experiment
•Participants from Prolific
• Landing page with instructions
• Interaction with chatbot
(different conditions)
• Questionnaire
54.
Method details
2x3 factorialdesign
Dependent variables:
• Trust
• Anthropomorphism
• Social presence ++
251 participants. ~40 for each
condition (random assignment)
All participants asked to use
chatbot for three tasks, then
answer questionnaire
All participants with English as
first language
All participants on desktop /
laptop computer
Conversational breakdown
No Yes, with repair Yes, without repair
Chatbot
human
likeness
No
Yes
User: How to apply
for loan?
Chatbot: Here at
Boost Bank, we make
loan applications fast
and easy …
User: How to apply
for loan?
Chatbot: I am sorry …
User: Apply loan!
Chatbot: Here at
Boost Bank, we make
loan application …
User: How to apply
for loan?
Chatbot: I am sorry
that I was not …
User: Apply loan!
Chatbot: I am sorry,
but it seems …
User: How to apply
for loan?
Chatbot: Loan
application is fast and
easy …
User: How to apply
for loan?
Chatbot: Request not
identified …
User: Apply loan!
Chatbot: Loan
application is …
User: How to apply
for loan?
Chatbot: Request not
identified …
User: Apply loan!
Chatbot: Unable to
respond …
Findings (I)
Human likenesswith main effect
on trust
Conversational performance with
main effect on trust
No interaction effect
58.
Findings (I)
Human likenesswith main effect
on trust
Conversational performance with
main effect on trust
No interaction effect
Implications
Trust may be impacted by variation in human
likeness
… but much more so by variation in conversational
performance.
-> To build a trusted bot, focus on avoiding or
repairing conversational breakdown
No evidence found for trust resilience resulting from
human likeness in chatbot design
… suggesting that users may not be particularly
forgiving for a failing humanlike chatbot, but not
particularly enraged either.
Findings (II)
Anthropomorphism (perceived
humanlikeness) affected by both
human likeness and by breakdown
Implications
While we as interaction designers may naïvely see
human likeness as something residing in the look
and feel of the chatbot (avatar, name, conversational
style)
… users’ perceptions of human likeness is as strongly
impacted by conversational performance.
For a chatbot to be perceived as human-like, it needs
to avoid breakdowns and handle repair.
61.
Research questions -
revisited
1.Are chatbots’ humanlike features as
important for trust as their ability to
reliably provide support?
2. Is trust in chatbots for customer service
dominated by their conversational
performance rather than their
humanlikeness?
Both humanlikeness and
conversational performance
impact general trust.
62.
Research questions -
revisited
1.Are chatbots’ humanlike features as
important for trust as their ability to
reliably provide support?
2. Is trust in chatbots for customer service
dominated by their conversational
performance rather than their
humanlikeness?
While human likeness significantly
impacts trust, effect of
conversational performance more
pervasive.