INTERACTION WITH AI
AND AUTONOMOUS
SYSTEMS – MODULE 2
Session 3, 2023
Asbjørn Følstad, SINTEF
2
Interaction with
AI – module 2
Five sessions
2.1: Human-AI interaction and design
2.2: Social relations with AI-based
chatbots (Marita Skjuve)
2.3: Chatbot interaction and design
2.4: Interacting with generative AI
2.5: Trustworthy interaction with AI
3
Human-AI interaction and design
– continued
4
VS.
Learning | Improving | Black box | Fuelled by large data sets
Dynamic Mistakes inevitable Data gathering through interaction
Opaque
Individual assignment – task 2:
Human-AI interaction design
• Amershi et al. (2019) and Kocielnik et al.
(2019) discuss interaction design for AI-
infused systems. Summarize main take-
aways from the two papers.
• Select two of the design guidelines in
Amershi et al. (2019). Discuss how the
AI-infused system you used as example
in the previous task adheres to, or
deviates from these two design
guidelines. Briefly discuss whether/how
these two design guidelines could
inspire improvements in the example
system.
• Bender et al. (2021) conduct a critical
discussion of a specific type of AI-
infused systems – those based on large
language models. Summarize their
argument concerning problematic
aspects of textual content and solutions
based on large langue models. https://www.microsoft.com/en-us/haxtoolkit/ai-guidelines/
Later
lecture
https://www.microsoft.com/en-us/haxtoolkit/library/
Learning | Improving | Black box | Fuelled by large data sets
7
Google Maps - Timeline
8
Learning system - design
for change
• M1: make clear what
the system can do
• M2: make clear how
well the system can
do what it can do
Learning | Improving | Black box | Fuelled by large data sets
9
Learning system - design
for change
• M1: make clear what
the system can do
• M2: make clear how
well the system can
do what it can do
Learning | Improving | Black box | Fuelled by large data sets
10
Learning system - design
for change
• M1: make clear what
the system can do
• M2: make clear how
well the system can
do what it can do
Learning | Improving | Black box | Fuelled by large data sets
11
Learning system - design
for change
• M1: make clear what
the system can do
• M2: make clear how
well the system can
do what it can do
Learning | Improving | Black box | Fuelled by large data sets
12
Learning | Improving | Black box | Fuelled by large data sets
13
Mistakes inevitable -
design for uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
14
Mistakes inevitable -
design for uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
15
Mistakes inevitable -
design for uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
16
Mistakes inevitable -
design for uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
17
Mistakes inevitable -
design for uncertainty
• M9: Support
efficient correction
• M10: Scope services
when in doubt
Learning | Improving | Black box | Fuelled by large data sets
18
Learning | Improving | Black box | Fuelled by large data sets
19
Difficult to understand
and validate output –
design for
explainability
• M11: Make clear
why the system did
what it did
Learning | Improving | Black box | Fuelled by large data sets
20
Difficult to understand
and validate output –
design for
explainability
• M11: Make clear
why the system did
what it did
Learning | Improving | Black box | Fuelled by large data sets
21
Difficult to understand
and validate output –
design for
explainability
• M11: Make clear
why the system did
what it did
Learning | Improving | Black box | Fuelled by large data sets
22
Learning | Improving | Black box | Fuelled by large data sets
23
Data wanted –
design for data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
Learning | Improving | Black box | Fuelled by large data sets
24
Data wanted –
design for data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
Learning | Improving | Black box | Fuelled by large data sets
25
Data wanted –
design for data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
https://www.technologyreview.com/s/610634/microsofts-neo-
nazi-sexbot-was-a-great-lesson-for-makers-of-ai-assistants/
Learning | Improving | Black box | Fuelled by large data sets
26
Data wanted –
design for data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
Learning | Improving | Black box | Fuelled by large data sets
27
Data wanted –
design for data capture
• Accommodate
gathering of data
from users
• … but with concern
for the risk of being
gamed
• Make users benefit
from data
• Design for privacy
Learning | Improving | Black box | Fuelled by large data sets
Practical application of the Amershi et
al. (2019) design guidelines for
Human-AI interaction
https://www.microsoft.com/en-us/haxtoolkit/ai-guidelines/
Consider a specific AI-based
service: ChatGPT. Which guidelines
are, in your view, relevant for this
service?
Discuss the service from
perspective of 1 of the relevant
guidelines. How does it comply wit
the guideline? Could it be
improved?
Go to design guidelines – online
version.
Plenary reflections based on the
group discussions
29
Chatbot interaction and design
30
Implications of chatbots
Conversation as design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
31
Implications of chatbots
Conversation as design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
32
Implications of chatbots
Conversation as design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
33
Høiland, C. (2019) “Hi, can I help?” Exploring how to design a mental
health chatbot for youths. Human Technology, 16(2), 139-169.
Implications of chatbots
Conversation as design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
34
Implications of chatbots
Conversation as design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
35
https://www.wired.com/2016/03/fault-microsofts-teen-ai-turned-jerk/
Implications of chatbots
Conversation as design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
36
Implications of chatbots
Conversation as design object
Necessary to move from UI design to
service design
Necessary to design for networks of
humans and bots
SR-Bank chatbot with escalation
to human operator
37
Different types of
chatbots
• Scripted
• Intents and actions
• Generative
38
Different types of
chatbots
• Scripted
• Intents and actions
• Generative
39
Different types of
chatbots
• Scripted
• Intents and actions
• Generative
Predefined message
Predefined message
Predefined message
Predefined message
Predefined message
Predefined message
->
Alt. 1
->
->
->
Alt. 2
->
40
Different types of
chatbots
• Scripted
• Intents and actions
• Generative
41
Different types of
chatbots
• Scripted
• Intents and actions
• Generative
!
Question in free text
Predefined response
42
Different types of
chatbots
• Scripted
• Intents and actions
• Generative
43
Different types of
chatbots
• Scripted
• Intents and actions
• Generative
44
Different types of
chatbots
• Scripted
• Intents and actions
• Generative
Question in free text
Answer generated by
language model
45
Different types of
chatbots
• Scripted, mainly flows, useful
e.g. for coaching or training
• Intents and actions, identifies
user objective through AI,
predefined replies
• Generative, based on large
language models such as
Open AI’s GPT-3
Intention-based
Generative
Easy to assess answer quality
Quality assured answers from knowledge
base
General answers / limited personalization
Limited set of answers
Adaptation resource demanding
Personal / unique answers
Difficult to assess answser quality
Answers not quality assured / hallucinations
New technology, lack experience in
resource demands for adaptation
Answers (nearly) any question
+
+
–
–
Rule-based / scripted
Convey content in dialogue format
Quality assured dialogue
General answers / limited personalization
Limited set of answers
Resource demanding to make good content
+
–
46
Chatbot human
likeness
Why are chatbots often
designed to resemble
interaction with humans?
47
Chatbot human
likeness
The Turing test as benchmark?
• Human likeness telltale of artificial
intelligence?
• Human likeness appreciated characteristic
of chatbots?
Dall-E: “Alan Turing looking puzzled, digital art”
48
Chatbot human
likeness
The Turing test as benchmark?
Human likeness with impact on user
experience and behaviour
Go, E., & Sundar, S. S. (2019). Humanizing chatbots: The effects of visual,
identity and conversational cues on humanness perceptions. Computers in
Human Behavior, 97, 304-316.
Humanlike chatbot beneficial for attitude towards
service and intention to use - provided the chatbot is
able to deliver on higher user expectations in
consequence of human likeness
49
Chatbot human
likeness
The Turing test as benchmark?
Human likeness with impact on user
experience and behaviour
50
Chatbot human
likeness
The Turing test as benchmark?
Human likeness with impact on user
experience and behaviour
… however, pragmatic motivation key
for chatbot use
51
Chatbot interaction design study 1:
Effect of human likeness and conversational
performance
Effect of human likeness
and conversational
performance on trust in
chatbot?
1. Are chatbots’ humanlike features as
important for trust as their ability to
reliably provide support?
2. Is trust in chatbots for customer service
dominated by their conversational
performance rather than their
humanlikeness?
Approach – online
experiment
• Participants from Prolific
• Landing page with instructions
• Interaction with chatbot
(different conditions)
• Questionnaire
Method details
2x3 factorial design
Dependent variables:
• Trust
• Anthropomorphism
• Social presence ++
251 participants. ~40 for each
condition (random assignment)
All participants asked to use
chatbot for three tasks, then
answer questionnaire
All participants with English as
first language
All participants on desktop /
laptop computer
Conversational breakdown
No Yes, with repair Yes, without repair
Chatbot
human
likeness
No
Yes
User: How to apply
for loan?
Chatbot: Here at
Boost Bank, we make
loan applications fast
and easy …
User: How to apply
for loan?
Chatbot: I am sorry …
User: Apply loan!
Chatbot: Here at
Boost Bank, we make
loan application …
User: How to apply
for loan?
Chatbot: I am sorry
that I was not …
User: Apply loan!
Chatbot: I am sorry,
but it seems …
User: How to apply
for loan?
Chatbot: Loan
application is fast and
easy …
User: How to apply
for loan?
Chatbot: Request not
identified …
User: Apply loan!
Chatbot: Loan
application is …
User: How to apply
for loan?
Chatbot: Request not
identified …
User: Apply loan!
Chatbot: Unable to
respond …
Findings (I)
Human likeness with main effect
on trust
Findings (I)
Human likeness with main effect
on trust
Conversational performance with
main effect on trust
Findings (I)
Human likeness with main effect
on trust
Conversational performance with
main effect on trust
No interaction effect
Findings (I)
Human likeness with main effect
on trust
Conversational performance with
main effect on trust
No interaction effect
Implications
Trust may be impacted by variation in human
likeness
… but much more so by variation in conversational
performance.
-> To build a trusted bot, focus on avoiding or
repairing conversational breakdown
No evidence found for trust resilience resulting from
human likeness in chatbot design
… suggesting that users may not be particularly
forgiving for a failing humanlike chatbot, but not
particularly enraged either.
Findings (II)
Anthropomorphism (perceived
human likeness) affected by both
human likeness and by breakdown
Findings (II)
Anthropomorphism (perceived
human likeness) affected by both
human likeness and by breakdown
Implications
While we as interaction designers may naïvely see
human likeness as something residing in the look
and feel of the chatbot (avatar, name, conversational
style)
… users’ perceptions of human likeness is as strongly
impacted by conversational performance.
For a chatbot to be perceived as human-like, it needs
to avoid breakdowns and handle repair.
Research questions -
revisited
1. Are chatbots’ humanlike features as
important for trust as their ability to
reliably provide support?
2. Is trust in chatbots for customer service
dominated by their conversational
performance rather than their
humanlikeness?
Both humanlikeness and
conversational performance
impact general trust.
Research questions -
revisited
1. Are chatbots’ humanlike features as
important for trust as their ability to
reliably provide support?
2. Is trust in chatbots for customer service
dominated by their conversational
performance rather than their
humanlikeness?
While human likeness significantly
impacts trust, effect of
conversational performance more
pervasive.

interacting-with-ai-2023---module-2---session-3---handout.pdf

  • 1.
    INTERACTION WITH AI ANDAUTONOMOUS SYSTEMS – MODULE 2 Session 3, 2023 Asbjørn Følstad, SINTEF
  • 2.
    2 Interaction with AI –module 2 Five sessions 2.1: Human-AI interaction and design 2.2: Social relations with AI-based chatbots (Marita Skjuve) 2.3: Chatbot interaction and design 2.4: Interacting with generative AI 2.5: Trustworthy interaction with AI
  • 3.
    3 Human-AI interaction anddesign – continued
  • 4.
    4 VS. Learning | Improving| Black box | Fuelled by large data sets Dynamic Mistakes inevitable Data gathering through interaction Opaque
  • 5.
    Individual assignment –task 2: Human-AI interaction design • Amershi et al. (2019) and Kocielnik et al. (2019) discuss interaction design for AI- infused systems. Summarize main take- aways from the two papers. • Select two of the design guidelines in Amershi et al. (2019). Discuss how the AI-infused system you used as example in the previous task adheres to, or deviates from these two design guidelines. Briefly discuss whether/how these two design guidelines could inspire improvements in the example system. • Bender et al. (2021) conduct a critical discussion of a specific type of AI- infused systems – those based on large language models. Summarize their argument concerning problematic aspects of textual content and solutions based on large langue models. https://www.microsoft.com/en-us/haxtoolkit/ai-guidelines/ Later lecture
  • 6.
  • 7.
    Learning | Improving| Black box | Fuelled by large data sets 7 Google Maps - Timeline
  • 8.
    8 Learning system -design for change • M1: make clear what the system can do • M2: make clear how well the system can do what it can do Learning | Improving | Black box | Fuelled by large data sets
  • 9.
    9 Learning system -design for change • M1: make clear what the system can do • M2: make clear how well the system can do what it can do Learning | Improving | Black box | Fuelled by large data sets
  • 10.
    10 Learning system -design for change • M1: make clear what the system can do • M2: make clear how well the system can do what it can do Learning | Improving | Black box | Fuelled by large data sets
  • 11.
    11 Learning system -design for change • M1: make clear what the system can do • M2: make clear how well the system can do what it can do Learning | Improving | Black box | Fuelled by large data sets
  • 12.
    12 Learning | Improving| Black box | Fuelled by large data sets
  • 13.
    13 Mistakes inevitable - designfor uncertainty • M9: Support efficient correction • M10: Scope services when in doubt Learning | Improving | Black box | Fuelled by large data sets
  • 14.
    14 Mistakes inevitable - designfor uncertainty • M9: Support efficient correction • M10: Scope services when in doubt Learning | Improving | Black box | Fuelled by large data sets
  • 15.
    15 Mistakes inevitable - designfor uncertainty • M9: Support efficient correction • M10: Scope services when in doubt Learning | Improving | Black box | Fuelled by large data sets
  • 16.
    16 Mistakes inevitable - designfor uncertainty • M9: Support efficient correction • M10: Scope services when in doubt Learning | Improving | Black box | Fuelled by large data sets
  • 17.
    17 Mistakes inevitable - designfor uncertainty • M9: Support efficient correction • M10: Scope services when in doubt Learning | Improving | Black box | Fuelled by large data sets
  • 18.
    18 Learning | Improving| Black box | Fuelled by large data sets
  • 19.
    19 Difficult to understand andvalidate output – design for explainability • M11: Make clear why the system did what it did Learning | Improving | Black box | Fuelled by large data sets
  • 20.
    20 Difficult to understand andvalidate output – design for explainability • M11: Make clear why the system did what it did Learning | Improving | Black box | Fuelled by large data sets
  • 21.
    21 Difficult to understand andvalidate output – design for explainability • M11: Make clear why the system did what it did Learning | Improving | Black box | Fuelled by large data sets
  • 22.
    22 Learning | Improving| Black box | Fuelled by large data sets
  • 23.
    23 Data wanted – designfor data capture • Accommodate gathering of data from users • … but with concern for the risk of being gamed • Make users benefit from data • Design for privacy Learning | Improving | Black box | Fuelled by large data sets
  • 24.
    24 Data wanted – designfor data capture • Accommodate gathering of data from users • … but with concern for the risk of being gamed • Make users benefit from data • Design for privacy Learning | Improving | Black box | Fuelled by large data sets
  • 25.
    25 Data wanted – designfor data capture • Accommodate gathering of data from users • … but with concern for the risk of being gamed • Make users benefit from data • Design for privacy https://www.technologyreview.com/s/610634/microsofts-neo- nazi-sexbot-was-a-great-lesson-for-makers-of-ai-assistants/ Learning | Improving | Black box | Fuelled by large data sets
  • 26.
    26 Data wanted – designfor data capture • Accommodate gathering of data from users • … but with concern for the risk of being gamed • Make users benefit from data • Design for privacy Learning | Improving | Black box | Fuelled by large data sets
  • 27.
    27 Data wanted – designfor data capture • Accommodate gathering of data from users • … but with concern for the risk of being gamed • Make users benefit from data • Design for privacy Learning | Improving | Black box | Fuelled by large data sets
  • 28.
    Practical application ofthe Amershi et al. (2019) design guidelines for Human-AI interaction https://www.microsoft.com/en-us/haxtoolkit/ai-guidelines/ Consider a specific AI-based service: ChatGPT. Which guidelines are, in your view, relevant for this service? Discuss the service from perspective of 1 of the relevant guidelines. How does it comply wit the guideline? Could it be improved? Go to design guidelines – online version. Plenary reflections based on the group discussions
  • 29.
  • 30.
    30 Implications of chatbots Conversationas design object Necessary to move from UI design to service design Necessary to design for networks of humans and bots
  • 31.
    31 Implications of chatbots Conversationas design object Necessary to move from UI design to service design Necessary to design for networks of humans and bots
  • 32.
    32 Implications of chatbots Conversationas design object Necessary to move from UI design to service design Necessary to design for networks of humans and bots
  • 33.
    33 Høiland, C. (2019)“Hi, can I help?” Exploring how to design a mental health chatbot for youths. Human Technology, 16(2), 139-169. Implications of chatbots Conversation as design object Necessary to move from UI design to service design Necessary to design for networks of humans and bots
  • 34.
    34 Implications of chatbots Conversationas design object Necessary to move from UI design to service design Necessary to design for networks of humans and bots
  • 35.
    35 https://www.wired.com/2016/03/fault-microsofts-teen-ai-turned-jerk/ Implications of chatbots Conversationas design object Necessary to move from UI design to service design Necessary to design for networks of humans and bots
  • 36.
    36 Implications of chatbots Conversationas design object Necessary to move from UI design to service design Necessary to design for networks of humans and bots SR-Bank chatbot with escalation to human operator
  • 37.
    37 Different types of chatbots •Scripted • Intents and actions • Generative
  • 38.
    38 Different types of chatbots •Scripted • Intents and actions • Generative
  • 39.
    39 Different types of chatbots •Scripted • Intents and actions • Generative Predefined message Predefined message Predefined message Predefined message Predefined message Predefined message -> Alt. 1 -> -> -> Alt. 2 ->
  • 40.
    40 Different types of chatbots •Scripted • Intents and actions • Generative
  • 41.
    41 Different types of chatbots •Scripted • Intents and actions • Generative ! Question in free text Predefined response
  • 42.
    42 Different types of chatbots •Scripted • Intents and actions • Generative
  • 43.
    43 Different types of chatbots •Scripted • Intents and actions • Generative
  • 44.
    44 Different types of chatbots •Scripted • Intents and actions • Generative Question in free text Answer generated by language model
  • 45.
    45 Different types of chatbots •Scripted, mainly flows, useful e.g. for coaching or training • Intents and actions, identifies user objective through AI, predefined replies • Generative, based on large language models such as Open AI’s GPT-3 Intention-based Generative Easy to assess answer quality Quality assured answers from knowledge base General answers / limited personalization Limited set of answers Adaptation resource demanding Personal / unique answers Difficult to assess answser quality Answers not quality assured / hallucinations New technology, lack experience in resource demands for adaptation Answers (nearly) any question + + – – Rule-based / scripted Convey content in dialogue format Quality assured dialogue General answers / limited personalization Limited set of answers Resource demanding to make good content + –
  • 46.
    46 Chatbot human likeness Why arechatbots often designed to resemble interaction with humans?
  • 47.
    47 Chatbot human likeness The Turingtest as benchmark? • Human likeness telltale of artificial intelligence? • Human likeness appreciated characteristic of chatbots? Dall-E: “Alan Turing looking puzzled, digital art”
  • 48.
    48 Chatbot human likeness The Turingtest as benchmark? Human likeness with impact on user experience and behaviour Go, E., & Sundar, S. S. (2019). Humanizing chatbots: The effects of visual, identity and conversational cues on humanness perceptions. Computers in Human Behavior, 97, 304-316. Humanlike chatbot beneficial for attitude towards service and intention to use - provided the chatbot is able to deliver on higher user expectations in consequence of human likeness
  • 49.
    49 Chatbot human likeness The Turingtest as benchmark? Human likeness with impact on user experience and behaviour
  • 50.
    50 Chatbot human likeness The Turingtest as benchmark? Human likeness with impact on user experience and behaviour … however, pragmatic motivation key for chatbot use
  • 51.
    51 Chatbot interaction designstudy 1: Effect of human likeness and conversational performance
  • 52.
    Effect of humanlikeness and conversational performance on trust in chatbot? 1. Are chatbots’ humanlike features as important for trust as their ability to reliably provide support? 2. Is trust in chatbots for customer service dominated by their conversational performance rather than their humanlikeness?
  • 53.
    Approach – online experiment •Participants from Prolific • Landing page with instructions • Interaction with chatbot (different conditions) • Questionnaire
  • 54.
    Method details 2x3 factorialdesign Dependent variables: • Trust • Anthropomorphism • Social presence ++ 251 participants. ~40 for each condition (random assignment) All participants asked to use chatbot for three tasks, then answer questionnaire All participants with English as first language All participants on desktop / laptop computer Conversational breakdown No Yes, with repair Yes, without repair Chatbot human likeness No Yes User: How to apply for loan? Chatbot: Here at Boost Bank, we make loan applications fast and easy … User: How to apply for loan? Chatbot: I am sorry … User: Apply loan! Chatbot: Here at Boost Bank, we make loan application … User: How to apply for loan? Chatbot: I am sorry that I was not … User: Apply loan! Chatbot: I am sorry, but it seems … User: How to apply for loan? Chatbot: Loan application is fast and easy … User: How to apply for loan? Chatbot: Request not identified … User: Apply loan! Chatbot: Loan application is … User: How to apply for loan? Chatbot: Request not identified … User: Apply loan! Chatbot: Unable to respond …
  • 55.
    Findings (I) Human likenesswith main effect on trust
  • 56.
    Findings (I) Human likenesswith main effect on trust Conversational performance with main effect on trust
  • 57.
    Findings (I) Human likenesswith main effect on trust Conversational performance with main effect on trust No interaction effect
  • 58.
    Findings (I) Human likenesswith main effect on trust Conversational performance with main effect on trust No interaction effect Implications Trust may be impacted by variation in human likeness … but much more so by variation in conversational performance. -> To build a trusted bot, focus on avoiding or repairing conversational breakdown No evidence found for trust resilience resulting from human likeness in chatbot design … suggesting that users may not be particularly forgiving for a failing humanlike chatbot, but not particularly enraged either.
  • 59.
    Findings (II) Anthropomorphism (perceived humanlikeness) affected by both human likeness and by breakdown
  • 60.
    Findings (II) Anthropomorphism (perceived humanlikeness) affected by both human likeness and by breakdown Implications While we as interaction designers may naïvely see human likeness as something residing in the look and feel of the chatbot (avatar, name, conversational style) … users’ perceptions of human likeness is as strongly impacted by conversational performance. For a chatbot to be perceived as human-like, it needs to avoid breakdowns and handle repair.
  • 61.
    Research questions - revisited 1.Are chatbots’ humanlike features as important for trust as their ability to reliably provide support? 2. Is trust in chatbots for customer service dominated by their conversational performance rather than their humanlikeness? Both humanlikeness and conversational performance impact general trust.
  • 62.
    Research questions - revisited 1.Are chatbots’ humanlike features as important for trust as their ability to reliably provide support? 2. Is trust in chatbots for customer service dominated by their conversational performance rather than their humanlikeness? While human likeness significantly impacts trust, effect of conversational performance more pervasive.