Unraveling Multimodality with Large Language Models.pdf
The Effect of Audiences on User Experience of Chatbots
1. The Effect of Audiences
On the User Experience of Conversational Interfaces in Physical Spaces
Heloisa Candello, Claudio Pinhanez, Mauro Pichiliani,
Paulo Cavalin, Flavio Figueiredo, Marisa
Vasconcellos, Haylla Conde
6. Experimental Procedure
• 92 semi-structured interviews
• Observation studies
• Conversation-logs
Study 1
Age Gender Previous knowledge of the plot
16-25 27-48 49-72 Female Male Familiar More
Less
Not
Familiar
12%
50%
38%
46%
54%
17%
22%
61%
7. Semi-structured interviews
Understanding user’s perceptions
Q1 Are you familiar with the plot?
Q2 Describe your experience to a friend who will not be
able to visit it.
Q3 How do you think the exhibit works?
Self-reported questions to evaluate social interaction
Q5 I felt part of the conversation.
Q6 The characters only talked to each other.
Q7 The characters talked to me.
Q8 The characters answered my questions.
Q9 The characters answered about any subject.
Q10 I asked everything I wanted.
Study 1
Participant signing the consent form
8. Statistical Analysis
• Q5 to Q10 of the self-reported questions
• Fisher exact test, p<0.05
• n=92
Study 1
Disagree Neutral Agree
Q5 I felt part of the conversation. 20% 10% 70%
Q6 The characters only talked to each other. 69% 5% 26%
Q7 The characters talked to me. 18% 5% 77%
Q8 The characters answered my questions. 32% 4% 64%
Q9 The characters answered about any subject. 39% 25% 36%
Q10 I asked everything I wanted. 20% 4% 76%
9. Four main themes
1- Curiosity and Novelty
2- Interest on the plot
3- Expected chatbot answers
4- Audience effects
Do you believe in free love? (P35)
Did she betray him? (P15)
The soundtrack was flawless […] I also noticed
the animation of the words, I found this to be
very cool? (P21)
[…] If I had known that I was not disturbing
I would ask more questions (P23)
I love you, Tatiana (P29)
Study 1
10. Audience effectsN. B. Cottrell, D. L. Wack, G. J. Sekerak, and R. H Rittle. 1968. Social facilitation of dominant responses by
the presence of an audience and the mere presence of others. Journal of Personality and Socia Psychology
9, 3 (1968), 245–250.
J.W. Michaels, J.M. Blommel, R.M. Brocato, R.A. Linkous, and J.S Rowe. 1982. Social facilitation and
inhibition in a natural setting. Replications in social psychology 2 (1982), 21–24.
11. 3 types of audiences were identified
(B) Observed by acquaintances (C) Observed by strangers
in the queue
(D) Observed by strangers
around the table
12. RQ 1
What are the effects of AUDIENCES
on visitors’ perceptions of social
interaction with chatbots?
13. Q5 I felt part of the conversation
Study 1
77% with knowledge
of the plot agree
*p<0.05
(B) Observed by acquaintances (C) Observed by strangers
in the queue
(D) Observed by strangers
around the table
Overall (n=51)
With knowledge of the plot (n=44)
14. Q7 The characters talked to me
Study 1
93% with knowledge
of the plot agree
*p<0.05
(B) Observed by acquaintances (C) Observed by strangers
in the queue
(D) Observed by strangers
around the table
Overall (n=51),
With knowledge of the plot (n=44)
Females (n=32), Males (n=19)
88% of females agree
88% agree
15. Q8 The characters answered my questions
Study 1
*p<0.05
(B) Observed by acquaintances (C) Observed by strangers
in the queue
(D) Observed by strangers
around the table
Overall (n=51),
With knowledge of the plot (n=44)
Females (n=32), Males (n=19)Overall (n=65)
59% disagree more
than comparing to (C)
and (D) (78%)
80% with knowledge of
the plot agree
16. Q9 The characters answered about any subject
*p<0.05
(B) Observed by acquaintances (C) Observed by strangers
in the queue
(D) Observed by strangers
around the table
Females (n=41), Males (n=32)
44% of females disagree
22% of males disagree
Am I handsome? (P41, male)
Who is the father of your son?
(P41, female)
Study 1
17. RQ 2
Does the presence of AUDIENCES
influence the type and content of
visitors’ questions to chatbots?
1,542 user questions
18. Experimental Procedure
Study 2
• Clustered user questions to topics by a clustering algorithm - 32 clusters
• 54 hours of video recording – manually coding of audience conditions (A, B, D)
• 1,542 user questions
• Clusters manually validated and merged in 4 main topics:
(S1) questions out of scope – 271 questions
(S2) questions about characters of the book – 978 questions
(S3) greetings – 101 greetings
(S4) reaction to failure (More coffee?) – 165 questions
Clustering Methodology
“More Coffee, Cláudio?”
Example of failure and direct address
19. Experimental Procedure
Study 2
Statistical Analysis
• Logistic regression models
• Response variables: topics (S1, S2, S3, S4)
(S1) questions out of scope – 271 questions
(S2) questions about characters of the book – 978 questions
(S3) greetings – 101 greetings
(S4) reaction to failure (More coffee?) – 165 questions
• Explanatory variables: direct address, gender
and audience conditions (A, B, D)
“More Coffee, Cláudio?”
Example of failure and direct address
20. Findings
Male users when observed by
strangers around the table (D),
asked less out of scope questions.
Positive values indicate that the predictor tends to
increase the change of the category, negative ones
show the opposite effect.
Study 2
21. Findings
Positive values indicate that the predictor tends to
increase the change of the category, negative ones
show the opposite effect.
Users observed by acquaintances
(B) had the tendency to continue
engaged after failures.
Users who are addressed by name
had the tendency to continue
engaged after failures.
Study 2
22. Findings
Positive values indicate that the predictor tends to
increase the change of the category, negative ones
show the opposite effect.
BUT, users observed by acquaintances
AND who are addressed by name had
the tendency to DISENGAGE after
failures.
Study 2
Users observed by acquaintances
(B) had the tendency to continue
engaged after failures.
Users who are addressed by name
had the tendency to continue
engaged after failures.
24. DR1: Designers should consider the
users’ previous knowledge of content as
it tends to affect the social interaction with machines, in
particular when users have audiences.
25. DR1: Designers should consider the
users’ previous knowledge of content as
it tends to affect the social interaction with machines, in
particular when users have audiences.
DR2: Designers should consider that
the presence of strangers in a queue
waiting to interact with a physical conversational system,
since it may affect how users will experience the system.
26. DR1: Designers should consider the
users’ previous knowledge of content as
it tends to affect the social interaction with machines, in
particular when users have audiences.
DR2: Designers should consider that
the presence of strangers in a queue
waiting to interact with a physical conversational system,
since it may affect how users will experience the system.
DR3: Designers should consider gender
effects when crafting public interactions
with conversational systems, including how to handle
answers to out of scope questions.
27. DR4: Designers should consider tailoring
and using direct address in some cases
of chatbot utterances according to the
presence of an audience.
In general, chatbots should use direct address, such as
vocatives or pronouns, to acknowledge either all the
participants in the audience or should not use them.
28. Future work
• We have a paper in review focused on Direct Address effects.
• We want to investigate gender effects further.
• We want to try to apply our findings in other public spaces (hospitality, retail).
• We hope designers and reserchers find our methodological approach
useful to apply in similar projects.
30. Statistical Analysis
Percentage of participants according to the audience conditions. The
categories were not exclusive.
Study 1
(B) Observed by acquaintances (C) Observed by strangers
in the queue
(D) Observed by strangers
around the table
31. Discussion
(RQ1) What are the effects of audiences
on visitors’ perceptions of social
interaction with chatbots?
Visitors accompanied by friends and family (B) felt less
connected to the chatbots than visitors observed by
strangers (C) (D).
Visitors knew the plot increased their sense of belonging to
the the chatbots in (B) (C).
Female users felt the characters talked to them when
observed by strangers (C).
Males and Females ask out of scope questions in a different
way.
32. Discussion
(RQ2) Does the presence of audiences
influence the type and content of
visitors’ questions to chatbots?
Reactions to failures increase when visitors were observed by
acquaintances (B)
Males have a higher tendency to ask in the scope question when in
observed by strangers (D)
Reactions to failures increase when visitors experience Direct
address sentences.
Shared experience combined with DA tended to reduce the reaction
to failure.
Editor's Notes
Hi, I’m Mauro Pichiliani from IBM Research. I did this work with a group of fabulous people.
More and more we see conversational system in public spaces, such as in museums, hotels and airports. And as a designers we should be aware of the social keys that might affect the user experience with those machines in public spaces. Our research started with this aim.
We wanted to understand the implications of interacting with chatbots in public spaces, so we looked at the interaction of visitors with an art exhibition. In this exhibition we have three chatbots that represent the characters of a famous book in Brazil, Dom Casmurro, from Machado de Assis. In this book, we have a love triangle, the husband, the wife and the best friend. The husband is very jealous of his wife, and the author does not answer if the wife betrayal him with the best friend. This exhibit it was displayed for 2 months at a paramount art event in Itau Cultural and received 10.000 visitors.We presented a demo of this work last year at CHI.
For the ones that did not see our demo, I show this video for you to understand the visitors interaction with the exhibit.
We did a first study to understand the overall user experience and the perception of engagement with the chatbots in the last three weeks of the exhibition display.
In the field, We conducted 92 semi-structured interviews with visitors after observing their interaction. Half of the participants were between 27-48 years old. Gender was balanced and we can see that most of the people had previous knowledge about the about the plot.
Participants were asked to describe their experience and answer self-reported questions.
We used the fisher exact test for analyzing the self-reported questions. We can notice, that overall, they were satisfied with the experience. Seventh percent felt part of the conversation with the bots. 69% disagree that the characters only talked to each other. 77% agree that the characters talked to them. 64% found the characters answered their questions. The question 9 was balanced showing the participants did not agree or disagree that the characters answered about any subject. And finally, 76% mentioned they asked everything they wanted. We also asked them to justify aloud the rational behind their answers.
We performed a thematic network and we found four organizing themes to illustrate participants perceptions of their engagement and connection to the chatbots. Curiosity for Novelty, Interest for the main question in the plot, Expected chatbot answers and Audience effects. We noticed different kinds of interactions according to the type of audience. For example, in this participant quote: If I had know that I was not disturbing I would ask more questions. This visitor was in a situation of having several other visitors waiting and observing him in a queue expecting to interact with the chatbots. Another example, visitors with family and friends used the exhibit like a channel to communicate their thoughts and feeling. Like this P29 that declared their love for Tatiana, that was also in the exhibit space.
Previous literature in social facilitation shows the performance of players improved 14% in front an audience while bad players had a dramatic decrease of 30 %. It is known as the Dominance effect. We also know according to the literature that the proximity of an audience also can affect the human behaviour.
We identified 3 types of audiences that seems to show different types of interactions. Visitors observed by acquaintances, visitors observed for by other visitors standing in a queue and visitors observed by strangers standing around the table. Those categories were not exclusive.
We analysed the self-reported questions to understand better the effects of those types of audiences on visitors‘perceptions of social interaction with chatbots.
As we can see, 77% visitors observed by strangers in a queue and with previous knowledge of the plot felt part of the conversation, their audience was not in their visual field and it might help them to feel more connected with chatbots.
In this case, people in the same condition C, usually agree that the chatbots talked to them. Females agree more often in this condition that the characters talked to them.
59% of people observed by family and friends disagree more othen the characters answered their questions comparing to the other two conditions. 80% of people observed by other in a queue agree with this statement.
In this case we saw a gender effect. 44% Female participants observed by strangers around the table disagree the characters answered about any subject, and also 22% males disagree with it. We examined the conversation logs and we saw that males and females ask questions differently. Females asked about the plot and Males asked about other subjects not related to the story.
We also investigated how the effect of audiences could be played with more visitors, analysing the conversation logs. We aimed to answer this question in the second study: Does the presence of Audiences influence the type of visitor‘s questions to chatbots?
We manually coded the audience conditions using 54 hours of video recording, We collected one thousand five hundred forty-two questions. We applied clustering algorithms and those clusters were reviewed by 2 human coders. We found 4 main final clusters, taht we call topics:
(S1) questions out of scope – 271 questions
(S2) questions about characters of the book – 978 questions
(S3) greetings – 101 greetings
(S4) reaction to failure (More coffee?) – 165 questions
An example of failure is in this picture, when the chatbots did not know what to answer they asked More coffee?
The did a logistic regression where our response variables are the 4 topics and the explanatory variables are Direct address, Gender and audience effects.
We found that Male users when observed by strangers around the table (D), asked less out of scope questions.
We found Users observed by acquaintances (B) had the tendency to continue engaged after failures.
Users who are addressed by name had the tendency to continue engaged after failures.
What was surprising was this effect revert when we have both conditions direct address and aquaintances.
Users observed by acquaintances AND who are addressed by name had the tendency to DISENGAGE after failures.
We have some design recomendations
Designers should consider the users’ previous knowledge of content as it tends to affect the social interaction with machines, in particular when users have audiences.
DR2: Designers should consider that the presence of strangers in a queue waiting to interact with a physical conversational system, since it may affect how users will experience the system.
DR3: Designers should consider gender effects when crafting public interactions with conversational systems, including how to handle answers to out of scope questions.
DR4: Designers should consider tailoring and using direct address in some cases of chatbot utterances according to the presence of an audience.
In general, chatbots should use direct address, such as vocatives or pronouns, to acknowledge either all the participants in the audience or should not use them.
We have a paper in review focused on Direct Address effects.
We want to investigate gender effects further.
We want to try to apply our findings in other public spaces (hospitality, retail).
We hope designers and reserchers find our methodological approach useful to apply in similar projects.
Thanks for listening!
We used the fisher exact test for analyzing the self-reported questions. The conditions were not exclusive.
Wearable devices that are in continuous physical contact with the wearer’s skin allow sending simple messages to the user.
Wearable devices that are in continuous physical contact with the wearer’s skin allow sending simple messages to the user.