Persona-Based Conversational Agents

Modeling Dialogue
Phenomena:
guerini@fbk.eu @m_guerini

Introduction
•  When we look at recent advancements in NLP we
might be tempted to think that we are on the right
path to solve the problem of CAs.

•  Naturalness is key to creating conversational
agents that are truly smart and can integrate
seamlessly into the lives of human beings.
•  Current generation of conversational interfaces is
far from replicating human-like dialogues.

Where are we heading to?
Some recent advances in Conversational Agents,
naturalness is always a key element:

1.  learn to memorize
2.  learn to personalize
3.  learn to empathize
4.  learn to express emotions
In all aspects of this area, we encounter the challenge
of a lack of well-balanced and well-labeled data.

Pascale Fung - invited talk at ACL 2019

Structure
We must build agents with a Persona, i.e. with a
specific, persistent, consistent interaction style.
it is NOT just a technical problem

What is Persona* about?
•  Three concepts, not to be confused:
– Personality: the way we behave/react to the world
– Persona: our personal story and preferences
– Personalization: the ability to respond to other’s
preferences

From Persona* to dialogues
•  All these elements have an impact on specific
communication aspects.
•  The impact is not limited to chit-chat agents but
also to task-oriented ones.
•  Some examples follow: a task oriented CA with
and without naturalness/human-like skills.
Guerini, M., Falcone, S., & Magnini, B. (2018). A Methodology for Evaluating Interaction Strategies of
Task-Oriented Conversational Agents. In Proceedings of EMNLP SCAI workshop, (pp. 24-32).

CH1: Hello Andrea! What did you eat for
your last meal?
User: I ate a plate of spaghetti with
tomato
CH1: How much spaghetti with tomato did
you eat?
User: Unfortunately I messed up, it was a
generous helping...
CH1: Your meal consisting of a generous
plate (200g) of spaghetti with tomato
corresponds to 30.85 g of carbohydrates. I
hope I have been helpful! See you soon!

your last meal?
User: I ate a plate of spaghetti with
tomato
CH1: How much spaghetti with tomato did
you eat?
generous helping...
CH1: Don’t worry about it, everyone
messes up sometimes! Anyway, your meal
consisting of a generous plate (200g) of
spaghetti with tomato corresponds to
30.85 g of carbohydrates. I hope I have
been helpful! See you soon!
Empathy

CH1: Hello Sara! What did you eat in your
last meal?
User: Actually, I would like to ask you for
advice on what to eat to make a balanced
meal.
CH1: Sorry, I didn’t recognize any food in
your answer. Try to reformulate it. What
did you eat in your last meal?
User: For now I have to go, we’ll catch
later.
CH1: Fine, see you soon!

CH1: Hello Sara! What did you eat in your
last meal?
User: Actually, I would like to ask you for
advice on what to eat to make a balanced
meal.
CH1: Unfortunately, I can’t provide food
tips, for this I suggest you to contact your
dietician. In the meantime, what did you
eat in your last meal?
User: For now I have to go, we’ll catch
later.
CH1: Fine, see you soon!

Facing

your last meal?
User: I ate a portion of seitan
CH1: I can’t recognize any food in your
answer. What did you eat?
User: I ate a portion of wheat gluten
CH1: Your meal consisting of a portion
(100g) of wheat gluten corresponds to
1,9g of carbohydrates. I hope I have been
helpful! See you soon!
your last meal?
User: I ate a portion of seitan
CH1: I have never heard about seitan. Can
you tell me what is it?
User: Sure! It is a dough of wheat gluten
CH1: Great! Your meal consisting of a
portion (100g) of wheat gluten
corresponds to 1,9g of carbohydrates. I
hope I have been helpful! See you soon!
Curiosity

CH1: Good morning Andrea. What did you
eat for your last meal?
User: I ate an awful chicken salad
CH1: I am sorry to hear that, how much
chicken salad did you eat?
User: 100g
CH1: Your meal consisting of 100g of
chicken salad corresponds to 2.85 g of
carbohydrates. Thank you and have a nice
day.
CH1: whassup Andrea? Shoot me your last
meal!
User: I ate an awful chicken salad
CH1: that sucks man! How much chicken
salad did you eat?
User: 100g
CH1: uh-huh, 100g of chicken salad
corresponds to 2.85 g of carbohydrates.
Catch ya’ later!
Formality Level

User: Top o' the mornin’ to ya CH1!
your last meal?
User: I pigged out on spaghetti with
tomato and basil
CH1: How much spaghetti with tomato
and basil did you eat?
User: It was a generous helping
CH1: Your meal consisting of a huge plate
(200g) of spaghetti with tomato and basil
corresponds to 30.85 g of carbohydrates.
User: Top o' the mornin’ to ya CH1!
CH1: Top o' the mornin’ to ya Andrea!
What did you eat for your last meal?
User: I pigged out on spaghetti with
tomato and basil
CH1: How much spaghetti with tomato
and basil did you pig out on?
User: It was a generous helping
CH1: Your meal consisting of a generous
helping (200g) of spaghetti with tomato
and basil corresponds to 30.85 g of
carbohydrates.
Linguistic Alignment

Personality
•  There is no generally agreed upon definition,
usually focus on interaction with environment.
•  Personality is defined as the characteristic set of
behaviors, cognitions, and emotional patterns
•  Most theories view personality as relatively stable

BIG FIVE – Personality model
•  An integration of personality research that
represents the various personality descriptions in
one common framework.
•  Individual differences in social and emotional life
organized into a five-factor model of personality

BIG FIVE – Personality model
Idea: we can measure personality using specific questionnaires.

OCEAN traits
•  Openness
inventive/curious vs. consistent/cautious
•  Conscientiousness
efficient/organized vs. easy-going/careless
•  Extraversion
outgoing/energetic vs. solitary/reserved
•  Agreeableness
friendly/compassionate vs. challenging/detached
•  Neuroticism
sensitive/nervous vs. secure/confident

A personality Model
•  In principle we can express a
personality with 5 simple
numbers (or labels if we
binarize the data)…
•  How to connect these numbers
with linguistic production?
•  How to connect these numbers
with linguistic interaction?

Rule Based Approaches
•  Rely on the psychological findings that correlate
personality traits and language production.
Parametrize them and vary utterance characteristics
according to these parameters. E.g.
Extroversion ≈ Verbosity

•  Personage (PERSONAlity GEnerator), which feed these
parameters to a standard NLG pipeline to affect from
content planning to surface realization.
Mairesse, F., & Walker, M. A. (2009). Can Conversational Agents Express Big Five Personality
Traits through Language?: Evaluating a Psychologically-Informed Language Generator.

The Open Vocabulary Approach
700 million words, phrases, and topic instances
collected from the Facebook messages of 75,000
volunteers, who took standard personality tests.

1.  Users on FB take a personality questionnaire
2.  Download all their posts
3.  Correlate words with personality

Schwartz, H. Andrew, et al. "Personality, gender, and age in the language of social media:
The open-vocabulary approach." PloS one 8.9 (2013): e73791.

The Open Vocabulary Approach
•  Some words are more
associated with a
personality trait than
others.

GlobalTrait Approach
•  A multilingual model to recognize Big Five Personality
traits from text data in four different languages.
•  Words having a similar semantic meaning in different
languages do not necessarily correspond to the same
personality traits.
•  GlobalTrait: alignment method such that words that
correlate positively to a trait are close together in the
multilingual vector space.

Siddique, F. B., Bertero, D., & Fung, P. (2019). GlobalTrait: Personality Alignment of Multilingual
Word Embeddings. Proceedings of AAAI Conference on Artificial Intelligence 33 - pp. 7015-7022.

GlobalTrait Approach
Words corresponding to a personality trait: left shows the ‘traditional’ multilingual embeddings,
being clustered per language, and right shows the GlobalTrait Personality aligned embeddings.
We see that the alignment brings the multilingual embeddings to a common vector space (rather
than being clustered per language), so we can get the personality trait based mapping between
languages.

What is a Persona?
•  The character that a CA plays during dialogues.
•  A persona is a composite of elements of identity:
– background facts or user profile
– language behavior
– interaction style
•  A persona is also adaptive: present different facets
to different interlocutors.

Li, J., Galley, M., Brockett, C., Spithourakis, G. P., Gao, J., & Dolan, B. (2016). A persona-based
neural conversation model. In proceedings of ACL 2016.

Persona and Neural Models
•  How to integrate a Persona into a neural model?
•  What data do we need?
•  Where can we take the data to train such models?

Easiest Generative model
•  Simple seq-2-seq model.

Issues with simple models
Common issues with chit-chat models include:

•  Lack of a consistent personality - typically trained
over many dialogs each with different speakers.
•  Lack of an explicit long-term memory - typically
trained to produce an utterance given only the
recent dialogue history
•  Tendency to produce non-specific answers like “I
don’t know”

Persona and Consistency
A dialogue agent should
be consistent with the
facts it asserts, and
ultimately with its own
persona.

This is not trivial to
obtain.

Possible approaches
•  Learn from annotated data (identity of speaker)

then

•  Add speaker label/id to your simple end-to-end
model when learning
•  Add explicitly the personality facts into the system
using of Memory Networks.

Data annotated with Speaker ID.
•  Use an open-domain corpus of Twitter
conversations
•  dialog datasets comprising TV series scripts show
that leveraging persona

•  Each user/character is a persona or
•  Cluster characters by demographics

Script Example
Jack: Hi Steve. What are you doing here?
Steve: I'm meeting a friend here for dinner. How about you?
Jack: I'm on my way home but I needed to stop by the book store to buy a
text book.
Steve: Didn't you finish school yet?
Jack: I have one more year, and then I'm done.
Steve: What are you majoring in?
Jack: I'm majoring in Sociology
Steve: How do you like your major?
Jack: I really find the subject very interesting. I'm enjoying all my classes.
Steve: That's great.
Jack: I'd better go now. I don't want to miss the bus.
Steve: Alright, I'll talk to you later.
Jack: Okay. See you later.

Flat use of the data (-)
text book.

Character based annotation (+)
text book.

Learn one Model for each Persona

Steve: I'm meeting a friend here for dinner. How
about you?
Jack: I'm on my way home but I needed to stop by
the book store to buy a text book.
about you?
Jack: I'm on my way home but I needed to stop by
the book store to buy a text book.

Learn all Persona in one Model
about you?
Jack: I'm on my way home but I needed to stop
by the book store to buy a text book.
STEVE | JACK
(i) Personae encoded in one hot vector

Learn all Persona in one Model
about you?
Jack: I'm on my way home but I needed to stop
by the book store to buy a text book.
< 0.5 , 0.3 , -0.7 >
(ii) Use a Personality Embedding

Memory Networks
Learning models that can reason with inference
components combined with a long-term memory
component, they learn how to use these jointly. The
long-term memory can be read and written to, with
the goal of using it for prediction.

Components
I: (input feature map) – converts the incoming input to the internal feature
representation.
G: (generalization) – updates old memories given the new input. We
call this generalization as there is an opportunity for the network to
compress and generalize its memories at this stage for some intended
future use.
O: (output feature map) – produces a new output (in the feature
representation space), given the new input and the current memory state.
R: (response) – converts the output into the response format desired.
For example, a textual response or an action

MN for storing dialogue
To correctly predict what is the next move in a
dialogue the system must look up its memory and
select the relevant turn (not necessarily the last).

MN for Modeling Persona
Represent persona as a set of statements. Store them in MM. Collect
crowdsourced dialogues providing persona profiles.
Persona 1 Persona 2
I like to ski I am an artist
My wife does not like me anymore I have four children
I have went to Mexico 4 times this year I recently got a cat
I hate Mexican food I enjoy walking for exercise
I like to eat cheetos I love watching Game of Thrones
[PERSON 1:] Hi
[PERSON 2:] Hello ! How are you today ?
[PERSON 1:] I am good thank you , how are you.
[PERSON 2:] Great, thanks ! My children and I were just about to watch Game of Thrones.
[PERSON 1:] Nice ! How old are your children?
[PERSON 2:] I have four that range in age from 10 to 21. You?
[PERSON 1:] I do not have children at the moment.
[PERSON 2:] That just means you get to keep all the popcorn for yourself.
[PERSON 1:] And Cheetos at the moment!
[PERSON 2:] Good choice. Do you watch Game of Thrones?
[PERSON 1:] No, I do not have much time for TV.
[PERSON 2:] I usually spend my time painting: but, I love the show.
Table 2: Example dialog from the PERSONA-CHAT dataset. Person 1 is given their own persona (top
left) at the beginning of the chat, but does not know the persona of Person 2, and vice-versa. They
have to get to know each other during the conversation.
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., & Weston, J. (2018). Personalizing Dialogue
Agents: I have a dog, do you have pets too? Proceedings of ACL.

Synthetic Persona-Based Dialogues
•  Generate synthetic dialogues using a simulator
based on a dialogue self-play model.

Produce an outline that represents an abstract
representation of a possible dialogue.
U: Greet
B: Greet , ASK_info()
U: Search ( people=6 , price='expensive’ , food='british’ )
B: Ask_Location()
U: Inform(location='London')
B: Inform(searching)

Each abstract turn is instantiated (paraphrased) with
templates that are chosen according to user’s
persona/demographic…
User Profile (1) User Profile (2)
Gender = Male
Age = Elderly
Favorite Food = Shepherd’s Pie
Gender = Male
Age = Young
Favorite Food = Fish and Chips
For example a young person can be addressed with informal templates

Actual simulation of a dialogue with an elderly and a
young customer starting from the same outline.
Hi! Hi!
Greeting sir, what may I assist you with? Hey dude, what’s up?
I’d like to book a table for six in an
expensive restaurant that serves british
food
I’d like to book a table for six in an
expensive restaurant that serves british
food
I’ll start the request now. May I know
where the restaurant should be located?
I’m on it! Where should it be?

London London
Excellent sir, please give me a moment to
provide you with options
Ok, looking for options!

Evolution of Dialogue Architectures to fit Persona

From single memory
network to split
memory network.

Conversation History
and persona facts are
stored in different
memories.

Section 3
Evaluating CAs’ Persona

Motivations
In task-oriented CA more attention to assessing task
effectiveness (Quality of Service), rather than to how
the task is achieved (Quality of Experience).
CA moving towards more human-like interaction
capabilities for which standard evaluation
methodologies may not suffice.
A novel methodology to assess - in a completely
controlled way - the impact on the quality of
experience of agent’s interaction strategies.

Standardized evaluation - still an open issue
Single Components Whole Interaction
SYSTEMS
Grammaticality; Slots correctly
realized; Slots correctly recognized;
etc.
Success rate
SIMULATION - Wizard of Oz
additional possible dimensions: manual vs automatic, qualitative vs quantitative, ...

Standardized evaluation - still an open issue
Single Components Whole Interaction
SYSTEMS
Grammaticality; Slots correctly
realized; Slots correctly recognized;
etc.
Success rate
SIMULATION - Wizard of Oz
Evaluating human-like interaction capabilities

Desiderata
The methodology should be:
-  Fast
-  Cheap
-  Controlled
WoZ does not fully meet these stringent criteria/
desiderata

Wizard of OZ
•  Time-consuming interactions to get the evaluation;
•  Might be biased by confounding variables;
•  Significant training: providing behavior instructions' to wizard and
possibly users.
•  These 'behavior instructions' cannot describe every single reaction.
•  Difficult for wizards to provide consistent responses across sessions;
•  ...

Our Methodology
Two transcripts of the interaction with a conversational agent
are presented to the user (crowdsourcing), to see if one
version is preferred over the other.
-  Subjects must read the transcripts of the interaction rather
than directly interacting with the agent: complete control
over the experiment (transcripts are manually curated so to
meet stringent control criteria).
-  The two versions must maintain all aspects and wording of
the interaction the same (apart from those affected by the
modality of interaction)
-  Drastic time reduction for judgment collection as compared
to WoZ.
WoZ = 30 min x 2 participants
Our = 3 min x 1 participant

CH1: Hello Andrea! What did you eat for your last
meal?
User: I ate a plate of spaghetti with tomato
CH1: How much spaghetti with tomato did you
eat?
generous helping...
CH1: Your meal consisting of a generous plate
(200g) of spaghetti with tomato corresponds to
30.85 g of carbohydrates. I hope I have been
helpful! See you soon!
CH1: Hello Andrea! What did you eat for your last
meal?
User: I ate a plate of spaghetti with tomato
CH1: How much spaghetti with tomato did you
eat?
generous helping...
CH1: Don’t worry about it, everyone messes up
sometimes! Anyway, your meal consisting of a
generous plate (200g) of spaghetti with tomato
corresponds to 30.85 g of carbohydrates. I hope I
have been helpful! See you soon!
Stimulus Material
Adding Empathy in stimulus material

Experimental Setting
We served experimental and
control condition together (within
subject design) with short
instructions on the task.

Forced decision between
alternatives.
CrowdFlower

Instructions

Question 1
Question 2
Question 3
Question 4
Question 5

B A

The purpose of this experiment is to evaluate two versions of CH1, a virtual
assistant designed to calculate the carbohydrates of the user’s meal. We will
show you some transcripts of the interaction between a human user and the
two versions of CH1, where the two interactions are displayed side by side. In
order to evaluate them:

1) Read both the interactions of the two versions of CH1;
2) You will find 5 questions below each couple of transcripts;
3) For each question, you have to select one of the two check boxes that are
under the column of the corresponding interaction;
4) Choose the interaction that better answers the questions according to you.

CrowdFlower

Instructions

Question 1
Question 2
Question 3
Question 4
Question 5

B A

Interaction Modalities
●  Empathy
●  Facing
●  Linguistic Alignment
●  Vocabulary Extension
●  Formality
CrowdFlower

Instructions

Question 1
Question 2
Question 3
Question 4
Question 5

B A

QoS
Which interaction with CH1 do you think
satisfied the user more?

Which version of CH1 seems easier to use?
QoE
Which version of CH1 would you
recommend?

Which version of CH1 creates a more
natural dialogue?
CrowdFlower

Instructions

Question 1
Question 2
Question 3
Question 4
Question 5

B A

Results: Magnitude Effects
Easy of Use Satisfaction Utility Interaction Margin
Linguistic
Alignment
0.60 0.61 0.67 0.65 0.63
Empathy 0.73 0.78 0.73 0.76 0.75
Facing 0.64 0.71 0.70 0.66 0.68
Formal 0.74 0.80 0.73 0.66 0.73
Vocabulary
Extension
0.74 0.71 0.73 0.77 0.74

Experiment Description
2860 Judgments collected: 143 subjects answered
four questions for each of the 5 independent
variables.

51.48$ Overall Cost: roughly 10$ for evaluating each
variable.

12 h experiment duration.

Results
•  More of the 50% of the contributors expressed a
preference for the experimental condition;
•  The independent variables have different
magnitude effects;
•  Each of the independent variables had a main
effect on one QoS and one QoE dimension;

Take Away result: Different users prefer different
CA’s personality
An analysis at the gender level, revealed that
some personalities are preferred more according
to user gender.
Male Female
Linguistic
Alignment
0.61 0.68
Facing 0.76 0.64

Thank you!
Marco Guerini
Fondazione Bruno Kessler
Via Sommarive 18,
38123 Trento (Italy)
guerini@fbk.eu
@m_guerini
Contacts

Persona-Based Conversational Agents

Recommended

Recommended

More Related Content

Similar to Persona-Based Conversational Agents

Similar to Persona-Based Conversational Agents (20)

More from Marco Guerini

More from Marco Guerini (9)

Recently uploaded

Recently uploaded (20)

Persona-Based Conversational Agents