1. Five NLP Challenges in
Data-Driven Personas
Dr. Joni Salminen
October 20, 2021
Nanyang Technological University, Singapore
2. Meet the APG Team!
Professor Jim Jansen
The Leader (Principal Scientist)
• Inventor of APG
• Leads the project
• Customer relationships &
management
MSc. Soon-gyo Jung
The Genius (Software Engineer)
• Creator of APG
• Front-End / Back-End
• Implements like a genius, hence
the nickname
Dr. Joni Salminen
The Handyman (Scientist)
• Helps with user studies,
system development, etc.
• Strategic guy, likes to think the
big picture
?
YOU?
3.
4. Why personas?
• Summarize relevant user information for decision
makers for doing their jobs better (e.g., creating
products that actually serve people’s needs)
• Are an alternative (or complement) to numbers
• Provide a different way of doing user/customer analytics
(more approachable & memorable)
• Give faces to user data
…are not just about visualization, but empathetic
representations of users! [1]
[1] Nielsen, L. (2019). Personas—User Focused Design (2nd ed.
2019 edition). Springer.
5. Why automate persona generation?
Personas are usually created with manual methods (i.e.,
interviews & ethnography), methods that are expensive
and slow to implement, and they can quickly become
outdated. Because of the limitations, personas risk being
inaccurate representations of the true user base.
Better
personas
Better
decisions
Better
results.
In contrast, APG provides personas that are fast to
create and updated automatically. This means the cost of
persona creation is dramatically reduced, making them
available for organizations with limited means (e.g.,
startups, small businesses). Depending on the underlying
dataset, APG can cover a wide range of behaviors and
demographics.
Manual methods
Automation
An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018). Imaginary People
Representing Real Numbers: Generating Personas from Online Social Media Data.
ACM Transactions on the Web (TWEB), 12(4), 27. https://doi.org/10.1145/3265986
7. Literally, giving faces to user data!
Personification = nameless, faceless
segments are turned into personas that
describe a behavioral and demographic
pattern in the data [1]
Enrichment = enriching the persona profiles with
additional information such as sentiment, loyalty, quotes,
most viewed content, and topics of interest [1]
[1] An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018).
Imaginary People Representing Real Numbers: Generating Personas
from Online Social Media Data. ACM Transactions on the Web
(TWEB), 12(4), 27. https://doi.org/10.1145/3265986
8.
9. Requirements:
• Enough data (e.g., >100,000
viewers/visitors/users/customers)
• Enough content (e.g., >1000
products/pages/videos/posts)
• Large and heterogeneous
audience
…so, probably not good for most
SMEs, startups, micro-organizations
(traditional personas work the best
for such organizations!)
You choose the tool based
on the problem!
Our ”Client Persona”
10. Research Roadmap for
Automatic Persona Generation [1]
Information architecture:
How to determine the
relevant persona
information for a given user,
use case, and industry?
(e.g., e-health, e-commerce,
politics, gaming…)
Quotes:
How to find demographically
matching, non-toxic comments
that describe the persona’s
attitudes and are relevant for
end users?
Temporal analysis:
How to analyze change
of personas over time?
APG is about finding better ways to process and choose
useful user information from vast amounts of online data.
”Personas are about giving faces to data.”
Image: How to
automatically generate, tag,
and choose appropriate
persona profile pictures?
Evaluation: (1) How to ensure
personas are of high quality
(complete, clear, consistent and
credible)? (2) How to measure
value of personas for individuals
and organizations?
Attributes & Topics of
Interest: How to automatically
infer user attributes, such as
interests, needs, wants, goals,
political orientation, and brand
affinity from social media?
[1] Salminen, J., Jansen, B. J., An, J., Kwak, H.,
& Jung, S. (2019). Automatic Persona
Generation for Online Content Creators:
Conceptual Rationale and a Research Agenda.
In L. Nielsen (Ed.), Personas—User Focused
Design (2nd ed., pp. 135–160). Springer London.
https://doi.org/10.1007/978-1-4471-7427-1_8
Interactivity: How to design
interactive features to make
users cope with more
personas?
11. Current NLP techniques in APG
• Topic classification:
• Current: Zero-shot classification (à la HuggingFace RoBERTa) for small
organizations and supervised ML (XGBoost and TF-IDF) for large
clients
• Past: LDA (crap!)
• Sentiment analysis:
• Current: EmoLex (multiple languages, dictionary-based)
• Future: SenticNet?
12. CHA1: Generate Persona Quotes
• Objective: Generate artificial quotes that reflect the persona’s
(a) attitudes and (b) demographics.
• NLP field: Conditional text generation
• Requirements:
• Demographically accurate
• Attitudinally accurate
• Topically accurate (enables searching)
The key here is conditional; mere grammaticality is not enough
but need to capture the persona’s ”self”.
”Quotes reflect the
personas attitudes
about given topics and
about life in general.”
13. CHA2: Chat with Personas
• Objective: Make it possible for users to ask things from a
persona, and the persona will give answers that, again, reflect
who the persona is in terms (a) attitudes, (b) demographics, and
(b) topics.
• NLP field: Dialogue systems
type to ask Ahmed a question…
You: Hi Ahmed! What
do you think about the
elections in Pakistan?
Ahmed: I don’t like it
[negative sentiment, click
to learn more]
14. CHA3: Frankenstein’s Personas
• Objective: solve Bødker’s [1] ”Frankenstein problem”:
inconsistency of persona information
• Example cases: man woman, Indian Pakistanese, etc.
(cultural sensibilities (Häkkilä et al. [2]))
• NLP field: supervised ML (language modeling)
• How to match the quotes with the personas’ demographics and
actual attitudes? (And maximize reflecting all aspects of the
persona’s attitudes?)
[1] Bødker, S., Christiansen, E., Nyvang, T., & Zander, P.-O. (2012).
Personas, people and participation: Challenges from the trenches of
local government. Proceedings of the 12th Participatory Design
Conference: Research Papers-Volume 1, 91–100.
[2] Häkkilä, J., Wiberg, M., Eira, N. J., Seppänen, T., Juuso, I., Mäkikalli, M., &
Wolf, K. (2020). Design Sensibilities-Designing for Cultural Sensitivity.
Proceedings of the 11th Nordic Conference on Human-Computer Interaction:
Shaping Experiences, Shaping Society, 1–3.
15. CHA4: Drifting Personas
• Objective: Identify topical changes in personas and notify
decision makers of these changes.
• NLP field: Concept drift / topic drift / model drift… (common
issues in ML [1])
• All refer to CHANGE in the underlying user behavior (basically,
the data: new categories appear, old ones change, distributions
change, etc.)
• How often should personas be changed? How should the change
be measured / detected? [2]
[1] Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine
Learning, 23(1), 69–101.
[2] Jansen, B. J., Jung, S., & Salminen, J. (2019). Capturing the change in topical interests of personas over
time. Proceedings of the Association for Information Science and Technology, 56(1), 127–136.
16. CHA5: Personas from Text Only
• User segmentation / text analytics / pattern mining
• Either for a specific use case (e.g., toxic personas, fake news
personas, fandom personas…) or general representations of
humanity that can be queried at will (i.e., ”stacking” different
user models on top of each other to create truly multifaceted
human representations)
• Needs data, help from psychologists, etc. How to validate and
so on?
17. Common challenges:
• Modeling people based on what they write.
• Lack of resources:
• Datasets (need demographically labeled data)
• Baselines
• Evaluation metrics (have to consider UX / HCI / user feedback; not only
technical, but socio-technical problems)
• Most importantly, not enough PEOPLE working on these issues
18. Data is available but what about
information?
• People’s attitudes, fears, doubts, hopes, needs, wants… can
these be inferred from unstructured (micro-)texts?
• Rosetta’s Stone for data-driven personas: user modeling /
attribute inference from smartly sampled tweets?
• Dictionaries (LIWC, AFINN, EMOLEX) vs. deep learning?
…VITALLY important because persona users’ information needs
are unique --- need to have flexible tools for them to query
persona attitudes in real-time
➔ static data-driven personas won’t do!
19. Thank you! Questions?
Dr. Joni Salminen
jsalminen@hbku.edu.qa
The APG family (Davao, 2019)
Get the book from Amazon!
(or your university library)