Five NLP Challenges in
Data-Driven Personas
Dr. Joni Salminen
October 20, 2021
Nanyang Technological University, Singapore
Meet the APG Team!
Professor Jim Jansen
The Leader (Principal Scientist)
• Inventor of APG
• Leads the project
• Customer relationships &
management
MSc. Soon-gyo Jung
The Genius (Software Engineer)
• Creator of APG
• Front-End / Back-End
• Implements like a genius, hence
the nickname
Dr. Joni Salminen
The Handyman (Scientist)
• Helps with user studies,
system development, etc.
• Strategic guy, likes to think the
big picture
?
YOU?
Why personas?
• Summarize relevant user information for decision
makers for doing their jobs better (e.g., creating
products that actually serve people’s needs)
• Are an alternative (or complement) to numbers
• Provide a different way of doing user/customer analytics
(more approachable & memorable)
• Give faces to user data
…are not just about visualization, but empathetic
representations of users! [1]
[1] Nielsen, L. (2019). Personas—User Focused Design (2nd ed.
2019 edition). Springer.
Why automate persona generation?
Personas are usually created with manual methods (i.e.,
interviews & ethnography), methods that are expensive
and slow to implement, and they can quickly become
outdated. Because of the limitations, personas risk being
inaccurate representations of the true user base.
Better
personas
Better
decisions
Better
results.
In contrast, APG provides personas that are fast to
create and updated automatically. This means the cost of
persona creation is dramatically reduced, making them
available for organizations with limited means (e.g.,
startups, small businesses). Depending on the underlying
dataset, APG can cover a wide range of behaviors and
demographics.
Manual methods
Automation
An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018). Imaginary People
Representing Real Numbers: Generating Personas from Online Social Media Data.
ACM Transactions on the Web (TWEB), 12(4), 27. https://doi.org/10.1145/3265986
v. 2.0 (2021)
Literally, giving faces to user data!
Personification = nameless, faceless
segments are turned into personas that
describe a behavioral and demographic
pattern in the data [1]
Enrichment = enriching the persona profiles with
additional information such as sentiment, loyalty, quotes,
most viewed content, and topics of interest [1]
[1] An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018).
Imaginary People Representing Real Numbers: Generating Personas
from Online Social Media Data. ACM Transactions on the Web
(TWEB), 12(4), 27. https://doi.org/10.1145/3265986
Requirements:
• Enough data (e.g., >100,000
viewers/visitors/users/customers)
• Enough content (e.g., >1000
products/pages/videos/posts)
• Large and heterogeneous
audience
…so, probably not good for most
SMEs, startups, micro-organizations
(traditional personas work the best
for such organizations!)
You choose the tool based
on the problem!
Our ”Client Persona”
Research Roadmap for
Automatic Persona Generation [1]
Information architecture:
How to determine the
relevant persona
information for a given user,
use case, and industry?
(e.g., e-health, e-commerce,
politics, gaming…)
Quotes:
How to find demographically
matching, non-toxic comments
that describe the persona’s
attitudes and are relevant for
end users?
Temporal analysis:
How to analyze change
of personas over time?
APG is about finding better ways to process and choose
useful user information from vast amounts of online data.
”Personas are about giving faces to data.”
Image: How to
automatically generate, tag,
and choose appropriate
persona profile pictures?
Evaluation: (1) How to ensure
personas are of high quality
(complete, clear, consistent and
credible)? (2) How to measure
value of personas for individuals
and organizations?
Attributes & Topics of
Interest: How to automatically
infer user attributes, such as
interests, needs, wants, goals,
political orientation, and brand
affinity from social media?
[1] Salminen, J., Jansen, B. J., An, J., Kwak, H.,
& Jung, S. (2019). Automatic Persona
Generation for Online Content Creators:
Conceptual Rationale and a Research Agenda.
In L. Nielsen (Ed.), Personas—User Focused
Design (2nd ed., pp. 135–160). Springer London.
https://doi.org/10.1007/978-1-4471-7427-1_8
Interactivity: How to design
interactive features to make
users cope with more
personas?
Current NLP techniques in APG
• Topic classification:
• Current: Zero-shot classification (à la HuggingFace RoBERTa) for small
organizations and supervised ML (XGBoost and TF-IDF) for large
clients
• Past: LDA (crap!)
• Sentiment analysis:
• Current: EmoLex (multiple languages, dictionary-based)
• Future: SenticNet?
CHA1: Generate Persona Quotes
• Objective: Generate artificial quotes that reflect the persona’s
(a) attitudes and (b) demographics.
• NLP field: Conditional text generation
• Requirements:
• Demographically accurate
• Attitudinally accurate
• Topically accurate (enables searching)
The key here is conditional; mere grammaticality is not enough
but need to capture the persona’s ”self”.
”Quotes reflect the
personas attitudes
about given topics and
about life in general.”
CHA2: Chat with Personas
• Objective: Make it possible for users to ask things from a
persona, and the persona will give answers that, again, reflect
who the persona is in terms (a) attitudes, (b) demographics, and
(b) topics.
• NLP field: Dialogue systems
type to ask Ahmed a question…
You: Hi Ahmed! What
do you think about the
elections in Pakistan?
Ahmed: I don’t like it
[negative sentiment, click
to learn more]
CHA3: Frankenstein’s Personas
• Objective: solve Bødker’s [1] ”Frankenstein problem”:
inconsistency of persona information
• Example cases: man  woman, Indian  Pakistanese, etc.
(cultural sensibilities (Häkkilä et al. [2]))
• NLP field: supervised ML (language modeling)
• How to match the quotes with the personas’ demographics and
actual attitudes? (And maximize reflecting all aspects of the
persona’s attitudes?)
[1] Bødker, S., Christiansen, E., Nyvang, T., & Zander, P.-O. (2012).
Personas, people and participation: Challenges from the trenches of
local government. Proceedings of the 12th Participatory Design
Conference: Research Papers-Volume 1, 91–100.
[2] Häkkilä, J., Wiberg, M., Eira, N. J., Seppänen, T., Juuso, I., Mäkikalli, M., &
Wolf, K. (2020). Design Sensibilities-Designing for Cultural Sensitivity.
Proceedings of the 11th Nordic Conference on Human-Computer Interaction:
Shaping Experiences, Shaping Society, 1–3.
CHA4: Drifting Personas
• Objective: Identify topical changes in personas and notify
decision makers of these changes.
• NLP field: Concept drift / topic drift / model drift… (common
issues in ML [1])
• All refer to CHANGE in the underlying user behavior (basically,
the data: new categories appear, old ones change, distributions
change, etc.)
• How often should personas be changed? How should the change
be measured / detected? [2]
[1] Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine
Learning, 23(1), 69–101.
[2] Jansen, B. J., Jung, S., & Salminen, J. (2019). Capturing the change in topical interests of personas over
time. Proceedings of the Association for Information Science and Technology, 56(1), 127–136.
CHA5: Personas from Text Only
• User segmentation / text analytics / pattern mining
• Either for a specific use case (e.g., toxic personas, fake news
personas, fandom personas…) or general representations of
humanity that can be queried at will (i.e., ”stacking” different
user models on top of each other to create truly multifaceted
human representations)
• Needs data, help from psychologists, etc. How to validate and
so on?
Common challenges:
• Modeling people based on what they write.
• Lack of resources:
• Datasets (need demographically labeled data)
• Baselines
• Evaluation metrics (have to consider UX / HCI / user feedback; not only
technical, but socio-technical problems)
• Most importantly, not enough PEOPLE working on these issues
Data is available but what about
information?
• People’s attitudes, fears, doubts, hopes, needs, wants… can
these be inferred from unstructured (micro-)texts?
• Rosetta’s Stone for data-driven personas: user modeling /
attribute inference from smartly sampled tweets?
• Dictionaries (LIWC, AFINN, EMOLEX) vs. deep learning?
…VITALLY important because persona users’ information needs
are unique --- need to have flexible tools for them to query
persona attitudes in real-time
➔ static data-driven personas won’t do!
Thank you! Questions?
Dr. Joni Salminen
jsalminen@hbku.edu.qa
The APG family (Davao, 2019)
Get the book from Amazon!
(or your university library)

Five NLP Challenges in Data-Driven Personas

  • 1.
    Five NLP Challengesin Data-Driven Personas Dr. Joni Salminen October 20, 2021 Nanyang Technological University, Singapore
  • 2.
    Meet the APGTeam! Professor Jim Jansen The Leader (Principal Scientist) • Inventor of APG • Leads the project • Customer relationships & management MSc. Soon-gyo Jung The Genius (Software Engineer) • Creator of APG • Front-End / Back-End • Implements like a genius, hence the nickname Dr. Joni Salminen The Handyman (Scientist) • Helps with user studies, system development, etc. • Strategic guy, likes to think the big picture ? YOU?
  • 4.
    Why personas? • Summarizerelevant user information for decision makers for doing their jobs better (e.g., creating products that actually serve people’s needs) • Are an alternative (or complement) to numbers • Provide a different way of doing user/customer analytics (more approachable & memorable) • Give faces to user data …are not just about visualization, but empathetic representations of users! [1] [1] Nielsen, L. (2019). Personas—User Focused Design (2nd ed. 2019 edition). Springer.
  • 5.
    Why automate personageneration? Personas are usually created with manual methods (i.e., interviews & ethnography), methods that are expensive and slow to implement, and they can quickly become outdated. Because of the limitations, personas risk being inaccurate representations of the true user base. Better personas Better decisions Better results. In contrast, APG provides personas that are fast to create and updated automatically. This means the cost of persona creation is dramatically reduced, making them available for organizations with limited means (e.g., startups, small businesses). Depending on the underlying dataset, APG can cover a wide range of behaviors and demographics. Manual methods Automation An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018). Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data. ACM Transactions on the Web (TWEB), 12(4), 27. https://doi.org/10.1145/3265986
  • 6.
  • 7.
    Literally, giving facesto user data! Personification = nameless, faceless segments are turned into personas that describe a behavioral and demographic pattern in the data [1] Enrichment = enriching the persona profiles with additional information such as sentiment, loyalty, quotes, most viewed content, and topics of interest [1] [1] An, J., Kwak, H., Salminen, J., Jung, S., & Jansen, B. J. (2018). Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data. ACM Transactions on the Web (TWEB), 12(4), 27. https://doi.org/10.1145/3265986
  • 9.
    Requirements: • Enough data(e.g., >100,000 viewers/visitors/users/customers) • Enough content (e.g., >1000 products/pages/videos/posts) • Large and heterogeneous audience …so, probably not good for most SMEs, startups, micro-organizations (traditional personas work the best for such organizations!) You choose the tool based on the problem! Our ”Client Persona”
  • 10.
    Research Roadmap for AutomaticPersona Generation [1] Information architecture: How to determine the relevant persona information for a given user, use case, and industry? (e.g., e-health, e-commerce, politics, gaming…) Quotes: How to find demographically matching, non-toxic comments that describe the persona’s attitudes and are relevant for end users? Temporal analysis: How to analyze change of personas over time? APG is about finding better ways to process and choose useful user information from vast amounts of online data. ”Personas are about giving faces to data.” Image: How to automatically generate, tag, and choose appropriate persona profile pictures? Evaluation: (1) How to ensure personas are of high quality (complete, clear, consistent and credible)? (2) How to measure value of personas for individuals and organizations? Attributes & Topics of Interest: How to automatically infer user attributes, such as interests, needs, wants, goals, political orientation, and brand affinity from social media? [1] Salminen, J., Jansen, B. J., An, J., Kwak, H., & Jung, S. (2019). Automatic Persona Generation for Online Content Creators: Conceptual Rationale and a Research Agenda. In L. Nielsen (Ed.), Personas—User Focused Design (2nd ed., pp. 135–160). Springer London. https://doi.org/10.1007/978-1-4471-7427-1_8 Interactivity: How to design interactive features to make users cope with more personas?
  • 11.
    Current NLP techniquesin APG • Topic classification: • Current: Zero-shot classification (à la HuggingFace RoBERTa) for small organizations and supervised ML (XGBoost and TF-IDF) for large clients • Past: LDA (crap!) • Sentiment analysis: • Current: EmoLex (multiple languages, dictionary-based) • Future: SenticNet?
  • 12.
    CHA1: Generate PersonaQuotes • Objective: Generate artificial quotes that reflect the persona’s (a) attitudes and (b) demographics. • NLP field: Conditional text generation • Requirements: • Demographically accurate • Attitudinally accurate • Topically accurate (enables searching) The key here is conditional; mere grammaticality is not enough but need to capture the persona’s ”self”. ”Quotes reflect the personas attitudes about given topics and about life in general.”
  • 13.
    CHA2: Chat withPersonas • Objective: Make it possible for users to ask things from a persona, and the persona will give answers that, again, reflect who the persona is in terms (a) attitudes, (b) demographics, and (b) topics. • NLP field: Dialogue systems type to ask Ahmed a question… You: Hi Ahmed! What do you think about the elections in Pakistan? Ahmed: I don’t like it [negative sentiment, click to learn more]
  • 14.
    CHA3: Frankenstein’s Personas •Objective: solve Bødker’s [1] ”Frankenstein problem”: inconsistency of persona information • Example cases: man  woman, Indian  Pakistanese, etc. (cultural sensibilities (Häkkilä et al. [2])) • NLP field: supervised ML (language modeling) • How to match the quotes with the personas’ demographics and actual attitudes? (And maximize reflecting all aspects of the persona’s attitudes?) [1] Bødker, S., Christiansen, E., Nyvang, T., & Zander, P.-O. (2012). Personas, people and participation: Challenges from the trenches of local government. Proceedings of the 12th Participatory Design Conference: Research Papers-Volume 1, 91–100. [2] Häkkilä, J., Wiberg, M., Eira, N. J., Seppänen, T., Juuso, I., Mäkikalli, M., & Wolf, K. (2020). Design Sensibilities-Designing for Cultural Sensitivity. Proceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society, 1–3.
  • 15.
    CHA4: Drifting Personas •Objective: Identify topical changes in personas and notify decision makers of these changes. • NLP field: Concept drift / topic drift / model drift… (common issues in ML [1]) • All refer to CHANGE in the underlying user behavior (basically, the data: new categories appear, old ones change, distributions change, etc.) • How often should personas be changed? How should the change be measured / detected? [2] [1] Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101. [2] Jansen, B. J., Jung, S., & Salminen, J. (2019). Capturing the change in topical interests of personas over time. Proceedings of the Association for Information Science and Technology, 56(1), 127–136.
  • 16.
    CHA5: Personas fromText Only • User segmentation / text analytics / pattern mining • Either for a specific use case (e.g., toxic personas, fake news personas, fandom personas…) or general representations of humanity that can be queried at will (i.e., ”stacking” different user models on top of each other to create truly multifaceted human representations) • Needs data, help from psychologists, etc. How to validate and so on?
  • 17.
    Common challenges: • Modelingpeople based on what they write. • Lack of resources: • Datasets (need demographically labeled data) • Baselines • Evaluation metrics (have to consider UX / HCI / user feedback; not only technical, but socio-technical problems) • Most importantly, not enough PEOPLE working on these issues
  • 18.
    Data is availablebut what about information? • People’s attitudes, fears, doubts, hopes, needs, wants… can these be inferred from unstructured (micro-)texts? • Rosetta’s Stone for data-driven personas: user modeling / attribute inference from smartly sampled tweets? • Dictionaries (LIWC, AFINN, EMOLEX) vs. deep learning? …VITALLY important because persona users’ information needs are unique --- need to have flexible tools for them to query persona attitudes in real-time ➔ static data-driven personas won’t do!
  • 19.
    Thank you! Questions? Dr.Joni Salminen jsalminen@hbku.edu.qa The APG family (Davao, 2019) Get the book from Amazon! (or your university library)