User Studies for APG: How to support system development with user feedback?

User Studies for APG
How to support system development with user feedback?

Purpose of presentation
1. To share some useful tips and lessons from
user studies we’ve done
2. To explain how to incorporate user studies
into system development flow
3. To encourage more user studies in Social
Computing

Outline
• The APG research roadmap
• Overview of our user studies
• Framework for Value-Driven Development™
• Examples of putting research into practice
• Tips for planning and executing user studies

What is a “user study”?
• A practically oriented research study that gathers real
feedback from real users, based on real scenarios,
testing the usability and functionality of a real system.
• User study is meant to produce information that
helps improve the system!

Why do we need user studies?
1. Break the researcher echo chamber – learn what matters for
people working in the industry
2. Find new use cases for the technologies you’re developing
3. Disseminate our research and systems (get downloads, citations,
real users signing up… ➔ promotion/raise !!!)
4. Maintain relationships to stakeholder organizations and their
key people
5. Get real feedback from real users to improve real systems
➔ the only path to real impact

There is an infinitely gigantic leap from
having “users” to having real users
…meaning: most registered people never use the
system in reality. Real user = active user

Getting your first active user is like achieving
the first moon landing.
…meaning: a small leap for mankind, a giant leap for
your system.

The same problem
is faced by startups
and QCRI projects
(Download Joni’s
dissertation “Startup
Dilemmas” FOR FREE:
http://www.utupub.fi/h
andle/10024/99349)

Research Roadmap for Automatic
Persona Generation
Information architecture:
How to choose the correct
information elements and
layouts for a given user, use
case or industry?
Quotes:
How to find representative,
contextually relevant,
and non-distracting
comments describing the
persona?
Evaluation: 1) How to ensure
personas are complete, clear,
consistent and credible?
2) How to measure usefulness
of personas for individuals and
organizations?
Attributes: How to infer
user attributes, such as
psychographics, needs
and wants, political
orientation, and brand
affinities?
Temporal analysis:
How to analyze change
and stability of personas
over time?
Image: How to retrieve,
generate, and choose
persona profile pictures?
Finding better ways to process and choose useful user
information from vast amounts of online data.
”Personas are about giving faces to data.” – Jim Jansen
Applicability: How to create
personas for different
contexts (e.g., e-commerce
personas, political personas,
gamer personas…)?

In our research roadmap, user studies are
relevant in most if not all sections…

A list of user studies we have done
Study Publications System
feedbacks
N of
participants
Real users or not
User Study 1 (2017) 4 (+1 in review) ~15 29 Yes
Persona Perception Scale: pilot
(2017)
1 N/A 19 No
QR Workshop (2017) N/A 10 12 Yes
QF Workshop (2018) N/A 9 12 Yes
User Study 2 (2018) 2 (in review) ~15 36 Yes
Persona Perception Scale:
validation (2018)
2 (1 in review
+ 1 in works)
N/A 412 No
Persona Crowd Experiment 1
(Transparent personas) (2018)
1 (in review) 1 (MAJOR) 412 No
Persona Crowd Experiment 2
(Smile Study) (2018)
1 (in review) N/A 2,400 No
User Study 3 (2018) GOAL: 4 TBA (~20) 34 Yes

Observations:
• Number of participants and number of system development
items is not correlated (when including crowd experiments)
• However, when excluding crowd experiments, there is a
correlation (the more people participate, the more ideas you
get)
…both the quality and magnitude of system development items
vary!

As the number of user studies increases,
the number of changes actually grows!
(but the magnitude of changes decreases,
i.e., you start fixing smaller things)

Zeroing in on the
value…
US1
US2
US3
Moment of value
= state and point in
time where the system
provides real value to
real users and is
therefore actively used
by them.

Value-Driven System Development
(VDSD)
Value-Driven
Development
• Implement the
recommended changes
• Conduct another user
study (validate progress
towards value; find new
things to improve)
Filter useful feedback from non-useful:
• How common is the feedback?
• Who gives the feedback?
• Design the user study
• Recruit participants
• Conduct the study

User-driven system development over time
v1 v2 v3
v4
Number of improvements
increasing
Magnitude of
problems decreasing
User study 2 User study 3 User study 4User study 1
time

Challenge: you need to make real
improvements in between the user studies
…otherwise, you keep hearing the same
complaints again and again!

Remember: developing a useful system is a
journey, not a project (there is no “end”).
…that’s why you need years of dedication,
funding and committed people.

How to decide the user study focus?
• Behavior = objective information about what the users
looks/clicks at. Considered as the most reliable kind of
data.
• Perceptions = helps understand what are the user’s
impressions and why she or he does things. However,
susceptible to several biases from both researcher and
respondent.
…ideally, you’d measure both!
(mixed method studies are the best user studies)

To-do list for a user study
1. Decide research problem
2. Decide manipulations
3. Create user task
4. Create experimental treatments & flows
5. Create consent form (and get IRB approval if needed)
6. Recruit users
7. Pilot test (devices, software, flows…)

How to design a user study?
• Combine an (a) interesting theoretical problem and (b) opportunity to
get valuable user feedback
• The REAL criteria:
• When possible, use the real system (not mockups, screenshots)
• Use real users (otherwise they won’t get engaged or provide valid feedback)
• Use real or realistic task scenario (to get as close to the users’ reality as
possible)

There’s a lot
to consider…

An example of user study flow (eye-tracking)
1. Welcome the participant
2. Explain what the study is about and how we collect data
3. Get their consent
4. Calibrate the devices
5. Start the recording
6. Make notes
7. When participant finishes, ask some additional questions
8. Thank them (preferably with a gift card ☺)

Tips on running user studies
(based on experience ☺)
• Run 4-6 pilots with different people (include people from outside the
team)
• Be realistic with timing (leave empty time!)
• Always test the machines before, on the spot (Internet….)
• Use two stations
• Have a Plan B (for machines, Internet, people…)

Tips for writing questionnaire introduction
• Be positive (engage people)
• Tell what the study is about (in plain language)
• Tell how long answering will take
• Tell who they can contact if they have questions

What type of data to collect?
• Real engagement data, e.g. eye-tracking, mouse tracking
→ tells what’s going on
• Think aloud data, e.g. notes, voice recordings, transcriptions
→ tells why it is going on

What metrics to analyze?
Metric Type Definition
Success rate Objective How many were able to carry out the
task successfully?
Time to success Objective How long did it take to complete the
task?
Ratio of time spent in AOI Objective How long was the participant
hovering/looking at an area of the
screen?
Frequency of mentions Objective How many times the participant
mentioned a specific term (e.g.,
confusion vocabulary,
positive/negative)?
Explicit ratings Subjective Participants’ judgments on the
system or content (e.g., “How easy
was it to…?)

What to ask?
• Open feedback (“How did you find the system?” “Does the system
solve a practical problem for you? If so, what?”)
• Background information:
• Age
• Gender
• Profession
• Experience with the system / in the domain
• (other things that might be confounding factors: country, native
language…)

When interviewing, ask about the users’
problems, not about the system:
• What is your typical workday like?
• What type of tasks you have in your job?
• What problems do you face in those tasks?
• Interesting…can you give a real example?

“How do I know if participants are paying
attention during studies?”
“Attention checks, or Instructional Manipulation Checks (IMCs), are a
straightforward and simple way to determine who does or doesn't pay
attention to your study instructions (Oppenheimer, Mayvis, &
Davidenko, 2009).
We recommend that you always have at least one attention check in
any given study.”
Source: www.prolific.ac (a great survey platform)

Rejecting!
This respondent failed the attention test. He or
she was supposed to select “slightly agree” but
selected “strongly agree”. Not likely that he or
she paid attention to the survey -- cannot use
his or her answers.
😢

Challenges with crowd
• Need to communicate super simply (pilot, pilot, pilot!) →
instructions, task description
• Quality control (give example from hate interpretation paper, the list)
• Lack of validity → they’re NOT real users!
• Therefore, you can test UI aspects and general information
processing/usability stuff, but not validate sources of value

Should I use the crowd?
Pros Cons
+Get a large number of participants
easily and cheaply (whereas recruiting
real users is costly and difficult)
-Not real user feedback! (validity
problem)
+Fast to collect data and iterate on
study designs (whereas real user
studies are “hit or miss”)
-Quality concerns (but can also be in real
user studies)

Managing the crowd (Pitkänen & Salminen, 2013)
Pitkänen, L., & Salminen, J. (2013). Managing the Crowd: A Study on
Videography Application. In Proceedings of Applied Business and
Entrepreneurship Association International (ABEAI 2013) Conference,
Honolulu, Hawaii, USA, 14–20 November.

How to analyze the results?
• For quantitative: R/Python/SPSS/Excel
• For qualitative analysis: Word/Excel is fine 90% of the time
• Only need Nvivo/Atlas when multiple layers of complexity
• (e.g., you need to label notes with multiple codes, e.g. “RESEARCH,CHI”,
“RESEARCH,UMAP”; assign priority (e.g., 1-5))

Visualizing eye-tracking data (heatmaps)

Surprising findings from user study
“Find match…no idea what this means…” → and we thought it was
DEAD SIMPLE! *not so* for users.
• (always record think-alouds → transcribe to analyze again)
• Remember: you are not your user. (always test your designs and
features against *real* users)

“To discover real user needs, we’ve been carrying
out several user studies. However, there are many
issues in conducting user studies. The feedback we
get is not always relevant or valid.”
https://www.linkedin.com/pulse/how-identify-useful-user-feedback-three-tips-value-joni-salminen/

The issues in interpreting user feedback
• Some participants might not be truly engaged or interested in
the system and just participate out of duty or because they
were “forced to”.
• Similarly, users may just brainstorm features that really they
would not use but that “sound cool”. (leading to feature
creep)
• Moreover, when compiling the feedback, we find that there
are a lot of requests for new features. Say, the users want 10
new features, but we have time and resources for two and
thus need to prioritize.

How to filter useful user feedback? 3 tips:
1. Who does the feeback come from? → not all people are engaged, motivated, or
knowledgeable to give useful feedback. Therefore, we have to consider if a person is
just “shooting ideas” or if he or she actually wants to provide useful feedback. We
then prioritize the comments from the people whose feedback indicates they are
taking the commenting more seriously.
2. How repetitive is the feedback? → if the request comes from many organizations and
many people within an organization, it is more likely to be a real problem to solve. If
it’s a rare request, the problem is probably also very rare and worthy to focus on.
3. Is the feedback traceable to a real problem the user has? → we want to know if the
request if a nice-to-have or pain killer. We want to solve real problems with the
system, so nice-to-haves must be minimized. Even if many motivated people suggest a
new feature, it could still be a nice-to-have if we cannot logically connect it to a real
problem.

Other things to be cautious about
• Social desirability bias (getting *too high* scores)
• Not all adoption problems can be solved with tech. You also need
marketing, time, product champions, etc. (barriers for adoption)

Example of social desirability bias
(QR Workshop 2017)

…yet, 0 active users from QR!
(who would log-in and use the system)

Lessons:
• What people say and do are two different things
• System ratings are poorly correlated with actual behavior
• If you want most reliable data, go for behavioral data
• If you want to ask people, ask about their problems, not about
solutions (when you understand the problem fully, you can create a
solution for it)

How to communicate results to engineer?
• Difficult (sometimes they don’t care ☺)
• Best to make engineer run test sessions him/herself
• Best to have a good relationship and communicate with
them (I love Soon ❤)

How to communicate results to engineer?
• Use quotes, stories, drawings…
• Use numbers (how many people ex
• Convince your engineer the problem really matters for users (buy-in)
• …again, best if they self experience user feedback. (internalization
problem)

Remember: Many engineers
think “user is stupid” when
they don’t understand how
to use the system.
That’s NOT the case!

Example of
communicating
results:
1. Draw on
the spot

3. Do a
mock-up
4. Explain to
developer

Example of communicating results
1. …Joni has an idea
2. …runs to Soon’s office
3. …Soon puts it into the source code
4. Done!
5. (repeat 100 times ☺)

Solves a real problem (usability)
experienced by many users.

Next step: validate the solution in a new user study!
(…and make sure the solution didn’t create a new
problem)
…remember, iteration: user study → system
development

Usability problems are not (that)
important!
…you can solve all the problems,
but if there is no value, your
system fails.

Usability problems are important
(--- when they are blocking access
to value)

Challenges for design guidelines
• How to prioritize one change over another?
• How to imagine the effect of change in user behavior?
• How to distinguish between small UI problem and really
big problems (i.e., the system being “useless”)

Example
of
communicating
results

How to read the table
• Frequency mentioned = How often participants mentioned this issue
• Difficulty to fix = Assessment of how much time it takes
• Priority = Assessment of how much this would make us close to “moment of
value”
• Four types:
• Definition (text)
• UI change
• System change
• New feature
• Three levels of difficulty (correlate quite well with types)
• Low (takes ~5mins from Soon (or maybe a day at max.))
• Medium (takes 1-3 weeks from Soon)
• High (requires 3-6 months of research )

How to read the table
• Frequency mentioned = How often participants mentioned this issue
• Difficulty to fix = Assessment of how much time it takes
• Priority = Assessment of how much this would make us close to
“moment of value”

NOTE: It’s important to be able to accurately
estimate the level of difficulty! (…you need to have
a certain level of understanding about software
development)
…and, you need to know how quick / skilled your
developer is! (again, I  Soon)

Prioritization: Take the quick wins (low hanging
fruits) first.
Then, work systematically to solve the more
challenging problems.

Bottom line: User studies have many benefits
• Build relationship with users / client organizations
• Learn what’s going on in the real world (real people’s problems)
• Solve research problems that matter (and make some good
publications ☺)
• Develop better systems that actually get used

Thanks for listening!
(….and thanks A LOT for participating in pilot studies ☺)
(ps. Full slides at SlideShare: www.slideshare.com/jonis12)
(pps. Visit https://persona.qcri.org for demo!)

User Studies for APG: How to support system development with user feedback?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to User Studies for APG: How to support system development with user feedback?

Similar to User Studies for APG: How to support system development with user feedback? (20)

More from Joni Salminen

More from Joni Salminen (20)

Recently uploaded

Recently uploaded (20)

User Studies for APG: How to support system development with user feedback?