This document discusses issue-based metrics and self-reported metrics for measuring user experience. It describes issue-based metrics as involving qualitative data about usability issues identified during user studies, including severity ratings of issues. Self-reported metrics involve subjective data collected through questionnaires and interviews using rating scales, the System Usability Scale, and other methods. Key considerations for both include identifying and analyzing patterns in issues and responses to focus design improvements.
4. Measuring the User Experience
• The next slides are based on the core text
book for this module, "Measuring the User
Experience"
4
5. Issue based metrics
• Usability issues typically include qualitative
data:
– The identification and description of a problem one or
more participants experienced
– An assessment of the underlying cause of the
problem
– Specific recommendations for remedying the problem
and many report positive findings as well
– Positive findings (what went well)
5
6. Usability issues
• Usability issues are based on behaviour in
using a product
• Common issues include:
– Task is not completed
– User goes “off course” or doesn't see
something that should be noticed
– User is frustrated
– User misinterprets some piece of content
6
7. What do you do with usability
issues?
•In iterative
design!
7
8. How do you identify issues?
• In-person studies (observing participants)
• Automated (or semi-automated) studies
(analysing behaviour, e.g. through logs)
8
9. Severity ratings
• Severity ratings help focus attention on what
really matters
– Low: any issue that annoys or frustrates participants
but does not play a role in task failure
– Medium: any issue that contributes to significant task
difficulty but does not cause task failure
– High: any issue that leads directly to task failure;
encountering this issue will stop the user from
complete the task
9
10. Severity ratings: 2 factors
• Severity rating can also use a combination
of 2 factors – typically frequency and
impact
10
11. Severity ratings: 4 factors
• You can also use four three-point
• scales (low, medium, high)
• Impact on the user experience
• Predicted frequency of occurrence
• Impact on the business goals
• Technical/implementation costs
11
12. Using a severity rating system
• What does each level mean? is it clear to
the team?
• Have more than one usability specialist
assigning severity ratings to each issue!
– How do you establish the final rating? How do
you address differences in the evaluation?
• Track the usability issues!
12
13. Analysing usability issues
• How is the overall usability of the product?
• Is the usability improving with each design
iteration?
• Where should you focus your efforts to
improve the design?
13
14. Analysing usability issues (2)
• Analysing usability issues typically focuses
on identifying
– Unique issues
– Issues per participant
– Frequency per participant
– Issues by category
– Issues by task
14
15. Consistency in identifying
usability issues
• Research shows very little agreement on what a
usability issue is or how severe it is
• A set of studies coordinated by Molich, with
different teams of usability experts evaluating
the same design, showed that there is vey little
overlap in the findings of the teams
– Molich & Dumas (2008) showed that 60% of all the
issues were identified by only 1 of the 17 teams
participating in the study
15
16. Number of participants: five
users is enough
• About 80% of usability issues will be
observed with the first five participants
(Nielsen & Landauer, 1993)
16
17. Number of participants: five
participants is not enough
• Lindgaard and Chattratichart (2007) tested
a web site with a known number of issues
– 2 teams (6 and 12 participants)
– 42% and 43% of the usability issues in a web
site found – but only 28% in common!
17
19. What are self-reported metrics?
• Self reported metrics relate to the
perception of user interaction with an
interface
– They focus on subjective data
19
20. Collecting self-reported metrics
• Answer questions or provide ratings orally
– This is typically done through interviews
• Record responses on a paper form, or with
some type of online tool (questionnaires)
20
21. Interviews
• Unstructured - not directed by a script.
Rich but not replicable.
• Structured - tightly scripted, often like a
questionnaire. Replicable but may lack
richness.
• Semi-structured - guided by a script but
interesting issues can be explored in more
depth. Can provide a good balance
between richness and replicability.
21
22. Closed vs. open questions
• ‘Closed questions’ have a predetermined
answer format, e.g., ‘yes’ or ‘no’
– Easier to analyse
• ‘Open questions’ do not have a
predetermined format
– Can allow to better explore research topics
22
23. Questions to avoid
• Long questions
• Compound sentences - split them into two
• Jargon and language that the interviewee may
not understand
• Leading questions that make assumptions
– e.g., why do you like …? Or
– Asking a question that the respondent is not qualified
to answer
• Unconscious biases, e.g. gender stereotypes
23
24. Running the interview
• Introduction – introduce yourself, explain the goals of
the interview, reassure about the ethical issues, ask to
record, present any informed consent form.
• Warm-up – make first questions easy and non-
threatening.
• Main body – present questions in a logical order
• A cool-off period – include a few easy questions to
defuse tension at the end
• Closure – thank interviewee and signal the end,
e.g. switch recorder off.
24
25. Enriching the interview process
25
• Use props - devices for prompting interviewee,
e.g. a prototype or a scenario
26. Questionnaires
• Questions can be closed or open
– Closed questions are easier to analyse, and
may be done by computer
• Can be administered to large populations
– Paper, email and the web used for
dissemination
• Sampling can be a problem when the size
of a population is unknown as is common
online
26
27. Questionnaire design
• Provide clear instructions on how to
complete the questionnaire
• Decide on whether phrases will all be
positive, all negative or mixed
• Different versions of the questionnaire
might be needed for different populations
• The impact of a question can be
influenced by question order
27
28. Question and response format
• Questionnaires can include:
– Binary choices
– Checkboxes that offer many options
– Rating scales
• Likert scales
• Semantic scales
– Open-ended questions
28
29. Encouraging a good response
• Make sure purpose of study is clear
• Ensure questionnaire is well designed
– Consider offering a short version for those who do not
have time to complete a long questionnaire
• Promise anonymity
• Follow-up with emails, phone calls, letters
• Provide an incentive
• 40% response rate is high, 20% is often
acceptable
29
30. On-line questionnaires
• Responses are usually
received quickly
• No copying and/or
postage costs
• Data can be easily
collected in database for
analysis
• Time required for data
analysis is reduced
• Errors can be corrected
easily
30
32. Problems with online
questionnaires
• Sampling is problematic if population size
is unknown
• Preventing individuals from responding
more than once
32
33. Analysing data
• When analysing data from rating scales,
use frequency distribution of the
responses (rather than average or
median)
33
34. System usability scale
• One of the most widely used tools for
assessing the perceived usability of a
system (Brooke, 1996)
• 10 statements to which users rate their
level of agreement
– Half the statements are worded positively and
half are worded negatively.
– A five-point scale of agreement is used for
each
34
35. System usability scale (2)
• A technique for combining the 10 ratings into an
overall score (on a scale of 0 to 100) is also
given
35
36. System usability scale
(questions 1-5)
• I think that I would like to use this system
frequently
• I found the system unnecessarily complex
• I thought the system was easy to use
• I think that I would need the support of a
technical person to be able to use this
system
• I found the various functions in this
system were well integrated
36
37. System usability scale
(questions 6-10)
• I thought there was too much
inconsistency in this system
• I would imagine that most people would
learn to use this system very quickly
• I found the system very cumbersome to
use
• I felt very confident using the system
• I needed to learn a lot of things before I
could get going with this system
37
39. System usability scale: score
• Sum the score contributions from each item
– For items 1, 3, 5, 7, and 9, the score contribution is
the scale position minus 1
– For items 2, 4, 6, 8, and 10, the contribution is 5
minus the scale position
• Multiply the sum of the scores by 2.5 to obtain
the overall SUS score:
– <50: Not acceptable
– 50–70: Marginal
– >70: Acceptable
39
40. Usability scales/questionairres
• There are also other scales:
– Post-Study System Usability Questionnaire
and Computer System Usability
Questionnaire (Lewis, 1995)
– Questionnaire for User Interface Satisfaction
(Chin, Diehl, & Norman, 1988)
– Product Reaction Cards (Benedek and Miner,
2002)
– More here:
http://oldwww.acm.org/perlman/question.html
40
41. Assessing attributes
• The techniques described in the previous pages are
typically used to assess interfaces or tasks as a whole
• You can also look at specific attributes of an interface:
– Visual appeal
– Perceived efficiency
– Confidence
– Usefulness
– Enjoyment
– Credibility
– Appropriateness of terminology
– Ease of navigation
– Responsiveness
41
42. Biases in self-reported data
• Answers provided in person or over the
phone tend to be more positive than
through an anonymous survey (Dillman
et al., 2008)
42