4. March 10, 2011 IAT 334 4
Evaluation
Earlier:
– Interpretive and Predictive
• Heuristic evaluation, walkthroughs, ethnography…
Now:
– User involved
• Usage observations, experiments, interviews...
5. March 10, 2011 IAT 334 5
Evaluation Forms
Summative
– After a system has been finished. Make
judgments about final item.
Formative
– As project is forming. All through the
lifecycle. Early, continuous.
6. March 10, 2011 IAT 334 6
Evaluation Data Gathering
Design the experiment to collect the data
to test the hypothesis to evaluate the
interface to refine the design
Information we gather about an interface can
be subjective or objective
Information also can be qualitative or
quantitative
– Which are tougher to measure?
7. March 10, 2011 IAT 334 7
Subjective Data
Satisfaction is an important factor in
performance over time
Learning what people prefer is valuable
data to gather
8. March 10, 2011 IAT 334 8
Methods
Ways of gathering subjective data
– Questionnaires
– Interviews
– Booths (eg, trade show)
– Call-in product hot-line
– Field support workers
9. March 10, 2011 IAT 334 9
Questionnaires
Preparation is expensive, but
administration is cheap
Oral vs. written
– Oral advs: Can ask follow-up questions
– Oral disadvs: Costly, time-consuming
Forms can provide better quantitative
data
10. March 10, 2011 IAT 334 10
Questionnaires
Issues
– Only as good as questions you ask
– Establish purpose of questionnaire
– Don’t ask things that you will not use
– Who is your audience?
– How do you deliver and collect questionnaire?
11. March 10, 2011 IAT 334 11
Questionnaire Topic
Can gather demographic data and data
about the interface being studied
Demographic data:
– Age, gender
– Task expertise
– Motivation
– Frequency of use
– Education/literacy
12. March 10, 2011 IAT 334 12
Interface Data
Can gather data about
– screen
– graphic design
– terminology
– capabilities
– learning
– overall impression
– ...
13. March 10, 2011 IAT 334 13
Question Format
Closed format
– Answer restricted to a set of choices
Characters on screen
hard to read easy to read
1 2 3 4 5 6 7
14. March 10, 2011 IAT 334 14
Closed Format
Likert Scale
– Typical scale uses 5, 7 or 9 choices
– Above that is hard to discern
– Doing an odd number gives the neutral
choice in the middle
15. March 10, 2011 IAT 334 15
Closed Format
Advantages
– Clarify alternatives
– Easily quantifiable
– Eliminates useless
answers
Disadvantages
– Must cover whole
range
– All should be equally
likely
– Don’t get interesting,
“different” reactions
16. March 10, 2011 IAT 334 16
Issues
Question specificity
– “Do you have a computer?”
Language
– Beware terminology, jargon
Clarity
Leading questions
– Can be phrased either positive or negative
17. March 10, 2011 IAT 334 17
Issues
Prestige bias
– People answer a certain way because they
want you to think that way about them
Embarrassing questions
Hypothetical questions
“Halo effect”
– When estimate of one feature affects
estimate of another (eg, intelligence/looks)
18. March 10, 2011 IAT 334 18
Deployment
Steps
– Discuss questions among team
– Administer verbally/written to a few people
(pilot). Verbally query about thoughts on
questions
– Administer final test
19. March 10, 2011 IAT 334 19
Open-ended Questions
Asks for unprompted opinions
Good for general, subjective information,
but difficult to analyze rigorously
May help with design ideas
– “Can you suggest improvements to this
interface?”
20. March 10, 2011 IAT 334 20
Ethics
People can be sensitive about this process and
issues
Make sure they know you are testing
software, not them
Attribution theory
– Studies why people believe that they succeeded or
failed--themselves or outside factors (gender, age
differences)
Can quit anytime
21. March 10, 2011 IAT 334 21
Objective Data
Users interact with interface
– You observe, monitor, calculate, examine,
measure, …
Objective, scientific data gathering
Comparison to interpretive/predictive
evaluation
22. March 10, 2011 IAT 334 22
Observing Users
Not as easy as you think
One of the best ways to gather feedback
about your interface
Watch, listen and learn as a person
interacts with your system
23. March 10, 2011 IAT 334 23
Observation
Direct
– In same room
– Can be intrusive
– Users aware of your
presence
– Only see it one time
– May use
semitransparent mirror
to reduce
intrusiveness
Indirect
– Video recording
– Reduces intrusiveness,
but doesn’t eliminate it
– Cameras focused on
screen, face &
keyboard
– Gives archival record,
but can spend a lot of
time reviewing it
24. March 10, 2011 IAT 334 24
Location
Observations may be
– In lab - Maybe a specially built usability lab
• Easier to control
• Can have user complete set of tasks
– In field
• Watch their everyday actions
• More realistic
• Harder to control other factors
25. March 10, 2011 IAT 334 25
Challenge
In simple observation, you observe
actions but don’t know what’s going on in
their head
Often utilize some form of verbal protocol
where users describe their thoughts
26. March 10, 2011 IAT 334 26
Verbal Protocol
One technique: Think-aloud
– User describes verbally what s/he is thinking
and doing
• What they believe is happening
• Why they take an action
• What they are trying to do
27. March 10, 2011 IAT 334 27
Think Aloud
Very widely used, useful technique
Allows you to understand user’s thought
processes better
Potential problems:
– Can be awkward for participant
– Thinking aloud can modify way user performs
task
28. March 10, 2011 IAT 334 28
Teams
Another technique: Co-discovery learning
– Join pairs of participants to work together
– Use think aloud
– Perhaps have one person be semi-expert
(coach) and one be novice
– More natural (like conversation) so removes
some awkwardness of individual think aloud
29. March 10, 2011 IAT 334 29
Alternative
What if thinking aloud during session will
be too disruptive?
Can use post-event protocol
– User performs session, then watches video
afterwards and describes what s/he was
thinking
– Sometimes difficult to recall
30. March 10, 2011 IAT 334 30
Historical Record
In observing users, how do you capture
events in the session for later analysis?
31. March 10, 2011 IAT 334 31
Capturing a Session
1. Paper & pencil
– Can be slow
– May miss things
– Is definitely cheap and easy
Time 10:00
10:03
10:08
10:22
Task 1 Task 2 Task 3 …
S
e
S
e
32. March 10, 2011 IAT 334 32
Capturing a Session
2. Audio tape
– Good for talk-aloud
– Hard to tie to interface
3. Video tape
– Multiple cameras probably needed
– Good record
– Can be intrusive
33. March 10, 2011 IAT 334 33
Capturing a Session
4. Software logging
– Modify software to log user actions
– Can give time-stamped key press or mouse
event
– Two problems:
• Too low-level, want higher level events
• Massive amount of data, need analysis tools
34. March 10, 2011 IAT 334 34
Assessing Usability
Usability Specifications
– Quantitative usability goals, used a guide for
knowing when interface is “good enough”
– Should be established as early as possible in
development process
35. March 10, 2011 IAT 334 35
Measurement Process
“If you can’t measure it, you can’t
manage it”
Need to keep gathering data on each
iterative refinement
36. March 10, 2011 IAT 334 36
What to Measure?
Usability attributes
– Initial performance
– Long-term performance
– Learnability
– Retainability
– Advanced feature usage
– First impression
– Long-term user satisfaction
37. March 10, 2011 IAT 334 37
How to Measure?
Benchmark Task
– Specific, clearly stated task for users to carry
out
Example: Calendar manager
– “Schedule an appointment with Prof. Smith
for next Thursday at 3pm.”
Users perform these under a variety of
conditions and you measure performance
38. March 10, 2011 IAT 334 38
Assessment Technique
Usability Measure Value to Current Worst Planned Best poss Observ
attribute instrument be measured level acc level target level level results
Initial Benchmk Length of 15 secs 30 secs 20 secs 10 secs
perf task time to (manual)
success add
appt on
first trial
First Quest -2..2 ?? 0 0.75 1.5
impression
40. March 10, 2011 IAT 334 40
Summary
Value to be measured
– Time to complete task
– Number of percentage of errors
– Percent of task completed in given time
– Ratio of successes to failures
– Number of commands used
– Frequency of help usage
41. March 10, 2011 IAT 334 41
Summary
Target level
– Often established by comparison with
competing system or non-computer based
task
42. Ethics
Testing can be arduous
Each participant should consent to be in
experiment (informal or formal)
– Know what experiment involves, what to
expect, what the potential risks are
Must be able to stop without danger or
penalty
All participants to be treated with respect
Nov 2, 2009 IAT 334 42
43. Consent
Why important?
– People can be sensitive about this process and issues
– Errors will likely be made, participant may feel
inadequate
– May be mentally or physically strenuous
What are the potential risks (there are always
risks)?
– Examples?
“Vulnerable” populations need special care &
consideration (& IRB review)
– Children; disabled; pregnant; students (why?)
Nov 2, 2009 IAT 334 43
44. Before Study
Be well prepared so participant’s time is not
wasted
Make sure they know you are testing software,
not them
– (Usability testing, not User testing)
Maintain privacy
Explain procedures without compromising
results
Can quit anytime
Administer signed consent form
Nov 2, 2009 IAT 334 44
45. During Study
Make sure participant is comfortable
Session should not be too long
Maintain relaxed atmosphere
Never indicate displeasure or anger
Nov 2, 2009 IAT 334 45
46. After Study
State how session will help you improve system
Show participant how to perform failed tasks
Don’t compromise privacy (never identify
people, only show videos with explicit
permission)
Data to be stored anonymously, securely,
and/or destroyed
Nov 2, 2009 IAT 334 46