Erfaringer med Remote Usability Testing af Jan Stage, AAU

Erfaringer med Remote Usability Testing?

Jan Stage

Professor, PhD
Forskningsleder i Informationssystemer (IS)/Human-Computer Interaction (HCI)
Aalborg Universitet, Institut for Datalogi, HCI-Lab

jans@cs.aau.dk

Oversigt
• Undersøgelse 1
• Undersøgelse 2

Institut for Datalogi 2

Oversigt
• Undersøgelse 1: synkron eller asynkron
• Metode
• Resultater
• Konklusion
• Undersøgelse 2


Empirical Study 1
Four methods: LAB – RS – AE – AU
Test subjects: 6 in each condition (18 users and 6 with usability expertise), all
students at Aalborg University
System: Email client (Mozilla Thunderbird 1.5)
9 defined tasks (typical email functions)
Setting, procedure and data collection in accordance with method
Data analysis: 24 outputs were analysed by three persons in random and
different order
Generated their individual lists of usability problems with their own
categorizations (also for the AE and AU conditions)
These were merged into an overall problem list through negotiation

Results: Task Completion
No significant difference in task completion
Significant difference in task completion
time
The users in the two asynchronous
conditions spent considerably more
time
We do not know the reason


Results: Usability Problems Identified

A total of 46 usability problems
No significant difference between LAB and RS
AE/AU identified significantly fewer problems, also critical problems
No significant difference between AE and AU in terms of problems identified


Conclusion
RS is the most widely described and used remote method. The performance
is virtually equivalent to LAB (or slightly better)
AE and AU perform surprisingly well
Experts do not perform significantly better than users
Video analysis (LAB and RS) required considerably more evaluator effort than
the user-based reporting (AU and AE)
Users can actually contribute to usability evaluation – not with the same
quality, but reasonably well, and there are plenty of them


Oversigt
• Undersøgelse 1
• Undersøgelse 2: hvilken asynkron metode
• Metode
• Resultater
• Konklusion


Empirical Study 2
Purpose: examine and compare remote asynchronous methods
Focus on usability problems identified
Comparable with the previous study
Selection of asynchronous methods based on literature survey


The 3 Remote Asynchronous Methods
User-reported critical incident (UCI)
• Well-defined method (Castillo et al. CHI 1998)
Forum-based online reporting and discussion (Forum)
• Assumption: through collaboration participants may give input which increases data
quality and richness (Thompson, 1999)
• A source for collecting qualitative data in a study of auto logging (Millen, 1999): the
participants turned out to report detailed usability feedback
Diary-based longitudinal user reporting (Diary)
• Used on a longitudinal basis for participants in a study of auto logging to provide
qualitative information (Steves et al. CSCW 2001)
• First day: same tasks as the other conditions (first part of diary delivered)
• Four more days: new tasks (same type) sent daily (complete diary delivered)
Conventional user-based laboratory test (Lab)
• Included as benchmark


Empirical Study (1)
Participants:
• 40 test subjects, 10 for each condition
• Students, age 20 to 30
• Distributed evenly: gender and tech/non-tech education
Setting:
• LAB: in our usability lab
• Remote asynchronous: in the participants’ homes
Participants in the remote asynchronous conditions received the software and
installed it on their computer
Training material for the remote asynchronous conditions
• Identification and categorisation of usability problems
• A minimalist approach that was strictly remote and asynchronous (via email)


Empirical Study (2)
Tasks:
• Nine fixed tasks
• The same across the four conditions to ensure that all participants used the
same parts of the system
• Typical email tasks (same as previous study)
Data collection in accordance with the method
• LAB: video recordings
• UCI: web-based system for generating problem descriptions while solving tasks
• Forum: after solving tasks, one week for posting and discussing problems
• Diary: a diary with no imposed structure; first part after the first day


Data Analysis
All data collected before the data analysis started
3 evaluators did the whole data analysis
The 40 data sets were analysed by the 3 evaluators
• In random order: by a draw
• In different order between them
The user input from the three remote conditions was transformed into
usability problem descriptions
Each evaluator generated his/her own individual lists of usability problems with
their own severity ratings
• A problem list for each condition
• A complete problem list (joined)
These were merged into an overall problem list through negotiation


Results: Task Completion Time
Considerable variation in task completion times

Participants in the remote conditions worked in their home at a time they
selected
For each task there was a hint that allowed them to check if they had solved
the task correctly
As we have no data on the task solving process in the remote conditions, we
cannot explain this variation

Results: Usability Problems Identified
Lab UCI Forum Diary
N=10 N=10 N=10 N=10
Task completion time in Tasks 1-9:
minutes: Average (SD) 24.24 (6.3) 34.45 (14.33) 15.45 (5.83) 32.57 (28.34)

Usability problems: # % # % # % # %
Critical (21) 20 95 10 48 9 43 11 52
Serious (17) 14 82 2 12 1 6 6 35
Cosmetic (24) 12 50 1 4 5 21 12 50
Total (62) 46 74 13 21 15 24 29 47

LAB: significantly better than the 3 remote conditions
UCI-Forum: no significant difference
UCI-Diary: significant overall: Diary – also significant on cosmetic
Forum-Diary: significant overall: Diary – not significant on any level


Results: Evaluator Effort
Lab UCI Forum Diary
(46) (13) (15) (29)
Preparation 6:00 2:40 2:40 2:40
Conducting test 10:00 1:00 1:00 1:30
Analysis 33:18 2:52 3:56 9:38
Merging problem lists 11:45 1:41 1:42 4:58
Total time spent 61:03 8:13 9:18 18:46
Avg. time per problem 1:20 0:38 0:37 0:39

The sum for all evaluators involved in each activity
Time for finding test subjects is not included (8h, common for all)
Task specifications from an earlier study. Preparation in the remote
conditions: work out written instructions
Considerable differences between the remote conditions for analysis and
merging of problem lists

Conclusion
The three remote methods performed significantly below the classical lab test
in terms of the number of usability problems identified
The Diary was the best remote method – it identified half of the problems
found in the Lab condition
UCI and Forum performed similarly for critical problems but worse for
serious problems
UCI and Forum took 13% of the lab test. Diary took 30%
The productivity of the remote methods was considerably higher


Interaktionsdesign og usability-evaluering
Master i IT
Videreuddannelse under IT-Vest
Fagpakke i Interaktionsdesign og usability-evaluering starter 1/2-12
Optager bachelorer, men også indgang for datamatikere
Information: http://www.master-it-vest.dk/


Erfaringer med Remote Usability Testing af Jan Stage, AAU

Recommended

Recommended

More Related Content

Similar to Erfaringer med Remote Usability Testing af Jan Stage, AAU

Similar to Erfaringer med Remote Usability Testing af Jan Stage, AAU (20)

More from InfinIT - Innovationsnetværket for it

More from InfinIT - Innovationsnetværket for it (20)

Recently uploaded

Recently uploaded (20)

Erfaringer med Remote Usability Testing af Jan Stage, AAU