A short presentation given in Sept 2009 to finish my masters degree. The project experiment found that evaluators working collaboratively could identify more usability problems and reach a significantly higher level of inter-evaluator agreement than using the traditional method of heuristic evaluation.
10. Still very popular
• Usability scores for Supermarket websites
Chen 2005
• Generate user protocols for medical
equipment Zhang 2003
• Comparing library websites Peng 2004
15. Standard Heuristic
Evaluation (SHE)
“each individual evaluator
inspect the interface alone......
to ensure independent and
unbiased evaluations from each
evaluator’’
Nielsen 1994
16. Collaborative Heuristic
Evaluation (CHE)
• Collaborative inspection to find potential
problems
• Individual evaluation with secret voting
17. Experimental design
Group 1 Group 2
National Rail Easy Jet
CHE Visit Britain British Towns
Easy Jet National Rail
SHE British Towns Visit Britain
18. Research questions
• find same problems ?
• find more severe problems ?
• find more reliable problems ?
• use heuristics better ?
19. Research questions
• find same problems ? - NO
• find more severe problems ? - maybe
• find more reliable problems ? - YES !
• use heuristics better ? - NO
27. Future research
• Explore social processes
• e.g. remote voting
• Explore how experts perform their role
• e.g. compare heuristics & severity
between experts
• Use it to train & evaluate novices
Editor's Notes
scenarios + think aloud user testing + heuristic evaluation
10 heuristics - available on his website useit.com
So how does it work... take John
then Alesha
she finds 25, merge them together...
and makes 56 ... add Carolyn
she finds 29 and merge them together...
and fab we have 78
Lots of studies and found 50-75% of problems at 5 evaluators / users
had advantage of knowing how many problems there were
Add severity ratings of all problems in each heuristic, weight them then sum to find global score (Sainsburys won!)
Infusion pumps cant be changed but can spot the problems and help ensure people dont make them again
been going so long, what is left to ask?
back to the 78 usability problems...
..they are made up of three individual sets
most found by just one evaluator
If you were making a change based on problem found by one would you feel confident?
Was it overlooked by the other 4 (assuming we have quorum) or was it rejected as a problem by 4?
Confirmed in the literature - Gilbert Cockton and Alan Woolrych Sunderland
Problem matching - Ebba Hvannberg and Effie Law
Why? work independently so see different pages, or in different ways
Together in same room, see the same interface at the same time
And keep the unbiased approach with no discussion and secret voting
2x2
individual differences - superstar evaluators
compare problems, two conditions same website
imagine 6th evaluator - dont expect to overlap much, and viewed different pages
2 way ANOVA CHE vs SHE - but most found by only one so question the premise
Vague heuristics question - no pattern of usage, very little agreement
National Rail
most problems found by only 1 evaluator, none by all
Stoven Karau - good round up of studies
Irving Janis - group think - discussion comes to wrong conclusions, but we banned discussion
Social loafing - he said it was so I’m just gonna agree,
Could not loaf on their decision of severity rating & could not know what other evaluators were voting