Reliability adequateness of the task scale of satisfaction Adaptation to training user scale of satisfaction for advances functionalities Learnability scale of perception of the facility of learning Robustness tolerant to errors scale of satisfaction in management of errors
Giving a list of tasks to be executed in a subject at the beginning of the experimentation
Well chosen the proposed tasks base on what we want to evaluate
The task take user to concentrate on the parts of interface where the evaluation is holding
Well calculate the dimension of time for each task (objective come from requirement analyze, compare with other existing software, …)
Estimate the necessary time in average and define an average proportion of exceed acceptability (cf. metrics): variety of inter-individual
Ensure that the statement of task is clearly enough for the comprehension of a novice or a primo-users
EVALUATION PLAN An evaluation will not provide any result unless if it is well prepared
[Basili et al. 1994]
What are the general objectives of the evaluation?
What are the specific questions for which we want to obtain an answer?
What is the paradigm and techniques of test which are necessary to achieve these objectives?
How to organize in practice an evaluation: users recruitment, users preparation, collection tools/devices, …
Ensure the respect of deontology rules in vigor
How to examine, interpret and present collected information?
EVALUATION PLAN What paradigm of evaluation to use? Objective (observation) Problems detections Broad range Modify behavior Subjective Less expensive Usage opinion Precision Response rate Predictive (model) Non necessary system Less expensive Limit range Predictive (expert) Expertise May miss out the problems
EVALUATION PLAN When do we use a particular paradigm of evaluation? Field studies Predicative Laboratory usability Quick and dirty
Remark : experimental studies / evaluation “quick and dirty”
Analyze the tests
Multi-criteria analyze: distributes the results follow the different characteristics
Statistic pertinent of results
Separate discipline : statistic (protocols and tests)
EVALUATION AND DIVERSITY OF USERS Analyze the statistic of result 1 2 3 4 5 6 7 Average Age 37 41 43 54 46 44 21 40.9 Sex F F M M F F M 4F, 3M Education level 4 2 4 4 4 1 2 3.0 PC years 5 2 0 2 6 4 9 4.0 Usage facility 1 2 2 1 2 3 1 1.7 Help quality 1 3 3 1 3 2 2 2.1
Nielsen J. (1993) Usability enginerring. Academic Press.
Chin J., Diehl V., Norman K. (1988) Development of an instrument measuring user satisfaction of the human-computer interface. Actes ACM CHI’88 Human Factors in Computing Systems. 213-218.
Dumas J., Redish J. (1999) A practical guide to usability testing. Intellect, Exeter, UK.
Lewis J. (1995) IBM computer usability satisfaction questionnaires : psychometric evaluation and instructions for use. International Journal of Human-Computer Interaction, 7 (1), 57-78.
Kirakowski J., Corbett M. (1993) SUMI : the Software Usability Measurement Inventory. British Journal of Educational Technology, 24(3), 210-212.
Nielsen J., Mack R., Bergendorf K., Grischkomswy N. (1986) Integrated software usage in the professional work environment : evidence from questionnaires and interviews. Actes CHI’86, New-York, NJ., ACM Press. 162-167.
Nielsen J. and Mach R. (Eds.) (1994) Usability inspection methods. John Wiley & Sons., New-York, NJ.
Whiteside J., Bennet J., Holtzblatt K. (1988) Usability engineering: our experience and evolution. In Helander M. (Ed.) Handbook of Human-Computer Interaction. North- Holland, Amsterdam.