7. evalution of interactive system


Published on


Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A priori: base on theory, logic, fixe rules or forms, etc.
  • Paradigm: pattern, example, model Cognitive walkthrough: focusing on how easy it is for new users to accomplish tasks with the system Heuristic: Helping to discover or learn
  • Evident: easy to see or preview
  • Flagrant : evident Metrique : qui a rapport au m ètre
  • Retention : a retaining or being retained
  • Déontologie : Théorie des obligations morale qui régissent une profession Vigueur : Force pour agir (physique ou morale)
  • 7. evalution of interactive system

    2. 2. CHAPTER 7: EVALUATION OF INTERACTIVE SYSTEM Software life cycle concerning also user interaction
    3. 3. EVALUATION OF INTERACTION <ul><li>Why evaluate? </li></ul><ul><li>Intuition of designer of system does not sufficient </li></ul><ul><li>Formal modelisation of system and interaction does not cover/include all the choices of conceptions </li></ul><ul><li>The recommendations (guidelines) are still safeguards and best practices are too general to cover all the aspects of a specific interaction </li></ul><ul><li>Software life cycle concerning equally the interaction </li></ul><ul><li>Spiral life cycle with prototyping </li></ul><ul><li>Evaluation to all steps of development </li></ul>
    4. 4. EVALUATION OF INTERACTION <ul><li>How to evaluate? </li></ul><ul><li>With users: experimentations </li></ul><ul><li>Without users: a priori </li></ul>
    5. 5. EVALUATION OF INTERACTION <ul><li>Paradigm of evaluation </li></ul><ul><li>A priori / Heuristic evaluation </li></ul><ul><ul><li>A priori evaluation: review of expert, cognitive walkthrough, … </li></ul></ul><ul><ul><li>Predictive model: Fitts’s Law, Keystroke Model, etc… </li></ul></ul><ul><li>Experimental evaluation </li></ul><ul><ul><li>Subjective evaluation </li></ul></ul><ul><ul><li>Test the usability with potential users </li></ul></ul><ul><ul><li>Test the acceptability with a sample population </li></ul></ul><ul><ul><li>Post-commercialization (or test version) evaluation </li></ul></ul><ul><ul><li>Cognitive experimentations </li></ul></ul>
    6. 6. A PRIORI EVALUATION <ul><li>Predictive model </li></ul><ul><li>GOMS, Keystroke, Fitts … (cf. chapter VI) </li></ul><ul><li>Heuristic evaluation (Nielsen & Mack, 1994) </li></ul><ul><li>Review the system by one or many experts: simulations the usage </li></ul><ul><li>Validation of some heuristic ergonomics (cf. heuristic of Nielsen, chapter II) </li></ul><ul><li>On screen specification and interaction specification (a priori evaluation), or on existing system or prototype </li></ul><ul><li>Cognitive walkthrough </li></ul><ul><li>Usability inspection method used to identify usability </li></ul><ul><li>Focusing on how easy it is for new users to accomplish tasks with the system </li></ul>
    7. 7. A PRIORI EVALUATION <ul><li>Cognitive walkthrough </li></ul><ul><li>Specification of intended user and system to develop using screen flow (businesses flow) </li></ul><ul><li>Evaluation a priori by experts with presence of designer </li></ul><ul><li>The evaluation walk through the screen by simulating the realization of tasks follow the credible scenario. It evaluate </li></ul><ul><ul><li>If the action to realize happen to be evident to user </li></ul></ul><ul><ul><li>If users can easily perceive the action to be realize is available </li></ul></ul><ul><ul><li>If users can see the result of the action and they can interpret it correctly </li></ul></ul><ul><li>Critical review of the evaluation with designer </li></ul><ul><li>Document of syntheses </li></ul>(Nielsen & Mack, 1994) (Spencer, 2000)
    8. 8. EXPERIMENTAL EVALUATION <ul><li>Laboratory usability </li></ul><ul><li>Room equipped with all the equipments allowing to observe a user working or interacts with the system </li></ul><ul><li>Observers close to the subject, or hidden (in annex room) </li></ul><ul><li>Record video, sound, log file </li></ul><ul><li>Subject describing his experience in direct (think aloud or cooperative evaluate with observers) or containing the realizing of task </li></ul><ul><li>Example : IBM (Boca Raton, Floride), Microsoft, Sun, … </li></ul><ul><li>Field studies </li></ul><ul><li>Condition more ecologic compare to laboratory usability </li></ul><ul><li>Limitation </li></ul><ul><li>Evaluate more often during first time of deployment: no track on learning over the time </li></ul><ul><li>Do not allow a large coverage of functionalities </li></ul>
    9. 9. SUBJECTIVE EVALUATION <ul><li>Principle: opinion post-utilization </li></ul><ul><li>Session of utilization of the system with a subject following by a task or a scenario clearly defined </li></ul><ul><li>Interrogation of the subjects to ask them for their opinion </li></ul><ul><li>Different techniques </li></ul><ul><li>Open or oriented interview (answer to the predefined questions) </li></ul><ul><li>Questionnaire: scale of values on specific point/issue </li></ul>
    10. 10. SUBJECTIVE EVALUATION <ul><li>Open interview </li></ul><ul><li>The subject address the points which haven’t remark or which may haven’t yet taking attention by designer </li></ul><ul><li>Lack of homogeneity of opinion, vary precision: difficult synthetic analyze </li></ul><ul><li>Conducted interview </li></ul><ul><li>Open or close question testing precise opinion </li></ul><ul><li>Structuring evaluation: analyze facilities </li></ul><ul><li>Have you ever reserve a hotel online? □ yes □ no </li></ul><ul><li>This functionality does it seem interesting to you? □ yes □ no </li></ul><ul><li>Can you easily complete the hotel reservation? □ yes □ no </li></ul><ul><li>Does this take you much time? □ yes □ no </li></ul>
    11. 11. SUBJECTIVE EVALUATION Semi-structuring interview (Nielsen et al., 1986) <ul><li>Why do you do this ? </li></ul><ul><li>knowing the objective of user </li></ul><ul><li>How do you do it ? </li></ul><ul><li>retrieving the sub-task to apply recursively the questions </li></ul><ul><li>Why do you do this in the following manner? </li></ul><ul><li>knowing the choices of user </li></ul><ul><li>What are the precondition for doing this? </li></ul><ul><li>evaluate if user understand the condition to start the action </li></ul><ul><li>What are the result of doing this? </li></ul><ul><li>Do errors ever occur when doing this? </li></ul><ul><li>How do you discover and correct these errors? </li></ul>
    12. 12. SUBJECTIVE EVALUATION <ul><li>Questionnaire </li></ul><ul><li>Users, sometime, have difficulty to give the edge opinion </li></ul><ul><li>Subjective evaluation in an interval/scale of multi-values, in a Lickert scale or a scale of preference </li></ul><ul><li>Example </li></ul><ul><li>QUIS (Chin et al. 1988) </li></ul><ul><li>IBM Post-Study System Usability Questionnaire (Lewis 1995) </li></ul><ul><li>Software Usability Measurement Inventory (Kirakowski et Corbett 1993) </li></ul><ul><li>Evaluate from 1 (poor) to 4 (excellent) your affirmation with following statement </li></ul><ul><li>This functionality is interesting □ 1 □ 2 □ 3 □ 4 </li></ul><ul><li>It is easy to reserve with the system □ 1 □ 2 □ 3 □ 4 </li></ul><ul><li>The time of reservation is acceptable □ 1 □ 2 □ 3 □ 4 </li></ul>
    13. 13. SUBJECTIVE EVALUATION QUIS (Questionnaire for User Interaction Satisfaction) www.lap.umd.edu/QUIS/ <ul><li>Past experience on tested system </li></ul><ul><li>Past experience on other systems </li></ul><ul><li>General opinion of users on the system </li></ul><ul><li>Display </li></ul><ul><li>Terminology usage and information provided by system </li></ul><ul><li>Learnability </li></ul><ul><li>Paper documentation and online help </li></ul><ul><li>Online documentation </li></ul><ul><li>Multimedia </li></ul><ul><li>Teleconference and collaborative work </li></ul><ul><li>System installation </li></ul>
    14. 14. SUBJECTIVE EVALUATION <ul><li>Subjective evaluative : what criteria of quality? </li></ul><ul><li>Example: norm ISO 9241 </li></ul>Reliability adequateness of the task scale of satisfaction Adaptation to training user scale of satisfaction for advances functionalities Learnability scale of perception of the facility of learning Robustness tolerant to errors scale of satisfaction in management of errors
    15. 15. OBJECTIVE EVALUATION <ul><li>Principles: observation post-utilization </li></ul><ul><li>Session of utilization of the system with a subject following by a task or a scenario clearly defined </li></ul><ul><li>Observation and/or recording of session and data examination </li></ul><ul><li>Data analyzes </li></ul><ul><li>Different approaches </li></ul><ul><li>Qualitative evaluation </li></ul><ul><li>Quantitative evaluation </li></ul>
    16. 16. EXAMINE THE OBSERVATION <ul><li>Qualitative evaluation </li></ul><ul><li>Search for the problem of the utilization the most flagrant: sample cases </li></ul><ul><li>Quantitative evaluation </li></ul><ul><li>Calculation of metric (ex: % of errors …) using the observed data </li></ul><ul><li>Analyze of videos </li></ul><ul><li>Transcription and analyze of verbalization of the subjects </li></ul><ul><li>Analyze the notes of the observer </li></ul><ul><li>Examine the log of data: key presses count, examine data by using the log file of the WWW server </li></ul>
    17. 17. OBJECTIVE EVALUATION: TEST OF USABILITY <ul><li>Quantitative metrics characterizing the quality of interaction </li></ul><ul><li>Example (Whiteside, Bennett and Holtzblatt 1988) </li></ul><ul><li>Execution time of a task </li></ul><ul><li>% of task completely executed </li></ul><ul><li>Ratio session success /failure </li></ul><ul><li>Number of errors </li></ul><ul><li>Distribution of number of errors for different subjects </li></ul><ul><li>Wasting time on the errors </li></ul><ul><li>Number of commands used to accomplish the task </li></ul><ul><li>Frequency use of Help and documentation </li></ul><ul><li>% of positive / bad comments (thing aloud) </li></ul><ul><li>Number of repetition of an error command </li></ul><ul><li>Number of commands invoked, but not used </li></ul><ul><li>Number of times the subject was distracted from the exact task </li></ul><ul><li>Number of times the subject has lost control of the system </li></ul><ul><li>Number of times the subject expresses frustration </li></ul><ul><li>… </li></ul>
    18. 18. OBJECTIVE EVALUATION: TEST OF USABILITY Usability testing of Nielsen (1993) <ul><li>Effectiveness : verify that if the objectives set by users are achieved </li></ul><ul><li>Efficiency : Evaluation of resources used to achieve this objective (ex: time to complete a task) </li></ul><ul><li>Satisfaction : quantification the level of user satisfaction </li></ul><ul><li>Effectiveness : OK if 90% of users pass the test </li></ul><ul><li>Efficiency : OK if 90% of users take less than 3 minutes to accomplish a task </li></ul><ul><li>Satisfaction : OK if less than 10% of users expressed a problem of the function </li></ul>Norm ISO 9241-11
    19. 19. OBJECTIVE EVALUATION: TEST OF USABILITY Example norm ISO 9241-11 Reliability adequateness of the task <ul><li>% of goals achieved </li></ul><ul><li>Time to complete the task </li></ul>Adaptation to training user <ul><li>Number of advance features used </li></ul><ul><li>Relative efficiency with an Expert </li></ul>Learnability <ul><li>% of functions learned after practice </li></ul><ul><li>Time to learn a function </li></ul>Robustness tolerant to errors <ul><li>% of corrected errors </li></ul><ul><li>Time wasting on error recovery </li></ul>
    20. 20. OBJECTIVE EVALUATION: TEST OF ACCEPTATION <ul><li>Principles </li></ul><ul><li>The same principle as the usability test, but we fixe the metrics with intervals of expected success (acceptability) </li></ul><ul><li>Utilization more frequent with the final system: requirement specification </li></ul><ul><li>Metrics example </li></ul><ul><li>Time (or number of sessions) used to learn a specific function </li></ul><ul><li>Execute time of a task </li></ul><ul><li>Error rate while realizing a task </li></ul><ul><li>Proportion of subjects having pass with success during a given time </li></ul><ul><li>Retention time of a learned command </li></ul><ul><li>The result of the subjective evaluation </li></ul><ul><li>… </li></ul>After 5 hours of utilization by the novices and 15 days of waiting (learning), 50% of the population of the test must be capable to accomplish 75% of tasks of the test correctly
    21. 21. OBJECTIVE EVALUATION: TEST OF ACCEPTATION <ul><li>Define the tasks for the test </li></ul><ul><li>Giving a list of tasks to be executed in a subject at the beginning of the experimentation </li></ul><ul><li>Well chosen the proposed tasks base on what we want to evaluate </li></ul><ul><li>The task take user to concentrate on the parts of interface where the evaluation is holding </li></ul><ul><li>Well calculate the dimension of time for each task (objective come from requirement analyze, compare with other existing software, …) </li></ul><ul><li>Estimate the necessary time in average and define an average proportion of exceed acceptability (cf. metrics): variety of inter-individual </li></ul><ul><li>Ensure that the statement of task is clearly enough for the comprehension of a novice or a primo-users </li></ul>
    22. 22. EVALUATION PLAN An evaluation will not provide any result unless if it is well prepared <ul><li>[Basili et al. 1994] </li></ul><ul><li>What are the general objectives of the evaluation? </li></ul><ul><li>What are the specific questions for which we want to obtain an answer? </li></ul><ul><li>What is the paradigm and techniques of test which are necessary to achieve these objectives? </li></ul><ul><li>How to organize in practice an evaluation: users recruitment, users preparation, collection tools/devices, … </li></ul><ul><li>Ensure the respect of deontology rules in vigor </li></ul><ul><li>How to examine, interpret and present collected information? </li></ul>
    23. 23. EVALUATION PLAN What paradigm of evaluation to use? Objective (observation) Problems detections Broad range Modify behavior Subjective Less expensive Usage opinion Precision Response rate Predictive (model) Non necessary system Less expensive Limit range Predictive (expert) Expertise May miss out the problems
    24. 24. EVALUATION PLAN When do we use a particular paradigm of evaluation? Field studies Predicative Laboratory usability Quick and dirty
    25. 25. EVALUATION AND DIVERSITY OF USERS <ul><li>Sampling the population </li></ul><ul><li>Important for both objective and subjective evaluation </li></ul><ul><li>Characterize the communities of intended users </li></ul><ul><li>Sampling the population following different criteria by responding to this characterization (men/women, expert/novice, familiar with computer usage, age, socio-professional category, …) </li></ul><ul><li>Sampling size: 5, 12, 20, 100? (Dumas & Reddish, 1999) </li></ul><ul><li>Remark : experimental studies / evaluation “quick and dirty” </li></ul><ul><li>Analyze the tests </li></ul><ul><li>Multi-criteria analyze: distributes the results follow the different characteristics </li></ul><ul><li>Statistic pertinent of results </li></ul><ul><li>Separate discipline : statistic (protocols and tests) </li></ul>
    26. 26. EVALUATION AND DIVERSITY OF USERS Analyze the statistic of result 1 2 3 4 5 6 7 Average Age 37 41 43 54 46 44 21 40.9 Sex F F M M F F M 4F, 3M Education level 4 2 4 4 4 1 2 3.0 PC years 5 2 0 2 6 4 9 4.0 Usage facility 1 2 2 1 2 3 1 1.7 Help quality 1 3 3 1 3 2 2 2.1
    27. 27. EVALUATION AND DIVERSITY OF USERS <ul><li>Example: measure the quality of an interface </li></ul><ul><li>Learning and learning persistent time </li></ul><ul><li>Rapidity of execution of a task (benchmark) </li></ul><ul><li>Errors rate and types </li></ul><ul><li>Satisfaction (subjective) of user </li></ul><ul><li>HCI designing = includes different factors </li></ul><ul><li>Experts : rapidity of execution is prime to learning time </li></ul><ul><li>Novices : learning time and errors rate reduction is prime to rapidity of execution </li></ul><ul><li>Critical system : reducing the errors is the most important </li></ul><ul><li>Industrial system : learning and execution cost … </li></ul><ul><li>… </li></ul>
    28. 28. EVALUATION: OTHER THAN USERS <ul><li>User is not everything … and often are not buyer </li></ul><ul><li>Typology of interest in choosing a software </li></ul><ul><li>But user is still alpha and omega! </li></ul><ul><li>[SESL: Ramage, 1997] </li></ul><ul><li>User of the software </li></ul><ul><li>Their colleagues and superiors (managers) </li></ul><ul><li>Developer and software reseller </li></ul><ul><li>Computer/information service of an organization (if necessary) </li></ul><ul><li>The clients of the organization </li></ul><ul><li>The syndicates and association of employee </li></ul><ul><li>The parent/main company </li></ul><ul><li>Association of employee </li></ul><ul><li>The shareholder </li></ul><ul><li>The government </li></ul>
    29. 29. EVALUATION PLAN: DEONTOLOGY <ul><li>Consent : acceptation form </li></ul><ul><li>Problem : evaluation on WWW </li></ul><ul><li>Before the session! Explain the subjects: </li></ul><ul><li>What is the objective of the evaluation and what do we want from the subject </li></ul><ul><li>What are the personal information which will be demanded and subjected: promise anonymity </li></ul><ul><li>If it can stop whenever he wants during the session </li></ul><ul><li>What are the financial reason for the evaluation (if the subject is remunerated or not) if there are </li></ul><ul><li>Ensure at the end (and only in this moment) the agreement by letting user to sign a consent form </li></ul>
    30. 30. BIBLIOGRAPHIES <ul><li>References </li></ul><ul><li>Nielsen J. (1993) Usability enginerring. Academic Press. </li></ul><ul><li>Publications </li></ul><ul><li>Chin J., Diehl V., Norman K. (1988) Development of an instrument measuring user satisfaction of the human-computer interface. Actes ACM CHI’88 Human Factors in Computing Systems. 213-218. </li></ul><ul><li>Dumas J., Redish J. (1999) A practical guide to usability testing. Intellect, Exeter, UK. </li></ul><ul><li>Lewis J. (1995) IBM computer usability satisfaction questionnaires : psychometric evaluation and instructions for use. International Journal of Human-Computer Interaction, 7 (1), 57-78. </li></ul><ul><li>Kirakowski J., Corbett M. (1993) SUMI : the Software Usability Measurement Inventory. British Journal of Educational Technology, 24(3), 210-212. </li></ul><ul><li>Nielsen J., Mack R., Bergendorf K., Grischkomswy N. (1986) Integrated software usage in the professional work environment : evidence from questionnaires and interviews. Actes CHI’86, New-York, NJ., ACM Press. 162-167. </li></ul><ul><li>Nielsen J. and Mach R. (Eds.) (1994) Usability inspection methods. John Wiley & Sons., New-York, NJ. </li></ul><ul><li>Whiteside J., Bennet J., Holtzblatt K. (1988) Usability engineering: our experience and evolution. In Helander M. (Ed.) Handbook of Human-Computer Interaction. North- Holland, Amsterdam. </li></ul>