1.
1<br />1<br />ASSESSING INTERACTIVE SYSTEM EFFECTIVENESS WITH USABILITY DESIGN HEURISTICS AND MARKOV MODELS OF USER BEHAVIOR <br />Presented by: Lashanda Lee<br />
2.
2<br />Motivation<br />2<br />For HCI to be successful, interfaces must be designed to:<br />Effectively translate intentions, actions and inputs of operator for computer <br />Effectively translate machine outputs for human comprehension<br />HCI frameworks available can aid in evaluating interface designs and generating subjective data.<br />Quantitative objective data is also needed as basis for cost justification and determination of ROI<br />Modeling human behavior may reduce the need for experimentation :<br />Time<br />Expense<br />Combination of data from OR techniques with subjective data can be used to generate score for overall system effectiveness <br />Allows for comparison of alternative interface designs.<br />
3.
3<br />Literature ReviewHCI frameworks<br />3<br />Norman’s model of HCI<br />Two stages<br />Does not focus on continuous cycle of communication <br />Pipe-line model<br />Inputs and outputs of system operate in parallel<br />Complex model with many states<br />Does not show cognitive process of human<br />Explains the computer processing<br />Dix et al model<br />Focuses on distances between both user and system <br />Focuses on continuous cycle of communication <br />Chosen as basis for evaluation in present research<br />
4.
4<br />Literature ReviewUsability paradigms and principles<br />4<br />Paradigms: how humans interact with computers <br />ubiquitous computing, intelligent systems, virtual reality and WIMP <br />Principles: how paradigms work <br />Flexibility, consistency, robustness, recoverability and learnability<br />Each paradigm focused on different usability principles. <br />Specific usability measures can be used to assess certain paradigms <br />Paradigms address figurative distance of articulation in Dix’s model in different ways <br />Examples:<br />Intelligent interfaces using NLP:<br />Greatest reduction in articulation distance for users but furthest from system language<br />Command line interface<br />Farthest from user in Dix framework but easy for the computer to understand<br />WIMP<br />Easy for both the system and user to interpret and equal in distance between the user and the system in the Dix interaction framework <br />
5.
5<br />Literature ReviewMeasures of usabilityQualitative measures<br />5<br />Low cost but low discovery<br />Comparisons of designs based on interface qualities<br />Data hard to analyze<br />May not lead to design changes because management considers data unreliable<br />Subjective data<br />Inspection methods<br />Low cost and quick discovery of problems using low skill evaluators<br />Often fail to find many serious problems and do not provide enough evidence to create design recommendations<br />Types include:<br />Heuristic methods, guidelines, style and rule inspections<br />Verbal reports<br />Hard to find an appropriate way to use the data <br />Gain insight into cognition<br />Surveys<br />Inexpensive and helps find trouble spots<br />Some information lost due to STM capabilities<br />
6.
6<br />Literature ReviewMeasures of usabilityQuantitative measures<br />6<br />Used to make comparisons of designs based on quantities associated with certain interface features <br />Useful in presenting information to management<br />Goals may be too ambitious or there are too many goals <br />Cannot cover entire systems <br />Subjective responses<br />Rankings, ratings or fuzzy set ratings<br />Considered quantitative because they involve manipulation and analysis of data as a basis for comparing interface alternatives. <br />Objective responses<br />Measures of effectiveness: binary task completion, number of correct tasks completed and task performance accuracy<br />Measures of efficiency: task completion time, time in mode, usage patterns and degree of variation from an optimal solution<br />Fuzzy sets<br />User modeling<br />Count of concrete occurrences and not based on the opinions of users <br />
7.
7<br />Literature ReviewQuantitative objective measuresFuzzy sets and user modeling<br />7<br />Fuzzy sets<br />Used to compare interface alternatives<br />Aggregate score produced based on count of interface inadequacies <br />Fuzzy sets logic used to determine membership for aggregate score<br />Method uses both subjective and objective measures<br />Requires multiple cycles of user testing to compare scores<br />Doesn’t use variable weights for dimensions considering them all equal<br />User Modeling<br />Used to predict interface action sequences based on prior use data.<br />Limited in revealing actual human performance, not exact<br />Can be used to help guide users while performing task with an interface<br />Activist<br />GOMS<br />Estimates task performance times <br />Produces accurate predictions of user actions<br />Takes a long time to create<br />Benefits include:<br />Model one or more types of users<br />Analyze without additional user testing<br />
8.
8<br />Literature ReviewUsability measuresSummary<br />8<br />Qualitative<br />Used iteratively<br />Low discovery<br />Hard to analyze<br />Usually does not effect change in a display because management considers data unreliable<br />Quantitative<br />Appear to be better for detailed usability problem analysis and design recommendations<br />User modeling can decrease cost<br />Necessary to gain management support<br />Combine an objective quantitative user modeling approach with subjective usability measures may provide:<br />An approach effective in finding problems<br />Basis for interface redesign<br />
9.
9<br />Literature ReviewOperations Research methods of usability evaluation<br />9<br />Use of techniques such as mathematical modeling to analyze complex situations<br />Used to optimize systems<br />Limited use in usability evaluation or interface improvements<br />Methods used:<br />Markov models<br />Stochastic processes<br />Used for website customization<br />Predict user behavior<br />Research by: Kitajima et al., Thimbleby et al., Jenamani et al.<br />Probabilistic finite state models<br />Include time distributions and transitional probabilities<br />Generate user behavior predictions<br />Research by: Sholl et al.<br />Critical path models<br />Algorithm determines longest time<br />Can also incorporate stochastic process predictions<br />Research by: Baber and Mellor (2001)<br />
10.
10<br />Literature ReviewOperations Research methods of usability evaluationMarkov models: Kitajima et al and Thimbleby et al.<br />10<br />Kitajima et al.<br />Markov models used to predict user behavior<br />Determine number of clicks to find relevant articles <br />After interface improvements, used model to predict number of clicks <br />Number of clicks was reduced<br />Used equation u(i) = 1+Σ Piku(k)<br />Thimbleby et al.<br />Applied Markov chains to several applications: microwave oven and cell phone<br />Used Markov chains to predict number of steps<br />Used Mathematica simulation of microwave to gather information<br />Used a mixture of perfect error-free matrix:<br />Used knowledge factors from 0 to 1, (1 was a fully knowledgeable user)<br />Simulated user behavior<br />Original design took 120 steps (for random user)<br />Improved design took fewer steps (for the random user)<br />Fewer steps considered “easier”<br />
11.
11<br />Literature ReviewOperations Research methods of usability evaluationSummary<br />11<br />Appears to be a viable and useful approach to evaluate interface usability<br />Provides objective quantitative data without need for several iterations of testing<br />Used repeatedly to predict behavior, such as number of clicks and task times <br />Accurately predicts user behavior<br />
12.
12<br />Summary and Problem Statement<br />12<br />Need to use framework describing communication between humans and computers to guide design improvements (Dix et al. was chosen for its simplicity and cyclic structure.)<br />Usability paradigms help identify types of technology that can be used to improve systems and provide direction in how to evaluate systems.<br />WIMP paradigm chosen for its simplicity accommodation of user and system <br />Many subjective measures but not adequate for assessing performance and supporting design changes<br />Objective, quantitative measures often gain the support of management for design changes but are expensive <br />OR methods: Markov models accurately predict human behavior<br />Need to define approach to using both types of measures to evaluate usability and require minimal user testing.<br />Combined use of Dix et al. model subjective system evaluations and OR modeling techniques to predict user behavior of interface<br />Both methods used to produce overall system effectiveness score to compare alternative designs.<br />
13.
13<br />MethodOverview of system effectiveness score<br />13<br />Dix et al. framework<br />Survey for designers- capture the perceptions of importance of each link in HCI framework<br />Survey for user with Markov model prediction of average number of interface actions (clicks) -users rated interfaces with respect to links in the framework<br />Novelty is measure reflects designer’s intent for application and user’s perception of the system<br />Designer weights and user ratings are multiplied and summed across links<br />Weighted sum is divided by Markov model prediction of average number of clicks <br />Score represents perceived usability per action<br />
14.
14<br />MethodWeighting factor determination<br />14<br />Designers expected to be most concerned with cognitive load.<br />Four designers surveyed using the Dix et al. framework:<br />Based on paradigm for application (WIMP), how important is each link to system effectiveness<br />Pair-wise comparisons of links<br />Values ranged between 0 and 0.5<br />Weighting factors averaged across designers to determine weight for each dimension<br />Weights were used in calculating overall system subjective score (designer’s rankings x user ratings)<br />
15.
15<br />MethodExperimental task<br />15<br />Used a version of Lenovo.com prototype to find and order ThinkPad R60<br />Twenty participants:<br />11 males, 9 females<br />Age range: 17-25<br />Half participants used old version of Lenovo.com website:<br />Required 11 clicks to buy (optimal path)<br />Tabs that separated the features information and the ability to purchase<br />Half of the participants used a new prototype:<br />Required 9 clicks to buy (optimal path)<br />All information about type of computer contained on 1 page<br />Multi-level navigation structure<br />More salient buttons <br />
16.
16<br />MethodDeveloping Markov Chain models<br />16<br />JavaScript recorded user actions<br />Old online ordering system used to identify states: Links, Tabs, Menu options (Radio buttons and popups not included)<br />Used action sequences to create transitional probability matrices<br />Based on actual number of users going from state i to state k.<br />Assumptions of Markov model include:<br />Sum of each row must equal 1<br />Probability of next interface state only depends on current state<br />To determine average number of clicks to task completion, used Kitajima et al. (2005)<br />u(i) = 1+Σ Piku(k)<br />Need state probability matrix based on action sequences<br />Need average number of steps from one state to another (based on designer analysis)<br />
17.
17<br />MethodRating system effectiveness (based on Dix framework)<br />17<br />Used Dix et al. framework<br />End users rated links<br />On a scale from 1 to 10<br />Presented framework at end of the task <br />Determined average ratings for each link and used in overall system effectiveness score<br />
18.
18<br />MethodOverall system effectiveness score and Markov model validation<br />18<br />Overall score<br />Used to compare alternative interface design<br />Average designer weights for each dimension<br />Average rating by end users<br />Product of two is partial score<br />Partial score divided by predicted average number of clicks is overall score<br />Highest ratio considered to indicate higher overall system effectiveness<br />Validation<br />T-test used to determine if actual observed number of clicks was significantly different from number of clicks with Markov model.<br />SystemEffectiveness:<br />
19.
19<br />ResultsAssessment of Markov model assumption <br />19<br />Transition from one state must only be dependent on the current state<br />Durbin-Watson test used to assess autocorrelation among user steps in interaction <br />Test statistics were:1.2879 (old) and 2.0815 (new) <br />Normalization procedure applied to original transitional probability matrices.<br />Durbin-Watson test conducted on normalized data<br />Test statistics were: 1.3920 (old) and 2.27 (new)<br />Test revealed mixed evidence<br />Model was accepted and applied to predict average number of clicks<br />
20.
20<br />ResultsComputation of average number of steps <br />20<br />The average number of steps it takes to get from any one state to the other<br />Represents individual u(k) in the Kitajima et al. equation<br />Matrix created by designers of the interface<br />
21.
21<br />ResultsComputation of average number of clicks <br />21<br />Use u(i) = 1+Σ Piku(k)<br />Consider paths to absorbent state to determine average number of clicks<br />Markov model predicted number of clicks for each interface:<br />11.5 for old (actual 12.9)<br />9 for new (actual 9.2)<br />T-test used to compare the difference between actual clicks across interfaces<br />T-value: -4.30 with p-value: 0.0004<br />Actual number of clicks different across interfaces - new was significantly less<br />T-test used to compare actual click count to predicted click count for all subjects:<br />P-value: 0.439 for new<br />P-value: 0.0605 for old<br />No significant difference between actual and predicted on either interface<br />T-tests used to compare predicted clicks across interfaces: <br />P-value: 0.0033<br />New interface reduced number of clicks<br />
22.
22<br />ResultsPartial system effectiveness score<br />22<br />Each participant rated interfaces on each dimension using scale of 1 to 10<br />Designers completed pair-wise comparisons<br />Designers expected to rate articulation and observation higher<br />T-test used to compare designer ratings of articulation and observation with performance and presentation <br />Rated articulation and observation higher <br />Average designer weights were multiplied by average user ratings <br />T-test used to compare partial score of new against old for all subjects<br />T-value: 5.08; p-value: < .0001<br />Partial score for new interface is significantly higher<br />
23.
23<br />ResultsOverall system effectiveness score<br />23<br />Partial score was divided by predicted average number of clicks to yield perceived usability per click<br />New: 0.939<br />Old: 0.475<br />T-test used to compare overall score for new and old interfaces for all subjects<br />T-value: 5.62; p-value: < .0001<br />Overall system effectiveness score for new was significantly higher than old<br />
24.
24<br />ResultsReducing experimentation<br />24<br />Purpose of Markov model was to predict number of clicks and to reduce need of additional user testing.<br />Designers can speculate an average number of steps to transition among state in the new interface and multiply by probabilities determined for original interface (through user testing)<br />Predicted number of clicks for new interface was 9.35 (actual 9.2)<br />T-test used to compare if actual number of clicks was different then the predicted number of clicks<br />T-value: 1.15; p-value: 0.270<br />Markov model was accurate in predicting the average number of clicks<br />In order to obtain user ratings, focus groups would be necessary <br />Approach significantly reduces time and money necessary for user testing<br />
25.
25<br />DiscussionDesigner ratings<br />25<br />Hypothesis: Average designer weighting factors for articulation and observation will be higher than performance and presentation<br />Designers were concerned with cognitive load, as represented by articulation and observation<br />If customer cannot find what (s)he is looking for, may lead to:<br />Frustration<br />Lost customers<br />Lost revenue<br />Designers realize that effectively reducing cognitive load is important<br />
26.
26<br />DiscussionImproved usability<br />26<br />Hypothesis: New interface will improve perceived usability<br />Multi-level navigation was used to reduce cognitive load:<br />Easier to find and view all options<br />Users could reach many state with 1 click<br />Identified by users of new interface as one of the most usable features<br />More prominent buttons:<br />Aided in easily identifying next steps<br />In original interface, users had difficult time finding customize button<br />Often scrolled up and down page or backtracked to determine what to do next<br />Partial system effectiveness score was higher for new interface (8.6) than the old (5.2)<br />
27.
27<br />DiscussionHigher system effectiveness score<br />27<br />Hypothesis: New interface will produce higher score because of perceived higher usability<br />Old interface degraded performance:<br />From features tab, some found it difficult to identify what to do next<br />Once users found product tab, some scrolled up and down trying to determine what to do next (new interface alleviated both these problems -all information on 1 page)<br />Higher perceived usability and fewer clicks led to higher ratio<br />
28.
28<br />DiscussionMarkov model accurately predicted average number of clicks<br />28<br />Hypothesis: Markov model will accurately predict average number of clicks used equation detailed by Kitajima<br />Because Markov models are used to represent stochastic behavior they proved valid in present work<br />Model revealed the variability among participants but do not show exact magnitude of the error<br />
29.
29<br />Conclusion<br />29<br />Objective was to create new measure of usability <br />Based on:<br />Few quantitative objective measures <br />Many subjective measures insufficient to justify design changes<br />Research supports subjective measure using Dix et al. framework and an objective measure, based on Markov models<br />Method is:<br />effective in objectively selecting among alternative designs and reducing the amount of experimentation necessary<br />Easy to implement <br />Can be used with several alternatives without the need for testing<br />Cannot apply to interfaces where selection of next state depends on previous states and not only current state<br />Future research:<br />Use Markov models to predict next steps, user will take and make relevant interface options more salient to improve usability<br />Find a way to incorporate time-on-task in overall effectiveness score:<br />Perceived time-on-task will impact customer retention<br />Research a method to accurately predict<br />
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment