Thesis

1 1 ASSESSING INTERACTIVE SYSTEM EFFECTIVENESS WITH USABILITY DESIGN HEURISTICS AND MARKOV MODELS OF USER BEHAVIOR Presented by: Lashanda Lee

2 Motivation 2 For HCI to be successful, interfaces must be designed to: Effectively translate intentions, actions and inputs of operator for computer Effectively translate machine outputs for human comprehension HCI frameworks available can aid in evaluating interface designs and generating subjective data. Quantitative objective data is also needed as basis for cost justification and determination of ROI Modeling human behavior may reduce the need for experimentation : Time Expense Combination of data from OR techniques with subjective data can be used to generate score for overall system effectiveness Allows for comparison of alternative interface designs.

3 Literature ReviewHCI frameworks 3 Norman’s model of HCI Two stages Does not focus on continuous cycle of communication Pipe-line model Inputs and outputs of system operate in parallel Complex model with many states Does not show cognitive process of human Explains the computer processing Dix et al model Focuses on distances between both user and system Focuses on continuous cycle of communication Chosen as basis for evaluation in present research

4 Literature ReviewUsability paradigms and principles 4 Paradigms: how humans interact with computers ubiquitous computing, intelligent systems, virtual reality and WIMP Principles: how paradigms work Flexibility, consistency, robustness, recoverability and learnability Each paradigm focused on different usability principles. Specific usability measures can be used to assess certain paradigms Paradigms address figurative distance of articulation in Dix’s model in different ways Examples: Intelligent interfaces using NLP: Greatest reduction in articulation distance for users but furthest from system language Command line interface Farthest from user in Dix framework but easy for the computer to understand WIMP Easy for both the system and user to interpret and equal in distance between the user and the system in the Dix interaction framework

5 Literature ReviewMeasures of usabilityQualitative measures 5 Low cost but low discovery Comparisons of designs based on interface qualities Data hard to analyze May not lead to design changes because management considers data unreliable Subjective data Inspection methods Low cost and quick discovery of problems using low skill evaluators Often fail to find many serious problems and do not provide enough evidence to create design recommendations Types include: Heuristic methods, guidelines, style and rule inspections Verbal reports Hard to find an appropriate way to use the data Gain insight into cognition Surveys Inexpensive and helps find trouble spots Some information lost due to STM capabilities

6 Literature ReviewMeasures of usabilityQuantitative measures 6 Used to make comparisons of designs based on quantities associated with certain interface features Useful in presenting information to management Goals may be too ambitious or there are too many goals Cannot cover entire systems Subjective responses Rankings, ratings or fuzzy set ratings Considered quantitative because they involve manipulation and analysis of data as a basis for comparing interface alternatives. Objective responses Measures of effectiveness: binary task completion, number of correct tasks completed and task performance accuracy Measures of efficiency: task completion time, time in mode, usage patterns and degree of variation from an optimal solution Fuzzy sets User modeling Count of concrete occurrences and not based on the opinions of users

7 Literature ReviewQuantitative objective measuresFuzzy sets and user modeling 7 Fuzzy sets Used to compare interface alternatives Aggregate score produced based on count of interface inadequacies Fuzzy sets logic used to determine membership for aggregate score Method uses both subjective and objective measures Requires multiple cycles of user testing to compare scores Doesn’t use variable weights for dimensions considering them all equal User Modeling Used to predict interface action sequences based on prior use data. Limited in revealing actual human performance, not exact Can be used to help guide users while performing task with an interface Activist GOMS Estimates task performance times Produces accurate predictions of user actions Takes a long time to create Benefits include: Model one or more types of users Analyze without additional user testing

8 Literature ReviewUsability measuresSummary 8 Qualitative Used iteratively Low discovery Hard to analyze Usually does not effect change in a display because management considers data unreliable Quantitative Appear to be better for detailed usability problem analysis and design recommendations User modeling can decrease cost Necessary to gain management support Combine an objective quantitative user modeling approach with subjective usability measures may provide: An approach effective in finding problems Basis for interface redesign

9 Literature ReviewOperations Research methods of usability evaluation 9 Use of techniques such as mathematical modeling to analyze complex situations Used to optimize systems Limited use in usability evaluation or interface improvements Methods used: Markov models Stochastic processes Used for website customization Predict user behavior Research by: Kitajima et al., Thimbleby et al., Jenamani et al. Probabilistic finite state models Include time distributions and transitional probabilities Generate user behavior predictions Research by: Sholl et al. Critical path models Algorithm determines longest time Can also incorporate stochastic process predictions Research by: Baber and Mellor (2001)

10 Literature ReviewOperations Research methods of usability evaluationMarkov models: Kitajima et al and Thimbleby et al. 10 Kitajima et al. Markov models used to predict user behavior Determine number of clicks to find relevant articles After interface improvements, used model to predict number of clicks Number of clicks was reduced Used equation u(i) = 1+Σ Piku(k) Thimbleby et al. Applied Markov chains to several applications: microwave oven and cell phone Used Markov chains to predict number of steps Used Mathematica simulation of microwave to gather information Used a mixture of perfect error-free matrix: Used knowledge factors from 0 to 1, (1 was a fully knowledgeable user) Simulated user behavior Original design took 120 steps (for random user) Improved design took fewer steps (for the random user) Fewer steps considered “easier”

11 Literature ReviewOperations Research methods of usability evaluationSummary 11 Appears to be a viable and useful approach to evaluate interface usability Provides objective quantitative data without need for several iterations of testing Used repeatedly to predict behavior, such as number of clicks and task times Accurately predicts user behavior

12 Summary and Problem Statement 12 Need to use framework describing communication between humans and computers to guide design improvements (Dix et al. was chosen for its simplicity and cyclic structure.) Usability paradigms help identify types of technology that can be used to improve systems and provide direction in how to evaluate systems. WIMP paradigm chosen for its simplicity accommodation of user and system Many subjective measures but not adequate for assessing performance and supporting design changes Objective, quantitative measures often gain the support of management for design changes but are expensive OR methods: Markov models accurately predict human behavior Need to define approach to using both types of measures to evaluate usability and require minimal user testing. Combined use of Dix et al. model subjective system evaluations and OR modeling techniques to predict user behavior of interface Both methods used to produce overall system effectiveness score to compare alternative designs.

13 MethodOverview of system effectiveness score 13 Dix et al. framework Survey for designers- capture the perceptions of importance of each link in HCI framework Survey for user with Markov model prediction of average number of interface actions (clicks) -users rated interfaces with respect to links in the framework Novelty is measure reflects designer’s intent for application and user’s perception of the system Designer weights and user ratings are multiplied and summed across links Weighted sum is divided by Markov model prediction of average number of clicks Score represents perceived usability per action

14 MethodWeighting factor determination 14 Designers expected to be most concerned with cognitive load. Four designers surveyed using the Dix et al. framework: Based on paradigm for application (WIMP), how important is each link to system effectiveness Pair-wise comparisons of links Values ranged between 0 and 0.5 Weighting factors averaged across designers to determine weight for each dimension Weights were used in calculating overall system subjective score (designer’s rankings x user ratings)

15 MethodExperimental task 15 Used a version of Lenovo.com prototype to find and order ThinkPad R60 Twenty participants: 11 males, 9 females Age range: 17-25 Half participants used old version of Lenovo.com website: Required 11 clicks to buy (optimal path) Tabs that separated the features information and the ability to purchase Half of the participants used a new prototype: Required 9 clicks to buy (optimal path) All information about type of computer contained on 1 page Multi-level navigation structure More salient buttons

16 MethodDeveloping Markov Chain models 16 JavaScript recorded user actions Old online ordering system used to identify states: Links, Tabs, Menu options (Radio buttons and popups not included) Used action sequences to create transitional probability matrices Based on actual number of users going from state i to state k. Assumptions of Markov model include: Sum of each row must equal 1 Probability of next interface state only depends on current state To determine average number of clicks to task completion, used Kitajima et al. (2005) u(i) = 1+Σ Piku(k) Need state probability matrix based on action sequences Need average number of steps from one state to another (based on designer analysis)

17 MethodRating system effectiveness (based on Dix framework) 17 Used Dix et al. framework End users rated links On a scale from 1 to 10 Presented framework at end of the task Determined average ratings for each link and used in overall system effectiveness score

18 MethodOverall system effectiveness score and Markov model validation 18 Overall score Used to compare alternative interface design Average designer weights for each dimension Average rating by end users Product of two is partial score Partial score divided by predicted average number of clicks is overall score Highest ratio considered to indicate higher overall system effectiveness Validation T-test used to determine if actual observed number of clicks was significantly different from number of clicks with Markov model. SystemEffectiveness:

19 ResultsAssessment of Markov model assumption 19 Transition from one state must only be dependent on the current state Durbin-Watson test used to assess autocorrelation among user steps in interaction Test statistics were:1.2879 (old) and 2.0815 (new) Normalization procedure applied to original transitional probability matrices. Durbin-Watson test conducted on normalized data Test statistics were: 1.3920 (old) and 2.27 (new) Test revealed mixed evidence Model was accepted and applied to predict average number of clicks

20 ResultsComputation of average number of steps 20 The average number of steps it takes to get from any one state to the other Represents individual u(k) in the Kitajima et al. equation Matrix created by designers of the interface

21 ResultsComputation of average number of clicks 21 Use u(i) = 1+Σ Piku(k) Consider paths to absorbent state to determine average number of clicks Markov model predicted number of clicks for each interface: 11.5 for old (actual 12.9) 9 for new (actual 9.2) T-test used to compare the difference between actual clicks across interfaces T-value: -4.30 with p-value: 0.0004 Actual number of clicks different across interfaces - new was significantly less T-test used to compare actual click count to predicted click count for all subjects: P-value: 0.439 for new P-value: 0.0605 for old No significant difference between actual and predicted on either interface T-tests used to compare predicted clicks across interfaces: P-value: 0.0033 New interface reduced number of clicks

22 ResultsPartial system effectiveness score 22 Each participant rated interfaces on each dimension using scale of 1 to 10 Designers completed pair-wise comparisons Designers expected to rate articulation and observation higher T-test used to compare designer ratings of articulation and observation with performance and presentation Rated articulation and observation higher Average designer weights were multiplied by average user ratings T-test used to compare partial score of new against old for all subjects T-value: 5.08; p-value: < .0001 Partial score for new interface is significantly higher

23 ResultsOverall system effectiveness score 23 Partial score was divided by predicted average number of clicks to yield perceived usability per click New: 0.939 Old: 0.475 T-test used to compare overall score for new and old interfaces for all subjects T-value: 5.62; p-value: < .0001 Overall system effectiveness score for new was significantly higher than old

24 ResultsReducing experimentation 24 Purpose of Markov model was to predict number of clicks and to reduce need of additional user testing. Designers can speculate an average number of steps to transition among state in the new interface and multiply by probabilities determined for original interface (through user testing) Predicted number of clicks for new interface was 9.35 (actual 9.2) T-test used to compare if actual number of clicks was different then the predicted number of clicks T-value: 1.15; p-value: 0.270 Markov model was accurate in predicting the average number of clicks In order to obtain user ratings, focus groups would be necessary Approach significantly reduces time and money necessary for user testing

25 DiscussionDesigner ratings 25 Hypothesis: Average designer weighting factors for articulation and observation will be higher than performance and presentation Designers were concerned with cognitive load, as represented by articulation and observation If customer cannot find what (s)he is looking for, may lead to: Frustration Lost customers Lost revenue Designers realize that effectively reducing cognitive load is important

26 DiscussionImproved usability 26 Hypothesis: New interface will improve perceived usability Multi-level navigation was used to reduce cognitive load: Easier to find and view all options Users could reach many state with 1 click Identified by users of new interface as one of the most usable features More prominent buttons: Aided in easily identifying next steps In original interface, users had difficult time finding customize button Often scrolled up and down page or backtracked to determine what to do next Partial system effectiveness score was higher for new interface (8.6) than the old (5.2)

27 DiscussionHigher system effectiveness score 27 Hypothesis: New interface will produce higher score because of perceived higher usability Old interface degraded performance: From features tab, some found it difficult to identify what to do next Once users found product tab, some scrolled up and down trying to determine what to do next (new interface alleviated both these problems -all information on 1 page) Higher perceived usability and fewer clicks led to higher ratio

28 DiscussionMarkov model accurately predicted average number of clicks 28 Hypothesis: Markov model will accurately predict average number of clicks used equation detailed by Kitajima Because Markov models are used to represent stochastic behavior they proved valid in present work Model revealed the variability among participants but do not show exact magnitude of the error

29 Conclusion 29 Objective was to create new measure of usability Based on: Few quantitative objective measures Many subjective measures insufficient to justify design changes Research supports subjective measure using Dix et al. framework and an objective measure, based on Markov models Method is: effective in objectively selecting among alternative designs and reducing the amount of experimentation necessary Easy to implement Can be used with several alternatives without the need for testing Cannot apply to interfaces where selection of next state depends on previous states and not only current state Future research: Use Markov models to predict next steps, user will take and make relevant interface options more salient to improve usability Find a way to incorporate time-on-task in overall effectiveness score: Perceived time-on-task will impact customer retention Research a method to accurately predict

Thesis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Thesis

Similar to Thesis (20)

Thesis