ICS3211 Lecture 9

ICS3211 - Intelligent
Interfaces II
Combining design with technology for effective human-
computer interaction
Week 9
Department of AI,
University of Malta,
20201

Testing & Evaluation
Week 9 overview:
• The What, Why and When of Evaluation & Testing
• Testing: expert review and lab testing
• Evaluation: formative/summative
• Evaluation: heuristic, cognitive walkthrough, usability testing
• Case study: evaluating different interfaces
2

Learning Outcomes
At the end of this session you should be able to:
• describe different forms of evaluation for different interfaces;
• compare and contrast the different evaluation methods with the
different contexts and identify the best one to use;
• list various rules for the heuristic evaluation (Schniederman &
Nielsen);
• list the various types of usability testing involved in evaluation;
• combine the various evaluation methods to come up with a
method that is most suitable to the project chosen.
3

Introduction
• Why evaluate?
• Designers become too entranced
• What I like
• Sunk cost fallacy
• Experienced designers know extensive testing is required
• How do you test?
• A web site?
• Air trafﬁc control system?
• When do you test?
4

What to Evaluate?
• What to evaluate may range from screen functions,
aesthetic designs to workﬂows;
• Users of an ambient display may want to know if it
changes people’s behaviour;
• Class Activity: What aspects would you want to
evaluate in a VR system designed to change users’
behaviour (you can choose which behaviour you
would want to see modiﬁed). Log in to Moodle VLE.
5

• Evaluation stages depends on product being
designed;
• Formative evaluations - evaluation done to check a
product continues to meet users’ needs
• Summative evaluations - evaluation done to assess
the success of a product
6
Ways of Categorising
Evaluation

Evaluation Categories
• Cognitive Psychological Approaches
• Social Psychology Methods - Interviews and
Questionnaires
• Social Science Methods
• Engineering Approaches
7

Expert Review
• Colleagues or Customers
• Ask for opinions
• Considerations:
• What is an expert? User or designer?
• Half day to week
8

Formal Usability Inspection
• Experts hold courtroom-style meeting
• Each side gives arguments (in an adversarial
format)
• There is a judge or moderator
• Extensive and expensive
• Good for novice designers and managers
9

Expert Reviews
• Can be conducted at any time in the design process
• Focus on being comprehensive rather than being speciﬁc on
improvements
• Example review recommendations
• Change log in procedure (from 3 to 5 minutes, because users
were busy)
• Reordering sequence of displays, removing nonessential
actions, providing feedback.
• Also come up with features for future releases
10

Expert Review
• Placed in situation similar to user
• Take training courses
• Read documentation
• Take tutorials
• Try the interface in a realistic work environment (complete with noise and
distractions)
• Bird’s eye view
• Studying a full set of printed screens laid on the ﬂoor or pinned to the walls
• See topics such as consistency
11

Heuristic Evaluation
• Give Expert heuristic, ask them to evaluate
• Shneiderman's "Eight Golden Rules of Interface
Design"
• Nielsen’s Heuristics
12

Shneiderman's "Eight Golden
Rules of Interface Design
• Strive for consistency
• Enable frequent users to use shortcuts
• Offer informative feedback
• Design dialog to yield closure
• Offer simple error handling
• Permit easy reversal of actions
• Support internal locus of control
• Reduce short-term memory load
13

Nielsen’s Heuristics
• Visibility of system status
• Match between system and the real world
• User control and freedom
• Consistency and standards
• Error prevention
• Recognition rather than recall
• Flexibility and efﬁciency of use
• Aesthetic and minimalist design
• Help users recognize, diagnose, and recover from errors
• Help and documentation
14

Consistency Inspection
• Verify consistency across family of interfaces
• Check terminology, fonts, color, layout, i/o formats
• Look at documentation and online help
• Also can be used in conjunction with software tools
15

Cognitive Walkthrough
• Experts “simulate” being users going through the interface
• Tasks are ordered by frequency
• Good for interfaces that can be learned by “exploratory
browsing”
• Usually walkthrough by themselves, then report their
experiences (written, video) to designers meeting
• Useful if application is geared for group the designers might not
be familiar with:
• Military, Assistive Technologies
16

Metaphors of human Thinking
(MOT)
• Experts consider metaphors for ﬁve aspects of
human thinking
• Habit
• Stream of thought
• Awareness and Associations
• Relation between utterances and thought
• Knowing
• Appears better than cognitive walkthrough and
heuristic evaluation
17

Types of Evaluation
• Controlled settings involving users
• usability testing
• living labs
• Natural settings involving users
• ﬁeld studies
• Any settings not involving users
18

Usability Testing and Labs
• 1980s, testing was luxury (but deadlines crept up)
• Usability testing was incentive for deadlines
• Fewer project overlays
• Sped up projects
• Cost savings
• Labs are different than academia
• Less general theory
• More practical studies
19

Staff
• Expertise in testing (psych, hci, comp sci)
• 10 to 15 projects per year
• Meet with UI architect to plan testing (Figure 4.2)
• Participate in early task analysis and design reviews
• T – 2-6 weeks, creates study design and test plan
• E.g. Who are participants? Beta testers, current customers,
in company staff, advertising
• T -1 week, pilot test (1-3 participants)
20

Participants
• Labs categorize users based on:
• Computing background
• Experience with task
• Motivation
• Education
• Ability with the language used in the interface
• Controls for
• Physical concerns (e.g. eyesight, handedness, age)
• Experimental conditions (e.g. time of day, physical surroundings, noise,
temperature, distractions)
21

Recording Participants
• Logging is important, yet tedious
• Software to help
• Powerful to see people use your interface
• New approaches: eye tracking
• IRB items
• Focus users on interface
• Tell them the task, duration
22

Thinking Aloud
• Concurrent think aloud
• Invite users to think aloud
• Nothing they say is wrong
• Don’t interrupt, let the user talk
• Spontaneous, encourages positive suggestions
• Can be done in teams of participants
• Retrospective think aloud
• Asks people afterwards what they were thinking
• Issues with accuracy
• Does not interrupt users (timings are more accurate)
23

Types of Usability Testing
• Paper mockups and prototyping
• Inexpensive, rapid, very
productive
• Low ﬁdelity is sometimes
better
24
http://expressionflow.com/wp-content/uploads/2007/05/paper-mock-up.png
http://user.meduni-graz.at/andreas.holzinger/holzinger/papers%20en/

• Discount usability testing
• Test early and often (with 3 to 6 testers)
• Pros: Most serious problems can be found with 6 testers. Good for
formative evaluation (early)
• Cons: Complex systems can’t be tested this way. Not good for summative
evaluation (late)
• Competitive usability testing
• Compare against prior or competitor’s versions
• Experimenter bias, be careful to not “prime the user”
• Within-subjects is preferred
25

• Universal usability testing
• Test with highly diverse
• Users (experience levels, ability, etc.)
• Platforms (mac, pc, linux)
• Hardware (old (how old is old?) -> latest)
• Networks (dial-up -> broadband)
• Field tests and portable labs
• Tests UI in realistic environments
• Beta tests
26

• Remote usability testing (via web)
• Recruited via online communities, email
• Large n
• Difﬁculty in logging, validating data
• Software can help
• Can You Break this Test
• Challenge testers to break a system
• Games, security, public displays
27

Limitations
• Focuses on first-time users
• Limited coverage of interface features
• Emergency (military, medical, mission-critical)
• Rarely used features
• Difficult to simulate realistic conditions
• Testing mobile devices
• Signal strength
• Batteries
• User focus
• Yet formal studies on user studies have identified
• Cost savings
• Return on investment (Sherman 2006, Bias and Mayhew 2005)
28

Survey Instruments
• Questionnaires
• Paper or online (e.g. surveymonkey.com)
• Easy to grasp for many people
• The power of many can be shown
• 80% of the 500 users who tried the system liked Option A
• 3 out of the 4 experts like Option B
• Success depends on
• Clear goals in advance
• Focused items
29

Designing survey questions
• Ideally
• Based on existing questions
• Reviewed by colleagues
• Pilot tested
• Direct activities are better than gathering statistics
• Fosters unexpected discoveries
• Important to pre-test questions
• Understandability
• Bias
30

Likert Scales
• Most common methodology
• Strongly Agree, Agree, Neutral, Disagree, Strongly
Disagree
• 5, 7, 9-point scales
• Examples
• Improves my performance in book searching and
buying
• Enables me to search and buy books faster
• Makes it easier to search for and purchase books
31

Most Used Likert-scales
• Questionnaire for User
Interaction Satisfaction
• E.g. questions
• How long have you
worked on this system?
• System Usability Scale
(SUS) – Brooke 1996
• Post-Study System
Usability Questionniare
• Computer System
Usability Questionniare
• Software usability
Measurement Inventory
• Website Analysis and
MeasureMent Inventory
• Mobile Phone Usability
Questionnaire
• Validity, Reliability
32

Bipolar Semantically
Anchored
• Coleman and Williges (1985)
• Pleasant versus Irritating
• Hostile 1 2 3 4 5 6 7 Friendly
• If needed, take existing questionnaires and alter
them slightly for your application
33

Acceptance Tests
• Set goals for performance
• Objective
• Measurable
• Examples
• Mean time between failures (e.g. MOSI)
• Test cases
• Response time requirements
• Readability (including documentation and help)
• Satisfaction
• Comprehensibility
34

Let’s discuss
You want your project to be user friendly.
• Choose Schneiderman or Nielsen’s heuristics to
provide an evaluation methodology:
• What kind of setting would you use?
• How much control would you want to exert?
• Which methods are recorded and when will they
be recorded?
35

Acceptance Tests
• By completing the acceptance tests
• Can be part of contractual fulﬁllment
• Demonstrate objectivity
• Different than usability tests
• More adversarial
• Neutral party should conduct that
• Ex. Video game and smartphone companies
• App Store, Microsoft, Nintendo, Sony
36

Evaluation during use
• Evaluation methods after a product has been released
• Interviews with individual users
• Get very detailed on speciﬁc concerns
• Costly and time-consuming
• Focus group discussions
• Patterns of usage
• Certain people can dominate or sway opinion
• Targeted focus groups
37

Continuous Logging
• The system itself logs user usage
• Video game example
• Other examples
• Track frequency of errors (gives an ordered list of what to address via tutorials,
training, text changes, etc.)
• Speed of performance
• Track which features are used and which are not
• Web Analytics
• Privacy? What gets logged? Opt-in/out?
• What about companies?
38

Online and Telephone Help
• Users enjoy having people ready to help (real-time
chat online or via telephone)
• E.g. Netﬂix has 8.4 million customers, how many
telephone customer service reps?
• 375
• Expensive, but higher customer satisfaction
• Cheaper versions use Bug Report systems
39

Automated Evaluation
• Software for evaluation
• Low level: Spelling, term concordance
• Metrics: number of displays, tabs, widgets, links
• World Wide Web Consortium Markup Validation
• US NIST Web Metrics Testbed
• New research areas: Evaluation of mobile platforms
40

Case Study
• Computer Game:
• Physiological responses used to evaluate users’
experiences;
• Video of participants playing - observation;
• User satisfaction questionnaire;
• Possibilities of applying crowdsourcing for online
performance evaluations
41

ICS3211 Lecture 9

More Related Content

What's hot

Similar to ICS3211 Lecture 9

More from Vanessa Camilleri

Recently uploaded

ICS3211 Lecture 9