Chapter 4
Performance Metrics


          Presenter: 00335011 魏傳諺
Agenda


• Preface
• Task Success
• Time-on-Task
• Errors
• Efficiency
• Learnability
Preface of Performance Metrics

•   Based on specific user behaviors
     – User behaviors
     – The use of scenarios or task
•   How well users are actually using a product
•   Useful to estimate the magnitude of a specific usability issue
     – How many people are likely to encounter the same issue after the product is
       released?
     – How many users are able to successfully complete a core set of tasks using
       a product
•   Not the magical elixir for every situation
     – sample size
     – time & money
     – tell the what very effectively but not the why
Five Basic Types

                • The most widely used performance metric
Task Success    • How effectively users are able to complete a given set of tasks


Time-on-Task    • How much time is required to complete a task



   Errors       • Reflect the mistakes made during a task



  Efficiency    • The amount of effort a user expends to complete a task



 Learnability   • How performance changes over time
TASK SUCCESS
Task Success

• The most common usability metric
• As long as the user has a well-defined task, you can measure
  success
Collecting Any Type of Success Metric

• Each task must have a clear end-state
    – Define the success criteria  Data collection
        • Find the current price for a share of Google stock (clear end-state)
        • Research ways to save for your retirement (not a clear end-state)

• Way to collect success data
    – Verbally articulate the answer after completing the task
    – Provide their answers in a more structured way
        • Try to avoid write-in answers if possible

• In some case the correct solution to a task may not be verifiable
    – depends on the user‟s specific situation
    – testing is not being performed in person
Binary Success

•   Either participants complete a task successfully or they don‟t
•   How to Collect and Measure
     – 0&1
•   How to Analyze and Present
     – By individual task
     – By user or type of user
         • Frequency of use
         • Previous experience using the product
         • Domain expertise
         • Age group
         • Can calculate a percentage of tasks that each successfully completed
              – Binary data  Continuous data

•   Calculating Confidence Intervals
Levels of Success

• Partially completing a task?
   – coming close to fully completing a task may provide value to the
     participant
   – Helpful for you to know
       • Why some participants failed to complete a task
       • With which particular tasks they needed help
Levels of Success (cont’d)

• How to Collect and Measure
   – Must define the various levels
   – Based on the extent or degree to which a participant completed the task
       • Complete Success, Partial Success, and Failure
       • What constitutes „„giving assistance‟‟ to the participant
       • Assign a numeric value for each level
       • Does not differentiate between different types of failure
   – Based on the experience in completing a task
       • No Problem, Minor Problem, Major Problem, and Failure/Gave up
       • Ordinal data  No average score
   – Based on the participant accomplishing the task in different ways
       • Depending on the quality of the answer (not needs numeric score)
Levels of Success (cont’d)

• How to Analyze and Present
   – To create a tacked bar chart
   – To report a “usability score”
Issues in Measuring Success

• How to define whether a task was successful?
   – When unexpected situations arise
       • Make note of them
       • Afterward try to reach a consensus

• How or when to end a task
   – Stopping rule
       • Complete task / Reach the point at which they would give up or seek
         assistance
       • “Three strikes and you‟re out”
       • Set a time limit
   – If the participant is becoming particularly frustrated or agitated
TIME-ON-TASK
Time-on-Task

• Way to measure the efficiency of any product
    – The faster a participant can complete a task, the better the experience
• Exceptions to the assumption that faster is better
    – Game
    – Learning
Importance of Measuring Time-on-Task

• Particularly important for products
    – where tasks are performed repeatedly by the user
• The side benefits of measuring time-on-task
    – Increasing Efficiency  Cost Savings  Actual ROI
How to Collect and Measure Time-on-Task

•   The time elapsed between the start of a task and the end of a task
     – In minutes
     – In seconds
•   Measure by any time-keeping device
     – Start time & End time
     – Two people record the times
•   Automated Tools for Measuring Time-on-Task
     – less error-prone
     – Much less obtrusive
•   Turning on and off the Clock
     – Rules about how to measure time
          • Start the clock as soon as they finish reading the task
          • Point the timing ends at the participant hit the “answer” button
          • Stop timing when the participant has stopped interacting with the product
How to Collect and Measure Time-on-Task (cont’d)

• Tabulating Time Data
Analyzing and Presenting Time-on-Task Data

•   Ways to present
     – Mean
     – Median
     – Geometric mean
•   Ranges
     – Time interval
•   Thresholds
     – Whether users can complete certain tasks within an acceptable amount of
       time
•   Distributions and Outliers
     – Exclude outliers (> 3 SD above the mean)
     – Set up thresholds
     – determine the fastest possible time
Issues to Consider When Using Time Data

•   Only Successful Tasks or All Tasks?
     – Advantage of only including successful tasks
          • A cleaner measure of efficiency
     – Advantage of including all tasks
          • A more accurate reflection of the overall user experience
          • An independent measure in relation to the task success data
     – Always determined when to end  include all times
     – Sometimes decided when to end  only include successful tasks
•   Using a Think-Aloud Protocol?
     – Think-aloud protocol: to gain important insight
     – Have an impact on the time-on-task data
     – Retrospective probing technique
•   Should You Tell the Participants about the Time Measurement?
     – Perform the tasks as quickly and accurately as possible
ERRORS
Errors

• Usability issue vs. Error
    – A usability issue is the underlying cause of a problem
    – One or more errors are a possible outcome
• Errors
    – incorrect actions that may lead to task failure
When to Measure Errors

• When you want to understand the specific action or set of actions
  that may result in task failure
• Errors can tell
    – How many mistakes were made
    – Where they were made within the product
    – How various designs produce different frequencies and types of errors
    – How usable something really is
• Three general situations where measuring errors might be useful
    – When an error will result in a significant loss in efficiency
    – When an error will result in significant costs
    – When an error will result in task failure
What Constitutes an Error?

• No widely accepted definition of what constitutes an error
• Based on many different types of incorrect actions by the user
    – Entering incorrect data into a form field
    – Making the wrong choice in a menu or drop-down list
    – Taking an incorrect sequence of actions
    – Failing to take a key action
• Determine what constitutes an error
    – Make a list of all the possible actions
    – Define many of the different types of errors that can be made
What Constitutes an Error? (cont’d)
Collecting and Measuring Errors

• Not always easy
   – Need to know what the correct (set of) action(s) should be
• Consideration
   – Only a single error opportunity
   – Multiple error opportunities
• Way of organizing error data
   – Record the number of errors for each task and each user
   – 0 ~ max(number of error opportunities)
Analyzing and Presenting Errors

•   Tasks with a Single Error Opportunity
     – Look at the frequency of the error for each task
          • Frequency of errors
          • Percentage of participants who made an error for each task
     – From an aggregate perspective
          • Average the error rates for each task into a single error rate
          • Take an average of all the tasks that had a certain number of errors
          • Establish maximum acceptable error rates for each task
•   Tasks with Multiple Error Opportunities
     – Look at the frequency of errors for each task  error rate
     – The average number of errors made by each participant for each task
     – Which tasks fall above or below a threshold
     – Weight each type of error with a different value and then calculate an “error score”
Issues to Consider When Using Error Metrics

• Make sure you are not double-counting errors
• Need to know
    – An error rate, and
    – Why different errors are occurring
• An error is the same as failing to complete a task
    – Report errors as task failure
EFFICIENCY
Efficiency

• Time-on-task
• Look at the amount of effort required to complete a task
   – In most products, the goal is to minimize the amount of effort
   – two types of effort
       • Cognitive
           – Finding the right place to perform an action
           – Deciding what action is necessary
           – Interpreting the results of the action
       • Physical
           – The physical activity required to take action
Collecting and Measuring Efficiency

• Identify the action(s) to be measured
• Define the start and end of an action
• Count the actions
• Actions must be meaningful
    – Incremental increase in cognitive effort
    – Incremental increase in physical effort
• Look only at successful tasks
Analyzing and Presenting Efficiency Data
Analyzing and Presenting Efficiency Data (cont’d)
Efficiency as a Combination of Task Success and Time


• Task Success + Time-on-Task
• Core measure of efficiency
   – The ratio of the task completion rate to the mean time per task
LEARNABILITY
LEARNABILITY

•   Most products, especially new ones, require some amount of learning
•   Experience
     – Based on the amount of time spent using a product
     – Based on the variety of tasks performed
•   Learning
     – Sometimes quick and painless
     – At other times quite arduous and time consuming
•   Learnability
     – The extent to which something can be learned
     – How much time and effort are required to become proficient
     – While happens over a short period of time  maximize efficiency
     – While happen over a longer time period  great rely on memory
Collecting and Measuring Learnability Data

• Basically the same as they are for the other performance metrics
• Collect the data at multiple times
    – Based on expected frequency of use
• Decide which metrics to use  Decide how much time to allow
  between trials
• Alternatives
    – Trials within the same session
    – Trials within the same session but with breaks between tasks
    – Trials between sessions
Analyzing and Presenting Learnability Data

• By examining a specific performance metric
• Interpret the chart
    – Notice the slope of the line(s)
    – Notice the point of asymptote, or essentially where the line starts to
      flatten out
    – Look at the difference between the highest and lowest values on the y-
      axis
• Compare learnability across different conditions
Issues to Consider When Measuring Learnability


• What Is a Trial?
   – Learning is continuous and without breaks in time
       • Memory is much less a factor in this situation
       • More about developing and modifying different strategies to complete a set
         of tasks
       • Take measurements at specified time intervals

• Number of Trials
   – There must be at least two
   – In most cases there should be at least three or four
   – You should err on the side of more trials than you think you might need
     to reach stable performance.
Thanks for your listening~

Ch4 Performance metrics

  • 1.
    Chapter 4 Performance Metrics Presenter: 00335011 魏傳諺
  • 2.
    Agenda • Preface • TaskSuccess • Time-on-Task • Errors • Efficiency • Learnability
  • 3.
    Preface of PerformanceMetrics • Based on specific user behaviors – User behaviors – The use of scenarios or task • How well users are actually using a product • Useful to estimate the magnitude of a specific usability issue – How many people are likely to encounter the same issue after the product is released? – How many users are able to successfully complete a core set of tasks using a product • Not the magical elixir for every situation – sample size – time & money – tell the what very effectively but not the why
  • 4.
    Five Basic Types • The most widely used performance metric Task Success • How effectively users are able to complete a given set of tasks Time-on-Task • How much time is required to complete a task Errors • Reflect the mistakes made during a task Efficiency • The amount of effort a user expends to complete a task Learnability • How performance changes over time
  • 5.
  • 6.
    Task Success • Themost common usability metric • As long as the user has a well-defined task, you can measure success
  • 7.
    Collecting Any Typeof Success Metric • Each task must have a clear end-state – Define the success criteria  Data collection • Find the current price for a share of Google stock (clear end-state) • Research ways to save for your retirement (not a clear end-state) • Way to collect success data – Verbally articulate the answer after completing the task – Provide their answers in a more structured way • Try to avoid write-in answers if possible • In some case the correct solution to a task may not be verifiable – depends on the user‟s specific situation – testing is not being performed in person
  • 8.
    Binary Success • Either participants complete a task successfully or they don‟t • How to Collect and Measure – 0&1 • How to Analyze and Present – By individual task – By user or type of user • Frequency of use • Previous experience using the product • Domain expertise • Age group • Can calculate a percentage of tasks that each successfully completed – Binary data  Continuous data • Calculating Confidence Intervals
  • 9.
    Levels of Success •Partially completing a task? – coming close to fully completing a task may provide value to the participant – Helpful for you to know • Why some participants failed to complete a task • With which particular tasks they needed help
  • 10.
    Levels of Success(cont’d) • How to Collect and Measure – Must define the various levels – Based on the extent or degree to which a participant completed the task • Complete Success, Partial Success, and Failure • What constitutes „„giving assistance‟‟ to the participant • Assign a numeric value for each level • Does not differentiate between different types of failure – Based on the experience in completing a task • No Problem, Minor Problem, Major Problem, and Failure/Gave up • Ordinal data  No average score – Based on the participant accomplishing the task in different ways • Depending on the quality of the answer (not needs numeric score)
  • 11.
    Levels of Success(cont’d) • How to Analyze and Present – To create a tacked bar chart – To report a “usability score”
  • 12.
    Issues in MeasuringSuccess • How to define whether a task was successful? – When unexpected situations arise • Make note of them • Afterward try to reach a consensus • How or when to end a task – Stopping rule • Complete task / Reach the point at which they would give up or seek assistance • “Three strikes and you‟re out” • Set a time limit – If the participant is becoming particularly frustrated or agitated
  • 13.
  • 14.
    Time-on-Task • Way tomeasure the efficiency of any product – The faster a participant can complete a task, the better the experience • Exceptions to the assumption that faster is better – Game – Learning
  • 15.
    Importance of MeasuringTime-on-Task • Particularly important for products – where tasks are performed repeatedly by the user • The side benefits of measuring time-on-task – Increasing Efficiency  Cost Savings  Actual ROI
  • 16.
    How to Collectand Measure Time-on-Task • The time elapsed between the start of a task and the end of a task – In minutes – In seconds • Measure by any time-keeping device – Start time & End time – Two people record the times • Automated Tools for Measuring Time-on-Task – less error-prone – Much less obtrusive • Turning on and off the Clock – Rules about how to measure time • Start the clock as soon as they finish reading the task • Point the timing ends at the participant hit the “answer” button • Stop timing when the participant has stopped interacting with the product
  • 17.
    How to Collectand Measure Time-on-Task (cont’d) • Tabulating Time Data
  • 18.
    Analyzing and PresentingTime-on-Task Data • Ways to present – Mean – Median – Geometric mean • Ranges – Time interval • Thresholds – Whether users can complete certain tasks within an acceptable amount of time • Distributions and Outliers – Exclude outliers (> 3 SD above the mean) – Set up thresholds – determine the fastest possible time
  • 19.
    Issues to ConsiderWhen Using Time Data • Only Successful Tasks or All Tasks? – Advantage of only including successful tasks • A cleaner measure of efficiency – Advantage of including all tasks • A more accurate reflection of the overall user experience • An independent measure in relation to the task success data – Always determined when to end  include all times – Sometimes decided when to end  only include successful tasks • Using a Think-Aloud Protocol? – Think-aloud protocol: to gain important insight – Have an impact on the time-on-task data – Retrospective probing technique • Should You Tell the Participants about the Time Measurement? – Perform the tasks as quickly and accurately as possible
  • 20.
  • 21.
    Errors • Usability issuevs. Error – A usability issue is the underlying cause of a problem – One or more errors are a possible outcome • Errors – incorrect actions that may lead to task failure
  • 22.
    When to MeasureErrors • When you want to understand the specific action or set of actions that may result in task failure • Errors can tell – How many mistakes were made – Where they were made within the product – How various designs produce different frequencies and types of errors – How usable something really is • Three general situations where measuring errors might be useful – When an error will result in a significant loss in efficiency – When an error will result in significant costs – When an error will result in task failure
  • 23.
    What Constitutes anError? • No widely accepted definition of what constitutes an error • Based on many different types of incorrect actions by the user – Entering incorrect data into a form field – Making the wrong choice in a menu or drop-down list – Taking an incorrect sequence of actions – Failing to take a key action • Determine what constitutes an error – Make a list of all the possible actions – Define many of the different types of errors that can be made
  • 24.
    What Constitutes anError? (cont’d)
  • 25.
    Collecting and MeasuringErrors • Not always easy – Need to know what the correct (set of) action(s) should be • Consideration – Only a single error opportunity – Multiple error opportunities • Way of organizing error data – Record the number of errors for each task and each user – 0 ~ max(number of error opportunities)
  • 26.
    Analyzing and PresentingErrors • Tasks with a Single Error Opportunity – Look at the frequency of the error for each task • Frequency of errors • Percentage of participants who made an error for each task – From an aggregate perspective • Average the error rates for each task into a single error rate • Take an average of all the tasks that had a certain number of errors • Establish maximum acceptable error rates for each task • Tasks with Multiple Error Opportunities – Look at the frequency of errors for each task  error rate – The average number of errors made by each participant for each task – Which tasks fall above or below a threshold – Weight each type of error with a different value and then calculate an “error score”
  • 27.
    Issues to ConsiderWhen Using Error Metrics • Make sure you are not double-counting errors • Need to know – An error rate, and – Why different errors are occurring • An error is the same as failing to complete a task – Report errors as task failure
  • 28.
  • 29.
    Efficiency • Time-on-task • Lookat the amount of effort required to complete a task – In most products, the goal is to minimize the amount of effort – two types of effort • Cognitive – Finding the right place to perform an action – Deciding what action is necessary – Interpreting the results of the action • Physical – The physical activity required to take action
  • 30.
    Collecting and MeasuringEfficiency • Identify the action(s) to be measured • Define the start and end of an action • Count the actions • Actions must be meaningful – Incremental increase in cognitive effort – Incremental increase in physical effort • Look only at successful tasks
  • 31.
    Analyzing and PresentingEfficiency Data
  • 32.
    Analyzing and PresentingEfficiency Data (cont’d)
  • 33.
    Efficiency as aCombination of Task Success and Time • Task Success + Time-on-Task • Core measure of efficiency – The ratio of the task completion rate to the mean time per task
  • 34.
  • 35.
    LEARNABILITY • Most products, especially new ones, require some amount of learning • Experience – Based on the amount of time spent using a product – Based on the variety of tasks performed • Learning – Sometimes quick and painless – At other times quite arduous and time consuming • Learnability – The extent to which something can be learned – How much time and effort are required to become proficient – While happens over a short period of time  maximize efficiency – While happen over a longer time period  great rely on memory
  • 36.
    Collecting and MeasuringLearnability Data • Basically the same as they are for the other performance metrics • Collect the data at multiple times – Based on expected frequency of use • Decide which metrics to use  Decide how much time to allow between trials • Alternatives – Trials within the same session – Trials within the same session but with breaks between tasks – Trials between sessions
  • 37.
    Analyzing and PresentingLearnability Data • By examining a specific performance metric • Interpret the chart – Notice the slope of the line(s) – Notice the point of asymptote, or essentially where the line starts to flatten out – Look at the difference between the highest and lowest values on the y- axis • Compare learnability across different conditions
  • 38.
    Issues to ConsiderWhen Measuring Learnability • What Is a Trial? – Learning is continuous and without breaks in time • Memory is much less a factor in this situation • More about developing and modifying different strategies to complete a set of tasks • Take measurements at specified time intervals • Number of Trials – There must be at least two – In most cases there should be at least three or four – You should err on the side of more trials than you think you might need to reach stable performance.
  • 39.
    Thanks for yourlistening~