The document discusses usability testing and how to properly conduct it. It notes that many games fail due to a lack of usability testing during development. Usability testing helps ensure interfaces are usable by comparing metrics like completion rates and task times to benchmarks. The document provides formulas and examples for analyzing completion rates and task times from small sample usability tests to determine if a task meets a benchmark with a certain level of statistical confidence. It emphasizes that usability testing, not just bug fixing, is needed to create usable products.
2. WHY ?
" Since the limitation of data, and the
lack of theoretical foundation in
Game Design, most of games have
been developed based solely on own
experiences and intuitions of the
Designer. As the result, about 80% of
games fail on the market every
year."
( Game Software Industry Report in AlienBrain product catalog. NxN
software. 2001 )
3. WHY ? (2)
"However, it is necessary to point out that,
too often, video game interfaces are an
afterthought. The reason is, too many
project managers assume the most
important part of a software
development project is the programming,
and then the interface can come later. As
the result, insufficient time is assigned for
interface design which may leads to a poor
quality interface." ( Fox 2005 )
4. MORE INFORMATION ...
"Human Computer Interaction in
Game Design"
- Nguyen Hung -
http://www.theseus.fi/bitstream/handle/10024/43234/Nguyen_Hung.
pdf?sequence=1
5. MORE INFORMATION ... (2)
"Quantifying The
User Experince"
- Jeff Sauro / James R. Lewis -
10. HOW DO WE DO IT ?
• Compare it to a specific benchmark or
goal.
• Get stastistical w ays to get more
precise answers.
• Get statistically significant evidence
from small samples.
11. HOW DO WE SET A
BENCHMARK ?
• Based on historical data obtained from
previous test that included the task.
• Based on findings reported in published
scientific or marketing research.
• Negotiate criteria with the stakeholders who
are responsible for the product.
12. HOW DO WE SET A
BENCHMARK ? (2)
Some suggestions :
• The best objective basis are data from previous
usability studies of predecessor or competitive
products.
• The source of historical data should be studies of
similiar types of participans, completing the same
tasks, under the same conditions.
• Negotiate with other stakeholders for the final set of
shared goals.
13. HOW DO WE SET A
BENCHMARK ? (2)
Some other suggestions :
• Establish some specific objectives
immediately, so you can measure
improvements.
• Revise your product in the early stages.
• Do not change reasonable goals to
accomodate an unusable product.
17. Use the exact probabilities from the binomial distribution,
where :
x = the number of users who successfully completed the
task
n = sample size
)(
)1(
)!(!
!
)( xnx
pp
xnx
n
xp
19. EXAMPLE 1
Eight of nine users successfully
completed a task.
Is there sufficent evidence to conclude
that at least 70% of all users would
be able to complete the same task ?
21. CONCLUSION
0.1556 + 0.04035 = 0.1960
The probability of 8 or 9 successes out of
nine attempts is (1 - 0.1960) * 100 = 80.4%
There is an 80.4% chance that the
completion rate exceeds 70%
22. MID - PROBABILITY
0.5*(0.1556) + 0.04035 = 0.07782
The probability of 8 or 9 successes out of
nine attempts is (1 - 0.07782) * 100 = 88.4%
There is an 88.4% chance that the
completion rate exceeds 70%
23. MID - PROBABILITY
0.5*(0.1556) + 0.04035 = 0.07782
The probability of 8 or 9 successes out of
nine attempts is (1 - 0.07782) * 100 = 88.4%
There is an 88.4% chance that the
completion rate exceeds 70%
24. • Not suitable for production, but sufficent
enough to show that efforts are better spent on
improving other functions.
• The probability we computed is called an "exact"
probability. Not because it's exactly correct, but
because the probabilities are calculated
correctly. Rather than approximated.
• This result tend to be coservative.
IMPORTANT NOTES
25. LARGE SAMPLE TEST
• success / fail
• "large" sample size = at least 15 failures
and 15 successes.
28. EXAMPLE 2
85 out of 100 users were able to
successfully locate a specific product
and add it to their shopping cart.
Is there enough evidence to conclude
that at least 75% of all users can
complete this task successfully ?
32. HERE'S THE FORMULA
where :
n
s
x
t ln
lnˆ)ln(
ln
ˆx
lns
= mean of the log values
= standar deviation of the log values
33. EXAMPLE 3
11 users completed a task in a financial
application.
Task times : 90, 59, 54, 55, 171, 86, 107,
53, 79, 72, 157
Is there enough evidence that the average
task time is less than 100 seconds?
34. ANSWER
• Task Times =
90, 59, 54, 55, 171, 86, 107, 53, 79, 72, 157
• Log-transformed times =
4.5, 4.08, 3.99, 4.01, 5.14, 4.45, 4.67, 3.97, 4.37, 4.28, 5.06
• Mean of log times = 4.41
• Geometric mean of log times = EXP(4.41) =
82.3
• Standar deviation of log times = 0.411
• Log of benchmark (60s) = 4.61
35. ANSWER (2)
find the t-statistic value
Use the probability on 10 degrees of freedom
(n-1);
TDIST(1.53,10,1) = 0.0785
53.1
124.0
19.0
11
411.0
41.461.4
t
36. CONCLUSION
The probability of seeing an average time of 82.3
seconds if the actual population time is greater
than 100 seconds is around 7.87%
OR
We can be 92.15% confident that users can
complete this task in less than 100 seconds.
37. • What is geometric mean?
The best estimate of the middle task time for
small-sample usability data (less than 25).
• How about large-sample usability data?
Use sample median method.
(won't be explained here)
IMPORTANT NOTES