2. 10
Figure 1. Methodological steps followed for the usability testing review.
MATERIAL AND METHODS
Usability and ergonomics are often misunderstood and
considered synonymous when designing and evaluating
products. Ergonomics (EN ISO 6385-2004) represents
the scientific field dealing with the general optimization of
a system (e.g. interactions between human and non-
human elements of a system and/or principles, data and
methods to design systems aimed at human well-being)
while usability (and usability engineering) focuses on
safety issues with reference to users, tasks and context
of use.
As shown in figure 1, in order to provide technical
guidance on usability testing a systematic review of
literature, international regulations and experts' opinions
(users' and manufacturers') was carried out by
addressing the following key questions:
Have any usability measurable indicators been
defined? If so, which ones?
Is any mathematical formula available to
calculate each specific usability index and the general
one?
What setting/location would be appropriate for
carrying out usability tests? Why?
INTERNATIONAL STANDARDS ON USABILITY IN
HEALTHCARE
The historic development of regulations about usability is
shown in Figure 2.
The standard upon which the concept of usability was
built is ISO 9241-11, in which usability and the
procedures to measure it were defined for the first time.
The American technical regulations applied the concept
of usability to healthcare in 2001. ANSI-AAMI HE 74
represents one of the main foundations of the EN 62366
European Standard (Italian CEI EN 62366), which was
published in October 2010 and dealt with usability test of
medical devices, and of the three editions of the EN
60601-1-6 European Standard, which were
acknowledged by three editions of the Italian Standard
CEI EN 60601-1-6 in 2006, 2008 and 2011 respectively.
Finally, the need to communicate the results of usability
tests in an effective and standardized way lead to the
development of the ISO/IEC 25062 International
Standard, which states how a Usability Report is to be
organized and which features it has to have.
ISO 9241-11:1998
The International Standard on “Ergonomic requirements
for office work with visual display terminals (VDTs) –
Guidance on Usability” defines usability as “the extent to
which a product can be used by specified users to
achieve specified goals with effectiveness, efficiency and
satisfaction in a specified context of use.”
Indeed, usability is based on three basic elements
(Figure 3):
Effectiveness: level of accuracy and completeness in
carrying out the functions the device is meant to perform;
Efficiency: effectiveness in relation to the resources used.
Time efficiency is related to the time needed to carry out
the functions of the device. Other types of efficiency exist,
such as economic efficiency (in relation to costs) and
human efficiency (in relation to human resources).
3. 11
Figure 1. Historic development of national and international regulations about the concept of Usability.
Figure 2. Usability Framework according to ISO 9241‐11:1998 - Guidance on usability.
User Satisfaction: synergy of information obtained from
the user both through behavioural analyses, interviews
and questionnaires administered before, during and after
a usability test.
As reported in the framework shown in Figure 3, a
usability test makes sense only if it is set in a specific
context of use, which consists of users, tasks, equipment
(hardware, software and materials) and of the physical
4. 12
and social environment in which the test is carried out. All
of the above mentioned elements can influence the
usability of a product in a work system (ISO 9241-
11:1998).
Here follow the four main application fields of usability
tests:
Comparative Analysis of different products or of
different versions of the same product;
Support in designing a product;
Diagnostic Evaluation applied to identify specific
elements responsible of usability problems of a device;
Planning specific and appropriate training about
how to use a device.
The ISO 9241-11:1998 Standard also defines the most
appropriate tools to measure the elements to be
evaluated through a usability test and states the need to
develop and apply clear and measurable quantitative and
semi-quantitative indexes. Moreover, this standard
underlines the importance of having a simulation
laboratory available, where one can arrange different
scenarios, simulate several contexts of use and have
better control over the variables present while using a
device (Daniels et al., 2007).
CEI EN 62366:2008
Application of usability engineering to medical
devices.
The IEC62366:2007 standard acknowledged by Italian
regulations in the CEI EN 62366:2008 standard identifies
poor usability of medical devices as one of the major
causes of use errors, because it is closely linked to poor
ease of use and learnability. Moreover, the application
field of medical devices proves to be highly critical
because of the increased technical complexity of devices
and of their availability to users who lack any healthcare
training, such as patients themselves.
With respect to ISO 9241-11:1998 Standard, two major
innovations have been introduced by the American
Standard ANSI/AAMI HE 74:2001first, and by the
European Standard later. The first innovation consists in
the introduction of the concept of learnability within the
definition of usability. The second one concerns the
definition of “primary operating functions”, which are the
functions to be taken into consideration when one carries
out a usability test. The definition of primary operating
functions includes both the functions which are frequently
used and those which are critical in relation to safety.
The concept of usability is considered of primary
importance in relation to safety, because of the the link
between usability test and the process of risk analysis
applied to medical devices, as it is described in the EN
ISO 14971 Standard which bears the title “Application of
risk management to medical devices”. The connections
concern the identification of risk elements in using the
device according to the manufacturer’s intended use, the
identification of risks and the implementation and
validation of procedures and actions aiming at reducing
risks.
ANSI/AAMI HE 74:2001
Human factor design process for medical devices
Attachment D of EN 62366 Standard clearly draws on the
ANSI/AAMI HE 74:2001 American Standard, which
defines the fields of application of usability test in the
steps involved in designing a device:
Conceptual Design, definition of aims and of the user’s
need;
Definition of technical requirements and technical
solutions;
Complete and detailed implementation of the technical
solutions chosen;
Test on the prototype
CEI EN 60601-1-6:2011
Medical electrical equipment – Part 1: General
requirements for basic safety and essential
performance – Collateral standard: Usability
The EN 60601-1-6:2010-04 European standard,
acknowledged by Italian regulations in the CEI EN
60601-1-6:2011-05 standard replaces the second edition
of IEC 60601-1-6 and aligns with the Usability
Engineering Process described in the IEC 62366
standard. This is due to the fact that, as healthcare
evolves, less skilled operators, including patients
themselves, are now using medical electrical equipment,
while the medical electrical equipment itself is becoming
more complicated (IEC 60601-1-6).
The above mentioned standard provides some important
innovations with respect to previous editions for what
concerns the following themes:
The so-called “primary operating functions” consider
correct and safe performances of the device by including
the frequently used functions and those functions related
to the safety of a medical device;
In order to carry out an exhaustive usability test to assess
the basic safety and essential performance of a device,
5. the EN 60601-1-6 standard suggests to take into
consideration the reasonably foreseeable user’s misuse,
besides the correct use intended by the producer.
Manufacturers shall not consider any form of incorrect
use. Moreover, the usability engineering process
concerns risk identification but not mitigation associated
with abnormal use.
The term “patient” shall include animals as well.
Accompanying documents (instruction for use and
technical description) have to be included in the usability
engineering process as part of operator-equipment
interface.
Worst case and frequent use scenarios shall be included
in the usability engineering process.
The standard underlines once again the basic concept
that usability is to be assessed within the context of use
of a device (Figure 4) taking into consideration the type of
user, the specific device and its intended use.
Finally, the standard contains a list of the main
applications of usability tests:
Support to product design;
Support to product prototyping;
Planning of workload;
Cost-benefit analysis;
Risk analysis;
Comparative analysis between different devices
or different versions of the same device.
ISO/IEC 25062
Software engineering – Software product Quality
Requirements and Evaluation (SQuaRE) – Common
Industry Format (CIF) for usability test reports.
The significance of this standard lies in the definition of
one standard format to present in a clear and effective
way the results of a usability test carried out on a device
and/or on a method to evaluate the usability of a device.
The way such results are communicated is of primary
importance, because significant decisions and choices
can depend on this. Both the IEC 62366:2007 and IEC
60601-1-6:2010 standards refer to the ISO/IEC 25062
standard to report on the measures obtained in a usability
test and include them in a usability engineering file. Here
follow the elements which are to be included in a usability
test:
Participants in the test must represent the real user
population for which the device is intended;
Primary functions are to be defined;
Measures related to effectiveness, efficiency and user
satisfaction are to be defined and included in the report.
13
RESULTS
On the basis of the instructions contained in the set of
standards analysed above, here are described the
proposed new metrics which have been developed to
measure the characteristics of usability: effectiveness,
efficiency, user satisfaction and learnability.
Effectiveness
The effectiveness index of a device is the result of the
combination of three components: the completeness of
the tasks carried out with success by the participants, the
number of errors made and the number of assists needed
by the participants to carry out the tasks. The
effectiveness index can be calculated both in relation to
the overall performance of the device and to a specific
primary function. In detail, the effectiveness index per
task is composed of the following indicators:
The Completeness Index per Specific Task-i (ICSi)
results from the number of participants who have
completed the specific task-i with success divided by the
number of participants, while the General Completeness
Index (ICS) is the level of completeness which
characterizes a device in all of the tasks tested by the
participants and it is defined as the sum of the
completeness indexes ICSi divided by the number of
tasks, as sown in equation (1).
[ICS=(∑ICSi)/number of tasks] (1)
The Error Index per specific task-i (IEGi) measures the
percentage of mistakes which occur while participants
use the device in each specific task and it is defined as
the total number of errors per task-i divided by the
number of participants, while the General Error Index
(IEG) provides information about mistakes with relation to
the general use of the device and it is defined as the sum
of the error indexes per task-i divided by the number of
tasks, as shown in equation (2).
[IEG = (∑IEGi)/number of tasks] (2)
The Assist Index per specific task-I (IAGi) results from the
total number of assists per task-i divided by the number
of participants, while the General Assist Index (IAG) is
the sum of the assist indexes per task-i divided by the
number of tasks, as shown in equation (3).
[IAG = (∑IAGi)/number of tasks] (3)
The General Effectiveness Index per task-i (IGEi) is the
synthetic index which represents the general
effectiveness of the device for a specific task and it is
6. 14
Figure 4. Usability engineering process
Figure 5. Check-list used for data collection by the evaluators during a usability test.
defined as the linear combination between IAGi, IEGi and
ICSi, as shown in equation (4), while the General
Effectiveness Index (IGE) results from the linear
combination between IAG, IEG and ICS, as shown in
equation (5).
[IGEi=(3*ICSi - 2*IEGi – IAGi )/6] (4)
[IGE =(3*ICS - 2*IEG – IAG )/6] (5)
Finally, figure 5 shows the specific check-list used to
collect data during the test. Thanks to this check-list, it is
possible to know if each task was completed, how many
mistakes were made and how many assists were given to
the participants. Such information is available for each
task.
Efficiency
The efficiency Index can be evaluated as the result of the
combination between the effectiveness and the
completion times achieved by the participants in the test.
The efficiency index can be calculated both in relation to
the overall performance of the device and for a specific
primary function or a single task.
In detail, the efficiency index per task is composed of the
following indicators:
Time Efficiency index per specific task-i (IETi) results
from the number of participants who obtained a
completion time higher than the value expected by
experts divided by the number of participants who
7. completed the task. The Completion Time can be
calculated per specific task (TCi) and results from the
sum of the completion times per participant divided by the
number of participants, while the General Completion
Time (TC) of a device is the sum of the TCi divided by the
number of tasks, as shown in equation (6).
[TC= (∑TCi)/number of tasks] (6)
It is important to underline that in the case of a usability
test carried out on one device only involving one
homogeneous group composed of several participants,
the Expected Completion time per specific task-i (TCi) is
provided by field experts. In the case of an analysis
carried out on two devices or two versions of the same
device or involving two groups of participants, the
significance range is calculated through a t-student test.
Finally, in the case of a usability test carried out on one
device with more than two groups of participants involved
or on more than two devices or versions of a device with
one group of participants involved, the significance range
is calculated through an ANOVA test. The check-list
used for the time collection is shown in figure 4, in the
areas “starting time” and “ending time.”
User Satisfaction and Learnability
User satisfaction and learnability can be assesses by
administering self-report tools, such as interviews and
questionnaires, during various phases of a usability test.
Literature about the usability of websites and information
systems includes several validated questionnaires which
aim at evaluating subjective features of the operator’s
use experience. The purpose of some questionnaires is
to gather information about the participants’ general
opinion about the usability of a system, while other
questionnaires aim at evaluating user satisfaction and
learnability. Here follow some of the issues surveyed by
the above mentioned tools: user’s previous experience
(Shneiderman, 1987), ease of use of the device (Lund,
2001), learnability (Kirakowsk et al. 1992, Shneiderman
1987 and Lund 2001), perceived efficiency (Kirakowsk et
al. 1992), perceived control on the device (Kirakowsk et
al. 1992), use satisfaction (Lewis 1991 and Lund 2001),
emotional reactions (Kirakowsk et al. 1992, Isomursu
2007) and user’s expectations (Albert and Dixon 2003, ,
Thayer and Dugan 2009). Unlike what happens in the
field of usability of websites and information systems, no
validated questionnaires are known which aim at
assessing subjective features of users’ experience with
medical devices. A few authors (Chiu et al.2004,
Follmann et al. 2010, Garmer et al. 2002, Hersch et al.
2009)) used ad hoc built questionnaires, which were
administered at the end of usability tests with the purpose
to assess both the
15
participants’ subjective perception about the usability of
the electro medical devices tested and, more specifically,
the users’ satisfaction and learnability of such devices. In
order to assess the user’s general satisfaction with
electromedical information devices (electronic medical
record system), other authors [Jaspers et al. 2008, Sitting
et al. 1999), used the Questionnaire for User Interaction
Satisfaction (QUIS) (Chin et al. 1988), which was
originally intended to be applied to information systems.
For what concerns the timing and phases to administer
the questionnaires, there are three possibilities:
Pre-test phase: participants are administered
questionnaires before the usability test begins. Thayer
and Dugan (Thayer and Dugan 2009) defined the
questionnaires administered in this phase as ”pre-context
questionnaires”, while Rubin and Chisnell (Rubin, J. and
Chisnell, 2008) refer to this kind of tool as “background
questionnaires”. Such questionnaires aim at gathering
information about the participants’ past experience, which
is helpful to better understand their behaviour and
performance during the test. This kind of questionnaires
is composed of a series of items which survey the
subjects’ experiences, attitudes and preferences in the
fields which could affect their performance. The above
mentioned authors state that these questionnaires are
useful to check whether the subjects recruited are
appropriate for the test.
Test phase: participants are administered questionnaires
before and/or after each task. Dumas and Redish
(Dumas and Redish, 1999) suggested administering
short interviews or questionnaires in this phase and to
have participants express their answers using Likert
scales. Lewis (Lewis 1991) put together a questionnaire
(After Scenario Questionnaire – ASQ) composed of
three questions to be asked at the end of each scenario.
Participants are required to answer using a 1-5 Likert
scale, with a range which varies from strongly disagree to
strongly agree, which aims at assessing the participants’
self-referred satisfaction about the tasks they carried out.
Sauro and Dumas (Sauro and Dumas 2009) stated that
carrying out an evaluation at the end of each scenario
has the advantage of providing more diagnostic
information about usability and more valid measures.
Albert and Dixon (Albert and Dixon 2003) applied a
different procedure, which implies asking two questions,
that is, one before and one just after each task is carried
out, in order to assess both the participants’ expectations
and experience as for the ease of use of the device. By
comparing such data, useful information can be gathered
and, if needed, actions can be planned to improve or
correct certain features of the device.
Finally, information about subjective experience can be
gathered by using the thinking-aloud technique, that
8. 16
Figure 6. Conceptual connection of usability to safety and risk management for medical devices (EN 62366, EN 14971).
consists in asking participants to express their thoughts in
words while they are carrying out a task. Such thoughts
can reveal whether the interaction between the
participants and the device is positive or negative and
can help observers to identify possible causes of errors.
Although this procedure is complex, it has the advantage
of better representing the user’s experience because,
unlike questionnaires, it is less likely to be subject to
participants’ falsifications and distortions.
Post-test phase: participants are administered
questionnaires at the end of the usability test.
Questionnaires administered in this phase, whether
validated or ad hoc built, enable observers to gather both
data about the participants’ evaluations of the device on
the whole and, if needed, about their opinion about
specific features of the device.
The procedures described so far to evaluate user
satisfaction and learn ability study subjective experience
as referred by the participants themselves, starting from a
series of standard questions. As far as learnability is
concerned, more objective procedures can be applied.
For instance, Karahoca and coll. (Karahoca et al., 2010),
asked participants to carry out tasks again twelve hours
after performing the first test. This way, researchers can
analyze the learning curve can and gather information
9. about learnability within the passing of time.
DISCUSSION AND CONCLUSIONS
In conclusion, by analysing international regulations, it
results clear that the concept of usability has acquired
more and more importance with the passing of time.
Originally, the issue of usability only concerned the field
of software and computer science, but it later spread to
the field of electro-medical technology and of medical
devices.
This paper provides a guideline to users, professionals in
usability analysis and manufacturers to carry out usability
tests based on indicators and methods which objectively
measure usability using a scientific approach both on the
level of tasks and on the level of the general device.
Efficiency and effectiveness can be assessed
quantitatively, while user satisfaction and learnability can
be estimated in a semi-quantitative way.
The new metrics can be applied to any usability
evaluation setting by taking into consideration real
experimental data deriving from usability tests and/or
estimated performance data gathered from experts'
interviews.
Regarding the context of use, usability is essential to
identify hazards and characteristics related to safety
(Figure 6) which are difficult to detect at a manufacturing
level by applying a heuristic approach only.
Even if a real medical ward represents the ideal context
of use to evaluate medical devices, the high cost and
organizational complexity, caused by the interruption of
regular activity and/or the high number of different areas
where the device is regularly used, makes the use of a
testing laboratory the best solution to combine real
environmental aspects with a controlled and multi-
configuration area (e.g. offering the possibility to carry out
simulations of worst case scenarios). Moreover, a
laboratory based approach makes it possible to take
more exact measurements (ISO 9241-11:1998).
References
Albert W, Dixon E (2003). Is this what you expected? The use
of expectation measures in usability testing. Proceedings of
the Usability Professionals Association 2003 Conference,
Scottsdale, AZ.
American Standard ANSI/AAMI HE74:2001.Human factors
design process for medical devices.
Chiu CC, Vicente KJ, Buffo-Sequeira I, Hamilton RM, McCrindle
B W (2004). Usability assessment of pacemaker
programmers. PACE, 27: 1388-1398.
Chin JP, Diel VA, Norman KL (1988). Development of an
instrument measuring user satisfaction of the human-
computer interface. Proceedings of SIGCHI
17
New York: ACM/SIGCHI. 88:213-218
Dumas J, Redish J (1999). A Practical Guide to Usability
Testing. Chicago, IL.
European Norm EN 62366:2008-01. Medical devices -
Application of usability engineering to medical devices.
European Norm EN 60601-1-6:2010-04. Medical Electrical
Equipment - Part 1: General Requirements For Basic Safety
And Essential Performance - Collateral Standard: Usability,
2011.
European Norm EN 14971. Medical devices. Application of risk
management to medical devices.
European Norm EN ISO 6385-2004. Ergonomic principles in the
design of work systems.
Follmann A, Korff A, Furtjes T, Lauer W, Kunze SC, Schmieder
K, Radermacher K (2010). Evaluation of a synergistically
controlled semiautomatic trepanation system for
neurosurgery. Conference Proceedings IEEE Engineering in
Medicine and Biology Society, 2010: 2304-2307.
Garmer K, Liljegren E, Osvalder AL, Dahlman S (2002).
Application of usability testing to the development of medical
equipment. Usability testing of a frequently used infusion
pump and a new user interface for an infusion pump
developed with a Human Factors approach. International
Journal of Industrial Ergonomics, 29: 145-159.
Hersch M, Einav S, Izbicki G (2009). Accuracy and ease of use
of a novel electronic urine output monitoring device compared
with standard manual urinometer in the intensive care unit.
Journal of Critical Care, 24: 629.e13-629.e17
International Standard ISO 9241-11:1998. Ergonomic
requirements for office work with visual display terminals
(VDTs) - Part 11: Guidance on usability.
International Standard ISO/IEC 25062:2006. Software
engineering — Software product Quality Requirements and
Evaluation (Square) — Common Industry Format (CIF) for
usability test reports.
Isomursu M, Tähti M, Väinämö S, Kuutti K (2007). Experimental
evaluation of five methods for collecting emotions in field
setting with mobile applications. International Journal of
Human-Computer Studies, 65: 404-418.
Jaspers MWM, Peute LWP, Lauteslager A, Piet JM, Bakker
PJM (2008). Pre-post evaluation of physicians’ satisfaction
with a redesigned electronic medical record system. Studies
in health technology and informatics, 136: 303-308.
Daniels J (2007). A Framework for Evaluating Usability. Journal
of Clinical Monitoring and Computing (2007) 21:323–330.
Karahoka A, Bayraktar E, Karahoka D (2010). Information
system design for a hospital emergency department: a
usability analysis of software prototypes. Journal of
Biomedical Informatics, 43: 224-232.
Kirakowski J, Porteous M, Corbett M (1992). How to use the
software usability measurement inventory: the users' view of
software quality. In: Proceedings European Conference on
Software Quality, Madrid.
Lewis JR (1991). Psychometric evaluation of an after-scenario
questionnaire for computer usability studies: the ASQ.
SIGCHI Bulletin, 23, 1: 78-81,.
Lund A (2001). Measuring usability with the USE questionnaire.
Usability and User experience Newsletter of the STC Usability
SIG.
http://www.stcsig.org/usability/newsletter/0110_measuring_wi
th_use.html
10. 18
Rubin J, Chisnell D (2008). Handbook of usability testing. How
to plan, design, and conduct effective tests. Wiley Publishing:
Indianapolis, IN.
Sauro J, Dumas JS (2009). Comparison of Three One-
Question, Post-Task Usability Questionnaires.
http://www.measuringusability.com/papers/Sauro_Dum
as_CHI2009.pdf
Shneiderman B (1987). Designing the User Interface: Strategies
for Effective Human-Computer Interaction. Addison-Wesley
Publishing Co., Reading, MA.;
Sitting DF, Kuperman GJ, Fiskio J (1999). Evaluating physician
satisfaction regarding user interactions with an electronic
medical record system. Proceeding AMIA, Annual
Symposium, 400-404.
Thayer A, Dugan TE (2009). Achieving Design Enlightenment:
Defining a New User Experience Measurement Framework.
IEEE International Professional Communication Conference,
2009.