The document discusses conducting a summative usability study of an electronic health record (EHR) system. It notes the challenges in testing EHR usability at scale given the variety of users and tasks. The case study outlines how a company scoped their test to focus on common clinician tasks and representative users. They developed a standardized training approach, used de-identified realistic test data, and implemented best practices like multiple pilot tests and moderator preparation. The summary provides an overview of how the company met the challenges of EHR usability testing.
1. Conducting a Summative Study of EHR
Usability: Case Study from athenahealth
Kris Engdahl
May 7, 2012
2.
3. How do we define EHR usability?
What is an electronic health record (EHR)?
• Electronic version of paper charting, plus capacity for electronic data
exchange; managed by healthcare professionals
• Used by single-physician practices, large healthcare networks, and everything
in between
• Different from a personal health record, which is managed by the patient
What is usability? (Choose your definition – here is the NIST definition)
• ISO 9421-11: “The extent to which a product can be used by specified users to
achieve specified goals with effectiveness, efficiency and satisfaction in a
specified context of use.”
4. Why is EHR Usability such a hot topic?
Healthcare is a large industry
• $2.5 trillion, 14.3 million jobs, 17.3% of GDP
It affects all of us
• We are all once and future patients
• We pay for everyone’s healthcare, either through insurance or taxes
Recent healthcare reform legislation encourages the adoption of EHRs
• Health Information Technology for Economic and Clinical Health (HITECH) Act
includes incentives for “meaningful use” of EHRs
• Office of the National Coordinator (ONC) is paying attention to the field of
usability as it evaluates EHRs for certification
Poor EHR usability can put patients at risk
• Increasing attention is being paid to patient safety in regards to EHR usability
Healthcare providers are demanding better EHR usability
5. Challenges in EHR Usability Testing
Who do we test with?
• Who are “representative” users?
• How do we get access to them?
What tasks do we test?
• Different kinds of users have widely varying tasks
What product “version” do we test?
• Most EHRs are highly customized for individual practices
• How useful is a test of a “generic” version?
What data do we use for testing?
• Participants are always distracted by unrealistic data – so it has to be realistic
• Real data is protected by HIPAA
It is challenging, BUT…
6.
7. Meeting the challenge
Know why you’re testing
• Why do a summative test?
• What will you do with the data?
Manage the scope
• How much time do you have?
• What kinds of resources do you have?
• How much can you reasonably do?
Prepare thoroughly
• Consider all the pieces
• Plan for logistics and timing
• Prepare the testing team
Do it!
8.
9. Scope: You could do this forever
Testing an EHR could mean testing everything all these people do
And then there are the other dimensions that make more user groups
• Age, specialty, tech-savviness, domain experience, etc.
http://www.bls.gov/oes/current/oes290000.htm
10. Knowing why helped us scope our test
Why we conducted a summative study
• Planning was underway for 2012, with UX changes in it
• We wanted to be able to quantify the improvements we were embarking on
• This would be the “before” measurement
This determined the tasks and users we selected
• We focused on areas that we intended to re-examine in upcoming releases
We had completed a heuristic review and a patient safety review
We had identified areas for UX work in 2012
• We focused on clinicians’ tasks in an ambulatory setting
• We focused on tasks that are common to a number of different specialties
• We presented the list, with time limits, to key stakeholders for prioritization
11. Here’s a rough view of an internist’s
work in an office visit
Source: http://www.usabilityprofessionals.org/upa_publications/jus/2009february/JUS_smelcer_Feb2009.pdf
12. Scoping the recruit
Challenge: Recruit “representative” users
• Clinicians vary by specialty, practice size, age, experience, gender, EHR
knowledge
• Just how many do you need to recruit, anyway?
How we narrowed the list
• We screened for clinicians – MD, DO, NP, PA
• We asked for a mix of specialty, gender, years of experience, and practice size
• We asked for people who used EHRs but had not used ours
• We screened for people who commonly did tasks we were testing
Recruiting itself
• Hired a professional recruiting firm
• We paid participants through the recruiter
• We recruited 22 hoping to get 20 participants
13. Determining the environment to test
Challenge: Most EHRs are highly customized by clients
• Installed EHRs have lots of custom programming
• Our cloud-based EHR is highly configurable
• And it changes every month
Challenge: Data has to be realistic, but not real
• Participants will be distracted by incomplete or incorrect data
• It is wrong (and illegal) to use actual medical data
What we did
• We modeled the environment we tested with on an actual client environment
• We chose a practice that had very little configuration (as “vanilla” as possible)
• We scrambled the data from the practice so all records were deidentified
• We arranged to be able to copy a our set-up test environment for each
participant
14.
15. Managing participant training
Challenge: EHRs are not walk-up-and-use applications
• Most implementations include 2-5 days of training
Challenge: We had 5 moderators
• With varying expertise with the application
• Any 5 people will say things differently
What we did
• We worked with Training professionals to develop a training script
• ~8 minutes long
• Covered key concepts / areas of the interface
• Walked through the tasks that we tested
• Each moderator followed the script, so each participant had the same training
• We printed out the screen shots from the training walkthrough as “Help”
16. Challenge: What do you measure?
Note: This data is for illustration only
17.
18. Start with a reasonable scope
Know how your data will be used
• What are you comparing?
• Who will use the data, and how?
Prioritize tasks and user groups
• Most common tasks
• Critical tasks
• Tasks that carry patient safety risks
• “Disparity-oriented use cases” (NISTIR 7769)
• NIST has some user scenarios in NISTIR 7804
Determine a reasonable sample size
• How much money do you have?
• How much time do you have?
• (see “who will use the data, and how” above)
19. Determine how realistic you can get
Balance customization with comparability
• Least configuration / customization?
• “Typical” configuration / customization?
Be sure to get realistic – but not real - data
• Creating realistic data from scratch will be time- and knowledge-intensive
• Patients’ real data is covered by HIPAA
• Would be nice if NIST had importable patient charts for their scenarios
Decide how to handle training
• How much training do users normally get with what you’re testing?
• How much time can you get with participants?
• Can you develop customized training for the tasks you will test?
20. Plan for a specialized recruit
Prepare the screener(s)
• NISTIR 7804 has a sample screener
• You may need more questions, depending on tasks and users
Allow time for recruiting
• The more specific your screener, the more time you need
• Book your professional recruiter in advance
Don’t skimp
• On recruiting
• On incentives
Be flexible to accommodate participants
• Medical people are wicked busy
• Be prepared to test early in the morning and in the evening
21. Prepare your moderators
Get any equipment you need ahead of time
• Technical problems tend to happen when you least expect them
• Plan to have backup plans for everything
Schedule time to train moderators
• Product: Effective paths, ineffective paths (and ways back), accellerators
• Test script: Starting points, time limits, what to look for
Plan for multiple pilot sessions
• Ideally at least one for each moderator, with everyone watching
• Discussions ahead of time about success and errors
Identify likely errors
Get consensus on definitions of success
• Discussions on how to handle possible “situations”
22. Use good test hygiene
Schedule sessions reasonably
• Time between sessions, for rest and reset
• Do not overwork moderators
• Make sure moderators eat and sleep
Normalize observations and analysis
• Encourage multiple observers
• Document decisions about success and errors
• Do a consistency pass of all recordings
• Double-check data storage, analysis, and statistics
23.
24. References
General info on healthcare industry:
• http://www.bls.gov/oco/cg/cgs035.htm
• http://www.bnet.com/blog/healthcare-business/health-spending-hits-173-percent-of-gdp-in-largest-
annual-jump/1117
Forecasts on health care spending
• http://facts.kff.org/chart.aspx?ch=944
• http://gmj.gallup.com/content/111778/other-700-billion-question.aspx
What’s an EHR?
• http://www.ama-assn.org/ama/pub/physician-resources/health-information-technology/health-it-
basics/emrs-ehrs.page
Articles on EHR Usability
• http://www.usabilityprofessionals.org/upa_publications/jus/2009february/JUS_smelcer_Feb2009.pdf
Health Insurance Portability and Accountability Act of 1996 (HIPAA) privacy:
• http://www.hhs.gov/ocr/privacy/
25. References
Government documents on EHR usability
• NIST site on the usability of Healthcare IT: http://www.nist.gov/healthcare/usability/
NISTIR 7769 – Human Factors Guidance to Prevent Healthcare Disparities with the Adoption of EHRs
NISTIR 7741 - NIST Guide to the Processes Approach for Improving the Usability of Electronic Health
Records
NISTIR 7742 – Customized CIF Format Template for Electronic Health Record Testing
NISTIR 7743 – Usability in Health IT: Technical Strategy, Research, and Implementation
NISTIR 7804 – Technical Evaluation, Testing, and Validation of the Usability of Electronic Health
Records
• AHRQ articles:
http://healthit.ahrq.gov/portal/server.pt/gateway/PTARGS_0_1248_907505_0_0_18/09(10)-0091-2-
EF.pdf
http://healthit.ahrq.gov/portal/server.pt/gateway/PTARGS_0_1248_907504_0_0_18/09(10)-0091-1-
EF.pdf
Information on EHR usability and patient safety
• http://www.bmj.com/content/330/7491/581.full
• http://www.useit.com/alertbox/20050411.html
• http://www.jointcommission.org/sentinel_event_alert_issue_42_safely_implementing_health_information
_and_converging_technologies/
Editor's Notes
Welcome, thank you, introductions
What is athenahealth?athenahealthis a watertown based company delivering cloud-based solutions for practice management, electronic health records, patient communication and care coordination.User Experience introduced to athenahealth in 2008Backed by CEO, embedded in R&D processes, tackling really interesting UX problemsHas grown to a team of 22, providing: User experience designUser researchPatient safetyProduct copyThe company and the team are still growing (Come talk to us!)
Here’s where we left of from last year’s conference. In the next slides, we’ll talk about how athenahealth’s usability team approached these challenges.
It is possible to conduct usability studies on EHRs.We routinely conduct formative studies, along with field studies and other user research activities, as part of our user centered design process. In addition, in December 2011, we conducted a baseline summative study. So, while it is challenging to conduct a summative usability study, it is possible. We’ve done it.
What does it take to meet the challenges?
(This really should be a much bigger bag, or a much smaller box)
Imagine all the tasks that all of these health care workers do and testing them. Then imagine testing all those tasks with multiple user groups, based on age, specialty, tech-savviness, etc.
The number of tasks that are supported by EHRs is huge – choosing which ones to cover is a big challenge. The first thing to ask is how the data will be used – why are you doing this in the first place?Summative test is really for comparative measurement – comparing two versions of the same thing or two similar things. We conducted our study as a baseline measurement against which we could measure future versions of our EHR. This allowed us to focus on tasks that mattered to us—tasks that our users do, tasks we intended to reexamine during 2012.Most of our clients are in an ambulatory setting, so we tested tasks that often happen in a doctor’s office. We support a variety of specialties, so we picked tasks common to many specialties.As with any test, we proposed some tasks, assigned maximum times for each task, and then presented the list to stakeholders for prioritization
I recreated the slide, removing circles that were part of the original author’s point, but might have implied that these were the tasks we tested)Everyone who’s ever had a visit to the doctor’s office, raise your hand. You have some idea of what the doctor does. This slide, from the Journal of Usability Studies, has a rough task analysis of an office visit for an IM doctor – in case you aren’t familiar with what a doctor does.
How many: Basically, you need an estimate ofthe variance of the dependent measure(s) of interest (typically obtained from previous,similar studies or pilot data) and an idea of how precise the measurement must be(which is a function of the magnitude of the desired minimum critical difference andstatistical confidence level); once you have that, the rest is arithmetic. There are numeroussources for information on standard sample-size estimation [6, 23]. For this reason,I’m not going to describe them in any additional detail here (but for a detailed discussionof this type of sample size estimation in the context of usability testing, see Lewis[14]). (from James Lewis, “Sample Sizes for Usability Tests, Mostly Math, Not Magic”, in Interactions, November-December 2006)
AND THEN…Because we are cloud based, the environment changes every month – including our testing environment.Our EHR is released every monthBecause it’s cloud-based, everyone gets a new version once a monthThis is a summative test, so all participants need the same environment and dataSame fake patient, same fake problemWe needed to replicate the environmentWe arranged to replicate our scrambled, set-up environmentBorrowed some servers from another groupOur Development resource arranged for eight copies that were refreshed every night, and added a way for us to refresh a copy during the day if need be.Each participant for a day used a different instance of the environmentEnvironment setup and the release schedule set boundaries on testing datesWe scheduled test sessions for the last two weeks before the next release
This is not our real data.Task successDefinition: Reached goal within time limit without committing a critical errorSome tasks had partial successCounted number and percentageTime on taskDefinition: From when they started the task to when they declared themselves doneAverage, of successful participants onlyError-free rateError definitions determined during analysis; critical errors were defined as errors that carried patient safety riskNumber of participants who completed the task without errorSome had partial success but no errors – guess howPost-task ease of use ratings, and SUS at the end of all tasksAverage ratings on tasks; SUS
Markers for the next travelers… / Lessons learned
Johan says that cleansed patient data is available from the ONC – but I’m not finding it easily. When we asked him, he sent this link:http://www.va.gov/BLUEBUTTON/docs/VA_My_HealtheVet_Blue_Button_Sample_Version_12_All_Data.txt
Your turn. Write and tell us how it goes. Use the EHR CIF format.