Presented by Jeremy Wyatt DM FRCP ACMI Fellow Leadership chair in eHealth research, University of Leeds; Clinical Adviser on New Technologies, Royal College of Physicians, London
At: What Works in Digital Health? University of Glasgow, 23-24 July 2015.
http://www.sicsa.ac.uk/events/sicsa-ux-mhealth-works-digital-health/
How to evaluate and improve the quality of mHealth behaviour change tools
1. Yorkshire Centre for
Health Informatics
How to evaluate & improve the quality
of mHealth behaviour change tools ?
Jeremy Wyatt DM FRCP
ACMI Fellow
Leadership chair in eHealth research,
University of Leeds; Clinical Adviser on
New Technologies, Royal College of
Physicians, London
j.c.wyatt@leeds.ac.uk
2. Agenda
• Why mHealth BC tools ?
• Is there a quality problem ?
• What does “quality” mean:
– In general ?
– For mHealth BC tools ?
• What methods might improve quality ?
• How to evaluate the quality of these tools ?
• Some example studies
• Conclusions
3. Why mHealth for behaviour
change ?
1. Face-to-face contacts do not scale
2. Smart phone hardware used by > 75% of adults:
• Cheap, convenient, fashionable
• Inbuilt sensors / wearables allow easy measurements
• Multiple channels: SMS, MMS, voice, video, apps…
3. mHealth software enables:
• Unobtrusive alerts to record data, take action
• Incorporation of BCTs (eg. BCTs present in 96% of
adherence apps, median 2 BCTs)
• Tailoring – makes BC more effective (corrected d=0.16 - SR
of 21 studies on BC websites, Lustria, J H Comm 2013)
4. Why digital channels ?
£8.60
£5.00
£2.83
£0.15
£0
£1
£2
£3
£4
£5
£6
£7
£8
£9
£10
Face to face Letter Telephone Digital
Costin£perencounter
Mean public sector cost per completed
encounter across 120 councils
Source: Cabinet Office Digital efficiency report, 2013
5. A broken market ?
y = 3.7 - 0.1x
R² = 0.016
0
5
10
15
20
25
30
35
40
45
0 5 10 15 20 25 30 35
ApppriceUSdollars
Evidence score: high score means app adheres to US Preventive Service
Task Force guidelines
Price ($US) of 47 smoking cessation apps versus evidence score
(data from Abroms et al 2013)
6. Privacy and mHealth apps
• Permissions requested: use accounts,
modify USB, read phone ID, find files,
full net access, view connections…
• Our study of 80 apps: average of 4
clear privacy breaches for health apps,
only 1 for medical apps
• We know that - we read the Terms &
Conditions ! (this one only 1200 words,
but many much longer…)
•FirstFolioAsYouLikeItPublicDomainPhototakenbyCowardlyLion-FolioSociety
editionof1996
With Hannah Panayiotou & Anam Noel,
Leeds medical students
7. What is “Quality” ?
“The totality of features and characteristics of a
product or service that bear on its ability to satisfy
stated or implied needs“
(ISO 9000:2005 Quality management systems --
Fundamentals and vocabulary)
IE. Quality is about fitness for purpose - a good
quality product complies with client requirements
[That 30-page ISO document costs £96-50!]
8. Possible processes to improve quality of
mHealth tools
Methods Advantages Disadvantages Examples
Wisdom of the
crowd
Simple user
ranking
Hard for users to assess
quality; click factory bias
Current app stores
MyHealthApps
Users apply
quality criteria
Explicit Requires widespread
dissemination; can
everyone apply them ?
RCP checklist
Classic peer
reviewed article
Rigorous (?) Slow, resource intensive,
doesn’t fit App model
47 PubMed articles
Physician peer
review
Timely
Dynamic
Not as rigorous
Scalable ?
iMedicalApps,
MedicalAppJournal
Developer self-
certification
Dynamic Requires developers to
understand & comply;
checklist must fit apps
HON Code ?
RCP checklist
Developer support Resource
light
Technical knowledge
needed
Multitude of developers
BSI PAS 277
CE marking,
external regulation
Credible Slow, expensive, apps don’t
fit national model
NHS App Store,
FDA, MHRA
9. BSI publically accessible standard
277 on health & medical apps
• PAS is not a BSI Standard, but a supportive
framework for developers
• Steering group includes DHACA, BUPA, NHS
AHSN, app developers, notified bodies, HSCIC,
RCP, NIHR mental health technology centre,
patient rep, medical device manufacturer
• Now published on BSI web site – free access
10. Regulation of medical apps by FDA, FCC
If classified as a medical device by FDA a product must
demonstrate efficacy, but:
• Only 100 apps so far classified as a medical device
• Decision to exercise “enforcement discretion” on most medical apps
• So, FDA has not actually banned any apps, yet
However, the Federal Communication Commission has
banned some apps with misleading claims, eg. “Acne Cure”
(no evidence of claimed benefit of iPhone screen backlight)
Sharpe, New England Center for Investigative Reporting, Many health apps are based on
flimsy science at best, and often do not work. Washington Post, November 12th 2012
11. We need to think differently…
Old think New Think
Paternalism: we know & determine what
is best for users
Self determination: users decide what is
best for them
Regulation will eliminate harmful Apps
after release
Prevent bad Apps - help App developers
understand safety & quality
The NHS must control Apps, apply rules
and safety checks
Self regulation by developer community
Consumer choice informed by truth in
labelling
App developers are in control Aristotle’s civil society* is in control
Quality is best achieved by laws and
regulations
Quality is best achieved by consensus
and culture change
The aim of Apps is innovation
(sometimes above other considerations)
App innovation must balance benefits
and risks
An Apps market driven by viral
campaigns, unfounded claims of benefit
An Apps market driven by fitness for
purpose (ISO) & evidence of benefit
• The elements that make up a democratic society, such as freedom of speech, an independent
judiciary, collaborating for common wellbeing
12. User ratings: app display rank versus
adherence to evidence
Study of 47
smoking
cessation
apps (Abroms
et al, 2013)
13. What went wrong with user
rankings and reviews ?
Wall Street Journal Nov. 24, 2013 6:25 p.m.
Inside a Twitter Robot Factory: Fake Activity, Often Bought for Publicity
Purposes, Influences Trending Topics
By JEFF ELDER
One day earlier this month, Jim Vidmar bought 1,000 fake Twitter accounts for $58 from
an online vendor in Pakistan. Mr. Vidmar programs accounts like these to "follow" other
Twitter accounts, and to rebroadcast tweets…
http://on.wsj.com/18A2hr9
14. Promoting quality in the
marketplace
Quality is defined by ISO as “fitness for purpose”
However, app users and purposes vary, so a simple good /
bad quality mark is insufficient
We need a checklist of optional criteria to empower users /
clinicians / commissioners / app developers
Select the criteria you need according to complexity of the
app and contextual risk (Lewis & Wyatt, JMIR 2014)
Labelling apps will make quality explicit, adding a new
Darwinian selection pressure to the apps marketplace
15. Risk Framework for mHealth apps
Context
in which
app used
Lewis T,
Wyatt JC.
JMIR 2014
16. Our draft quality criteria for apps
based on Donabedian 1966
Structure = the app development team, the
evidence base, use of an appropriate behaviour
change model etc. …
Processes = app functions: usability, accuracy etc.
Outcomes = app impacts on user knowledge & self
efficacy, user behaviours, resource usage
Wyatt JC, Curtis K, Brown K, Michie S,. Submitted to Lancet
17. Structure: what should a high
quality App include ?
Appropriate sponsor & developer skills
Appropriate components:
• An appropriate underlying health promotion theory
• A sensible choice of behaviour change techniques
• Good user interface
• Accurately programmed algorithms
• Adequate security
• Accurate knowledge from a reliable source – NICE ?
Study method: independent expert review
18. Process: does the App work well ?
Suggested measures:
• App download rates [surrogate for user recognition]
• App usage rates – immediately after download & later
• Time taken to enter data, receive report
• Acceptability, usability
• Accuracy of calculations, data
• Appropriateness of any advice given
Study method: lab tests and surveys
19. Case study of CVD risk apps:
methods
• Assembled search terms: heart attach, cardiovascular
disease etc.
• Searched for & downloaded all iPhone / iPad free & paid
apps that public might use to assess personal risk
• Assembled 15 scenarios varying in risk from 1% to 98%
• Assessed the risk figure, format and advice given by each
app
• Definition of error: App above 20% & GS below 20%, or vice
versa (old NICE guidance - or 10% - new guidance)
With Hannah Cullumbine & Sophie Moriarty,
Leeds medical students
20. Overall results
• Located 21 apps, only 19 (7 paid)
gave figures
• All 19 communicated risk using
percentages (cf. advice from
Gigerenzer, BMJ 2004)
• One app said see your GP every
time; none of the rest gave advice
• Some apps refused to accept key
data, eg. age > 74, diabetes Heart Health App
21. Misclassification rates for 20%
threshold
• Varied from 7% to 33%
• 5 (26%) of 19 apps misclassified 25% or more of the scenarios, 3 (16%)
misclassified 20-24%; 8 (42%) misclassified at least a fifth of the scenarios
• Median error rate free 13%, paid 27%
• Error rate for free apps significantly less than for paid apps (p = 0.026, M-
W U test 13.5)
0%
5%
10%
15%
20%
25%
30%
35%
Misclassificationrate
App misclassification rate (20%
threshold; paid = dark blue)
22. Assessing app accuracy
• Only applies to apps that give advice, calculate a
risk / drug dose etc.
• Need a gold standard for correct advice / risk
[QRisk2 in our case]
• Need a representative case series, or plausible
simulated cases
• Ideally, users should enter case data – or their own
data
• How accurate is “accurate enough”:
– Accurate enough to get used ?
– Accurate enough to encourage user to take action ?
23. Intervention modelling
experiments
Aim: to check intervention before expensive large scale study
(MRC Framework: Campbell BMJ 207)
What to measure:
• acceptability, usability
• accuracy of data input by users, accuracy of output
• whether users correctly interpret output
• stated impact of output on decision, self efficacy, action
• users’ emotional response to output
• user impressions & suggested improvements
24. Example IME: How to make prescribing
alerts more acceptable to doctors ?
Background: interruptive alerts annoy doctors
Randomised IME in 24 junior doctors, each viewing 30
prescribing scenarios, with prescribing alerts presented in
two different ways
Same alert text presented as modal dialogue box
(interruptive) or on ePrescribing interface (non-
interruptive)
Funded by Connecting for Health, carried out by
Academic F2 doctor
Published as Scott G et al, JAMIA 2011
27. Outcomes – the bottom line
What is the impact of a health promotion App on:
– User knowledge and attitudes
– Self efficacy
– Short term behaviour change
– Long term behaviour change
– Individual health-related outcomes
– Population health
Study method: randomised controlled trials (eg. My
Meal Mate App - Carter et al JMIR 2013)
surrogate
outcomes
28. Online RCT on Fogg’s persuasive technology
theory & NHS organ donation register sign up
Persuasive features:
1. URL includes https, dundee.ac.uk
2. University Logo
3. No advertising
4. References
5. Address & contact details
6. Privacy Statement
7. Articles all dated
8. Site certified (W3C / Health on Net)
Nind, Sniehotta et al 2010
29. Why bother with impact evaluation –
can’t we just predict the results ?
No, the real word of people and organisations is too
messy / complex to predict whether technology works:
• Diagnostic decision support (Wyatt, MedInfo ‘89)
• Integrated medicines management for a children’s hospital
(Koppel, JAMA 2005)
• MSN messenger for nurse triage (Eminovic, JTT 2006)
30. A proposed evaluation cascade for
mHealth Apps
Area Topics Methods
Source • Purpose, sponsor
• User, cost
Inspection
Safety • Data protection
• Usability
Inspection
HCI lab / user tests
Content • Based on sound evidence
• Proven behaviour change methods
Inspection
Accuracy • Calculations
• Advice
Scenarios with gold
standard
Potential
impact
• Ease of use in the field
• Understanding of output
Intervention
modelling
experiments
Impact • Knowledge, attitudes, self-efficacy
• Health behaviours, outcomes
Within-subject expts
Field trials
31. Conclusions
1. The quality of mHealth tools varies too much
2. User & professional reviews, developer self-certification
and regulation are not enough
3. To help reduce “apptimism” and strengthen other
strategies, we need to agree quality criteria, evaluate
apps against them, & label apps with the results
4. We have the evaluation methods (eg. rating quality of
evidence, usability / accuracy studies, RCTs)
5. This will support patients, health professionals and app
developers to maximise the benefits of mHealth