I’d like t talk to you today about the idea of evidence farming, with thanks to xxx and xxx for originating this concept.
When we ask “does it work?” we’re really asking about the strength of assocation between a given exposure, or combination of exposurs, and a given outcome. For example, is Text4Baby is associated with increased breastfeeding. Usually, we’re asking the question at the population level, is Text4Baby associated with increased breastfeeding for the 100,000-plus women who use it. At other times, we’re interested in the question at the individual level -- is Text4Baby associated with increased breatsfeeding for this particular woman?
Right now, the most common evaluation approach is the randomized controlled trial, in which a group of subjects are randomized to prespecified interventions and followed for prespecified outcomes. The goal is to confirm a hypothesis at the population level, for example that use of an asthma app will reduce ER visits in the following year. Although they have strong internal validity, RCTs are slow and difficult to fund and set-up, they are expensive, which means trials are often small and of short duration as we heard yesterday. And because of their highly structured and restricted nature, RCTs also often lack relevance to the real world.
Another common approach is data mining. Throw all the data from the electronic health record, or mHealth apps, together, and “mine” for associations between exposures and outcomes. The goal of data mining is to generate hypotheses at the population level, but if the exposures or outcomes you’re interested in weren’t already collected by the source systems, you’re out of luck. Most importantly though, this kind of data is observational data from the care process, which is not complete or systematic so whatever associations you find are going to have weak internal validity.
If you want to find out whether something works for a paricular individual, the only formal method is the N-of-1 study. How many of you heard of N-of-1 studies? Used them? This is a within-subject multiple crossover design that randomizes the individual to an alternating sequence of interventions, and measures outcomes along the way. So here, a user is randomized to using either the app, then not, then the app or the other way around. And peak flow is measured to see whether the app works for this patient or not. This method is complicated, analysis is difficult, little known, and not widely used. But there are examples of this method leading to improved outcomes and cost savings.
As different as these methods are, they all share a common attitude towards evidence, which is that evidence is something to be extracted from the care process, either by mining it from the data or directly manipulating the care process with rigid and pre-defined protocols. If this evidence extraction is done without much regard for the realities of real clinical practice, it can feel like…
evidence strip mining, not terribly friendly to patient or clinician.
An alternative frame is evidence farming, in which we think of evidence not as something to be extracted from the cre process, but as something to be cultivated in a sustainable way as part of the care process. We should involve patients and clinicians in the cultivation of evidence (using patient to include both kingdom of the well and the sickl), by testing interventions and outcomes that matter to us, so we have a personal stake in the findings.
In this framing, data mining for interesting associations is like taking your pig to the forest to look for truffles.
RCTs is industrial evidence farming, evidence by and for the masses. But it should be organic industrial farming so to speak, with RCTs that are what we call pragmatic, that have less stringent inclusion criteria that test common interventions used in common ways. We also need more RCTs that test adaptive treatment strategies, and that are able ot evaluate interventions like apps that change over the course of the study. And patients should have more input into the study design.
But one of the things we really want to know in mHealth is “does this app work for me??!” and for that we need to cultivate our own personal evidence gardens, where apps should have a built-in function for guiding me to systematically find out if something is working for me or not. And it’s a personal garden because I should choose which exposures I test, and which outcomes I track.
I have a patient whose passion is competitive ballroom dancing. She has moderate asthma and wants to minimize her use of inhaled steroids. Her N-of-1 study might look like this: testing standing use of Flovent steroid inhaler vs on an as needed basis, and seeing how that impacts on her dancing.
Crowdsourcing can help users find out what exposures and outcomes others are concerned about, and can also help researchers and developers discover more patient-centered outcomes.
In fact, just like we have a food macrosystem, we have an evidence macrosystem that like food, should be balanced, sustainable, and good for us. Right now, the mHealth evidence macrosystem is weak throughout. We need data liquidity so we can mine across larger more comprehensive data sets. We need more pragmatic and relevant RCTs, but perhaps most importantly, we need to develop and support new methods of evaluation that can generate more personal evidence more quickly. There are also methods for aggregating N-of-1 studies to fill in the gap between personal evidence and RCTs that Eric Hekler mentioned this morning.
And if we do develop these new methods, how can we do evaluation at scale? I would say, by building support for a robust evidence macrosystem right into a common open infrastructure for mhealth.
Currently, mHealth is built in a stovepipe manner with little data sharing and interoperability, each app siting in it’s own silo. Yesterday, we heard about some very successful approaches, but each of us will have to reimplement them in our own apps. This limits the efficiency and the impact of quality mHealth. Of course, traditional enterprise health IT is all about silos.
But let’s do something different in mHealth! You know, the Internet took off after it adopted what came to be called the hourglass model, where TCP/IP was the narrow waist that was standardized and made open, and that reduced duplication, spurred innovation both above and below that narrow waist, and spawned many commercial and non-profit ventures.
I’m working with Deborah Estrin, a computer scientist from UCLA on openmHealth.org, a project to catalyze a phase transition of mHealth from stovepipe to a narrow waist open architecture.
These shared modules would include modules for usage analytics. A data sharing platform so we can aggregate and mine data across apps. We need shard modules for supporting major components of RCTs. Modules for scripting and analyzing individualized N-of-1 protocols, and other novel analysis approaches.
We need support for using social media approaches for discovery of exposures and outcomes that matter, and shared libraries of validated measures and instruments, like the PROMIS instruments developed by the NIH. There’s also huge opportunity for developing and sharing measures to get at finer-grained mechanisms based on theoretical models of behavior change, etc.
Evaluation is an important part of our openmHealth architecture but not the only part. Other parts include data collection and presentation modules, other analysis and data management services, authoring tools and interfaces to external applications and devices, etc. Our approach is to define a component architecture with well-defined APIs, and to build a reference implementation so that we can start to “tip” mHealth away from silos into an open ecosystem. We are particularly interested in open for apps targeting underserved populations.
Our goal for mHealth evidence Is to establish a learning community coupled with an open technical architecture so that we can broadly, rapidly, and iteratively disseminate both evaluation methods and findings that matter. It is well within our power as a community to do this.
Please visit us at openmhealth.org.
11 am sim
Evidence Farming 1 : Implications for Open Architecture Ida Sim, MD, PhD Director, Center for Clinical and Translational Informatics University of California San Francisco May 5, 2011 1 With thanks to Rich Kravitz MD, UC Davis and Naihua Duan, Columbia
Rephrasing “Does it Work?” (Complexes of) Exposures Outcome strength of association? individual population Increased breastfeeding Text4Baby
Current Approaches: RCT <ul><li>Tests prespecified interventions and outcomes </li></ul><ul><li>To confirm a hypothesis at the population level </li></ul><ul><li>Strong internal validity </li></ul><ul><li>Problems: slow to set-up, expensive, short-term, lack relevance to the real world </li></ul>ER visits at 1 year 50 people population 100 people ER visits at 1 year 50 people Asthma App Usual Care
Current Approaches: Data Mining <ul><li>Exposures and outcomes from care process systems </li></ul><ul><li>To generate hypotheses at the population level </li></ul><ul><li>Problems: limited to data collected, weak internal validity (data not complete or systematic) </li></ul>population Exposures Outcomes ? EHR Apps
Current Approaches: N-of-1 Studies <ul><li>Within-subject multiple crossover </li></ul><ul><li>Only formal method for determining individual treatment effectiveness </li></ul><ul><li>Problems: complicated to set up, analysis is difficult, little known, not widely used </li></ul>individual peak flow peak flow Usual Care Asthma app Asthma app Usual Care Asthma app Usual Care
<ul><li>Evidence is something to be extracted from the care process </li></ul><ul><ul><li>mining it from the data </li></ul></ul><ul><ul><li>directly manipulating the care process with rigid and pre-defined protocols </li></ul></ul>Evidence Extraction
Stovepiped mHealth <ul><li>Health apps built independently </li></ul><ul><ul><li>little data sharing and interoperability </li></ul></ul><ul><li>Limits efficiency and impact of quality mHealth </li></ul>
Internet Hourglass Model <ul><li>Standardize and make open the “narrow waist” </li></ul><ul><li>Reduces duplication, spurs community innovation, supports commercial and non-profit uses </li></ul>
OpenmHealth.org Estrin DE, Sim I. Science; 330: 759-60. 2010.
<ul><li>The waist should support the evidence macrosystem </li></ul>OpenmHealth.org
Open Architecture for an Evidence Macrosystem <ul><li>Modules for usage analytics </li></ul><ul><ul><li># of text messages, # of sessions, etc. </li></ul></ul><ul><li>Rooting for (glocal) evidence </li></ul><ul><ul><li>data sharing with shared syntax and semantics </li></ul></ul><ul><li>Industrial farming, e.g., with RCTs </li></ul><ul><ul><li>modules for informed consent, randomization, adaptive treatment strategy, mixed methods, etc. </li></ul></ul><ul><li>Personal evidence gardening, e.g., N-of-1 </li></ul><ul><ul><li>modules for scripting and analyzing individualized N-of-1 protocols, etc. </li></ul></ul>
Open Architecture for an Evidence Macrosystem <ul><li>Social media for discovery of exposures and outcomes that matter </li></ul><ul><li>Shared libraries of validated measures and instruments (e.g., PROMIS) </li></ul><ul><ul><li>measures that get at finer-grained mechanisms based on theoretical models of change, etc. </li></ul></ul>
Goal for mHealth Evidence <ul><li>A learning community coupled with an open architecture for broad, rapid, and iterative dissemination of evaluation methods and findings that matter </li></ul>