OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM
An Institute of Medicine Workshop sponsored by the Patient-Centered Outc...
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM

Table of Contents
SECTION 1: WORKSHOP FRAMING MATERIALS
• Agenda
• Pla...
SECTION 4: DETECTING TREATMENT EFFECT HETEROGENEITY
• Hlatky, MA, et al. Coronary artery bypass surgery compared with perc...
----------------------------------
Workshop Framing Materials
WorkshopFraming
Materials
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM

An Institute of Medicine Workshop
Sponsored by the Patient-Centered Ou...
2
9:00 am Workshop stage-setting
 Session format
o Workshop overview and stage-setting
Steve Goodman, Stanford University...
3
o What are the most promising approaches to reduction of bias through
the use of statistical methods? Through study desi...
4
o Are the standards for causal inference from OS different when prior
RCTs have been performed? How does statistical met...
5
o What are the best methods to form distinctive patient subgroups in
which to examine for heterogeneity of treatment res...
6
Q&A and open discussion
 Session questions:
o How can patient-level observational data be used to create predictive
mod...
7
12:15 pm Summary and next steps
Comments from the Chairs
Joe Selby, Patient-Centered Outcomes Research Institute
Ralph H...
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM
Workshop Planning Committee
Co–Chairs
Ralph I. Horwitz, MD
Senior Vice P...
Current as of 12pm, April 24
OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSYTEM
April 25-26, 2013
Workshop Participants
Ji...
Current as of 12pm, April 24
Steven N. Goodman, MD, PhD
Associate Dean for Clinical and
Translational Research
Stanford Un...
Current as of 12pm, April 24
Richard Platt, MD, MS
Chair, Ambulatory Care and Prevention
Chair, Population Medicine
Harvar...
Current as of 12pm, April 24
IOM Staff
Claudia Grossmann, PhD
Senior Program Officer
Diedtra Henderson
Program Officer
Eli...
----------------------------------
Engaging the Issue of Bias
EngagingtheIssueofBias
CLINICAL
TRIALS
Clinical Trials 2012; 9: 48–55ARTICLE
Beyond the intention-to-treat in comparative
effectiveness research
...
those shortcomings. Let us start by defining two
types of causal effects that can be estimated in RCTs.
The effect of assi...
treatment Z is biased toward the null. That is, if an
ITT analysis fails to find a toxic effect, there is no
guarantee tha...
equated with efficacy. There is, however, no guaran-
tee that the effect of assigned treatment Z matches
the treatment’s e...
RCT and rather treats them as coming from an
observational study. As a result, an ‘as treated’
comparison will be confound...
can only be interpreted as the effect of treatment A if
the analysis is appropriately adjusted for the con-
founders L. If...
Discussion
An ITT analysis of RCTs is appealing for the same
reason it may be appalling: simplicity. As described
above, I...
Principles for Clinical Trials. Federal Register 1998; 63:
49583–98.
3. Rosenberger WF, Lachin JM. Randomization in Clinic...
VIEWPOINT
Prespecified Falsification End Points
Can They Validate True Observational Associations?
Vinay Prasad, MD
Anupam...
would suggest that an association between PPI use and pneu-
monia initially suspected to be causal is perhaps con-
founded...
PERSPECTIVE
Orthogonal predictions: follow-up questions for suggestive datay
Alexander M. Walker MD, DrPH1,2*
1
World Heal...
cases that did not present as fully recognized instances
of the disease, but which nonetheless represented the
same pathol...
sincerely believing in the importance of independent
replication, they find that they too examine different
dimensions of o...
answersraisenewquestions,differentones,anditmakes
sense to pick out (from the same fire hose spurting facts)
new data that ...
Special Issue Paper
Received 4 November 2011, Accepted 28 August 2012 Published online in Wiley Online Library
(wileyonlin...
P. B. RYAN ET AL.
and reproducible process to efficiently generate evidence to support the characterization of the potentia...
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Observational Studies in a Learning Health System
Upcoming SlideShare
Loading in …5
×

Observational Studies in a Learning Health System

7,862 views

Published on

Published in: Health & Medicine, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,862
On SlideShare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Observational Studies in a Learning Health System

  1. 1. OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM An Institute of Medicine Workshop sponsored by the Patient-Centered Outcomes Research Institute April 25-26, 2013 The National Academies 2101 Constitution Avenue, NW Washington, DC 20418
  2. 2. OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM  Table of Contents SECTION 1: WORKSHOP FRAMING MATERIALS • Agenda • Planning Committee Roster • Participant List SECTION 2: ENGAGING THE ISSUE OF BIAS • Hernan, Miguel A. and Hernandez-Diaz, Sonia. Beyond the intention-to-treat in comparative effectiveness research. Clinical Trials. 9:48–55. 2012. • Prasad, Vinay and Jena, Anupam. Prespecified falsification end points: Can they validate true observational associations? JAMA. 309(3). 2013. • Walker, Alexander A. Orthogonal predictions: Follow-up questions for suggestive data. Pharmacoepidemiology and Drug Safety. 19: 529–532. 2010. • Ryan, PB, et al. Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Statistics in Medicine. 2012. • Lorch, SA, et al. The differential impact of delivery hospital on the outcomes of premature infants. Pediatrics. 130(2). 2012. • Small, Dylan S. and Rosenbaum, Paul R. War and wages: The strength of instrumental variables and their sensitivity to unobserved biases. Journal of the American Statistical Association. 103(483). 2008. • Brookhart, MA, et al. Comparative Mortality Risk of Anemia Management Practices in Incident Hemodialysis Patients.JAMA. 303(9). 2010. • Cornfield, J. Principles of Research. Statistics in Medicine. 31:2760-2768. 2012. SECTION 3: GENERALIZING RCT RESULTS TO BROADER POPULATIONS • Kaizar, Eloise E. Estimating treatment effect via simple cross design synthesis. Statistics in Medicine. 30:2986–3009. 2011. • Go, AS, et al. Anticoagulation therapy for stroke prevention in atrial fibrillation: How well do randomized trials translate into clinical practice? JAMA. 290(20). 2003. • Hernan, MA, et al. Observational studies analyzed like randomized experiments: An application to postmenopausal hormone therapy and Coronary Heart Disease. Epidemiology. 19(6). 2008. • Weintraub, WS, et al. Comparative effectiveness of revascularization strategies. The New England Journal of Medicine. 366(16). 2012.
  3. 3. SECTION 4: DETECTING TREATMENT EFFECT HETEROGENEITY • Hlatky, MA, et al. Coronary artery bypass surgery compared with percutaneous coronary interventions for multivessel disease: A collaborative analysis of individual patient data from ten randomised trials. The Lancet. 373: 1190–97. 2009. • Kent, DM, et al. Assessing and reporting heterogeneity in treatment effects in clinical trials: A proposal. Trials. 11:85. 2010. • Kent, David M. and Hayward, Rodney A. Limitations of applying summary results of clinical trials to individual patients: The need for risk stratification. JAMA. 298(10):1209-1212. 2007. • Basu, A, et al. Heterogeneity in Action: The Role of Passive Personalization in Comparative Effectiveness Research. 2012 • Basu, Anirban. Estimating Person-centered Treatment (PeT) Effects using Instrumental Variables: An application to evaluating prostate cancer treatments. 2013. SECTION 5: PREDICTING INDIVIDUAL RESPONSES • Byar, David P. Why Databases should not replace Randomized Clinical Trials. Biometrics. 36(2): 337-342. 1980. • Lee, KL, et al. Clinical judgment and statistics. Lessons from a simulated randomized trial in coronary artery disease. Circulation. 61:508-515. 1980. • Pencina, Michael J. and D’Agostino, Ralph B. Thoroughly modern risk prediction? Science Translational Medicine. 4(131). 2012. • Tatonetti, NP, et al. Data-driven prediction of drug effects and interactions. Science Translational Medicine. 4(125). 2012. SECTION 6: ORGANIZATIONAL BACKGROUND • IOM Roundtable on Value & Science-Driven Health Care (VSRT) 1. VSRT Background Information and Roster 2. VSRT Charter and Vision 3. Clinical Effectiveness Research Innovation Collaborative Background Information • Patient-Centered Outcomes Research Institute (PCORI) 1. National Priorities for Research and Research Agenda SECTION 7: BIOGRAPHIES AND MEETING LOGISTICS • Planning Committee Biographies • Speaker Biographies • Location, Hotel, and Travel
  4. 4. ---------------------------------- Workshop Framing Materials WorkshopFraming Materials
  5. 5. OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM  An Institute of Medicine Workshop Sponsored by the Patient-Centered Outcomes Research Institute  A LEARNING HEALTH SYSTEM ACTIVITY IOM ROUNDTABLE ON VALUE & SCIENCE-DRIVEN HEALTH CARE APRIL 25-26, 2013 THE NATIONAL ACADEMY OF SCIENCES 2101 CONSTITUTION AVENUE, NW WASHINGTON, DC Day 1: Thursday, April 25th 8:00 am Coffee and light breakfast available 8:30 am Welcome, introductions and overview Welcome, framing of the meeting and agenda overview Welcome from the IOM Michael McGinnis, Institute of Medicine Opening remarks and meeting overview Joe Selby, Patient-Centered Outcomes Research Institute Ralph Horwitz, GlaxoSmithKline Meeting objectives 1. Explore the role of observational studies (OS) in the generation of evidence to guide clinical and health policy decisions, with a focus on individual patient care, in a learning health system; 2. Consider concepts of OS design and analysis, emerging statistical methods, use of OS’s to supplement evidence from experimental methods, identifying treatment heterogeneity, and providing effectiveness estimates tailored for individual patients; 3. Engage colleagues from disciplines typically underrepresented in clinical evidence discussions; 4. Identify strategies for accelerating progress in the appropriate use of OS for evidence generation.
  6. 6. 2 9:00 am Workshop stage-setting  Session format o Workshop overview and stage-setting Steve Goodman, Stanford University Q&A and open discussion  Session questions: o How do OS contribute to building valid evidence to support effective decision making by patients and clinicians? When are their findings useful, when are they not? o What are the major challenges (study design, methodological, data collection/management/analysis, cultural etc.) facing the field in the use of OS data for decision making? Please include consideration of the following issues: bias, methodological standards, publishing requirements. o What can workshop participants expect from the following sessions? 9:45 am Engaging the issue of bias Moderator: Michael Lauer, National Heart Lung and Blood Institute  Session format o Introduction to issue Sebastian Schneeweiss, Harvard University o Presentations:  Instrumental variables and their sensitivity to unobserved biases Dylan Small, University of Pennsylvania  An empirical approach to measuring and calibrating for error in observational analyses Patrick Ryan, Johnson & Johnson o Respondents and panel discussion:  John Wong, Tufts University  Joel Greenhouse, Carnegie Mellon University Q&A and open discussion  Session questions: o What are the major bias-related concerns with the use of observational study methods? What are the sources of bias? o How many of these concerns relate to methods and how much to the quality and availability of suitable data? What barriers have these concerns created for the use of the results of observational studies to drive decision-making?
  7. 7. 3 o What are the most promising approaches to reduction of bias through the use of statistical methods? Through study design (e.g. Dealing with issues of multiplicity)? o What are the circumstances under which administrative (claims) data can be used to assess treatment benefits? What data are needed from EHRs to strengthen the value of administrative data? o What methods are best to adjust for the changes in treatment and clinical conditions among patients followed longitudinally? o What are the implications of these promising approaches for the use of observational study methods moving forward? 11:30 pm Lunch Participants will be asked to identify among their lunch table, what they think the most critical questions are for PCOR in the topics covered by the workshop. These topics will them be circulated to the moderators of the proceeding sessions. 12:30 pm Generalizing RCT results to broader populations Moderator: Harold Sox, Dartmouth University  Session format o Introduction to issue Robert Califf, Duke o Presentations:  Generalizing the right question Miguel Hernan, Harvard University  Using observational studies to determine RCT generalizability Eloise Kaizar, Ohio State o Respondents and panel discussion:  William Weintraub, Christiana Medical Center  Constantine Frangakis, Johns Hopkins University Q&A and open discussion  Session questions: o What are the most cogent methodological and clinical considerations in using observational study methods to test the external validity of findings from RCTs? o How do data collection, management, and analysis approaches impact generalizability? o What are the generalizability questions of greatest interest? Or, where does the greatest doubt arise? (Age, concomitant illness, concomitant treatment) What examples represent well established differences? o What statistical methods are needed to generalize RCT results?
  8. 8. 4 o Are the standards for causal inference from OS different when prior RCTs have been performed? How does statistical methodology vary in this case? o What are the implications when treatment results for patients not included in the RCT differ from the overall results reported in the original RCT? o What makes an observed difference in outcome credible? Finding the RCT-shown effect on the narrower population? Replication in >1 environment? Confidence interval of the result? Size of the effect in the RCT? o Can subset analyses in the RCT, even if underpowered, be used to support or rebut the OS finding? 2:15 pm Break 2:30 pm Detecting treatment-effect heterogeneity Moderator: Richard Platt, Harvard Pilgrim Health Care Institute  Session format o Introduction to issue David Kent, Tufts University o Presentations:  Comparative effectiveness of coronary artery bypass grafting and percutaneous coronary intervention Mark Hlatky, Stanford University  Identification of effect heterogeneity using instrumental variables Anirban Basu, University of Washington o Respondents and panel discussion:  Mary Charlson, Cornell University  Mark Cullen, Stanford University Q&A and open discussion  Session questions: o What is the potential for OS in assessing treatment response heterogeneity and individual patient decision-making? o What clinical and other data can be collected routinely to affect this potential? o How can longitudinal information on change in treatment categories and clinical condition be used to assess variation in treatment response and individual patient decision-making?  What are the statistical methods for time varying changes in treatment (including co-therapies) and clinical condition
  9. 9. 5 o What are the best methods to form distinctive patient subgroups in which to examine for heterogeneity of treatment response?  What data elements are necessary to define these distinctive patient subgroups? o What are the best methods to assess heterogeneity in multi-dimensional outcomes? o How could further implementation of best practices in data collection, management, and analysis impact treatment response heterogeneity? o What is needed in order for information about treatment response heterogeneity to be validated and used in practice? 4:15 pm Summary and preview of next day 4:45 pm Reception 5:45 pm Adjourn ********************************************* Day 2: Friday, April 26th 8:00 am Coffee and light breakfast available 8:30 am Welcome, brief agenda overview, summary of previous day Welcome, framing of the meeting and agenda overview 9:00 am Predicting individual responses Moderator: Ralph Horwitz, GSK  Session format o Introduction to issue Burton Singer, University of Florida o Presentations:  Data-driven prediction models Nicholas Tatonetti, Columbia University  Individual prediction Michael Kattan, Cleveland Clinic o Respondents and panel discussion:  Peter Bach, Sloan Kettering  Mitchell Gail, National Cancer Institute
  10. 10. 6 Q&A and open discussion  Session questions: o How can patient-level observational data be used to create predictive models of treatment response in individual patients? What statistical methodologies are needed? o How can predictive analytic methods be used to study the interactions of treatment with multiple patient characteristics? o How should the clinical history (longitudinal information) for a given patient be utilized in the creation of prediction rules for responses of that patient to one or more candidate treatment regimens? o What are effective methodologies for producing prediction rules to guide the management of an individual patient based on their comparability to results of RCTs, OS, and archived patient records? o How can we blend predictive models, which can predict impact of treatment choices, and causal modeling, that compare predictions under different treatments? 11:00 am Conclusions and strategies going forward Panel members will be charged with highlighting very specific next steps laid out in the course of workshop presentations and discussions and/or suggesting some of their own.  Panel: o Rob Califf , Duke University o Cynthia Mulrow, University of Texas o Jean Slutsky, Agency for Healthcare Quality and Research o Steve Goodman, Stanford University  Session questions: o What are the major themes and conclusions from the workshop’s presentations and discussions? o How can these themes be translated into actionable strategies with designated stakeholders? o What are the critical next steps in terms of advancing analytic methods? o What are the critical next steps in developing data bases that will generate evidence to guide clinical decision making. o What are critical next steps in disseminating information on new methods to increase their appropriate use? 10:45 am Break
  11. 11. 7 12:15 pm Summary and next steps Comments from the Chairs Joe Selby, Patient-Centered Outcomes Research Institute Ralph Horwitz, GlaxoSmithKline Comments and thanks from the IOM Michael McGinnis, Institute of Medicine 12:45 pm Adjourn ******************************************* Planning Committee Co–Chairs Ralph Horwitz, GlaxoSmithKline Joe Selby, Patient-Centered Outcomes Research Institute Members Anirban Basu, University of Washington Troy Brennan, CVS/Caremark Louis Jacques, Centers for Medicare & Medicaid Services Steve Goodman, Stanford University Jerry Kassirer, Tuft University Michael Lauer, National Heart Lung and Blood Institute David Madigan, Columbia University Sharon-Lise Normand, Harvard University Richard Platt, Harvard Pilgrim Health Care Institute Robert Temple, Food and Drug Administration Burton Singer, University of Florida Jean Slutsky, Agency for Healthcare Research and Quality Staff officer: Claudia Grossmann cgrossmann@nas.edu 202.334.3867
  12. 12. OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSTEM Workshop Planning Committee Co–Chairs Ralph I. Horwitz, MD Senior Vice President, Clinical Sciences Evaluation GlaxoSmithKline Joe V. Selby, MD, MPH Executive Director PCORI Members Anirban Basu, MS, PhD Associate Professor and Director Health Economics and Outcomes Methodology University of Washington Troyen A. Brennan, MD, JD, MPH Executive Vice President and Chief Medical Officer CVS Caremark Steven N. Goodman, MD, PhD Associate Dean for Clinical & Translational Research Stanford University School of Medicine Louis B. Jacques, MD Director Coverage and Analysis Group Centers for Medicare & Medicaid Services Jerome P. Kassirer, MD Distinguished Professor Tufts University School of Medicine Michael S. Lauer, MD, FACC, FAHA Director Division of Prevention and Population Sciences National Heart, Lung, and Blood Institute David Madigan, PhD Chair of Statistics Columbia University Sharon-Lise T. Normand, PhD, MSc Professor Department of Biostatistics and Health Care Policy Harvard Medical School Richard Platt, MD, MS Chair, Ambulatory Care and Prevention Chair, Population Medicine Harvard University Burton H. Singer, PhD, MS Professor Emerging Pathogens Institute University of Florida Jean Slutsky, PA, MS Director Center for Outcomes and Evidence Agency for Healthcare Research and Quality Robert Temple, MD Deputy Director for Clinical Science Centers for Drug Evaluation and Research Food and Drug Administration
  13. 13. Current as of 12pm, April 24 OBSERVATIONAL STUDIES IN A LEARNING HEALTH SYSYTEM April 25-26, 2013 Workshop Participants Jill Abell, PhD, MPH Senior Director, Clinical Effectiveness and Safety GlaxoSmithKline Joseph Alper Writer and Technology Analyst LSN Consulting Naomi Aronson Executive Director Blue Cross, Blue Shield Peter Bach, MD, MAPP Attending Physician Department of Epidemiology & Biostatistics Memorial Sloan-Kettering Cancer Center Anirban Basu, MS, PhD Associate Professor and Director Program in Health Economics and Outcomes Methodology University of Washington Lawrence Becker Director, Benefits Xerox Corporation Marc L. Berger, MD Vice President, Real World Data and Analytics Pfizer Inc. Robert M. Califf, MD Vice Chancellor for Clinical Research Duke University Medical Center Mary E. Charlson, MD Chief, Clinical Epidemiology and Evaluative Sciences Research Weill Cornell Medical College Jennifer B. Christian, PharmD, MPH, PhD Senior Director, Clinical Effectiveness and Safety GlaxoSmithKline Michael L. Cohen, PhD Senior Program Officer Committee on National Statistics Mark R. Cullen, MD Professor of Medicine Stanford School of Medicine Steven R. Cummings, MD, FACP Professor Emeritus, Department of Medicine University of California, San Francisco Robert W. Dubois, MD, PhD Chief Science Officer National Pharmaceutical Council Rachael L. Fleurence, PhD Acting Director, Accelerating PCOR Methods Program PCORI Dean Follmann, PhD Branch Chief-Associate Director for Biostatistics National Institutes of Health Constantine Frangakis, PhD Professor, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Mitchell H. Gail, MD, PhD Senior Investigator National Cancer Institute Kathleen R. Gans-Brangs, PhD Senior Director, Medical Affairs AstraZeneca
  14. 14. Current as of 12pm, April 24 Steven N. Goodman, MD, PhD Associate Dean for Clinical and Translational Research Stanford University School of Medicine Sheldon Greenfield, MD Executive Co-Director, Health Policy Research Institute University of California, Irvine Joel B. Greenhouse, PhD Professor of Statistics Carnegie Mellon University Sean Hennessy, PharmD, PhD Associate Professor of Epidemiology University of Pennsylvania Miguel Hernan, MD, DrPH, ScM, MPH Professor of Epidemiology Harvard University Mark A. Hlatky, MD Professor of Health Research & Policy, Professor of Medicine Stanford University Ralph I. Horwitz, M.D. Senior Vice President, Clinical Science Evaluation GlaxoSmithKline Gail Hunt President and CEO National Alliance for Caregiving Robert Jesse, MD, PhD Principal Deputy Under Secretary for Health Department of Veterans Affairs Eloise E. Kaizar, PhD Associate Professor Department of Statistics The Ohio State University Jerome P. Kassirer, MD Distinguished Professor Tufts University School of Medicine Michael Kattan, PhD Quantitative Health Sciences Department Chair Cleveland Clinic David M. Kent, MD, MSc Director, Clinical and Translational Science Program Tufts University Sackler School of Graduate Biomedical Sciences Michael S. Lauer, MD, FACC, FAHA Director, Division of Prevention and Population Sciences National Heart, Lung, and Blood Institute J. Michael McGinnis, MD, MPP, MA Senior Scholar Institute of Medicine David O. Meltzer, PhD Associate Professor University of Chicago Nancy E. Miller, PhD Senior Science Policy Analyst Office of Science Policy National Institutes of Health Sally Morton, PhD Professor and Chair, Department of Biostatistics Graduate School of Public Health University of Pittsburgh Cynthia D. Mulrow, MD, MSc Senior Deputy Editor Annals of Internal Medicine Robin Newhouse Chair and Professor University of Maryland School of Nursing Perry D. Nisen, MD, PhD SVP, Science and Innovation GlaxoSmithKline Michael Pencina, PhD Associate Professor Boston University
  15. 15. Current as of 12pm, April 24 Richard Platt, MD, MS Chair, Ambulatory Care and Prevention Chair, Population Medicine Harvard University James Robins, MD Mitchell L. and Robin LaFoley Dong Professor of Epidemiology Harvard University Patrick Ryan, PhD Head of Epidemiology Analytics Janssen Research and Development Nancy Santanello, MD, MS Vice President, Epidemiology Merck Richard L. Schilsky, MD, FASCO Chief Medical Officer American Society of Clinical Oncology Sebastian Schneeweiss, MD Associate Professor, Epidemiology Division of Pharmacoepidemiology and Pharmacoeconomics Brigham and Women's Hospital Michelle K. Schwalbe, PhD Program Officer Board on Mathematical Sciences and Their Applications National Research Council Jodi Segal, MD, MPH Director, Pharmacoepidemiology Program The John Hopkins Medical Institutions Joe V. Selby, MD, MPH Executive Director PCORI Burton H. Singer, PhD, MS Professor, Emerging Pathogens Institute University of Florida Jean Slutsky, PA, MS Director, Center for Outcomes and Evidence Agency for Healthcare Research and Quality Dylan Small, PhD Associate Professor of Statistics University of Pennsylvania Harold C. Sox, MD Professor of Medicine (emeritus, active) The Dartmouth Institute for Health Policy and Clinical Practice Dartmouth Geisel School of Medicine Elizabeth A. Stuart Associate Professor, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Nicholas Tatonetti, PhD Assistant Professor of Biomedical Informatics Columbia University Robert Temple, MD Deputy Center Director for Clinical Science Food and Drug Administration Scott T. Weidman, PhD Director, Board on Mathematical Sciences and their Applications National Research Council William S. Weintraub, MD, FACC John H. Ammon Chair of Cardiology Christiana Care Health Services Harlan Weisman Managing Director And-One Consulting, LLC Ashley E. Wivel, MD, MSc Senior Director in Clinical Effectiveness and Safety GlaxoSmithKline John B. Wong, MD Professor of Medicine Tufts University Sackler School of Graduate Biomedical Sciences
  16. 16. Current as of 12pm, April 24 IOM Staff Claudia Grossmann, PhD Senior Program Officer Diedtra Henderson Program Officer Elizabeth Johnston Program Assistant Valerie Rohrbach Senior Program Assistant Julia Sanders Senior Program Assistant Robert Saunders, PhD Senior Program Officer Barret Zimmermann Program Assistant
  17. 17. ---------------------------------- Engaging the Issue of Bias EngagingtheIssueofBias
  18. 18. CLINICAL TRIALS Clinical Trials 2012; 9: 48–55ARTICLE Beyond the intention-to-treat in comparative effectiveness research Miguel A Herna´na,b and Sonia Herna´ndez-Dı´aza Background The intention-to-treat comparison is the primary, if not the only, analytic approach of many randomized clinical trials. Purpose To review the shortcomings of intention-to-treat analyses, and of ‘as treated’ and ‘per protocol’ analyses as commonly implemented, with an emphasis on problems that are especially relevant for comparative effectiveness research. Methods and Results In placebo-controlled randomized clinical trials, intention- to-treat analyses underestimate the treatment effect and are therefore nonconser- vative for both safety trials and noninferiority trials. In randomized clinical trials with an active comparator, intention-to-treat estimates can overestimate a treatment’s effect in the presence of differential adherence. In either case, there is no guarantee that an intention-to-treat analysis estimates the clinical effectiveness of treatment. Inverse probability weighting, g-estimation, and instrumental variable estimation can reduce the bias introduced by nonadherence and loss to follow-up in ‘as treated’ and ‘per protocol’ analyses. Limitations These analyse require untestable assumptions, a dose-response model, and time-varying data on confounders and adherence. Conclusions We recommend that all randomized clinical trials with substantial lack of adherence or loss to follow-up are analyzed using different methods. These include an intention-to-treat analysis to estimate the effect of assigned treatment and ‘as treated’ and ‘per protocol’ analyses to estimate the effect of treatment after appropriate adjustment via inverse probability weighting or g-estimation. Clinical Trials 2012; 9: 48–55. http://ctj.sagepub.com Introduction Randomized clinical trials (RCTs) are widely viewed as a key tool for comparative effectiveness research [1], and the intention-to-treat (ITT) comparison has long been regarded as the preferred analytic approach for many RCTs [2]. Indeed, the ITT, or ‘as randomized,’ analysis has two crucial advantages over other common alterna- tives – for example, an ‘as treated’ analysis. First, in double-blind RCTs, an ITT comparison provides a valid statistical test of the hypothesis of null effect of treatment [3,4]. Second, in placebo-controlled trials, an ITT comparison is regarded as conservative because it underestimates the treatment effect when participants do not fully adhere to their assigned treatment. Yet excessive reliance on the ITT approach is problematic, as has been argued by others before us [5]. In this paper, we review the problems of ITT comparisons with an emphasis on those that are especially relevant for comparative effectiveness research. We also review the shortcomings of ‘as treated’ and ‘per protocol’ analyses as commonly implemented in RCTs and recommend the routine use of analytic approaches that address some of a Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA, b Harvard-MIT Division of Health Sciences and Technology, Boston, MA, USA Author for correspondence: Miguel Herna´n, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA E-mail: miguel_hernan@post.harvard.edu Ó The Author(s), 2011 Reprints and permissions: http://www.sagepub.co.uk/journalsPermissions.nav 10.1177/1740774511420743
  19. 19. those shortcomings. Let us start by defining two types of causal effects that can be estimated in RCTs. The effect of assigned treatment versus the effect of treatment Consider a double-blind clinical trial in which par- ticipants are randomly assigned to either active treatment (Z ¼ 1) or placebo (Z ¼ 0) and are then followed for 5 years or until they die (Y ¼ 1 if they die within 5 years, Y ¼ 0 otherwise). An ITT analysis would compare the 5-year risk of death in those assigned to treatment with the 5-year risk of death in those assigned to placebo. An ITT comparison unbiasedly estimates the average causal effect of treatment assignment Z on the outcome Y. For brevity, we will refer to this as the effect of assigned treatment. Trial participants may not adhere to, or comply with, the assigned treatment Z. Some of those assigned to placebo may decide to take treatment, and some of those assigned to active treatment may decide not to take it. We use A to refer to the treatment actually received. Thus, regardless of their assigned treatment Z, some subjects will take treat- ment (A ¼ 1) and others will not take it (A ¼ 0). The use of ITT comparisons is sometimes criticized when not all trial participants adhere to their assigned treatment Z, that is, when Z is not equal to A for every trial participant. For example, consider two RCTs: in the first trial, half of the participants in the Z ¼ 1 group decide not to take treatment; in the second trial, all participants assigned to Z ¼ 1 decide to take the treatment. An ITT comparison will correctly estimate the effect of assigned treatment Z in both trials, but the effects will be different even if the two trials are otherwise identical. The direction and magnitude of the effect of assigned treatment depends on the adherence pattern. Now suppose that, in each of the two trials with different adherence, we could estimate the effect that would have been observed if all participants had fully adhered to the value of treatment A (1 or 0) originally assigned to them. We will refer to such effect as the average causal effect of treatment A on the outcome Y or, for brevity, the effect of treatment. The effect of treatment A appears to be an attractive choice to summarize the findings of RCTs with substantial nonadherence because it will be the same in two trials that differ only in their adherence pattern. However, estimating the magnitude of the effect of treatment A without bias requires assumptions grounded on expert knowledge (see below). No matter how sophisticated the statistical analysis, the estimate of the effect of A will be biased if one makes incorrect assumptions. The effect of assigned treatment may be misleading An ITT comparison is simple and therefore very attractive [4]. It bypasses the need for assumptions regarding adherence and dose-response by focusing on estimating the effect of assigned treatment Z rather than the effect of treatment A. However, there is a price to pay for this simplicity, as reviewed in this section. We start by considering placebo-controlled double-blind RCTs. It is well known that if treatment A has a null effect on the outcome, then both the effect of assigned treatment Z and the effect of treat- ment A will be null. This is a key advantage of the ITT analysis: it correctly estimates the effect of treatment A under the null, regardless of the adherence pattern. It is also well known that if treatment A has a non- null effect (that is, either increases or decreases the risk of the outcome) and some participants do not adhere to their assigned treatment, then the effect of assigned treatment Z will be closer to the null that the actual effect of treatment A [3]. This bias toward the null is due to contamination of the treatment groups: some subjects assigned to treatment (Z ¼ 1) may not take it (A ¼ 0) whereas some subjects assigned to placebo (Z ¼ 0) may find a way to take treatment (A ¼ 1). As long as the proportion of patients who end up taking treatment (A ¼ 1) is greater in the group assigned to treatment (Z ¼ 1) than in the group assigned to placebo (Z ¼ 0), the effect of assigned treatment Z will be in between the effect of treatment A and the null value. The practical effect of this bias varies depending on the goal of the trial. Some placebo-controlled RCTs are designed to quantify a treatment’s benefi- cial effects – for example, a trial to determine whether sildenafil reduces the risk of erectile dys- function. An ITT analysis of these trials is said to be ‘conservative’ because the effect of assigned treat- ment Z is biased toward the null. That is, if an ITT analysis finds a beneficial effect for treatment assign- ment Z, then the true beneficial effect of treatment A must be even greater. The makers of treatment A have a great incentive to design a high-quality study with high levels of adherence. Otherwise, a small beneficial effect of treatment might be missed by the ITT analysis. Other trials are designed to quantify a treatment’s harmful effects – for example, a trial to determine whether sildenafil increases the risk of cardiovascular disease. An ITT analysis of these trials is antic- onservative precisely because the effect of assigned Beyond the intention-to-treat 49 http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
  20. 20. treatment Z is biased toward the null. That is, if an ITT analysis fails to find a toxic effect, there is no guarantee that treatment A is safe. A trial designed to quantify harm and whose protocol foresees only an ITT analysis could be referred to as a ‘randomized cynical trial.’ Now let us consider double-blind RCTs that com- pare two active treatments. These trials are often designed to show that a new treatment (A ¼ 1) is not inferior to a reference treatment (A ¼ 0) in terms of either benefits or harms. An example of a noninfer- iority trial would be one that compares the reduction in blood glucose between a new inhaled insulin and regular injectable insulin. The protocol of the trial would specify a noninferiority margin, that is, the maximum average difference in blood glucose that is considered equivalent (e.g., 10 mg/dL). Using an ITT comparison, the new insulin (A ¼ 1) will be declared not inferior to classical insulin (A ¼ 0) if the average reduction in blood glucose in the group assigned to the new treatment (Z ¼ 1) is within 10 mg/dL of the average reduction in blood glucose in the group assigned to the reference treatment (Z ¼ 0) plus/ minus random variability. Such ITT analysis may be misleading in the presence of imperfect adherence. To see this, consider the following scenario. Scenario 1 The new treatment A ¼ 1 is actually inferior to the reference treatment A ¼ 0, for example, the average reduction in blood glucose is 10 mg/dL under treat- ment A ¼ 1 and 22 mg/dL under treatment A ¼ 0. The type and magnitude of adherence is equal in the two groups, for example 30% of subjects in each group decided not to take insulin. As a result, the average reduction is, say, 7 mg/dL in the group assigned to the new treatment (Z ¼ 1) and 15 mg/dL in the group assigned to the reference treatment (Z ¼ 0). An ITT analysis, which is biased toward the null in this scenario, may incorrectly suggest that the new treatment A ¼ 1 is not inferior to the reference treatment A ¼ 0. Other double-blind RCTs with an active compar- ator are designed to show that a new treatment (A ¼ 1) is superior to the reference treatment (A ¼ 0) in terms of either benefits or harms. An example of a superiority trial would be one that compares the risk of heart disease between two antiretroviral regimes. Using an ITT comparison, the new regimen (A ¼ 1) will be declared superior to the reference regime (A ¼ 0) if the heart disease risk is lower in the group assigned to the new regime (Z ¼ 1) than in the group assigned to the reference regime (Z ¼ 0) plus/minus random variability. Again, such ITT analysis may be misleading in the presence of imperfect adherence. Consider the following scenario. Scenario 2 The new treatment A ¼ 1 is actually equivalent to the reference treatment A ¼ 0, for example, the 5-year risk of heart disease is 3% under either treatment A ¼ 1 or treatment A ¼ 0, and the risk in the absence of either treatment is 1%. The type or magnitude of adherence differs between the two groups, for exam- ple, 50% of subjects assigned to the new regime and 10% of those assigned to the reference regime decided not to take their treatment because of minor side effects. As a result, the risk is, say, 2% in the group assigned to the new regime (Z ¼ 1) and 2.8% in the group assigned to the reference regime (Z ¼ 0). An ITT analysis, which is biased away from the null in this scenario, may incorrectly suggest that treatment A ¼ 1 is superior to treatment A ¼ 0. An ITT analysis of RCTs with an active comparator may result in effect estimates that are biased toward (Scenario 1) or away from (Scenario 2) the null. In other words, the magnitude of the effect of assigned treatment Z may be greater than or less than the effect of treatment A. The direction of the bias depends on the proportion of subjects that do not adhere to treatment in each group, and on the reasons for nonadherence. Yet, a common justification for ITTcomparisons is the following: Adherence is not perfect in clinical practice. Therefore, clinicians may be more inter- ested in consistently estimating the effect of assigned treatment Z, which already incorporates the impact of nonadherence, than the effect of treatment A in the absence of nonadherence. That is, the effect of assigned treatment Z reflects a treatment’s clinical effectiveness and therefore should be privileged over the effect of treatment A. In the next section, we summarize the reasons why this is not necessarily true. The effect of assigned treatment is not the same as the effectiveness of treatment Effectiveness is usually defined as ‘how well a treat- ment works in everyday practice,’ and efficacy as ‘how well a treatment works under perfect adherence and highly controlled conditions.’ Thus, the effect of assigned treatment Z in postapproval settings is often equated with effectiveness, whereas the effect of treatment Z in preapproval settings (which is close to the effect of A when adherence is high) is often 50 MA Herna´n and S Herna´ndez-Dı´az Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
  21. 21. equated with efficacy. There is, however, no guaran- tee that the effect of assigned treatment Z matches the treatment’s effectiveness in routine medical practice. A discrepancy may arise for multiple rea- sons, including differences in patient characteristics, monitoring, or blinding, as we now briefly review. The eligibility criteria for participants in RCTs are shaped by methodologic and ethical considerations. To maximize adherence to the protocol, many RCTs exclude individuals with severe disease, comorbid- ities, or polypharmacy. To minimize risks to vulner- able populations, many RCTs exclude pregnant women, children, or institutionalized populations. As a consequence, the characteristics of participants in an RCT may be, on average, different from those of the individuals who will receive the treatment in clinical practice. If the effect of the treatment under study varies by those characteristics (e.g., treatment is more effective for those using certain concomitant treatments) then the effect of assigned treatment Z in the trial will differ from the treatment’s effective- ness in clinical practice. Patients in RCTs are often more intensely moni- tored than patients in clinical practice. This greater intensity of monitoring may lead to earlier detection of problems (i.e., toxicity, inadequate dosing) in RCTs compared with clinical practice. Thus, a treat- ment’s effectiveness may be greater in RCTs because the earlier detection of problems results in more timely therapeutic modifications, including modifi- cations in treatment dosing, switching to less toxic treatments, or addition of concomitant treatments. Blinding is a useful approach to prevent bias from differential ascertainment of the outcome [6]. There is, however, an inherent contradiction in conduct- ing a double-blind study while arguing that the goal of the study is estimating the effectiveness in routine medical practice. In real life, both patients and doctors are aware of the assigned treatment. A true effectiveness measure should incorporate the effects of assignment awareness (e.g., behavioral changes) that are eliminated in ITT comparisons of double- blind RCTs. Some RCTs, commonly referred to as pragmatic trials [7–9], are specifically designed to guide decisions in clinical practice. Compared with highly controlled trials, pragmatic trials include less selected partici- pants and are conducted under more realistic condi- tions, which may result in lower adherence to the assigned treatment. It is often argued that an ITT analysis of pragmatic trials is particularly appropriate to measure the treatment’s effectiveness, and thus that pragmatic trials are the best design for comparative effectiveness research. However, this argument raises at least two concerns. First, the effect of assigned treatment Z is influ- enced by the adherence patterns observed in the trial, regardless of whether the trial is a pragmatic one. Compared with clinical practice, trial partici- pants may have a greater adherence because they are closely monitored (see above), or simply because they are the selected group who received informed consent and accepted to participate. Patients outside the trial may have a greater adherence after they learn, perhaps based on the trial’s findings, that treatment is beneficial. Therefore, the effect of assigned treatment estimated by an ITT analysis may under- or overestimate the effectiveness of the treatment. Second, the effect of assigned treatment Z is inadequate for patients who are interested in initi- ating and fully adhering to a treatment A that has been shown to be efficacious in previous RCTs. In order to make the best informed decision, these patients would like to know the effect of treatment A rather than an effect of assigned treatment Z, which is contaminated by other patients’ nonadherence [5]. For example, to decide whether to use certain contraception method, a couple may want to know the failure rate if they use the method as indicated, rather than the failure rate in a population that included a substantial proportion of nonadherers. Therefore, the effect of assigned treatment Z may be an insufficient summary measure of the trial data, even if it actually measures the treatment’s effectiveness. In summary, the effect of assigned treatment Z – estimated via an ITT comparison – may not be a valid measure of the effectiveness of treatment A in clinical practice. And even if it were, effectiveness is not always the most interesting effect measure. These considerations, together with the inappropri- ateness of ITT comparisons for safety and noninfer- iority trials, make it necessary to expand the reporting of results from RCTs beyond ITT analyses. The next section reviews other analytic approaches for data from RCTs. Conventional ‘as treated’ and ‘per protocol’ analyses Two common attempts to estimate the effect of treatment A are ‘as treated’ and ‘per protocol’ com- parisons. Neither is generally valid. An ‘as treated’ analysis classifies RCT participants according to the treatment that they took (either A ¼ 1 or A ¼ 0) rather than according to the treat- ment that they were assigned to (either Z ¼ 1 or Z ¼ 0). Then an ‘as treated’ analysis compares the risk (or the mean) of the outcome Y among those who took treatment (A ¼ 1) with that among those who did not take treatment (A ¼ 0), regardless of their treatment assignment Z. That is, an ‘as treated’ comparison ignores that the data come from an Beyond the intention-to-treat 51 http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
  22. 22. RCT and rather treats them as coming from an observational study. As a result, an ‘as treated’ comparison will be confounded if the reasons that moved participants to take treatment were associ- ated with prognostic factors. The causal diagram in Figure 1 represents the confounding as a noncausal association between A and Y when there exist prognostic factors L that also affect the decision to take treatment A (U is an unmeasured common cause of L and Y). Confounding arises in an ‘as treated’ analysis when not all prognostic factors L are appro- priately measured and adjusted for. A ‘per protocol’ analysis – also referred to as an ‘on treatment’ analysis – only includes individuals who adhered to the clinical trial instructions as specified in the study protocol. The subset of trial participants included in a ‘per protocol’ analysis, referred to as the per protocol population, includes only partici- pants with A equal to Z: those who were assigned to treatment (Z ¼ 1) and took it (A ¼ 1), and those who were not assigned to treatment (Z ¼ 0) and did not take it (A ¼ 0). A ‘per protocol’ analysis compares the risk (or the mean) of the outcome Y among those who were assigned to treatment (Z ¼ 1) with that among those who were not assigned to treatment (Z ¼ 0) in the per protocol population. That is, a ‘per protocol’ analysis is an ITT analysis in the per protocol population. This contrast will be affected by selection bias [10] if the reasons that moved participants to adhere to their assigned treatment were associated with prognostic factors L. The causal diagram in Figure 2 includes S as an indicator of selection into the ‘per protocol’ population. The selection indicator S is fully determined by the values of Z and A, that is, S ¼ 1 when A ¼ Z, and S ¼ 0 otherwise. The selection bias is a noncausal associ- ation between Z and Y that arises when the analysis is restricted to the ‘per protocol’ population (S ¼ 1) and not all prognostic factors L are appropriately measured and adjusted for. As an example of biased ‘as treated’ and ‘per protocol’ estimates of the effect of treatment A, consider the following scenario. Scenario 3 An RCT assigns men to either colonoscopy (Z ¼ 1) or no colonoscopy (Z ¼ 0). Suppose that undergoing a colonoscopy (A ¼ 1) does not affect the 10-year risk of death from colon cancer (Y) compared with not undergoing a colonoscopy (A ¼ 0), that is, the effect of treatment A is null. Further suppose that, among men assigned to Z ¼ 1, those with family history of colon cancer (L ¼ 1) are more likely to adhere to their assigned treatment and undergo the colonoscopy (A ¼ 1). Even though A has a null effect, an ‘as treated’ analysis will find that men undergoing colonoscopy (A ¼ 1) are more likely to die from colon cancer because they include a greater proportion of men with a predisposition to colon cancer than the others (A ¼ 0). This is the situation depicted in Figure 1. Similarly, a ‘per protocol’ analysis will find a greater risk of death from colon cancer in the group Z ¼ 1 than in the group Z ¼ 0 because the per protocol restriction A ¼ Z overloads the group assigned to colonoscopy with men with a family history of colon cancer. This is the situation depicted in Figure 2. The confounding bias in the ‘as treated’ analysis and the selection bias in the ‘per protocol’ analysis can go in either direction – for example, suppose that L represents healthy diet rather than family history of colon cancer. In general, the direction of the bias is hard to predict because it is possible that the proportions of people with a family history, healthy diet, and any other prognostic factor will vary between the groups A ¼ 1 and A ¼ 0 condi- tional on Z. In summary, ‘as treated’ and ‘per protocol’ anal- yses transform RCTs into observational studies for all practical purposes. The estimates from these analyses Z L U A Y Figure 1. Simplified causal diagram for a randomized clinical trial with assigned treatment Z, received treatment A, and outcome Y. U represents the unmeasured common causes of A and Y. An ‘as treated’ analysis of the A-Y association will be confounded unless all prognostic factors L are adjusted for. Z L U S A Y Figure 2. Simplified causal diagram for a randomized clinical trial with assigned treatment Z, received treatment A, and outcome Y. U represents the unmeasured common causes of A and Y, and S an indicator for selection into the ‘per protocol’ population. The Z-Y association in the ‘per protocol’ population (a restriction represented by the box around S) will be affected by selection bias unless all prognostic factors L are adjusted for. 52 MA Herna´n and S Herna´ndez-Dı´az Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
  23. 23. can only be interpreted as the effect of treatment A if the analysis is appropriately adjusted for the con- founders L. If the intended analysis of the RCT is ‘as treated’ or ‘per protocol,’ then the protocol of the trial should describe the potential confounders and how they will be measured, just like the protocol of an observational study would do. More general ‘as treated’ and ‘per protocol’ analyses to estimate the effect of treatment So far we have made the simplifying assumption that adherence is all or nothing. But in reality, RCT participants may adhere to their assigned treatment intermittently. For example, they may take their assigned treatment for 2 months, discontinue it for the next 3 months, and then resume it until the end of the study. Or subjects may take treatment con- stantly but at a lower dose than assigned. For example, they may take only one pill per day when they should take two. Treatment A is generally a time-varying variable – each day you may take it or not take it – rather than a time-fixed variable – you either always take it or never take it during the follow-up. An ‘as treated’ analysis with a time-varying treat- ment A usually involves some sort of dose-response model. A ‘per protocol’ analysis with a time-varying treatment A includes all RCT participants but censors them if/when they deviate from their assigned treatment. The censoring usually occurs at a fixed time after nonadherence, say, 6 months. The per protocol population in this variation refers to the adherent person-time rather than to the adherent persons. Because previous sections were only concerned with introducing some basic problems of ITT, ‘as treated’ and ‘per protocol’ analyses, we considered A as a time-fixed variable. However, this simplification may be unrealistic and misleading in practice. When treatment A is truly time-varying (i) the effect of treatment needs to be redefined and (ii) appropriate adjustment for the measured confoun- ders L cannot generally be achieved by using con- ventional methods such as stratification, regression, or matching. The definition of the average causal effect of a time-fixed treatment involves the contrast between two clinical regimes. For example, we defined the causal effect of a time-fixed treatment as a contrast between the average outcome that would be observed if all participants took treatment A ¼ 1 versus treatment A ¼ 0. The two regimes are ‘‘taking treatment A ¼ 1’’ and ‘‘taking treatment A ¼ 0’’. The definition of the causal effect of a time-varying treatment also involves a contrast between two clinical regimes. For example, we can define the causal effect of a time-varying treatment as a contrast between the average outcome that would be observed if all participants had continuous treat- ment with A ¼ 1 versus continuous treatment with A ¼ 0 during the entire follow-up. We sometimes refer to this causal effect as the effect of continuous treatment. When the treatment is time-varying, so are the confounders. For example, the probability of taking antiretroviral therapy increases in the presence of symptoms of HIV disease. Both therapy and con- founders evolve together during the follow-up. When the time-varying confounders are affected by previous treatment – for example, antiretroviral therapy use reduces the frequency of symptoms – conventional methods cannot appropriately adjust for the measured confounders [10]. Rather, inverse probability (IP) weighting or g-estimation are gener- ally needed for confounding adjustment in ‘as treated’ and ‘per protocol’ analyses involving time- varying treatments [11–13]. Both IP weighting and g-estimation require that time-varying confounders and time-varying treat- ments are measured during the entire follow-up. Thus, if planning to use these adjustment methods, the protocol of the trial should describe the potential confounders and how they will be measured. Unfortunately, like in any observational study, there is no guarantee that all confounders will be identified and correctly measured, which may result in biased estimates of the effect of continuous treatment in ‘as treated’ and ‘per protocol’ analyses involving time-varying treatments. An alternative adjustment method is instrumen- tal variable (IV) estimation, a particular form of g-estimation that does not require measurement of any confounders [14–17]. In double-blind RCTs, IV estimation eliminates confounding for the effect of continuous treatment A by exploiting the fact that the initial treatment assignment Z was random. Thus, if the time-varying treatment A is measured and a correctly specified structural model used, IV estimation adjusts for confounding without measur- ing, or even knowing, the confounders. A detailed description of IP weighting, g-estimation, and IV estimation is beyond the scope of this paper. Toh and Herna´n review these methods for RCTs [18]. IP weighting and g-estimation can also be used to estimate the effect of treatment regimes that may be more clin- ically relevant than the effect of continuous treat- ment [19,20]. For example, it may be more interesting to estimate the effect of treatment taken continuously unless toxic effects or counterindica- tions arise. Beyond the intention-to-treat 53 http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
  24. 24. Discussion An ITT analysis of RCTs is appealing for the same reason it may be appalling: simplicity. As described above, ITT estimates may be inadequate for the assessment of comparative effectiveness or safety. In the presence of nonadherence, the ITT effect is a biased estimate of treatment’s effects such as the effect of continuous treatment. This bias can be corrected in an appropriately adjusted ‘as treated’ analysis via IP weighting, g-estimation, or IV estima- tion. However, IP weighting and g-estimation require untestable assumptions similar to those made for causal inference from observational data. IV estimation generally requires a dose-response model and its validity is questionable for nonblinded RCTs. The ITT approach is also problematic if a large proportion of participants drop out or are otherwise lost to follow-up, or if the outcomes are incompletely ascertained among those completing the study. In these studies, an ITT comparison cannot be con- ducted because the value of the outcome is missing for some individuals. To circumvent this problem, the ITT analysis is often replaced by a pseudo-ITT analysis that is restricted to subjects with complete data or in which the last observation is carried forward. These pseudo-ITT analyses may be affected by selection bias in either direction. Adjusting for this bias is possible via IP weighting if information on the time-varying determinants of loss to follow- up is available, but again, the validity of the adjust- ment relies on untestable assumptions about the unmeasured variables [18]. RCTs with long follow-up periods, as expected in many comparative effectiveness research settings, are especially susceptible to bias due to nonadher- ence and loss to follow-up. As these problems accu- mulate over time, the RCT starts to resemble a prospective observational study, and the ITT analysis yields an increasingly biased estimate of the effect of continuous treatment. Consider, for example, a Women’s Health Initiative randomized trial that assigned postmenopausal women to either estrogen plus progestin hormone therapy or placebo [21]. About 40% of women had stopped taking at least 80% of their assigned treatment by the 6th year of follow-up. The ITT hazard ratio of breast cancer was 1.25 (95% CI: 1.01, 1.54) for hormone therapy versus placebo. The IP weighted hazard ratio of breast cancer was 1.68 (1.24, 2.28) for 8 years of continuous hormone therapy versus no hormone therapy [22]. These findings suggest that the effect of continuous treatment was more than twofold greater than the effect of assigned treatment. Of course, neither of these estimates reflects the long-term effect of hor- mone therapy in clinical practice (e.g., the adherence to hormone therapy was much higher in the trial than in the real world). When analyzing data from RCTs, the question is not whether assumptions are made but rather which assumptions are made. In an RCT with incomplete follow-up or outcome ascertainment, a pseudo-ITT analysis assumes that the loss to follow-up occurs completely at random whereas an IP weighted ITT analysis makes less strong assumptions (e.g., loss to follow-up occurs at random conditional on the measured covariates). In an RCT with incomplete adherence, an ITT analysis shifts the burden of assessing the actual magnitude of the effect from the data analysts to the clinicians and other decision makers, who will need to make assumptions about the potential bias introduced by lack of adherence. Supplementing the ITT effects with ‘as treated’ or ‘per protocol’ effects can help decision makers [23], but only if a reasonable attempt is made to appro- priately adjust for confounding and selection bias. In summary, we recommend that all RCTs with substantial lack of adherence or loss to follow-up be analyzed using different methods, including an ITT analysis to estimate the effect of assigned treat- ment, and appropriately adjusted ‘per protocol’ and ‘as treated’ analyses (i.e., via IP weighting or g- estimation) to estimate the effect of received treat- ment. Each approach has relative advantages and disadvantages, and depends on a different combi- nation of assumptions [18]. To implement this recommendation, RCT protocols should include a more sophisticated statistical analysis plan, as well as plans to measure adherence and other postrando- mization variables. This added complexity is neces- sary to take full advantage of the substantial societal resources that are invested in RCTs. Acknowledgement We thank Goodarz Danaei for his comments to an earlier version of this manuscript. Funding This study was funded by National Institutes of Health grants R01 HL080644-01 and R01 HD056940. References 1. Luce BR, Kramer JM, Goodman SN, et al. Rethinking randomized clinical trials for comparative effectiveness research: the need for transformational change. Ann Intern Med 2009; 151: 206–09. 2. Food and Drug Administration. International Conference on Harmonisation; Guidance on Statistical 54 MA Herna´n and S Herna´ndez-Dı´az Clinical Trials 2012; 9: 48–55 http://ctj.sagepub.com
  25. 25. Principles for Clinical Trials. Federal Register 1998; 63: 49583–98. 3. Rosenberger WF, Lachin JM. Randomization in Clinical Trials: Theory and Practice. Wiley-Interscience, New York, NY, 2002. 4. Piantadosi S. Clinical Trials: A Methodologic Perspective (2nd edn). Wiley-Interscience, Hoboken, NJ, 2005. 5. Sheiner LB, Rubin DB. Intention-to-treat analysis and the goals of clinical trials. Clin Pharmacol Ther 1995; 57: 6–15. 6. Psaty BM, Prentice RL. Minimizing bias in randomized trials: the importance of blinding. JAMA 2010; 304: 793–94. 7. McMahon AD. Study control, violators, inclusion criteria and defining explanatory and pragmatic trials. Stat Med 2002; 21: 1365–76. 8. Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in therapeutical trials. J Chronic Dis 1967; 20: 637–48. 9. Tunis SR, Stryer DB, Clancy CM. Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy. JAMA 2003; 290: 1624–32. 10. Herna´n MA, Herna´ndez-Dı´az S, Robins JM. A structural approach to selection bias. Epidemiology 2004; 15: 615–25. 11. Robins JM. Correcting for non-compliance in randomized trials using structural nested mean models. Communin Stat 1994; 23: 2379–412. 12. Robins JM. Correction for non-compliance in equivalence trials. Stat Med 1998; 17: 269–302. 13. Robins JM, Finkelstein D. Correcting for non- compliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) Log-rank tests. Biometrics 2000; 56: 779–88. 14. Herna´n MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology 2006; 17: 360–72. 15. Ten Have TR, Normand SL, Marcus SM, et al. Intent-to- treat vs. non-intent-to-treat analyses under treatment non-adherence in mental health randomized trials. Psychiatr Ann 2008; 38: 772–83. 16. Cole SR, Chu H. Effect of acyclovir on herpetic ocular recurrence using a structural nested model. Contemp Clin Trials 2005; 26: 300–10. 17. Mark SD, Robins JM. A method for the analysis of randomized trials with compliance information: an appli- cation to the Multiple Risk Factor Intervention Trial. Contr Clin Trials 1993; 14: 79–97. 18. Toh S, Herna´n MA. Causal Inference from longitudinal studies with baseline randomization. Int J Biostat 2008; 4: Article 22. 19. Herna´n MA, Lanoy E, Costagliola D, Robins JM. Comparison of dynamic treatment regimes via inverse probability weighting. Basic Clin Pharmacol Toxicol 2006; 98: 237–42. 20. Cain LE, Robins JM, Lanoy E, et al. When to start treatment? a systematic approach to the comparison of dynamic regimes using observational data. Int J Biostat 2006; 6: Article 18. 21. Writing group for the Women’s Health Initiative Investigators. Risks and benefits of estrogen plus proges- tin in healthy postmenopausal women: principal results from the women’s health initiative randomized controlled trial. JAMA 2002; 288: 321–33. 22. Toh S, Herna´ndez-Dı´az S, Logan R, et al. Estimating absolute risks in the presence of nonadherence: an appli- cation to a follow-up study with baseline randomization. Epidemiology 2010; 21: 528–39. 23. Thorpe KE, Zwarenstein M, Oxman AD, et al. A prag- matic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers. J Clin Epidemiol 2009; 62: 464–75. Beyond the intention-to-treat 55 http://ctj.sagepub.com Clinical Trials 2012; 9: 48–55
  26. 26. VIEWPOINT Prespecified Falsification End Points Can They Validate True Observational Associations? Vinay Prasad, MD Anupam B. Jena, MD, PhD A S OBSERVATIONAL STUDIES HAVE INCREASED IN NUM- ber—fueled by a boom in electronic recordkeep- ing and the ease with which observational analy- ses of large databases can be performed—so too have failures to confirm initial research findings.1 Several solutions to the problem of incorrect observational results have been suggested,1,2 emphasizing the importance of a rec- ord not only of significant findings but of all analyses con- ducted.2 An important and increasingly familiar type of observa- tional study is the identification of rare adverse effects (de- fined by organizations such as the Council for Interna- tional Organizations and Medical Sciences as occurring among fewer than 1 per 1000 individuals) from population data. Examples of these studies include whether macrolide antibiotics such as azithromycin are associated with higher rates of sudden cardiac death3 ; whether proton pump in- hibitors (PPIs) are associated with higher rates of pneumo- nia4 ; or whether bisphosphonates are associated with an in- creased risk of atypical (subtrochanteric) femur fractures.5 Rare adverse events, such as these examples, occur so in- frequently that almost by definition they may not be iden- tified in randomized controlled trials (RCTs). Postmarket- ing data from thousands of patients are required to identify such low-frequency events. In fact, the ability to conduct postmarketing surveillance of large databases has been her- alded as a vital step in ensuring the safe dissemination of medical treatments after clinical trials (phase 4) for pre- cisely this reason. Few dispute the importance of observational studies for capturing rare adverse events. For instance, in early stud- ies of whether bisphosphonate use increases the rate of atypi- cal femur fractures, pooled analysis of RCTs demonstrated no elevated risk.6 However, these data were based on a lim- ited sample of 14 000 patients with only 284 hip or femur fractures and only 12 atypical fracture events over just more than 3.5 years of follow-up. In contrast, later observational studies addressing the same question were able to leverage much larger and more comprehensive data. One analysis that examined 205 466 women who took bisphosphonates for an average of 4 years identified more than 10 000 hip or fe- mur fractures and 716 atypical fractures.5 This analysis dem- onstrated an increased risk of atypical fractures associated with bisphosphonate use and was validated by another large population-based study. However, analyses in large data sets are not necessarily correct simply because they are larger. Control groups might not eliminate potential confounders, or many varying defi- nitions of exposure to the agent may be tested (alternative thresholds for dose or duration of a drug)—a form of mul- tiple-hypothesis testing.2 Just as small, true signals can be identified by these analyses, so too can small, erroneous as- sociations. For instance, several observational studies have found an association between use of PPIs and development of pneumonia, and it is biologically plausible that elevated gastric pH may engender bacterial colonization.4 However, it is also possible that even after statistical adjustment for known comorbid conditions, PPI users may have other un- observed health characteristics (such as poor health lit- eracy or adherence) that could increase their rates of pneu- monia, apart from use of the drug. Alternatively, physicians who are more likely to prescribe PPIs to their patients also may be more likely to diagnose their patients with pneu- monia in the appropriate clinical setting. Both mecha- nisms would suggest that the observational association be- tween PPI use and pneumonia is confounded. In light of the increasing prevalence of such studies and their importance in shaping clinical decisions, it is important to know that the associations identified are true rather than spurious cor- relations. Prespecified falsification hypotheses may pro- vide an intuitive and useful safeguard when observational data are used to find rare harms. A falsification hypothesis is a claim, distinct from the one being tested, that researchers believe is highly unlikely to be causally related to the intervention in question.7 For in- stance, a falsification hypothesis may be that PPI use in- creases the rate of soft tissue infection or myocardial infarc- tion. A confirmed falsification test—in this case, a positive association between PPI use and risks of these conditions— Author Affiliations: Medical Oncology Branch, National Cancer Institute, Na- tional Institutes of Health, Bethesda, Maryland (Dr Prasad); Department of Health Care Policy, Harvard Medical School, and Massachusetts General Hospital, Bos- ton (Dr Jena); and National Bureau of Economic Research, Cambridge, Massa- chusetts (Dr Jena). Corresponding Author: Anupam B. Jena, MD, PhD, Department of Health Care Policy, Harvard Medical School, 180 Longwood Ave, Boston, MA 02115 (jena @hcp.med.harvard.edu). ©2013 American Medical Association. All rights reserved. JAMA, January 16, 2013—Vol 309, No. 3 241 Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 02/11/2013
  27. 27. would suggest that an association between PPI use and pneu- monia initially suspected to be causal is perhaps con- founded by unobserved patient or physician characteristics. Ideally, several prespecified false hypotheses can be tested and, if found not to exist, can support the main study as- sociation of interest. In the case of PPIs, falsification analy- ses have shown that many improbable conditions—chest pain, urinary tract infections, osteoarthritis, rheumatoid ar- thritis flares, and deep venous thrombosis—are also linked to PPI use,4 making the claim of an increased risk of pneu- monia related to use of the drug unlikely. Another example of falsification analysis applied to ob- servational associations involves the reported relationship of social networks with the spread of complex phenomena such as smoking, obesity, and depression. In social net- work studies, persons with social ties are shown to be more likely to gain or lose weight, or to start or stop smoking, at similar time points than 2 random persons in the same group. Several studies supported these claims; however, other stud- ies have shown that even implausible factors—acne, height, and headaches—may also exhibit “network effects.”8 Falsification analysis can be operationalized by asking in- vestigators to specify implausible hypotheses up front and then testing those claims using statistical methods similar to those used in the primary analysis. Falsification could be required both for studies that aim to show a rare harm of a particular medical intervention as well as for studies that aim to show deleterious interactions between medications. For instance, in evaluating whether concomitant use of clopi- dogrel and PPIs is associated with decreased effectiveness of the former drug and worsens cardiovascular outcomes, does the use of PPIs also implausibly diminish the effect of antihypertensive agents or metformin? Prespecifying falsification end points and choosing them appropriately is important for avoiding the problem of mul- tiple hypothesis testing. For instance, if many falsification hypotheses are tested to support a particular observational association, a few falsification outcomes will pass the falsi- fication test—ie, will not be associated with the drug or in- tervention of interest—whereas other falsification tests may fail. If the former are selectively reported, some associa- tions may be mistakenly validated. This issue cannot be ad- dressed by statistical testing for multiple hypotheses alone because selective reporting may still occur. Instead, pre- specifying falsification outcomes and choosing outcomes that are common may mitigate concerns about post hoc data min- ing. In the case of PPIs and risk of pneumonia, falsification analyses used prevalent ambulatory complaints such chest pain, urinary tract infections, and osteoarthritis.4 Observational studies of rare effects of a drug may be fur- ther validated by verification analyses that demonstrate the presence of known adverse effects of a drug in the data set being studied. For instance, an observational study suggest- ing an unknown adverse effect of clopidogrel (for ex- ample, seizures) should also be able to demonstrate the pres- ence of known adverse effects such as gastrointestinal hemorrhage associated with clopidogrel use. The inability of a study to verify known adverse effects should raise ques- tions about selection in the study population. Although no published recommendations exist, standard- ized falsification analyses with 3 to 4 prespecified or highly prevalent disease outcomes may help to strengthen the va- lidity of observational studies, as could inclusion of verifi- cation analyses. Information on whether falsification and validation end points were used in a study should be in- cluded in a registry for observational studies that others have suggested.2 Prespecified falsification hypotheses can improve the va- lidity of studies finding rare harms when researchers can- not determine answers to these questions from RCTs, either because of limited sample sizes or limited follow-up. How- ever, falsification analysis is not a perfect tool for validat- ing the associations in observational studies, nor is it in- tended to be. The absence of implausible falsification hypotheses does not imply that the primary association of interest is causal, nor does their presence guarantee that real relations do not exist. However, when many false relation- ships are present, caution is warranted in the interpreta- tion of study findings. Conflict of Interest Disclosures: The authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were re- ported. REFERENCES 1. Thomas L, Peterson ED. The value of statistical analysis plans in observational research: defining high-quality research from the start. JAMA. 2012;308(8): 773-774. 2. Ioannidis JP. The importance of potential studies that have not existed and reg- istration of observational data sets. JAMA. 2012;308(6):575-576. 3. Ray WA, Murray KT, Hall K, Arbogast PG, Stein CM. Azithromycin and the risk of cardiovascular death. N Engl J Med. 2012;366(20):1881-1890. 4. Jena AB, Sun E, Goldman DP. Confounding in the association of proton pump inhibitor use with risk of community-acquired pneumonia [published online Sep- tember 7, 2012]. J Gen Intern Med. doi:10.1007/s11606-012-2211-5. 5. Park-Wyllie LY, Mamdani MM, Juurlink DN, et al. Bisphosphonate use and the risk of subtrochanteric or femoral shaft fractures in older women. JAMA. 2011; 305(8):783-789. 6. Black DM, Kelly MP, Genant HK, et al; Fracture Intervention Trial Steering Committee; HORIZON Pivotal Fracture Trial Steering Committee. Bisphospho- nates and fractures of the subtrochanteric or diaphyseal femur. N Engl J Med. 2010; 362(19):1761-1771. 7. Bertrand M, Duflo E, Mullainathan S. How much should we trust differences- in-differences estimates? Q J Econ. 2004;119:249-275. 8. Cohen-Cole E, Fletcher JM. Detecting implausible social network effects in acne, height, and headaches: longitudinal analysis. BMJ. 2008;337:a2533. VIEWPOINT 242 JAMA, January 16, 2013—Vol 309, No. 3 ©2013 American Medical Association. All rights reserved. Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 02/11/2013
  28. 28. PERSPECTIVE Orthogonal predictions: follow-up questions for suggestive datay Alexander M. Walker MD, DrPH1,2* 1 World Health Information Science Consultants, LLC, Newton, MA 02466, USA 2 Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA SUMMARY When a biological hypothesis of causal effect can be inferred, the hypothesis can sometimes be tested in the selfsame database that gave rise to the study data from which the hypothesis grew. Valid testing happens when the inferred biological hypothesis has scientific implications that predict new relations between observations already recorded. Testing for the existence of the new relations is a valid assessment of the biological hypothesis, so long as the newly predicted relations are not a logical correlate of the observations that stimulated the hypothesis in the first place. These predictions that lead to valid tests might be called ‘orthogonal’ predictions in the data, and stand in marked contrast to ‘scrawny’ hypotheses with no biological content, which predict simply that the same data relations will be seen in a new database. The Universal Data Warehouse will shortly render moot searches for new databases in which to test. Copyright # 2010 John Wiley & Sons, Ltd. key words — databases; hypothesis testing; induction; inference Received 2 October 2009; Accepted 13 January 2010 INTRODUCTION In 2000, the Food and Drug Administration’s (FDA) Manette Niu and her colleagues had found something that might have been predicted by medicine, but not by statistics.1 They were looking for infants who had gotten into trouble after a dose of Wyeth’s RotaShield vaccine in the Centers for Disease Control’s Vaccine Adverse Event Reporting System. A vaccine against rotavirus infection in infants, RotaShield was already off the market.2 In the United States, rotavirus causes diarrhea so severe that it can lead to hospitalization. By contrast, the infection is deadly in poor countries. The 1999 withdrawal had arguably cost hundreds of thousands of lives of children whose death from rotavirus-induced diarrhea could have been avoided through widespread vaccina- tion with RotaShield.3,4 The enormity of the con- sequences of the withdrawal made it important that the decision had been based at least in sound biology. Wyeth suspended sales of RotaShield because the vaccine appeared to cause intussusception, an infant bowel disorder in which a portion of the colon slips inside of itself. The range of manifestations of intussusception varies enormously. It can resolve on its own, with little more by way of signs than the baby’s fussiness from abdominal pain. Sometimes tissue damage causes bloody diarrhea. Sometimes the bowel infarcts and must be removed, or the baby dies. Dr Niu had used a powerful data-mining tool, Bill DuMouchel’s Multi-Item Gamma Poisson Shrinker to sift through the Vaccine Adverse Event Reporting System (VAERS) data, and she found that intussuscep- tion was not alone in its association with RotaShield.5 So too were gastrointestinal hemorrhage, intestinal obstruction, gastroenteritis, and abdominal pain. My argument here is that those correlations represented independent tests of the biological hypothesis that had already killed the vaccine. The observations were sufficient to discriminate hypotheses of biological causation from those of chance, though competing (and testable) hypotheses of artifact may have remained. The biologic hypothesis was that if RotaShield had caused intussusception, it was likely to have caused pharmacoepidemiology and drug safety 2010; 19: 529–532 Published online 22 March 2010 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/pds.1929 * Correspondence to: A. M. Walker, World Health Information Science Consultants, LLC, 275 Grove St., Suite 2-400, Newton, MA 02466, USA. E-mail: Alec.Walker@WHISCON.com y The author declared no conflict of interest. Copyright # 2010 John Wiley & Sons, Ltd.
  29. 29. cases that did not present as fully recognized instances of the disease, but which nonetheless represented the same pathology. Looking for these other conditions was a test of the biologic hypothesis raised by the occurrence of the severest cases. Like the original observations, the test data resided in VAERS, but were nonetheless independent, in that different physicians in different places, acting more or less concurrently, reported them about different patients. INDUCTION AND TESTING The key step in Niu’s activity was induction of a biological hypothesis of cause from an observation of association. Testing the biological hypothesis differs fundamentally from testing the data-restatement that ‘There is an association between RotaShield and intussusception.’ The latter, by itself a scrawny hypothesis if you could call it a hypothesis at all, might be examined in other environments, though probably not in real time, since VAERS is a national system and RotaShield had been marketed only in the United States. Scrawny hypotheses have no meat to them, that is they do no more than predict more of the same, and even then only when the circumstances of observation are identical. The biological hypothesis, by contrast, was immediately testable through its implica- tions in VAERS, and could produce a host of other empiric tests. From the perspective of the Wyeth, the FDA and the Centers for Disease Control and Prevention (CDC), the parties who had to act in that summer of crisis, only biologic causation really mattered. Biologic causation was not the only theory that predicted reports of multiple related diseases in association with RotaShield. Most of the reports came in after the CDC had announced the association and Wyeth had suspended distribution of RotaTeq. Phys- icians who did not know one another might have been similarly sensitized to the idea that symptom com- plexes compatible with intussusception should be reported. Stimulated reporting is therefore another theory that competes with biological causation to account for the findings. For the present discussion, the key point is not how well the competing hypotheses (biological causation, stimulated reporting, and chance) explain the newly found data. The key is whether one can rationally look at the non-intussusception diagnoses in VAERS to test theories about the RotaShield-intussusception associ- ation, and whether such looks ‘into the same data’ are logically suspect. Trudy Murphy and collaborators offered another example of testing implications of the biological hypothesis of causation in a subsequent case-control study of RotaShield and intussusception.6 Looking at any prior receipt of RotaShield, they found an adjusted odds ratio of 2.2 (95% CI 1.5–3.3). Murphy’s data also provided a test of the theory of biological causation, no form of which would predict a uniform distribution of cases over time after vaccination. Indeed there were pronounced aggregations of cases 3–7 days following first and second immunization. Interestingly, a theory of stimulated reporting would not have produced time clustering, at least not without secondary theories added on top, and so the Murphy data weighed against the leading non-biologic theory for the Niu observations. ORTHOGONAL PREDICTIONS Niu’s and Murphy’s findings share a common element. In neither case did the original observation (case reports of intussusception for Niu, or an association between RotaShield in and ever-immunization for Murphy) imply the follow-up observations (other diagnoses and time-clusters) as a matter of logic, on the null hypothesis. That is, neither set of follow-up observations was predicted by the corresponding scrawny hypothesis, since neither was simply a restatement of the initiating finding. In this sense, I propose that we call the predictions that Niu and Murphy tested ‘orthogonal’ to the original observation. In the very high-dimensional space of medical observations, the predicted data are not simply a rotation of the original findings. Where did the orthogonal predictions come from? The investigators stepped out of the data and into the physical world. We do not know about the world directly, but we can have theories about how it works, and we can test those theories against what we see. Reasoning about the nature of the relations that gave rise to observed data, we can look for opportunities to test the theories. With discipline, we can restrict our ‘predictions’ to relations that are genuinely new, and yet implied by our theories. SHOCKED, SHOCKED ‘I’m shocked, shocked to find that gambling is going on in here!’ says Captain Renault in Casablanca, just before he discretely accepts his winnings and closes down Rick’s Cafe´ Ame´ricain to appease his Nazi minders. Advocates for finding new data sources to test hypotheses might feel kinship with the captain. While Copyright # 2010 John Wiley & Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532 DOI: 10.1002/pds 530 a. m. walker
  30. 30. sincerely believing in the importance of independent replication, they find that they too examine different dimensions of outcomes in suggestive data to evaluate important hypotheses, particularly those hypotheses that would require immediate action if true. This is already the core of regulatory epidemiology, which concerns itself with the best decision on the available data. The necessity to act sometimes plays havoc with prescriptions that cannot be implemented quickly. Exploration of data in hand is not limited to public health epidemiologists, regulators among them. In fact most epidemiologists check causal hypotheses in the data that generated them. Whenever observational researchers see an important effect, they worry (or should do) whether they have missed some confound- ing factor. Confounding is a causal alternative hypothesis for an observed association, and the hypothesis of confounding often has testable implica- tions in the data at hand. Will the crude effect disappear when we control for age? It would be hard to describe the search for confounders as anything other than testing alternative causal hypotheses in the data that gave rise to them. Far from public health, sciences in which there is little opportunity for experiment, such as geology, regularly test hypotheses in existing data. Ebel and Grossman, for example, could ‘predict for the first time’ (their words) events 65 million years ago, in a headline-grabbing theory that explained a world-wide layer of iridium at just the geological stratum that coincided with the disappearance of the dinosaurs.7 There is nothing illegitimate in the exercise. THE UNIVERSAL DATA WAREHOUSE The question that motivated this Symposium, ‘One Database or Two?’ was whether it is necessary to seek out a new database to test theories derived from a database at hand. Above I have argued that the issue is not the separation of the databases, but rather the independence of the test and hypothesis-generating data. Clearly, two physically separate databases whose information was independently derived by different investigators working in different sources meet the criterion of independence, but so do independently derived domains of single databases. Fortunately, the question may shortly be moot, because there will be in the future only one database. Let me explain In 1993, Philip Cole, a professor at the University of Alabama at Birmingham, provided a radical solution to the repeated critique that epidemiologists were finding unanticipated relations in data, and that the researchers were presuming to make statements about hypotheses that had not been specified in advance. In ‘The Hypothesis Generating Machine’, Cole announced the creation of the HGM, a machine that had integrated data on every agent, every means of exposure and every time relation together with every disease. From these, the HGM had formed every possible hypothesis about every possible relationship.8 Never again would a hypothesis be denigrated for having been newly inferred from data. In the same elegant paper, Cole also likened the idea that studies generate hypotheses to the once widely held view that piles of rags generate mouse pups. People generate hypotheses; inanimate studies do not. With acknowledgment to Cole, couldn’t we imagine a Universal Data Warehouse consisting of all data ever recorded? Some twists of relativity theory might even get us to postulate that the UDW could contain all future data as well.9 Henceforward, all tests of all hypotheses would occur by necessity in the UDW, whether or not the investigator was aware that his or her data were simply a view into the warehouse. Researchers would evermore test and measure the impact of hypotheses in the data that suggested them. The new procedure of resorting to the UDW will not constitute a departure from current practice, and may result in more efficient discussion. The reluctance of statisticians and philosophers to test a hypothesis in the data that generated it makes rigorous sense. I think that our disagreement, if there was one, on the enjoyable morning of our Symposium was definitional rather than scientific. In an earlier era, when dedicated, expensive collection was the only source of data, the sensible analysis plan extracted everything to be learned the first time through. Overwhelmed by information from public and private data streams, researchers now select out the pieces that seem right to answer the questions they pose. The KEY POINTS When examination of complex data leads to a biological hypothesis, that hypothesis may have implications that are testable in the original data. The test data need to be independent of the hypothesis-generating data. The ‘‘Universal Data Warehouse’’ reminds us of the futility of substituting data location for data independence. Copyright # 2010 John Wiley Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532 DOI: 10.1002/pds orthogonal predictions 531
  31. 31. answersraisenewquestions,differentones,anditmakes sense to pick out (from the same fire hose spurting facts) new data that will help us make sense of what we think we may have learned the first time through. Recorded experience is the database through which we observe, theorize, test, theorize, observe again, test again, and so on for as long as we have stamina and means. We certainly should have standards as to when data test a theory, but the standard does not need to be that the originating databases are different. ACKNOWLEDGEMENTS This paper owes much to many people, none of whom should be held accountable for its shortcomings, as author did not always agree with his friends’ good advice. Author is indebted to his co-participants in the Symposium, Larry Gould particularly, Patrick Ryan and Sebastian Schnee- weiss, and the deft organizers, Susan Sacks and Nancy Santanello for their valuable advice. He also thanks Phil Cole, Ken Rothman and Paul Stang for their careful reading and to-the-point commentary. There are no relevant finan- cial considerations to disclose. REFERENCES 1. Niu MT, Erwin DE, Braun MM. Data mining in the US Vaccine Adverse Event Reporting System (VAERS): early detection of intussusception and other events after rotavirus vaccination. Vaccine 2001; 19: 4627–4634. 2. Centers for Disease Control and Prevention (CDC). Suspension of rotavirus vaccine after reports of intussusception—United States, 1999. MMWR Morb Mortal Wkly Rep 2004; 53(34): 786–789. Erratum in: MMWR Morb Mortal Wkly Rep 2004; 53(37): 879. 3. World Health Organization. Report of the Meeting on Future Directions for Rotavirus Vaccine Research in Developing Countries, Geneva, 9–11 February 2000, Geneva (Publication WHO/VB/00.23). 4. Linhares AC, Bresee JS. Rotavirus vaccines and vaccination in Latin America. Rev Panam Salud Publica 2000; 8(5): 305–331. 5. DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System. Am Stat 1999; 53: 177–190. 6. Murphy TV, Gargiullo PM, Massoudi MS et al. Intussusception among infants given an oral rotavirus vaccine. N Engl J Med 2001; 344: 564– 572. 7. Ebel DS, Grossman L. Spinel-bearing spherules condensed from the Chicxulub impact-vapor plume. Geology 2005; 33(4): 293–296. 8. Cole P. The hypothesis generating machine. Epidemiology 1993; 4(3): 271–273. 9. Rindler W. Essential Relativity (rev. 2nd edn). Springer Verlag: Berlin, 1977. See Section 2.4. ‘The Relativity of Simultaneity’ for a particularly lucid presentation of this phenomenon. The warehouse does not of course contain all future data, as we will restrict it to information generated by and about humans. Copyright # 2010 John Wiley Sons, Ltd. Pharmacoepidemiology and Drug Safety, 2010; 19: 529–532 DOI: 10.1002/pds 532 a. m. walker
  32. 32. Special Issue Paper Received 4 November 2011, Accepted 28 August 2012 Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.5620 Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership‡ Patrick B. Ryan,a,b,c*† David Madigan,b,d Paul E. Stang,a,b J. Marc Overhage,b,e Judith A. Racoosinb,f and Abraham G. Hartzemab,g§ Background: Expanded availability of observational healthcare data (both administrative claims and electronic health records) has prompted the development of statistical methods for identifying adverse events associated with medical products, but the operating characteristics of these methods when applied to the real-world data are unknown. Methods: We studied the performance of eight analytic methods for estimating of the strength of association- relative risk (RR) and associated standard error of 53 drug–adverse event outcome pairs, both positive and negative controls. The methods were applied to a network of ten observational healthcare databases, comprising over 130 million lives. Performance measures included sensitivity, specificity, and positive predictive value of methods at RR thresholds achieving statistical significance of p 0.05 or p 0.001 and with absolute threshold RR 1.5, as well as threshold-free measures such as area under receiver operating characteristic curve (AUC). Results: Although no specific method demonstrated superior performance, the aggregate results provide a benchmark and baseline expectation for risk identification method performance. At traditional levels of statistical significance (RR 1, p 0.05), all methods have a false positive rate 18%, with positive predictive value 38%. The best predictive model, high-dimensional propensity score, achieved an AUC D 0.77. At 50% sensitivity, false positive rate ranged from 16% to 30%. At 10% false positive rate, sensitivity of the methods ranged from 9% to 33%. Conclusions: Systematic processes for risk identification can provide useful information to supplement an overall safety assessment, but assessment of methods performance suggests a substantial chance of identifying false positive associations. Copyright © 2012 John Wiley Sons, Ltd. Keywords: product surveillance, postmarketing; pharmacoepidemiology; epidemiologic methods; causality; electronic health records; adverse drug reactions 1. Introduction The U.S. Food and Drug Administration Amendments Act of 2007 required the establishment of an ‘active postmarket risk identification and analysis system’ with access to patient-level observational data from 100 million lives by 2012 [1]. In this context, we define ‘risk identification’ as a systematic aJohnson Johnson Pharmaceutical Research and Development LLC, Titusville, NJ, U.S.A. bObservational Medical Outcomes Partnership, Foundation for the National Institutes of Health, Bethesda, MD, U.S.A. cUNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, U.S.A. dDepartment of Statistics, Columbia University, New York, NY, U.S.A. eRegenstrief Institute and Indiana University School of Medicine, Indianapolis, IN, U.S.A. fCenter for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, U.S.A. gCollege of Pharmacy, University of Florida, Gainesville, FL, U.S.A. *Correspondence to: Patrick B. Ryan, Johnson Johnson 1125 Trenton-Harbourton Road PO Box 200 MS K304 Titusville, NJ 08560, U.S.A. †E-mail: ryan@omop.org ‡This article expresses the views of the authors and does not necessarily represent those of their affiliated organizations. §At the time of this work, Dr. Hartzema was on sabbatical at the U.S. Food and Drug Administration. Copyright © 2012 John Wiley Sons, Ltd. Statist. Med. 2012
  33. 33. P. B. RYAN ET AL. and reproducible process to efficiently generate evidence to support the characterization of the potential effects of medical products. This system applied to a network of observational healthcare databases would provide another source of evidence to complement existing safety information contributed by preclinical data, clinical trials, spontaneous adverse event reports, registries, and pharmacoepidemiology evaluation studies. When used in conjunction with evidence of the benefits of the product and alternative treatments a more comprehensive understanding of the effects of medical products promises to inform medical decision making. The practicing clinician has a critical role in both the generation of quality data that can be used for these efforts and integration of the findings from safety assessments into routine practice; both of which become increasingly important in the evolution of the electronic health record and the creation of a ‘learning healthcare system’ [2]. The secondary use of observational healthcare databases (e.g., administrative claims and electronic health records) has become the predominant resource in pharmacoepidemiology, health outcomes, and health services research because it reflects ‘real-world’ experience. Unlike well-designed and well-performed randomized clinical trials, the use of observational data requires special consideration of potential biases that can distort the measurement of the true effect size. Researchers can choose from a variety of analytic methods that attempt to control for these biases; however, the operating characteristics of these methods and their potential utility within a risk identification system have not been systematically studied. The Observational Medical Outcomes Partnership (OMOP; http://omop.fnih.org) conducts methodological research to support the development of a national risk identification and analysis system; the details of which have been previously published [3]. The OMOP research plan consists of a series of empirical assessments of the performance characteristics of a number of analysis methods conducted across a network of observational data sources. This paper reports findings from a series of assessments of risk identification methods to determine their ability to correctly identify ‘true’ drug–adverse event outcome associations and drug–adverse outcome negative controls as ‘not associated’. 2. Methods The OMOP established a network of ten data sources capturing the healthcare experience of 130 million patients. The data network included administrative claims data (SDI Health, Humana Inc., and four Thomson Reuters MarketScan® Research Databases reflecting commercial claims with and without laboratory records, Medicare supplemental, and multistate Medicaid populations) and electronic health records (Regenstrief Institute, Partners Healthcare System, GE Centricity, and Department of Veterans Affairs Center for Medication Safety/Outcomes Research). Table I depicts the characteristics and popu- lation sizes of each data source. The data sources in the OMOP were selected to reflect the diversity of U.S. observational data [4]. This research program was approved or granted exemption by the Insti- tutional Review Boards at each participating organizations. All of these datasets were transformed to a common data model, where data about drug exposure and condition occurrence were structured in a consistent fashion and defined using the same controlled terminologies, to facilitate subsequent analysis [5]. A total of 13 different analytic methods were implemented during the OMOP experiment. Complete descriptions, references, and source code for each method are available at http://omop.fnih.org/Methods Library, of those eight report estimates of relative risk (RR) and its standard error. In this paper, we examine these eight methods. Results for the remaining five methods are available upon request. Each method had multiple parameter settings corresponding to various study design decisions, including definition of time-at-risk, identification of outcomes based on first occurrence or all occurrences of diagnosis codes, choice of comparator group, and specific confounding adjustment strategy. The specific parameters for each method and the number of parameter combinations studied for each method are shown in Table II. The performance of the analytical methods was assessed on the basis of their ability to correctly identify nine drug–outcome pairs that were classified as ‘positive controls’ and 44 drug–outcome pairs classified as ‘negative controls’. Positive controls were true associations as determined by the listing of the corresponding outcome as an adverse event in the drug product label along with prior published observational database research suggesting an association; subsequently, these positive controls were endorsed by expert panel consensus; negative controls lacked such evidence in their labeling and published literature and were ruled out as having a positive association by the expert panel. Members of the OMOP’s advisory boards and other participants [3] and literature references for the test cases [6] Copyright © 2012 John Wiley Sons, Ltd. Statist. Med. 2012

×