Too Good to Be True: Health
Psychology’s Dependence on
Underpowered Positive Studies

James C. Coyne, Ph.D.
University of ...
Long a pervasive problem…
 Lack

of sufficient resources to conduct
well-designed, amply powered studies

 Confusion

ab...
“We are grateful to the Society of Behavioral
Medicine (SBM) for selecting the authorship
group. This article is one of th...
SBM Initiative
Meta-analyses generated by professional
organizations should receive special
critical scrutiny because of t...
Small Studies
 Suffer

strong publication bias.

 Negative

findings go unpublished because
the studies are too small.

...
Small Studies
 Require

a larger effect size for statistical
significance.



Published results tend to be exaggerated
a...
Small Trials Likely to Have Outliers,
and With Publication Bias, Yield
Results That Won’t Replicate
Hospital A has 10 birt...
Small Studies


Are particularly vulnerable to selective loss of
patients to follow-up and to investigators, outcome
rate...
Sample Size


Sample size is the best proxy for other sources of
bias in trials.



Sample size negatively predicts over...
Gorin, et al "Metaanalysis of psychosocial
interventions to reduce
pain in patients with
cancer." Journal of
Clinical Onco...
Forest plot of effect sizes (g) for studies measuring pain severity (k = 38).

Sheinfeld Gorin S et al. JCO 2012;30:539-54...
What the SBM Authors Claimed
about Psychosocial Interventions
for Cancer Pain


“Robust findings" of "substantial rigor" ...
19 of 38 studies had less than 35 patients in
the intervention or control group. Two of the
other largest trials should ha...
For 4 studies having the largest effect sizes,
1 had only 20 patients receiving relaxation;
the next largest had 10 patien...
Some of the studies quite
small


7 patients receiving pain education

 10

patients receiving hypnosis

 16

patients ...
What is Left
Montgomery

0.9

0.61

2.67

Hypnosis

Lang

0.42

0.08

0.78

Hypnosis

Allard

0.32

-0.05

0.68

Nursing/P...
Synthesis: pooling the results
Hart, et al. "Meta-analysis of
efficacy of interventions for
elevated depressive
symptoms in adults
diagnosed with cancer....
.

Hart S L et al. JNCI J Natl Cancer Inst 2012;104:990-1004
© The Author 2012. Published by Oxford University Press. All ...
3 studies classified as “psychotherapeutic”
were complex collaborative care
interventions for depression emphasizing
medic...
Of the 2 remaining studies, 1 randomly
assigned 45 patients to either problemsolving or waitlist control and retained only...
With Removal of Small and
Inappropriately Classified Studies

No Eligible Studies Were Left
Fail-safe N of 106 confirms the relative
stability of the observed effect size.
“Our findings advance this literature
by d...
Fail Safe N is Pseudo-Precise
Nonsense
Don’t Be Intimidated by Exaggerated
Estimates of Number of Unpublished
Studies Need...
Deficiencies of Failsafe N
 Combining

Z scores does not directly
account for sample sizes of the studies.

 Choice

of ...
Deficiencies of Failsafe N
 Estimates

of failsafe N not influenced by
evidence of bias in the data.
 Guesswork to estim...
Are Small, Unpowered Studies
Good for Anything?
Leon, Andrew C., Lori L. Davis, and
Helena C. Kraemer. The role and
interp...
A pilot study is not a
hypothesis testing study.
Efficacy and effectiveness are
not evaluated in a pilot.
A pilot study does not provide a
meaningful effect size estimate for
planning subsequent studies due to
the imprecision in...
Upcoming SlideShare
Loading in …5
×

The folly of believing positive findings from underpowered intervention studies

1,752 views

Published on

Presented at the European Health Psychology Conference, July 13, 2013, This slideshow shows the folly of accepting positive findings from underpowered studies. Much of the "evidence" in health psychology comes from such unreliable studies.

Published in: Health & Medicine, Business
3 Comments
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,752
On SlideShare
0
From Embeds
0
Number of Embeds
110
Actions
Shares
0
Downloads
10
Comments
3
Likes
0
Embeds 0
No embeds

No notes for slide
  • Forest plot of effect sizes (g) for studies measuring pain severity (k = 38).
  • . Forest plot of effect sizes (Hedges’ g, designated g in the figure) for trials included in the meta-analysis (58–62,72–75). The corresponding 95% CI (designated “Lower” and “Upper” and indicated graphically by whisker bars) are also given. Effect sizes for the trials containing two intervention groups are displayed separately (59,62). CBT = cognitive behavioral therapy; CI = confidence interval; D = desipramine; P = paroxetine; SS = social support.
  • The folly of believing positive findings from underpowered intervention studies

    1. 1. Too Good to Be True: Health Psychology’s Dependence on Underpowered Positive Studies James C. Coyne, Ph.D. University of Groningen, University Medical Center Groningen, The Netherlands Twitter @CoyneoftheRealm
    2. 2. Long a pervasive problem…  Lack of sufficient resources to conduct well-designed, amply powered studies  Confusion about pilot studies: Cannot be the basis for evaluating efficacy or estimating effect sizes!
    3. 3. “We are grateful to the Society of Behavioral Medicine (SBM) for selecting the authorship group. This article is one of three metaanalyses that have been undertaken under the aegis of the SBM Evidence-Based Behavioral Medicine Committee; the other two metaanalyses examine the effects of psychosocial interventions on depression and fatigue among patients with cancer.”
    4. 4. SBM Initiative Meta-analyses generated by professional organizations should receive special critical scrutiny because of tenancy to gloss over limits of literature in order to promote the services of their membership.
    5. 5. Small Studies  Suffer strong publication bias.  Negative findings go unpublished because the studies are too small.  Positive findings celebrated because they were obtained despite the smallness.
    6. 6. Small Studies  Require a larger effect size for statistical significance.  Published results tend to be exaggerated and not to be replicated in larger and better quality later studies.
    7. 7. Small Trials Likely to Have Outliers, and With Publication Bias, Yield Results That Won’t Replicate Hospital A has 10 births per month on average. Hospital B has 100 births per month on average. In January, one of the hospitals reported 70% of the births were girls. Is it more likely in A, B, or equally likely to be in either?
    8. 8. Small Studies  Are particularly vulnerable to selective loss of patients to follow-up and to investigators, outcome raters knowing to which condition patients are assigned.  Investigators can naïvely or deliberately monitor incoming data and stop the trial when a positive finding has been obtained, even when it is a chance finding that would be undone with continued accumulation of patients.
    9. 9. Sample Size  Sample size is the best proxy for other sources of bias in trials.  Sample size negatively predicts overall effect size.  In presence of small study effects, restriction of analyses to large trials or predictions of treatment benefits observed in large trials might provide more valid estimates than overall analyses of trials, irrespective of sample size.
    10. 10. Gorin, et al "Metaanalysis of psychosocial interventions to reduce pain in patients with cancer." Journal of Clinical Oncology 30: (5): (2012): 539-547.
    11. 11. Forest plot of effect sizes (g) for studies measuring pain severity (k = 38). Sheinfeld Gorin S et al. JCO 2012;30:539-547 ©2012 by American Society of Clinical Oncology
    12. 12. What the SBM Authors Claimed about Psychosocial Interventions for Cancer Pain  “Robust findings" of "substantial rigor" and “strong evidence for psychosocial pain management approaches."  Claimed findings supported the “systematic implementation" of these techniques.  Estimated would take 812 unpublished studies lurking in file drawers to change their assessment.
    13. 13. 19 of 38 studies had less than 35 patients in the intervention or control group. Two of the other largest trials should have been excluded for other reasons. Of 13 studies individually having significant effects on pain severity, 8 would have been excluded because they were too small, 1 because it should not have been included in the first place.
    14. 14. For 4 studies having the largest effect sizes, 1 had only 20 patients receiving relaxation; the next largest had 10 patients who were hypnotized; the next, 20 patients listening to the relaxation tape versus 20 patients getting live instructions, but these numbers were obtained by replacing patients who dropped out. Study with the fourth largest effect size had 15 patients receiving training in selfhypnosis.
    15. 15. Some of the studies quite small  7 patients receiving pain education  10 patients receiving hypnosis  16 patients getting pain education  16 patients getting self hypnosis  8 patients getting relaxation plus 8 patients getting CBT plus relaxation
    16. 16. What is Left Montgomery 0.9 0.61 2.67 Hypnosis Lang 0.42 0.08 0.78 Hypnosis Allard 0.32 -0.05 0.68 Nursing/Patient Ed Rimer 0.31 -0.02 0.63 Nursing/Patient Ed Yates 0.3 -0.03 0.63 Nursing/Patient Ed DeWit 0.14 -0.08 0.36 Nursing/Patient Ed DeWit Van Dam -0.19 -0.61 0.24 Nursing/Patient Ed Gaston-Johansson -0.28 -0.65 0.46 Comprehensive Coping
    17. 17. Synthesis: pooling the results
    18. 18. Hart, et al. "Meta-analysis of efficacy of interventions for elevated depressive symptoms in adults diagnosed with cancer." Journal of the National Cancer Institute 104:13 (2012): 990-1004.
    19. 19. . Hart S L et al. JNCI J Natl Cancer Inst 2012;104:990-1004 © The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
    20. 20. 3 studies classified as “psychotherapeutic” were complex collaborative care interventions for depression emphasizing medication management. These studies provided the bulk [527] of the patients in the authors' calculation of the effect size for psychotherapeutic intervention.
    21. 21. Of the 2 remaining studies, 1 randomly assigned 45 patients to either problemsolving or waitlist control and retained only 37 patients for analyses. Final study contributed 2 effect sizes based on comparisons of 29 patients receiving CBT and 23 receiving supportive therapy to the same 26-patient no-treatment control group, thus violating the assumption of independence of effect sizes.
    22. 22. With Removal of Small and Inappropriately Classified Studies No Eligible Studies Were Left
    23. 23. Fail-safe N of 106 confirms the relative stability of the observed effect size. “Our findings advance this literature by demonstrating that psychological and pharmacologic approaches, evaluated in RCTs, can be targeted productively toward cancer patients in need of intervention by virtue of clinical depression or elevated depressive symptoms.”
    24. 24. Fail Safe N is Pseudo-Precise Nonsense Don’t Be Intimidated by Exaggerated Estimates of Number of Unpublished Studies Needed to Unseat Conclusions Based on Meta Analysis of Underpowered Studies.
    25. 25. Deficiencies of Failsafe N  Combining Z scores does not directly account for sample sizes of the studies.  Choice of zero for the average effect of the unpublished studies is arbitrary, almost certainly biased.  Allowing for unpublished negative studies substantially reduces failsafe N.
    26. 26. Deficiencies of Failsafe N  Estimates of failsafe N not influenced by evidence of bias in the data.  Guesswork to estimate the magnitude of unpublished studies in the area.  Heterogeneity among the studies is ignored.  Method is not influenced by the shape of the funnel graph.
    27. 27. Are Small, Unpowered Studies Good for Anything? Leon, Andrew C., Lori L. Davis, and Helena C. Kraemer. The role and interpretation of pilot studies in clinical research. Journal of Psychiatric Research 45:5 (2011): 626-629.
    28. 28. A pilot study is not a hypothesis testing study. Efficacy and effectiveness are not evaluated in a pilot.
    29. 29. A pilot study does not provide a meaningful effect size estimate for planning subsequent studies due to the imprecision inherent in data from small samples. Feasibility results do not necessarily generalize beyond the inclusion and exclusion criteria of the pilot design. .

    ×