ANESTH ANALG EDITORIAL 14552005;101:1454 –6issue has been addressed in multiple recent review Clinicians should be cautioned to not interpret mag-articles and editorials in the general medical and psy- nitude of change (effect size) as an indication of clin-chological literature (4 – 8). ical significance. The clinical significance of a treat- In an attempt to address some of the limitations of ment should be based on external standards providedthe P value, the use of the confidence intervals (CI) has by patients and clinicians. That is, a small effect sizebeen advocated by some clinicians (9). One should may still be clinically significant and, likewise, a largerealize, however, that these two definitions of statisti- effect size may not be clinically significant, dependingcal significance are essentially reciprocal (10). That is, on what is being studied. Indeed, there is a growinggetting a P 0.05 is the same as having a 95% CI that recognition that traditional methods used, such asdoes not overlap zero. CIs can also, however, be used statistical significance tests and effect sizes, should beto estimate the size of difference between groups in supplemented with methods for determining clini-addition to merely indicating the existence or absence cally significant changes. Although there is little con-of statistical significance (11). This later approach, sensus about the criteria for these efficacy standards,however, is not widely used in the medical and psy- the most prominent definitions of clinically significantchological literature, and today CIs are mostly used as change include: 1) treated patients make a statisticallysurrogates for the hypothesis test rather than consid- reliable improvement in the change scores; 2) treatedering the full range of likely effect size. patients are empirically indistinguishable from a nor- The group of statistics called “effect sizes” designate mal population after treatment, or 3) changes of atindices that measure the magnitude of difference be- least one sd. The most frequently used method fortween groups, controlling for variation within the evaluating the reliability of change scores is thegroups; effect sizes can be thought of as a standard- Jacobson-Truax method in combination with clinicalized difference. In other words, although a P value cutoff points (15). Using this method, change is con-denotes whether the difference between two groups in sidered reliable, or unlikely to be the product of meas-a particular study is likely to occur solely by chance, urement error, if the reliable change index (RCI) isthe effect size quantifies the amount of difference be- more than 1.96. That is, when the individual has atween the two groups. Quantification of effect size change score more than 1.96, one can reasonably as-does not rely on sample size but instead relies on the sume that the individual has improved.strength of the intervention. There are a number of Unfortunately, most of the methods above are dif-different types of effect sizes and a description of these ficult to adopt in the perioperative arena, as compar-various types and formulae is beyond the scope of this ison with a normal population is not an option in mosteditorial. We refer the interested reader to review trials, and the RCI, which controls for statistical issuesarticles that describe the various types of effect sizes involving the assessment tool, is a somewhat compli-and their calculation methodology (12,13). Effect sizes cated and controversial technique. Thus, clinical sig-of the d type are the most commonly used in the nificance in the perioperative arena may be best as-medical literature, as they are primarily used to com- sessed by posing a particular question such as “is apare two treatment groups. D type effect size is de- change of 8.5% reduction in intraoperative bleed clin-fined as the magnitude of difference between two ically significant?” or “how many sd does this changemeans, divided by the sd [(Mean of control group represent?” Obviously, both of these questions have aMean of treatment group)/sd of the control group]. subjective component in them and although it is tra-Thus, the d effect size is dependent on variation ditionally agreed that at least a 1-sd change is gener-within the control group and the differences between ally needed for clinical significance, this boundary hasthe control and intervention groups. Values of the d no scientific underpinning. The validity of a clinicaltype effect sizes range from to , where zero cutoff for these last two methods can be improved bydenotes no effect and values less than or more than establishing external validity (e.g., patient perspec-zero are treated as absolute values when interpreting tive) for the decision. For example, Flor et al. (16) havemagnitude. Conventionally, d type effect sizes that are conducted a large meta-analysis that was aimed atnear 0.20 are interpreted as small, effect sizes near 0.50 evaluating the effectiveness of multidisciplinary reha-are considered “medium,” and effect sizes in the range bilitation for chronic pain. The investigators foundof 0.80 are considered “large” (14). However, interpre- that pain among the patients who received the inter-tation of the magnitude of an effect size depends on vention was indeed reduced by 25%. This reductionthe type of data gathered and the discipline involved. was certainly statistically significant and had an effectEffect sizes of another type—the risk potency type— size of 0.7. Colvin et al. (17), however, reported earlierinclude likelihood ratios such as odds ratio, risk ratio, that patients would consider only a 50% improvementrisk difference, and relative risk reduction. Clinicians in their pain levels as a treatment “success.” Thus, inare probably more familiar with these less abstract this example, a reduction of 25% in pain scores may bestatistics and it may be helpful to realize that likeli- statistically, but not clinically, significant. Clearly thishood statistics are a type of effect size. is a developing area that warrants further discussion.
1456 EDITORIAL ANESTH ANALG 2005;101:1454 –6 In conclusion, we suggest that reporting of periop- 7. Greenstein G. Clinical versus statistical significance as they relate to the efficacy of periodontal therapy. J Am Dent Assocerative medical research should continue beyond re- 2003;134:1168 –70.porting results consisting primarily of descriptive and 8. Sterne JAC, Smith GD, Cox DR. Sifting the evidence: what’sstatistically significant or nonsignificant findings. The wrong with significance tests? Another comment on the role ofinterpretation of findings should occur in the context statistical methods. BMJ 2001;322:226 –31. 9. Simon R. Confidence intervals for reporting results of clinicalof the magnitude of change that occurred and the trials. Ann Intern Med 1986;105:429 –35.clinical significance of the findings. 10. Feinstein AR. P-values and confidence intervals: two sides of the same unsatisfactory coin. J Clin Epidemiol 1998;51:355– 60. 11. Gardner MG, Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. BMJ 1986;292:References 746 –50.1. Fisher RA. Statistical methods for research workers, 1st ed. 12. Kirk R. Practical significance: A concept whose time has come. Edinburgh: Oliver and Boyd, 1925. Reprinted by Oxford Uni- Educ Psychol Meas 1996;56:746 –59. versity Press. 13. Snyder P, Lawson S. Evaluating results using corrected and2. Fisher RA. Design of experiments. 1st ed. Edinburgh: Oliver and uncorrected effect size estimates. J Exper Educ 1993;61:334 –349. Boyd, 1935. Reprinted by Oxford University Press. 14. Cohen J. Statistical power analysis for the behavioral sciences,3. Fisher RA. Statistical methods for research workers. London: 2nd ed. Mahwah, New Jersey: Lawrence Erlbaum, 1988. Oliver and Boyd, 1950:80. 15. Jacobson NS, Truax P. Clinical significance: A statistical ap-4. Borenstein M. Hypothesis testing and effect size estimation in proach to defining meaningful change in psychotherapy re- clinical trials. Ann Allergy Asthma Immunol 1997;78:5–11. search. J Consult Clinic Psych 1991;59:12–9.5. Matthey S. P 0.05: but is it clinically significant? Practical 16. Flor H, Fydrich T, Turk DC. Efficacy of multidisciplinary pain examples for clinicians. Behav Change 1998;15:140 – 6. treatment centers: a meta-analytic review. Clin J Pain 1992;49:6. Cummings P, Rivara FP. Reporting statistical information in 221–30. medical journal articles. Arch Pediatr Adolesc Med 2003;157: 17. Colvin DF, Bettinger R, Knapp R, et al. Characteristics of pa- 321– 4. tients with chronic pain. South Med J 1980;73:1020 –3.