Acs0004 Evidence Based Surgery


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Acs0004 Evidence Based Surgery

  1. 1. © 2006 WebMD, Inc. All rights reserved. ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 4 Evidence-Based Surgery — 1 4 EVIDENCE-BASED SURGERY Samuel R. G. Finlayson, M.D., M.P.H., F.A.C.S. Evidence-based surgery may be defined as the consistent and judi- interpreting the best scientific evidence. Scientific reviews serve as cious use of the best available scientific evidence in making decisions secondary sources for evidence-based practice and are increasing- about the care of surgical patients. It is not an isolated phenomenon; ly found in journals, in books, and on the Internet. Prominent rather, it is one part of a broad movement—evidence-based medi- examples include Clinical Evidence (published semiannually by cine—whose aim is to apply the scientific method to all of medical the British Journal of Medicine and continually updated online practice.The historical roots of this broad movement lie in the pio- []) and the Cochrane Database of neering work of the Scottish epidemiologist Archibald Cochrane Systematic Reviews ( SCIP serves as a (1909–1988), for whom the preeminent international organization clearinghouse for evidence-based guidelines that specifically for research in evidence-based medicine (the Cochrane Collabo- address surgical practices. ration) is named.The term evidence-based medicine itself was pop- These efforts to summarize and disseminate information about ularized by a landmark article that appeared in the Journal of the evidence-based surgery provide a convenient means of access to American Medical Association in 1992.1 This article advocated a new the surgical literature that can be very helpful to practicing sur- approach to medical education, urging physicians and educators to geons. Such aids, however, are far from complete, and new evi- deemphasize “intuition, unsystematic clinical experience, and patho- dence emerges continually. Accordingly, modern evidence-based physiologic rationale as sufficient grounds for clinical decision mak- surgeons cannot afford to rely entirely on these sources: they must ing.” In essence, advocates of evidence-based medicine seek to demote also learn to assess the quality of individual scientific studies for so-called expert opinion from its previous relatively high standing, themselves, as well as to interpret the implications of these studies regarding it as being, in fact, the least valid basis for making clinical for their own practices. decisions. As a consequence of the growth of this movement, the LEVELS OF EVIDENCE discipline of surgery, once driven more by the eminence of tradition than by the evidence of science, now increasingly requires its stu- Evidence for surgical practices comes in many forms, with vary- dents to adopt evidence-based scientific standards of practice. ing degrees of reliability. At one end of the spectrum is an empir- The imperative that surgical care be delivered in accordance ical impression that a practice makes physiologic sense and seems with the best available scientific evidence is only one facet of evi- to work well; much of what surgeons actually do falls into this cat- dence-based surgery. Other facets include systematic efforts to egory and has not been formally tested. At the other end of the establish standards of care supported by science and the move- spectrum is evidence accumulated from multiple carefully con- ment to popularize evidence-based practice. Systematic reviews of ducted scientific experiments that yield consistent and repro- the literature are often generated by independent researchers or ducible results.The ultimate task of the evidence-based surgeon is collaborative study groups (e.g., Cochrane collaborations) and to select practices that conform to the best evidence available; to published as review articles in journals or disseminated as practice that end, it is essential to be able to judge the reliability of scientif- guidelines. The movement to popularize evidence-based surgical ic evidence. practice is a relatively recent phenomenon that is exemplified by In an effort to help clinicians judge the strength of scientific evi- the activities of the Surgical Care Improvement Project (SCIP) dence, researchers have attempted to create hierarchies of evi- ( Although re- dence, in which the highest places are occupied by those sources searchers are charged with generating and disseminating scientif- that are most reliable and the lowest places by those that are least ic evidence, the greatest responsibility for the success of evidence- sure.With the understanding that not all practices have been sub- based surgery ultimately lies with individual surgeons, who must jected to the highest levels of scientific scrutiny, clinicians are not only practice evidence-based surgery but also understand and advised to base practices on evidence gleaned from studies as high appropriately interpret an immense surgical literature. on the evidence hierarchy as possible. In this chapter, I provide a framework for evaluating the An oft-cited example of such an evidence hierarchy is the levels- strength of evidence for surgical practices, examining the validity of-evidence system popularized by the United States Preventive of scientific studies in surgery, and assessing the role of evidence- Medicine Task Force (USPMTF) [see Table 1].2 Since the incep- based surgery in measuring and improving the quality of surgical tion of this system, the terms and concepts it employs have care.These are the basic conceptual and analytic tools that a mod- become common parlance among clinicians—hence, for example, ern evidence-based surgeon needs to navigate the surgical litera- the frequently heard references to level 1 evidence (i.e., evidence ture and implement practices that are based on sound science. from well-conducted, randomized, controlled trials). However, almost as soon as the USPMTF levels-of-evidence system was released, debate about its adequacy began.3 The predominant Evaluation of Strength of Evidence for Surgical Practices criticism has been that the system is too simple and inflexible to provide a precise description of the strength of evidence for clini- GUIDELINES AND SECONDARY SOURCES OF SCIENTIFIC cal practices. Although the system identifies the design of the study EVIDENCE from which the evidence is drawn, it does not consider certain To meet the growing demand for evidence-based practice infor- other important factors that influence the quality of the study. For mation, a market has developed around the process of pooling and example, in the USPMTF system, the same grade is awarded to a
  2. 2. © 2006 WebMD, Inc. All rights reserved. ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 4 Evidence-Based Surgery — 2 randomized, double-blind, placebo-controlled trial with 50,000 Table 1 Levels of Evidence According to subjects as to a randomized, unblinded trial with 30 subjects. USPMTF Schema Furthermore, a higher grade is awarded to the latter trial than to a well-designed, well-conducted, multi-institution, prospective co- Level of Evidence Source of Evidence hort study with 10,000 subjects. In response to the deficiencies of the USPMTF system, nu- I At least one properly randomized, controlled trial merous alternative grading systems have been developed that take II-1 Well-designed controlled trials without randomization into account factors other than study design, such as quality, con- II-2 Well-designed cohort or case-control analytic study, sistency, and completeness. Nevertheless, it is widely recognized preferably from more than one center or research group that no grading system yet developed is perfect.4 Consequently, II-3 Multiple time-series with or without intervention or, pos- surgeons are often required to judge the quality and applicability sibly, dramatic results from uncontrolled trials (e.g., penicillin treatment in 1940s) of scientific evidence for themselves. III Opinions from respected authorities based on clinical experience, descriptive studies, and case reports or APPRAISING SCIENTIFIC EVIDENCE opinions from committees of experts Specific study designs are associated with different levels of confidence about cause and effect. The clinical study design that Category of Basis of Recommendation Recommendation is considered to have the greatest potential for determining causa- tion is the randomized, controlled clinical trial. However, even Level A Good and consistent scientific evidence studies with this design can lead to erroneous conclusions if they Level B Limited or inconsistent scientific evidence are not performed properly.Therefore, in evaluating the quality of Level C Consensus, expert opinion, or both clinical evidence, it is not sufficient simply to ascertain the design USPMTF–United States Preventive Medicine Task Force of the study that produced the evidence: one must also take a close look at how the study was conceived, implemented, ana- lyzed, and interpreted. are two types of chance-related errors: type I and type II. Type I Scientific evidence from studies of clinical practice relies on two error (also called α error) occurs when researchers erroneously re- important inferences.The first inference is that the observed out- ject the null hypothesis—that is, they infer that there is a differ- come is the result of the practice and cannot be attributed to some ence in outcomes when no difference really exists. Type II error alternative explanation.When this inference is deemed to be true, (also called β error) occurs when researchers erroneously confirm the study is considered to have internal validity.The second infer- the null hypothesis—that is, they infer that there is no difference ence is that what was observed in the clinical study is relevant to in outcomes when a difference really does exist. scenarios outside the study in which the surgeon seeks to imple- ment the practice. The extent to which this inference is true is Type I errors Statistical testing is used to quantify the likeli- referred to as external validity or generalizability.Whereas internal hood of a type I error. A variable commonly employed for this validity is determined by how well the study is conducted and purpose is the P value, which is a measure of the probability that how accurately the results are analyzed, external validity is deter- observed differences between groups might be attributable to mined by how well the study plan reflects the real-world clinical chance alone. The threshold for statistical significance is conven- question that inspired it and how well the study’s conclusions tionally set at a P value of 0.05, which signifies that the likelihood apply to real-world scenarios outside the study [see Figure 1]. that the observed differences would occur by chance alone is 5%. External validity can also refer to the difference between an inter- Although a P value of 0.05 falls short of absolute certainty, it is vention’s efficacy (how well it works when applied perfectly) and generally accepted as sufficient for scientific proof. its effectiveness (how well it works when applied generally in an An alternative expression of statistical likelihood is the confi- uncontrolled environment); when this difference is substantial, the dence interval, which is a measure of the probability that the study’s external validity is poor. observed difference would occur if the same study were repeated an infinite number of times. For example, a confidence interval of 95% indicates that the observed difference would be present in Assessment of Validity of Scientific Studies in Surgery 95% of the repetitions of the study. There are many statistical tests that can be used to calculate P INTERNAL VALIDITY: EVALUATING STUDY QUALITY values and confidence intervals. Which statistical test is most Assessment of the internal validity of a study requires an under- appropriate for a particular situation depends on several factors, standing of the potential influence of three key factors: chance, including the number of observations in the comparison groups, bias, and confounding. Chance refers to unpredictable random- the number of groups being compared, whether two or more ness of events that might mislead researchers. Bias refers to sys- groups are being compared to each other or one group is being tematic errors in how study subjects are selected or assessed. compared to itself after some time interval, what kind of numeri- Confounding refers to differences in the comparison groups (other cal data are being analyzed (e.g., continuous or categorical), and than the intended difference that is the subject of the comparison) whether risk adjustment is required. It is likely that only a minor- that lead to differences in outcomes. ity of surgeons will have a firm grasp of all the nuances of the more complex statistical analyses. Fortunately, however, most clinical Chance surgical studies are designed simply enough to employ statistical In clinical studies that compare outcomes between two or more tests that are within the reach of the nonstatistician. groups, the assumption that there is no difference in outcomes is called the null hypothesis. Erroneous conclusions with regard to Type II errors Type II errors often occur when the sample the null hypothesis can sometimes occur by chance alone. There size is simply too small to permit detection of small but clinically
  3. 3. © 2006 WebMD, Inc. All rights reserved. ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 4 Evidence-Based Surgery — 3 important differences in outcomes between comparison groups. in a trial of medical versus surgical treatment of gastroesophageal When a study’s sample size is insufficient for identification of out- reflux disease, selecting study subjects from a group of diners at a come differences, the study is said to lack sufficient statistical power. Szechuan Chinese restaurant might lead to results favoring med- Once a study is complete, no amount of analysis can correct for in- ical treatment (in that consumers of Szechuan Chinese food may sufficient statistical power. Therefore, before starting a study, re- be more likely to have well-controlled reflux). When assessing the searchers should perform what is known as a power calculation, which validity of scientific evidence, surgeons must carefully consider the involves determining the minimum size a difference must have to characteristics of the subjects selected for study. be meaningful, then calculating the minimum number of observa- tions that would be required to demonstrate such a difference sta- Information bias The term information bias applies to any tistically. Surgeons should be particularly cautious when evaluating problem caused by the way in which outcome data or other perti- studies with null findings, particularly when no power calculation nent data are obtained. As an example, in a study of sexual func- is explicitly reported. It is wise to remember that, as the adage has it, tion after surgical treatment of rectal cancer, subjects may report no evidence of effect is not necessarily evidence of no effect. symptoms differently in an in-person interview from how they would report them in an anonymous mailed survey. As another Bias example, in a study of hernia repair outcomes, rates of chronic The term bias refers to a systematic problem with a clinical postoperative pain might be incorrectly reported if surgeons assess study that results in an incorrect estimate of the differences in out- outcomes in their own patients. Also, recall bias (i.e., selective comes between comparison groups. There are two general types memory of past events) is a type of information bias to which ret- of bias: selection bias and information bias. The former results rospective clinical studies are particularly vulnerable. from errors in how study subjects are chosen, whereas the latter Information bias is often more subtle than selection bias; results from errors in how information about exposures or out- accordingly, particular attention to the reported study methods is comes (or other pertinent information) is obtained. required to control this problem. Measures employed to control information bias include blinding and prospective study design. Selection bias The term selection bias applies to any imper- Confounding fection in the selection process that results in a study population The term confounding refers to differences in outcomes that containing either the wrong types of subjects (i.e., persons who are occur because of differences in the respective baseline risks of the not typical of the target population) or subjects who, for some rea- comparison groups. Confounding is often the result of selection son unrelated to the intervention being evaluated, are more likely bias. For example, a comparison of mortality after open colectomy to have the outcome of interest. As an example, paid volunteer with that after laparoscopic colectomy might be skewed because of subjects may be more motivated to comply with treatment regi- the greater likelihood that open colectomy will be performed on mens and report favorable results than unpaid subjects are, and an emergency basis in a critically ill patient. In other words, sever- this difference may result in overestimation of the effect of an ity of illness might confound the observed association between intervention. Such overestimation may involve both the internal mortality and surgical approach. validity and the external validity of the study. As another example, Confounding can be effectively addressed by means of random- ization. When subjects are randomized, potentially confounding variables (both recognized and unrecognized) are likely to be evenly distributed across comparison groups. Thus, even if these REAL WORLD variables influence the baseline rates of certain outcomes in the cohort as a whole, they are unlikely to have a significant effect on and Design Study Apply Evidence differences observed across comparison groups.When randomiza- Ask Questions tion is not practical, confounding can be minimized by tightly con- to Practice trolling the study entry criteria. For instance, in the aforemen- tioned comparison of open and laparoscopic colectomy (see Processes Related above), one might opt to include only elective colectomies. It should to External Validity be kept in mind, however, that restrictive entry criteria can some- times limit generalizability. Another way to combat confounding is to use statistical risk-adjustment techniques; the downside to these is that they can reduce statistical power. EXTERNAL VALIDITY: INTERPRETING AND APPLYING Study Conduct Processes Related e t Results te na lyze and EVIDENCE TO PRACTICE to Internal Validity Once one is convinced that a clinical study is internally valid (i.e., that the observed outcome is the result of the exposure or r pr A Measure intervention and cannot be attributed to some alternative explana- Results In tion), the challenge is to assess the study’s external validity (i.e., to determine whether the findings are applicable to a particular clin- ical scenario). To make an assessment of the external validity of a STUDY clinical study, it is necessary to examine several components of the study, including the patient population, the intervention, and the Figure 1 Schematically depicted are processes that affect the outcome measure. This process can be illustrated by briefly con- internal and external validity of a clinical study. sidering the example of a large, prospective, randomized clinical
  4. 4. © 2006 WebMD, Inc. All rights reserved. ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 4 Evidence-Based Surgery — 4 trial of laparoscopic versus open inguinal hernia repair performed repair (TEP).8 Surgeons who avoid using TAPP might reasonably in Veterans’ Affairs (VA) medical centers.5 question the generalizability of the VA study to their practices. The VA trial concluded that the outcomes of open hernia repair Finally, the type of outcome measured can affect the generaliz- were superior to those of laparoscopic repair. The trial was well ability of clinical studies.The outcomes chosen for clinical studies designed and well conducted, but it generated substantial discus- may be those that are most convenient or most easily quantified sion about the generalizability of the results. As noted [see Internal rather than those that are of greatest interest to patients. In the VA Validity: Evaluating Study Quality, Bias, above], subject selection hernia trial, several outcomes were studied, including operative bias can adversely affect the external validity (generalizability) of a complications, hernia recurrence, pain, and length of convales- study’s results: if the study population is in some important respect cence. Some of the outcome differences favored open repair, different from a particular population for which a surgeon is mak- whereas others favored laparoscopic repair. To interpret the evi- ing clinical decisions, the results may not be entirely generalizable dence as favoring one type of repair or the other involves making to the latter population. In the VA hernia trial, the subjects were implicit value judgments regarding which outcomes are most military veterans, who tend to be, on average, older than the non- important to patients. Surgeons interested in applying the evi- veteran general population. If older persons are more vulnerable dence from the VA hernia trial to their own decisions about hernia to the risks of laparoscopic hernia repair (e.g., complications asso- repair will have to examine the specific outcomes measured before ciated with general anesthesia), one might expect that any differ- they can determine to what extent this study is generalizable to ences between open and laparoscopic herniorrhaphy with respect specific patients with specific values and interests. to morbidity outcomes would be exaggerated in a trial that includ- ed a higher number of older subjects, as the VA trial did. Accord- ingly, a surgeon attempting to assess the external validity of the VA Role of Evidence-Based Surgery in Measuring and trial might consider the evidence provided by the trial to be applic- Improving Quality of Care able to older patients but might reserve his or her judgment on the In clinical studies, the efficacy of a surgical practice is measured use of laparoscopy to repair hernias in younger, healthier patients. in terms of the resulting patient outcomes. Until relatively recent- A striking example of the potential effect of selection bias on ly, efforts to assess the quality of surgical care have focused almost generalizability comes from the Asymptomatic Carotid Artery exclusively on clinical outcomes. In the past few years, however, the Stenosis (ACAS) trial.6 In this large, prospective, randomized evidence-based surgery movement has begun to promote an alter- study, volunteers for the trial were substantially younger and native measure of surgical quality—namely, adherence to process- healthier than the average patient who undergoes carotid es of care supported by the best available scientific evidence. endarterectomy. As a result, the observed perioperative mortality The question of whether efforts to assess quality should focus on in the ACAS trial was considerably lower than that observed in the evidence-based processes of care or clinical outcomes is as much general population—and, for that matter, lower even than the practical as philosophical.The practical argument against outcome overall perioperative mortality in the very hospitals where the trial measures is largely driven by a growing recognition that when hos- was conducted.7 Although the results of the ACAS trial signifi- pitals and surgeons are considered on an individual basis, adverse cantly changed practice, an argument could be made that the evi- outcomes generally are not numerous enough to allow identifica- dence provided by this trial, strictly speaking, was generalizable tion of meaningful differences between providers.9 In other words, only to younger populations. outcome-based studies of the quality of care supplied by individ- The external validity of a clinical study can also be affected by ual providers tend to have insufficient statistical power.The practi- who provides the intervention. For example, in the VA hernia trial, cal argument against evidence-based process-of-care measures is surgeons had varying degrees of experience with the laparoscopic driven by the paucity of high-leverage, procedure-specific process- approach, and there were twofold differences in hernia recurrence es for which sound evidence is available, as well as by the logistical rates between surgeons who had done more than 250 cases and challenge of measuring such processes. The issues surrounding surgeons who had less experience. Surgeons considering whether assessment of quality of care are discussed in greater detail else- the evidence supports the use of laparoscopic repair will therefore where [see ECP:2 Performance Measures in Surgical Practice]. have to examine their own experience with this approach before Given its current momentum, the evidence-based surgery they can determine to what extent the results of the VA trial are movement is likely to play a progressively larger role in efforts to generalizable to their own practices. assess and improve quality of surgical care. Furthermore, as pay- Furthermore, external validity can be influenced by what type of ers increasingly turn to pay-for-performance strategies to improve intervention is provided. For example, some have argued that one quality and control costs, the demand for evidence-based practice of the laparoscopic techniques commonly used in the VA trial, guidelines is likely to grow. Ultimately, it is certain that identifica- transabdominal preperitoneal repair (TAPP), is outmoded and tion and implementation of evidence-based surgical practices will more hazardous than the competing approach, totally extraperitoneal provide patients with safer, better care. References 1. Evidence-Based Medicine Working Group: Evidence- grading the quality of evidence and the strength of 7. Wennberg DE, Lucas FL, Birkmeyer JD, et al: based medicine: a new approach to teaching the recommendations I: critical appraisal of existing Variation in carotid endartectomy mortality in the practice of medicine. JAMA 268:2420, 1992 approaches—the GRADE Working Group. BMC Medicare population: trial hospitals, volume, and Health Services Research 4:38, 2004 patient characteristics. JAMA 279:1278, 1998 2. Harris RP, Helfand M, Woolf SH, et al: Current methods of the US Preventive Services Task Force: 5. Neumayer L, Giobbe-Hurder O, Johansson O, et al: 8. Grunwaldt LJ, Schwaitzberg SD, Rattner DW, et al: a review of the process. Am J Prev Med 20(suppl Open mesh versus laparoscopic mesh repair of Is laparoscopic inguinal hernia repair an operation 3):21, 2001 inguinal hernia. N Engl J Med 350:1819, 2004 of the past? J Am Coll Surg 200:616, 2005 6. Executive Committee for the Asymptomatic Carotid 3. Woloshin S: Arguing about grades. Eff Clin Pract 9. Dimick JB,Welch HG, Birkmeyer JD: Surgical mor- Atherosclerosis Study: Endarterectomy for asymp- 3:94, 2000 tality as an indicator of hospital quality: the problem tomatic carotid artery stenosis. JAMA 273:1421, 4. Atkins D, Eccles M, Flottorp S, et al: Systems for 1995 with small sample size. JAMA 292:847, 2004