Acs0003 Benchmarking Surgical Outcomes


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Acs0003 Benchmarking Surgical Outcomes

  1. 1. © 2008 BC Decker Inc ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 3 BENCHMARKING SURGICAL OUTCOMES — 1 BENCHMARKING SURGICAL OUTCOMES Emily V. A. Finlayson, MD, MS, and John D. Birkmeyer, MD, FACS Interest in information about surgical outcomes is growing. mortality rates for coronary artery bypass surgery. All of these Patients and their families are looking for, and finding, states release hospital-specific performance data, but only hospital- and surgeon-specific information about quality as some report surgeon-specific information. they try to make informed decisions about where and from Public reporting programs related to other surgical proce- whom to receive their surgical care.1,2 Payers, both private dures generally rely on administrative data. A small number and public, are also seeking information about surgical of states use data from their discharge abstract databases to performance for their value-based purchasing initiatives. For determine and report volume and risk-adjusted mortality example, the Leapfrog Group, a large coalition of health care rates with selected procedures, including major cancer resec- purchasers, is collating data on hospital volume, process, and tions. The most widely available source of surgical outcomes outcome measures in an effort to steer patients to centers data comes from proprietary rating firms, most notably likely to have the best results. As part of its Surgical Care Healthgrades (, which rely Improvement Program and Centers of Excellence projects, primarily on public use Medicare files. At the present time, the Center for Medicare and Medicaid Services (CMS) is Healthgrades allows users to select from 31 different proce- requiring that hospitals submit outcomes data for selected dures or conditions and obtain data on hospitals in any procedures and other performance measures. specified geographic region. For each procedure, hospitals Surgeons should be just as interested in surgical outcomes are ranked from 5 stars (best) to 1 star (worst) based on data. First, it is essential that they provide patients with risk-adjusted mortality and, in some cases, morbidity. accurate, realistic information about the risks and benefits Hospital-specific information is provided free of charge, but they can expect with specific procedures. Unfortunately, the information about specific surgeons requires a small fee. medical literature is not always reliable for this purpose. It is Although the clinical outcomes data from state cardiac limited by publication bias and tends to be skewed by case surgery registries are generally considered robust, the other series from large, nonrepresentative referral centers, which sources of publicly reported outcomes data have several may not reflect outcomes in the “real world.” Second, as important limitations. Some limitations pertain to the use of patients increasingly turn to the Internet for information, sur- administrative data as the underlying data source, which we geons should be aware of what data their patients are seeing discuss later. Others are specific to the vendor. For example, and be prepared to address their questions. Third, and most Healthgrades is often criticized for the lack of transparency importantly, surgeons need information about surgical out- of its methods for calculating rates and risk adjustment.3 Its comes to benchmark their performance against both national reliance on categorical rankings and the lack of actual rates norms and their peers and help guide their improvement (along with numerators and denominators) are additional efforts. criticisms. In this chapter, we review alternative data sources for benchmarking surgical outcomes. We describe ongoing Public Use Administrative Databases public reporting programs, public use administrative data- bases that can be analyzed for benchmarking purposes, and Rather than relying on outside analysis, surgeons can improvement-oriented clinical outcomes registries, such as obtain administrative data and do it themselves. Although the National Surgical Quality Improvement Program this strategy requires data skills, this approach may be (NSQIP). We review the strengths and weaknesses of these practical for surgeons with analytic skills (or access to analytic sources and provide representative surgical mortality data support). Public use administrative databases [see Table 1] from some of these sources. are increasingly available, are relatively inexpensive, and no longer require special equipment for data transmission or storage. Public Reporting Programs Most administrative databases useful for benchmarking The most readily available source of surgical outcomes data surgical outcomes consist of hospital discharge abstracts. is Internet-based public reporting programs. At the present These abstracts contain information for patients admitted to time, those based on clinical data are limited to cardiac acute care hospitals. Data collected include demographic surgery. Following the lead of New York State, which first information: name, age, sex, race or ethnicity, and patient initiated public reporting in 1989, state agencies in New residence. They also include admission and discharge dates, Jersey, Pennsylvania, California, and Massachussetts all total charges, expected payment source, admission type administer longitudinal clinical registries and regularly release (elective, urgent, emergent), and discharge disposition. In (on the Internet and elsewhere) information on risk-adjusted addition, Unique Physician Identification Numbers (UPINs) DOI 10.2310/7800.SECPC03 06/08
  2. 2. © 2008 BC Decker Inc ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 3 BENCHMARKING SURGICAL OUTCOMES — 2 Table 1 Public Use Administrative Databases and Clinical Registries Database Patients Participating Strengths Limitations Hospitals Medicare Medicare recipients: All US hospitals treating Large sample size, Limited to elderly patients age 65 yr and Medicare patients population based patients, lack of older, disabled patients specificity for some under 65 yr, and procedure codes, lack patients with end-stage of detailed clinical renal disease undergo- information for risk ing inpatient surgery adjustment Nationwide Inpatient Patients undergoing 20% sample of all US All ages, large sample Only inpatient mortality Sample (NIS) inpatient surgery nonfederal hospitals size available, lack of (approx. 1,000 specificity for some hospitals in 37 states) procedure codes, lack of detailed clinical information for risk adjustment National Surgical Quality Patients undergoing Currently nearly 200 Prospectively acquired High cost of Improvement Program general and vascular private sector hospitals clinical data participation; not (ACS-NSQIP) surgery designed to assess (current version) procedure-specific performance Society of Thoracic Patients undergoing Registry participants: Prospectively acquired Historically, not Surgeons (STS) cardiothoracic 70% of all adult clinical data externally audited operations cardiothoracic operations performed annually in the United States National Cancer Patients undergoing 1,400 hospitals Prospectively acquired Not externally audited DataBase surgery for cancer nationwide (approx. clinical data, detailed 75% of incident cancer cancer-specific data cases in the United States) can be used to identify attending physicians and surgeons. health rating companies. The Medicare inpatient database Claims for surgical admissions contain procedure-specific (MEDPAR file) is the most accessible and widely used. codes from the International Classification of Diseases, Ninth It includes all fee-for-service acute care admissions for Edition, Clinical Modification (ICD-9-CM). In addition, Medicare recipients, including most Americans over 65 years, hospital discharge abstracts contain fields for at least 10 diag- disabled patients under 65 years, and patients with end-stage nosis codes, which are used to record preexisting medical renal disease. conditions or medical complications of surgery for billing Primary analysis of administrative databases has a number purposes. of advantages for surgeons interested in benchmarking There are several administrative databases that surgeons outcomes. In the absence of comparable clinical databases can use for benchmarking surgical outcomes. Although their (outside cardiac surgery), they are currently the only source accessibility and other details vary widely, most states main- of population-based outcomes data. Surgeons can assess tain discharge abstract databases that are available for public virtually any inpatient procedure of interest to them, not just use. Surgeons can also obtain data from the Nationwide those currently targeted by proprietary rating companies. Inpatient Sample (NIS) for this purpose at relatively low cost With their large sample sizes, they also allow for analysis of and inconvenience. Developed as part of the Healthcare Cost outcomes of infrequently performed procedures. and Utilization Project (HCUP), a federal-state-industry However, administrative data have numerous limitations partnership sponsored by the Agency for Healthcare Research for benchmarking outcomes. Some of these pertain to the and Quality, the NIS is an all-payer inpatient care database specific database. For example, Medicare data apply primar- containing information from approximately 8 million hospital ily to elderly patients and thus are not useful for procedures admissions annually. It includes all patients from a 20% most commonly performed in younger patients. They also sample of all US nonfederal hospitals (approximately 1,000 miss large numbers of elderly patients in regions of the United facilities) from 37 states and is designed to provide nationally States with high penetration of Medicare managed care. representative estimates of health care use and outcomes All payer databases, including state-level files and the NIS, ( Toward this end, hospitals provide in-hospital but not 30-day mortality rates. Unlike are selected with regard to ownership control, bed size, Medicare data, they usually do not contain hospital or teaching status, rural-urban location, and geographic region. surgeon identifiers. Thus, these files are useful for generating Finally, surgeons can use public use Medicare data for national- or state-level norms for specific procedures but not benchmarking surgical outcomes, as used by some propriety for assessing the outcomes of specific providers. 06/08
  3. 3. © 2008 BC Decker Inc ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 3 BENCHMARKING SURGICAL OUTCOMES — 3 The most important limitations of administrative databases Nonetheless, the NSQIP currently has several weaknesses relate to problems with the accuracy, completeness, and from the perspective of outcomes benchmarking. First, it is clinical precision of coding in administrative data.4,5 ICD-9- expensive to administer. In addition to paying an annual fee, CM diagnosis codes used to identify comorbidities are often each center is required to hire and train a dedicated surgical clinically imprecise, fail to reflect disease severity, and cannot clinical nurse reviewer to review and enter data. Second, it is differentiate preadmission conditions from acute complica- not designed for assessing procedure-specific performance. tions. For this reason, risk adjustment and measurement of Risk adjustment is based on a common set of preoperative postoperative complications with administrative databases are variables for all procedures, not risk factors specific to indi- limited. Although much more reliable in general, ICD-9-CM vidual procedures. In addition, the NSQIP collects data on a procedure codes lack sufficient clinical specificity, particularly sample of procedures performed at each hospital, not all pro- relative to Current Procedural Terminology (CPT) codes cedures. The cases submitted are a sample of all procedures used for physician billing. For example, they often fail to dis- performed at each site. Thus, procedure-specific outcome tinguish between laparoscopic and open procedures or similar measures may be imprecise owing to small sample sizes. procedures associated with different baseline risks (e.g., lapa- To address the limitations of the NSQIP, the ACS-NSQIP roscopic antireflux surgery versus repair of paraesophageal leadership has set out to retool the program’s approach to hernias). data collection and measurement. Several major changes are scheduled to be implemented starting in 2009. Instead of collecting the same data on a sample of patients undergoing Clinical Registries a wide variety of general and vascular procedures, the Of course, the ideal source of information for benchmark- new ACS-NSQIP will be based on parallel, specialty-specific ing surgical outcomes is prospective, clinical outcomes modules. Specialty societies will help set priorities on which registries [see Table 1]. As described earlier, several states procedures and variables should be examined. This approach administer such registries for cardiac surgery as part of their should yield more targeted, clinically relevant data for proce- quality improvement and public reporting efforts. However, dures that specialty experts believe are good candidates for a number of professional organizations, including the Ameri- benchmarking and quality improvement. In addition, there can College of Surgeons (ACS), have launched national out- will be 100% sampling of a limited number of procedures, comes registries in a number of other clinical areas. Outcomes resulting in much larger sample sizes for evaluating data from these sources are not reported publicly but procedure-specific outcomes. The ascertainment of patient are intended instead to provide confidential feedback on characteristics and outcome measures is being revised as well. performance to hospitals and surgeons for internal quality Data collection will be streamlined to five to 10 core patient characteristic variables with the addition of a small number improvement purposes. With one prominent exception of procedure-specific risk factors. This change will allow (ACS-NSQIP), currently available outcomes registries target for the maintenance of high-level risk adjustment while specific specialties, conditions, or procedures. reducing the cost and data collection burden. The addition acs national surgical quality improvement of procedure-specific complications as outcome variables program (e.g., anastomotic leak after colectomy or stroke after carotid endarterectomy) will also allow for benchmarking of clinically Perhaps the most visible and powerful source of bench- relevant outcomes. marking information is the NSQIP. Originally developed and implemented in Department of Veterans Affairs (VA) cardiac surgery hospitals, NSQIP was later applied in a consortium of large The Society of Thoracic Surgeons (STS) national database academic medical centers in private sector hospitals and is the best source of national data for benchmarking outcomes subsequently marketed by the ACS to all types of hospitals. with cardiac surgery.6 Launched nearly 20 years ago, the STS As of 2007, nearly 200 non-VA hospitals had enrolled. At national database includes clinical data on more than 70% participating hospitals, NSQIP data are collected by medical of all adult cardiothoracic operations performed annually in record review by dedicated nurse abstractors. Preoperative the United States. Participating hospitals receive regular feed- risk factors, intraoperative variables, and 30-day postopera- back on their mortality rates after adult and congenital car- tive mortality and morbidity outcomes for patients undergo- diac and general thoracic surgery. The strengths of the STS ing major surgery are submitted. Risk-adjusted morbidity and registry include robust, procedure-specific risk adjustment mortality results for each hospital are calculated semiannualy and high hospital participation rates, which implies generaliz- and are reported as observed versus expected ratios. ability of its outcomes data. Historically, a major weakness As private sector participation grows, the ACS-NSQIP has has been the lack of external auditing to ensure the accuracy the potential to become a valuable resource for benchmarking and completeness of outcomes data submitted by hospitals. surgical outcomes. Its prospectively collected clinical data allow for robust risk adjustment. In addition, participants cancer surgery can easily access their own outcomes data on a user-friendly A joint effort of the Commission on Cancer (CoC) of Web interface. Users can easily navigate through different the ACS and the American Cancer Society, the National procedures and outcomes to obtain information pertaining to Cancer Data Base (NCDB) is a national registry that tracks their own specialty. Participants can benchmark their own information related to the treatment and outcome of cancer results against those of community centers, academic centers, patients ( About 1,400 or both. hospitals nationwide submit data to the NCDB, which 06/08
  4. 4. © 2008 BC Decker Inc ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 3 BENCHMARKING SURGICAL OUTCOMES — 4 currently captures approximately 75% of incident cancer (BOLD), an Internet-based patient outcomes tracking and cases in the United States. The database includes patient reporting database. Basic data for this database are required characteristics, tumor stage and grade, type of treatment, for all Bariatric Surgery Centers of Excellence (BSCOE) disease recurrence, and survival. Individuals at CoC-approved participants. cancer centers can access benchmark reports. These reports summarize the data from the user’s own center and com- parisons with state, regional, or national data, as well as other Additional Considerations individual cancer centers. Data on patient and tumor charac- Although the alternative sources of surgical benchmarks teristics, treatment, recurrence, and survival are collected at have distinct strengths and weaknesses, it is worth acknowl- the participating centers. Currently, the primary outcome edging their common limitations. The first relates to sample available to participants online is patient survival, not opera- size. Although the benchmarks themselves are usually based tive mortality. Although it is the richest source of clinical data on large numbers and are thus statistically robust, the out- for benchmarking outcomes after cancer surgery, the NCDB comes of hospitals and surgeons assessing their own perfor- has limitations. Data are not externally audited to ensure mance against these benchmarks are not, particularly at the accuracy and completeness. Moreover, unlike most clinical level of individual procedures. When sample sizes are too outcomes registries, the NCDB was not originally designed small, it may be difficult to determine whether complication for tracking outcomes related to quality. It only recently rates higher than the benchmark reflect genuine problems or began collecting information on comorbidities (for risk simply chance. In one recent study, Dimick and colleagues adjustment) and has no information on outcomes other than considered hypothetical hospitals with operative mortality mortality. rates twice the national average and estimated how many trauma cases they would need over a 3-year period to be reasonably confident that their higher mortality was “real” and not The ACS, along with its Committee on Trauma, also over- statistical artifact.7 Minimal caseloads varied by procedure, sees the National Trauma Data Bank (NTDB) (http://www. from 77 for esophagectomy to 2,668 for hip replacement. At the present time, approximately According to their analysis of the NIS, a majority of US 556 hospitals submit data to the NTDB, including 70% of hospitals meet these caseload criteria for only one procedure Level I– and 53% of Level II–designated trauma centers. (coronary artery bypass graft). Participating hospitals submit extensive information about Another limitation pertains to generalizability. Table 2 patient comorbidities and condition on presentation to the summarizes overall mortality rates for several procedures, hospital, procedures performed, complications, and mortal- based on data from the Medicare Inpatient File (2006), NIS ity. Benchmark reports are provided to each participating (2003), ACS-NSQIP (private sector hospitals) (2005–2006), hospital. They also have access to the primary data for per- and STS (2002–2005) databases. Owing to the individual forming their own analyses. Although this database captures characteristics of each database (e.g., distinct patient popula- a large proportion of trauma admissions in the United States, tions, methods used to define mortality), different data sets data submission to the NTDB is voluntary and not externally yield different mortality estimates. For example, mortality audited. rates were highest across procedures in the predominantly bariatric surgery elderly Medicare population, ranging from 0.9% for carotid endarterectomy to 11.7% for pneumonectomy. Operative Two competing programs for tracking outcomes with mortality in the NIS is considerably lower for all procedures, bariatric surgery have been launched. Clinical registries of with some mortality rates more than 3% lower than those the ACS Bariatric Surgery Center Network (ACS-BSCN) Program and the Surgical Review Corporation (SRC) are observed in the Medicare population (e.g., pancreaticoduo- intended to support hospital accreditation and “centers of denectomy 5.2% versus 9.1%; gastrectomy 3.5% versus excellence” designations in bariatric surgery. With the ACS- 6.6%). Although none of these mortality estimates are BSCN (, NSQIP-participating “wrong,” surgeons need to recognize that risk estimates are hospitals submit data via their Web-based portals and can dependent on the distinct composition of each database and compare their results with those of other centers, as with may not be generalizable to their own practice. other procedures included in the NSQIP. Hospitals not par- Although this chapter focuses on sources of information ticipating in the NSQIP submit only their bariatric outcomes for operative mortality, surgeons may also be interested in data and receive annual summaries of their outcomes rates, benchmarking other measures related to surgical quality. which are not risk-adjusted or benchmarked against other For example, information about hospital volumes can be programs. The SRC (, which obtained from Healthgrades, the Leapfrog Group Web site is closely aligned with the American Society for Bariatric (, and a growing number of Surgery, is a nonprofit organization that assesses bariatric state agencies. Although not available at the present time, surgery programs, analyzes outcomes data, and formulates information pertaining to selected processes of care is now practice guidelines. Participating centers are required to being collected by the CMS as part of its Surgical Care report outcomes annually to maintain their status as an SRC- Improvement Program (SCIP). These performance measures approved center. In addition to access to their own data, include processes related to avoiding surgical site infection, approved centers receive benchmark outcomes data aggre- venous thromboembolism, cardiac events, and ventilator- gated from all participating centers. More recently, SRC acquired pneumonia after surgery. The new ACS-NSQIP launched the Bariatric Outcomes Longitudinal Database database will collect information about compliance with 06/08
  5. 5. © 2008 BC Decker Inc ACS Surgery: Principles and Practice ELEMENTS OF CONTEMPORARY PRACTICE 3 BENCHMARKING SURGICAL OUTCOMES — 5 Table 2 Operative Mortality, by Database Operations Operative Mortality, By Database Medicare (2006) Nationwide Inpatient ACS-NSQIP: Private Society of Thoracic Sample (2003) Sector Hospitals Surgeons National (2005–2006) Database (2002– 2005) n Mortality n Mortality n Mortality n Mortal- (%) (%) (%) ity (%) Cardiac surgery Coronary bypass 67,287 3.3 32,123 2.2 NA NA 155,243 2.5 Aortic valve replacement 23,117 5.2 7,221 4.0 NA NA 12,079 3.4 Mitral valve replacement 5,969 8.8 2,903 5.9 NA NA 4,171 5.5 Vascular surgery Lower extremity bypass 25,537 2.4 10,830 1.3 3,462 2.9 NA NA Elective aortic aneurysm repair 11,600 5.1 5,757 3.6 2,530 2.1 NA NA Carotid endarterectomy 56,019 0.9 21,441 0.4 4,017 0.8 NA NA Cancer surgery Pulmomary lobectomy 15,516 3.3 5,298 1.9 NA NA 2,544 1.5 Pneumonectomy 1,099 10.2 563 8.7 NA NA 248 3.6 Esophagectomy 2,859 7.1 1,198 5.0 255 3.9 1,038 2.4 Gastrectomy 5,722 5.5 2,776 3.5 724 4.0 NA NA Pancreaticoduodenectomy 1,434 8.6 629 5.2 1,169 2.6 NA NA Colectomy 54,101 3.5 23,074 1.5 7,564 1.6 NA NA Abdominoperineal resection 3,098 2.3 1,489 1.1 1,672 2.2 NA NA Gastric bypass 5,508 0.7 14,056 0.2 5,513 0.3 NA NA N/A = not available; NSQIP = National Surgical Quality Improvement Program. SCIP measures and processes of care important to specific a health care satisfaction survey business that has created procedures being examined. Finally, surgeons interested in national databases of comparative satisfaction information. In benchmarking patient satisfaction can turn to several vendors addition, HCIA Inc (formerly called Health Care Investment for this service. A large number of hospitals participate in a Analysts) and the Medical Group Management Association survey measurement program administered by Press-Ganey, run a patient satisfaction comparison service. References 1. Schwartz LM, Woloshin S, Birkmeyer JD, 3. Krumholz HM, Rathore SS, Chen J, et al. of coronary angioplasty procedures at et al. How do elderly patients decide where Evaluation of a consumer-oriented Internet hospitals treating medicare beneficiaries and to go for major surgery? Telephone interview health care report card: the risk of quality short-term mortality. N Engl J Med 1994; survey. BMJ 2005;331:821–4. ratings based on mortality data. JAMA 331:1625–9. 2002;10:1277–87. 6. Grover FL, Edwards FH. Similarity between 2. Kaiser Family Foundation and Agency 4. Hsia DC, Krushat WM, Fagan AB, the STS and New York State databases for for Healthcare Research and Quality. et al. Accuracy of diagnostic coding for valvular heart disease. Ann Thorac Surg National survey on Americans as health Medicare patients under the prospective- 2000;70:1143–4. care consumers: an update on the role of payment system. N Engl J Med 1988; 7. Dimick JB, Welch HG, Birkmeyer JD, et al. quality information. Available at: http:// 318:352–5. Surgical mortality as an indicator of hospital (accessed 5. Jollis JG, Peterson ED, DeLong ER, et al. quality: the problem with small sample size. December 2006). The relationship between the volume JAMA 2004;292:847–51. 06/08