Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rapid Identification of Pneumonias in BioSense Data Using ...


Published on

  • Be the first to comment

  • Be the first to like this

Rapid Identification of Pneumonias in BioSense Data Using ...

  1. 1. Rapid Identification of Pneumonias in BioSense Data Using Radiology Text Reports Armen Asatryan, MD, MPH,1 Haobo Ma, MD, MS,1 Roseanne English, BS,2 Jerome Tokars, MD, MPH2 1 Science Applications International Corporation 2 Centers for Disease Control and Prevention Atlanta, GA The findings and conclusions in this presentation are those of the author(s) and do not necessarily represent the views of the Centers for Disease Control and Prevention and/or Science Applications International Corporation
  2. 2. Background -- BioSense • BioSense started in 2003 as CDC’s electronic biosurveillance system • Purposes: early detection and situational awareness • Data sources: DoD Medical Treatment Facilities, VA Medical Centers, LabCorp, non-federal acute care hospitals • Currently, 371 acute care hospitals send ≥1 type of data to BioSense, 42 send radiology text reports • Available data: facility, patient disposition, visit dates, patient demographics, microbiology lab order/results, radiology orders/results, hospital pharmacy orders, chief complaint, ICD-9 codes, vital signs (temp, BP) in the ER
  3. 3. Background -- Pneumonia • Pneumonia is a major contributor to morbidity and mortality – 1.4 M pneumonia hospital discharges in the US in 2003 (American Lung Association, – Pneumonia and influenza are 7th leading cause of death in the US – Total cost to the US economy estimated at $37.5 billion in 2004 (ibid) • Early detection of pneumonia is important as several Category A bioterrorism agent-related diseases as well as influenza can present as pneumonia
  4. 4. Previous Research: Administrative Data for Pneumonia Detection • ICD-9 code- and DRG-based algorithms have Sn 48-66%, Sp >99%, and PPV 73-81% for identifying pneumonia when compared to clinical (physician) reference (Aronsky et al, 2005). • Bayesian network based algorithm for identifying pneumonia (ICD-9 code as a reference): AUC 0.93 (CI 0.907, 0.948), at fixed Sn 95%, Sp was 69%, PPV 7.3%, NPV 99.8% (Aronsky et al, 2000).
  5. 5. Objective • Objective of study: find keywords in radiology reports that are most highly associated with pneumonia diagnosis • Radiology text reports may be available with 1-3 days, whereas final diagnosis may take 1-3 weeks – In our data set, radiology text reports were available, on average, 9 days earlier than ICD-9 discharge codes
  6. 6. Methods • Studied radiology text reports from 13 hospitals sending data to BioSense • Study period: 2/2006 through 1/2007 • Included inpatient, outpatient, and ER patients • Keyword-based text parsing SAS® program using RegEx and accounting for negations and double negations (e.g. pneumonia cannot be excluded) • Search terms: airspace, consolidation, density, infiltrate, opacity, and pneumonia/pneumonitis • Logistic regression to evaluate the independent association of the keywords with a final diagnosis of pneumonia (ICD-9 codes 480--486).
  7. 7. Example of a CXR showing pneumonia
  8. 8. Example of a text radiology report with keywords underlined • AP and lateral chest film on 01/01/2007 at 12:00 AM. No comparison films available. Clinical history: 58 years old black female with cough and fever. Findings: Right lower lobe infiltrate, consistent with pneumonia. Left lung, heart, mediastinum, and pulmonary vasculature unremarkable. No pneumothorax. No pleural effusion. Impression: Right lower lobe pneumonia.
  9. 9. Validation of Text Parsing • Sample dataset: 400 reports (300 with any of the keywords, including negations, and 100 without) • Gold-standard for text parsing validation: manual review of the radiology reports • Manual review for the search keywords: “pneumonia” 77, others 63, none 260 • Text parsing: sensitivity 98.5%, specificity 98.6%
  10. 10. Results • 67,714 reports of chest X-rays performed for 42,510 hospital visits of 34,883 patients • Male : female ratio = 1 : 1 • Age – Range: <1 day (newborn) to 105 years old – Mean: 55 years old
  11. 11. Results • Among all 67,714 radiology reports – ICD-9 Dx of Pneumonia: 8,631 (12.8%) – Keywords found (some reports had > 1 keyword) • Opacity 22.1% • Infiltrate 15% • Density 8.2% • Pneumonia 6% • Consolidation 5.5% • Airspace disease 1.1%
  12. 12. Logistic Regression Model to Evaluate Keywords • Reports may have >1 keyword in many combinations: – Right lower lobe infiltrate, consistent with pneumonia – Right lower lobe density, suggestive of airspace disease or mass • Model shows independent association between each keyword and final diagnosis • Model contained 6 dichotomous indicator variables (i.e., 0=keyword not present in report, 1=keyword present)
  13. 13. Logistic Regression Model Predictors of Final Diagnosis of Pneumonia Keyword Odds Ratio* 95% Confidence Interval Infiltrate 3.5 3.3-3.6 Pneumonia 3.0 2.7-3.2 Airspace 2.7 2.3-3.3 Consolidation 2.7 2.5-2.9 Opacity 2.0 1.9-2.1 Density 1.2 1.1-1.3 * Reference group is no mention of the keyword, i.e., reports mentioning “infiltrate” are 3.5 times more likely to have a final ICD-9 diagnosis of pneumonia than reports without this keyword
  14. 14. Keyword Index Pneumonia diagnosis No pneumonia diagnosis One or more keyword 6,186 18,180 No keywords 2,445 40,903 All 8,631 59,083
  15. 15. Results: Keyword Index • One or more of the 5 keywords was found in – 6,186/8,631 (71.7%) of reports of patients with a pneumonia diagnosis – 18,180/59,083 (30.8%) of patients without a pneumonia diagnosis – Sensitivity 72%, specificity 69%, kappa=0.23
  16. 16. Limitations / Strengths • Limitations – Study included a small number of facilities in a limited geographical area – No formal data verification performed – Based on ICD-9 coded diagnosis, rather than formal clinical case definition – Variations in readings between radiologists (style, preferences, quality) • Strengths – Used empirical process to determine keywords associated with pneumonia – Used clinically rich data
  17. 17. Discussion • Computerized text parsing can successfully identify keywords in radiology text reports • Five keywords (infiltrate, pneumonia, airspace, consolidation, and opacity) were independently associated with a final ICD-9 coded diagnosis of pneumonia • Reasonable sensitivity for the 5 keywords and ICD-9 Dx, but low kappa. Reasons: – Inadequacy of coded diagnoses – Clinical Dx in the presence of a negative CXR – Either ICD-9 code or CXR may indicate pneumonia, but neither is definitive – Possible missing CXRs
  18. 18. Radiology Text Report Processed by Natural Language Processing • History Section – HPI AP PORTABLE CHEST AT 1100 HOURS HISTORY: Shortness of breath. • Diagnostic Tests Section – There are chronic and degenerative changes. There is marked chronic lung disease. Extensive pulmonary opacity is seen in the right mid and right lower lung zone compatible with pneumonia. This is superimposed on diffuse chronic lung disease. A focal area of scarring and/ or discoid atelectasis is seen at the medial left lung base. A central line enters from the right and terminates in the superior vena cava. • Assessment Section – IMPRESSION: Marked chronic lung disease. Interval development of a large right mid and right lower lung zone opacity compatible with pneumonia.
  19. 19. Future Steps • Study additional facilities • Evaluate radiology text keywords using a formal case definition along with chart review as the gold standard • Evaluate the value of natural language processors in finding additional keywords and implementing more sophisticated concepts such as degrees of certainty, presence of comorbidities, and earliness of detection