This document describes a method for developing disease classification models using patient electronic medical record (EMR) data. It involves extracting features from structured and unstructured EMR data using natural language processing and concept mapping. Statistical analysis is then used to select highly correlated concepts for different diseases. The method is applied to diseases like obesity, migraine, septic arthritis, and osteoarthritis in patient EMRs. Limitations and future work are also discussed.
5. Septic arthritis, also known as infectious arthritis,
may represent a direct invasion of joint space by
various microorganisms, most commonly caused by a
variety of bacteria. However, viruses, mycobacteria,
and fungi have been implicated. Reactive arthritis is a
sterile inflammatory process that usually results from
an extra-articular infectious process. Bacteria are the
most significant pathogens because of their rapidly
destructive nature. For this reason, the current
discussion concentrates on the bacterial septic
arthritides. Failure to recognize and to appropriately
treat septic arthritis results in significant rates of
morbidityand mayeven lead to death …
… Streptococcal species, such as Streptococcus
viridans, S pneumoniae, and group B streptococci,
account for 20% of cases. Aerobic gram-negative rods
are involved in 20-25% of cases. Most of these
infections occur in people who are very young, who
are very old, who are diabetic, who are
immunosuppressed, and who abuse intravenous
drugs.
Septic arthritis, also known as infectious arthritis,
may represent a direct invasion of joint space by
various microorganisms, most commonly caused by a
variety of bacteria. However, viruses, mycobacteria,
and fungi have been implicated. Reactive arthritis is a
sterile inflammatory process that usually results from
an extra-articular infectious process. Bacteria are the
most significant pathogens because of their rapidly
destructive nature. For this reason, the current
discussion concentrates on the bacterial septic
arthritides. Failure to recognize and to appropriately
treat septic arthritis results in significant rates of
morbidityand mayeven lead to death …
… Streptococcal species, such as Streptococcus
viridans, S pneumoniae, and group B streptococci,
account for 20% of cases. Aerobic gram-negative rods
are involved in 20-25% of cases. Most of these
infections occur in people who are very young, who
are very old, who are diabetic, who are
immunosuppressed, and who abuse intravenous
drugs.
Knowledge
Sources
Natural Language
Processing
Concept
Mapping
Concept Unique
Identifier (CUI):
C0085435
reactive
arthritis
reactive
arthritides
arthritis
reactive
Semantic Type:
Disease or Syndrome
Unified Medical
Language System
6. Patient 1 Patient 2 Patient 3
… Patient 200
Note-Level
CUI Screening:
non-main CUI
mentioned in >5%
of notes including
main CUI
Patient-Level
CUI Screening:
Spearman’s Rank Correlation, r > 0
7. CUI Patient 1 Patient 2 Patient 3 Patient 4 Patient 5
Main CUI 92 16 44 368 144
C1247884 84 20 59 320 152
C0934556 122 89 153 0 167
r = 1, p = 0.017
r = -0.10, p = 0.95
18. Acknowledgements
Cai Lab and Collaborators
• Dr. Tianxi Cai
• Dr. Sheng Yu
Summer Program in Biostatistics
and Computational Biology
• Dr. Rebecca Betensky
• Tonia Smith
• Heather Mattie
• Eleanor Murray
• Joshua Barback