Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Extracting information
  from clinical notes

  H. Yang, I. Spasic, F. Sarafraz,
  John A. Keane, Goran Nenadic


     Sch...
Motivation & aim
 Electronic clinical notes
    electronic medical/health records
    hospital discharge summaries
 Ex...
Clinical notes
 Highly condensed text
    sometimes without proper sentences
    hospital discharge summaries are more ...
NLP challenges in clinical data
 A series of international challenges in information
  extraction from clinical narrative...
i2b2 2008
 Extract status of diseases in patients
       obesity, diabetes mellitus, hypercholesterolemia,
        hyper...
Methodology
                    Linguistic      section splitting, sentence splitting,
                 pre-processing    ...
Rule-based IE
 Disease status patterns
 - context-based patterns
   [N] negative for CHF
   [Q] question of asthma
   [U]...
Textual Annotation Results

 Performance on Disease Status (Ranked 1st)
Micro-average: Accuracy (0.9723)
Macro-average: P...
Intuitive Annotation Results

 Performance on Disease Status (Ranked 7th)
Micro-average: Accuracy (0.9572)
Macro-average:...
i2b2 2009
 Extract mentions of medication and related
  information
   drugs the patient takes
   dose, mode of applica...
Evaluation (F-measure)


              Medication                              83.59%
              Dosage                ...
Summary
 NLP and text mining techniques are useful for extraction
  of clinical data
  - disease status extraction: 95-97...
Health care  special interest-i2b2
Upcoming SlideShare
Loading in …5
×

Health care special interest-i2b2

751 views

Published on

Published in: Technology
  • Be the first to comment

Health care special interest-i2b2

  1. 1. Extracting information from clinical notes H. Yang, I. Spasic, F. Sarafraz, John A. Keane, Goran Nenadic School of Computer Science University of Manchester
  2. 2. Motivation & aim  Electronic clinical notes  electronic medical/health records  hospital discharge summaries  Extract information on  individual patients and their diseases  clinical practice  treatments, drugs used, etc.  Aim: support data analytics  e.g. monitoring quality  Huge interest locally and internationally
  3. 3. Clinical notes  Highly condensed text  sometimes without proper sentences  hospital discharge summaries are more structured  list of medications, symptoms, etc.  Terminological variability  orthographic, acronyms, local conventions  Various sections  previous history, social/family background
  4. 4. NLP challenges in clinical data  A series of international challenges in information extraction from clinical narratives  organisers: Informatics for Integrating Biology & the Bedside (i2b2)  3 shared tasks so far − De-identification of medical records and identification of smokers from their clinical records (2007) Identification of obesity & related diseases in patients from hospital discharge documents (2008) Extraction of medications and related information from patients’ discharge documents (2009)  2010 challenge  concept, assertions, relations
  5. 5. i2b2 2008  Extract status of diseases in patients  obesity, diabetes mellitus, hypercholesterolemia, hypertriglyceridemia, hypertension, heart failure (16 in total)  status: yes, no, unmentioned, questionable  on textual and “intuitive” level  28 teams worldwide  UoM ranked 1st in textual and 7th in intuitive  Our methodology  Term-based exact and approximate matching  Context-based pattern- and rule-based matching  Machine learning approach Yang, H., Spasic, I., Keane, J., Nenadic, G.: A Text Mining Approach to the Prediction of a Disease Status from Clinical Discharge Summaries, JAMIA 16(4):596-600
  6. 6. Methodology Linguistic section splitting, sentence splitting, pre-processing chunking, POS tagging, parsing Information textual evidence extraction, extraction section filtering, morphological Medical (rules, machine clues (e.g. drug/disease name resources learning) affixes) •Disease names •Drug names •Body parts Template filling, filtering negative •Symptoms results, relations and heuristics: •Abbreviations Constructing Organ : Symptom, •Synonyms results Symptom : Disease, Disease : Drug, Drug : Mode of application
  7. 7. Rule-based IE  Disease status patterns - context-based patterns [N] negative for CHF [Q] question of asthma [U] no known diagnosis of CAD [U] we should consider further asthma studies as an outpatient - semantics-based patterns [N] normal coronaries, a thin black man  Clinical resources used in sentence extraction  clinical inference rules e.g., weight>90kg, LDL>160mg/dl, HDL<35mg/dl  medications e.g., ‘anti-depressant’
  8. 8. Textual Annotation Results  Performance on Disease Status (Ranked 1st) Micro-average: Accuracy (0.9723) Macro-average: P (0.8482), R (0.7737), F-score (0.8052) #Eval #Corr #Gold Precision Recall F-score Y 2267 2132 2192 0.9404 0.9726 0.9562 N 56 40 65 0.7142 0.6153 0.6611 Q 12 9 17 0.7500 0.5294 0.6206 U 5709 5640 5770 0.9879 0.9774 0.9826
  9. 9. Intuitive Annotation Results  Performance on Disease Status (Ranked 7th) Micro-average: Accuracy (0.9572) Macro-average: P (0.6383), R (0.6294), F-score (0.6336) #Eval #Corr #Gold Precision Recall F-Score Y 2160 2068 2285 0.9574 0.9050 0.9304 N 5236 5014 5100 0.9576 0.9831 0.9702 Q 3 0 14 0 0 0
  10. 10. i2b2 2009  Extract mentions of medication and related information  drugs the patient takes  dose, mode of application, frequency, duration, etc. (for each mention)  19 teams worldwide  UoM ranked 3rd  Our approach was based on combining  extensive dictionaries  morphological and derivational patterns
  11. 11. Evaluation (F-measure) Medication 83.59% Dosage 82.67% Frequency 83.49% Mode 85.33% Duration 51.00% Reason 38.81% All fields 78.47% Spasić I, Sarafraz F, Keane JA, Nenadic G: “Medication Information Extraction with Linguistic Pattern Matching and Semantic Rules”, JAMIA (to appear)
  12. 12. Summary  NLP and text mining techniques are useful for extraction of clinical data - disease status extraction: 95-97% accuracy - medication information extraction: 80% F-measure  Construction of reliable and sufficient resources - clinical terms and abbreviations (e.g., disease synonyms, symptoms, drugs) - context patterns related to diseases, medication, etc.  Domain knowledge required  construction of domain- and task-specific resources  complex clinical facts and conditions for inference  more comprehensive knowledge representation needed

×