SlideShare a Scribd company logo
Kavita Ganesan & Michael Subotin 
Presented at: 2014 Conference on IEEE Big Data
All sorts of notes types! 
 Admit notes 
◦ documenting why patient is being admitted 
◦ baseline status, etc. 
 Progress notes 
◦ progress during course of hospitalization 
 Discharge notes 
◦ conclusion of a hospital stay or series of treatments 
 Others 
◦ Operative notes 
◦ Procedure notes 
◦ Delivery notes 
◦ Emergency Department notes, etc
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
CHIEF COMPLAINT: 
Injured right little toe. 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with a past medical history of multiple 
myeloma who presents today after hitting his fifth toe of the right foot 
on a wood panel yesterday…… 
Review of Systems: 
CONSTITUTIONAL: No fever, chills, or weight loss. 
RESPIRATORY: No cough, shortness of breath, or wheezing. 
CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. 
............... 
PAST MEDICAL HISTORY 
Multiple myeloma, peripheral neuropathy, hypertension.. 
PAST SURGICAL HISTORY:- 
Stem cell transplant. 
SOCIAL HISTORY 
The patient formerly smoked tobacco; however, quit within the last 10 
years. 
FAMILY HISTORY: 
Hypertension. 
ALLERGIES: 
ASPIRIN. 
……… 
Purpose of visit 
Patient’s current 
condition in 
narrative form 
Ongoing issues, 
issues in the past 
Information on 
allergies
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
CHIEF COMPLAINT: 
Injured right little toe. 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with a past medical history of multiple 
myeloma who presents today after hitting his fifth toe of the right foot 
on a wood panel yesterday…… 
Review of Systems: 
CONSTITUTIONAL: No fever, chills, or weight loss. 
RESPIRATORY: No cough, shortness of breath, or wheezing. 
CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. 
............... 
PAST MEDICAL HISTORY 
Multiple myeloma, peripheral neuropathy, hypertension.. 
PAST SURGICAL HISTORY:- 
Stem cell transplant. 
SOCIAL HISTORY 
The patient formerly smoked tobacco; however, quit within the last 10 
years. 
This is how most notes look: 
• some longer, some shorter 
• different set of headers, etc 
FAMILY HISTORY: 
Hypertension. 
ALLERGIES: 
ASPIRIN. 
……… 
Purpose of visit 
Patient’s current 
condition in 
narrative form 
Ongoing issues, 
issues in the past 
Information on 
allergies
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
CHIEF COMPLAIN: 
Injured right little toe. 
CHIEF COMPLAIN: 
Injured right little toe. 
CHIEF COMPLAINT: 
Injured right little toe. 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with 
a past medical history of… 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with 
a past medical history of… 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with 
a past medical history of… 
Review of Systems: 
CONSTITUTIONAL: No fever, 
chills, or weight loss. 
CARDIOVASCULAR: No chest pain, 
chest pressure, or palpitations. 
............... 
Review of Systems: 
CONSTITUTIONAL: No fever, 
chills, or weight loss. 
CARDIOVASCULAR: No chest pain, 
chest pressure, or palpitations. 
............... 
……… 
Review of Systems: 
CONSTITUTIONAL: No fever, 
chills, or weight loss. 
CARDIOVASCULAR: No chest pain, 
chest pressure, or palpitations. 
............... 
……… 
……… 
 Very unstructured 
◦ formatting cues  inconsistent 
◦ varies: across physicians, notes, 
hospitals 
 Hard to analyze specific sections 
◦ E.g. analyze allergies patient population 
◦ Need to segment notes to extract 
all allergy info.
◦ Information collected vary from note types to note types 
 Ex. info on progress notes vs. admit note 
◦ Contents & formatting can vary from hospital to hospital 
 Even within the same organization – E.g. Kaiser 
◦ Contents & formatting vary between physicians 
 Different styles, speed of typing, etc.
 If you are looking at a single note type, from a single 
hospital - then maybe 
 Not suitable as a general segmentation approach: 
 Can easily break: 
◦ on unseen note types and minor format variations 
◦ Example: 
 regex based on all caps 
 regex based on seen headers only
 Several works have explored supervised methods to 
segmenting clinical notes 
[Cho et al. 2003, tepper et al. 2012, apostolva et al. 2009] 
 Problem: methods not general! 
◦ Cho et al. 2003: One model for each type of note 
 20 note types  20 models! 
 Not practical  maintain each model 
◦ Tepper et al. 2012: Model had low adaptability to unseen 
documents 
 features used, training data used, etc.
 General segmentation approach for clinical texts 
 Requirements: 
◦ Single model/approach for most note types 
◦ Discount extreme non-standard formatting 
e.g. tabular format 
 Segment: 
◦ Header 
◦ Top level sections 
◦ Footer
PRIMARY CARE PHYSICIAN: 
Dr. XXXXX XXXXXXXX. 
CHIEF COMPLAINT: 
Injured right little toe. 
HISTORY OF PRESENT ILLNESS: 
This is a 63-year-old male with a past medical history of multiple 
myeloma who presents today after hitting his fifth toe of the right foot 
on a wood panel yesterday…… 
Review of Systems: 
CONSTITUTIONAL: No fever, chills, or weight loss. 
RESPIRATORY: No cough, shortness of breath, or wheezing. 
CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. 
............... 
PAST MEDICAL HISTORY 
Multiple myeloma, peripheral neuropathy, hypertension.. 
PAST SURGICAL HISTORY:- 
Stem cell transplant. 
SOCIAL HISTORY 
The patient formerly smoked tobacco; however, quit within the last 10 
years. 
FAMILY HISTORY: 
Hypertension. 
ALLERGIES: 
ASPIRIN. 
……… 
Header 
Top-level section 
Top-level section 
Top-level section 
Top-level section 
Top-level section 
Top-level section 
Top-level section
 Supervised approach using L1-Logistic Regression with a 
constraint combination approach 
 Idea: scan each line in a clinical document and label as: 
◦ BeginHeader 
◦ ContHeader 
◦ BeginSection 
◦ ContSection 
◦ Footer 
 Labels are predicted with certain confidence 
 But, problem using line-wise predictions as is: 
◦ Label sequences may not make sense 
◦ E.g. There maybe a BeginHeader after a BeginSection  
incorrect
 Post-processing: enforce sequence combination rules: 
◦ First line of document: BeginHeader or BeginSection 
◦ BeginHeader cannot come right after BeginHeader or ContHeader 
◦ ContHeader must come after BeginHeader or ContHeader 
◦ ContSection must come after BeginSection or ContSection 
◦ Footer cannot come right after BeginHeader or ContHeader 
 Rules applied after all lines in document labeled 
◦ Applied to consecutive label pairs 
◦ Computed efficiently: Viterbi algorithm
Inpatient Outpatient 
• Notes from 12 different enterprises 
• Some large enterprises 
• All sorts of note types 
• Some noisy sectioning, some clean 
• 100 radiology notes 
• Fairly clean sections 
• One hospital 
• All sorts of note types 
• Fairly well sectioned 
• 35, 000 notes in total 
• 2000 randomly sampled notes 
(inpatient) 
• 100 radiology notes 
• Fairly clean sections
 Emphasis on training data 
 Variation in training data 
◦ Use different note types for training 
◦ Intuition: help model generalize well 
 Sample training data: 
◦ Instead of using all training data from 2100 notes 
◦ Generated subsets of training data with varying size and 
cross-validate on test sets 
◦ Intuition: allows to pick the best model 
 Best model only used < 700 notes (out of 2100)
 5 test sets 
◦ 4/5 test set from hospitals not in train set 
 true estimate of accuracy 
◦ Covers both inpatient and outpatient notes 
◦ Covers different note types 
◦ ~12,500 test notes 
 Primary evaluation metric: line-wise accuracy 
◦ percentage of correctly predicted line labels
1st model: limited variety 
(hp + discharge) 
Train set 
3-folded cross 
validation 
Unseen test 
accuracy 
Inp1HospB (300 - limited) 96.70% 67.00% 
Inp3HospD (300 - varied) 96.58% 88.23% 
2nd model: variety 
(11 types - hp, ds, pn…) 
Model with variety: 
higher accuracy on 
unseen test set 
3-folded cross-validation 
accuracy: high in both 
Important to have variety in training notes in 
building general segmentation model
Accuracy consistently 
> 90% across enterprises 
Client/Data In/Outpatient # Test Docs Accuracy 
1. Inp1HospB In 300 92.58% 
2. Inp2HospC In 1000 93.29% 
3. Inp3HospD In 300 95.81% 
4. Rad1MixedHosps Out 9000 92.45% 
5. Rad2HospA Out 1902 93.67% 
Average 93.56% 
• Average accuracy: 93.56% 
• Covers inpatient/outpatient 
Single model: But, performs well across enterprises
Document Type Accuracy 
1. History and Physical 95.70% 
2. Physician Clinicals 93.10% 
3. Discharge Summary 94.00% 
4. Consult Note 94.60% 
5. Short Stay Summary 94.60% 
6. Operative Note 92.20% 
7. Progress Note 87.80% 
8. Cardiac Cath Report 85.40% 
9. Procedure Note 83.60% 
• Model performs well across note types 
• Lowest performance: procedure notes 
low recall on segmenting “technique” sections 
Performs 
very well 
> 90% 
Reasonable.. 
> 80% 
Accuracy Breakdown for Inp2HospC
94.00% 
93.00% 
92.00% 
91.00% 
90.00% 
89.00% 
88.00% 
87.00% 
86.00% 
# Notes vs. Accuracy 
No benefit with more notes 
0 500 1000 1500 2000 
Accuracy 
# Training Notes 
Avg. accurracy peaks @500 
notes on all test sets 
No benefit with more notes 
No need for big data for a general model. 
We need good data from all that big data!
 Unigrams – of each line (LineUnigram) 
 Relative position of line in document (PosInDoc) 
◦ Top, Middle, Bottom 
 Known Header features (KnownHeader) 
◦ Find potential headers using repository of seen headers 
◦ Seen headers can have canonical type 
E.g. Past Medical History, Previous Med History “PAST_MEDICAL_HISTORY” 
◦ If potential headers found, we include features: 
 Canonical type 
 Unigram & Char n-gram of potential header 
 Caps/colon info – mixed case, all caps, lowercase 
 Length of potential header
Feature Set 
Avg. 
Accuracy Improvement 
LineUnigram 85.55% 
LineUnigram+PosInDoc 88.62% +3.46% 
LineUnigram+PosInDoc+KnownHeader 93.10% +4.81%
 Explored: 
◦ Supervised approach to building a very general segmentation 
model for clinical texts 
 Evaluation showed: 
◦ Model works well on notes across enterprises 
◦ Model works across note types 
 Key to effectiveness: 
◦ Variation in training data –all sorts of note types 
◦ Training data selection strategy – sample and cross-validate 
◦ Feature set – not explored in existing works
Contact: 
Kavita Ganesan 
ganesan.kavita@gmail.com 
www.kavita-ganesan.com 
www.text-analytics101.com

More Related Content

Viewers also liked

Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
Kavita Ganesan
 
Micropinion Generation
Micropinion GenerationMicropinion Generation
Micropinion Generation
Kavita Ganesan
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
Kavita Ganesan
 
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Kavita Ganesan
 
Introduction to Java Strings, By Kavita Ganesan
Introduction to Java Strings, By Kavita GanesanIntroduction to Java Strings, By Kavita Ganesan
Introduction to Java Strings, By Kavita Ganesan
Kavita Ganesan
 
Francais orthographe
Francais orthographeFrancais orthographe
Francais orthographezouhaer
 
Power guineu 1[1]
Power guineu 1[1]Power guineu 1[1]
Power guineu 1[1]43705656K
 
What do We Know about Drag Kings?
What do We Know about Drag Kings?What do We Know about Drag Kings?
What do We Know about Drag Kings?
Teila123
 
Financial terms
Financial terms Financial terms
Financial terms
Tanu Bansal
 
28th Social Work Day at the United Nations 2011
28th Social Work Day at the  United Nations 201128th Social Work Day at the  United Nations 2011
28th Social Work Day at the United Nations 2011
IFSW
 
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 OstravaUser eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
jirikomar
 
Carlos lenin estrada
Carlos lenin estradaCarlos lenin estrada
Carlos lenin estrada
carloslenin19
 
Real Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy TechnologyReal Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy Technology
ZeroNet-Energy-Solutions
 
UI Prototype
UI PrototypeUI Prototype
UI Prototype
Seung Wook Lee
 
Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011
Christopher Polak
 
Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Pilii Ise Gelsi
 
Prsentation eng 101
Prsentation  eng 101Prsentation  eng 101
Prsentation eng 101
sopno100
 
What is your earliest memory
What is your earliest memoryWhat is your earliest memory
What is your earliest memory
marco_fro19
 

Viewers also liked (20)

Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
 
Micropinion Generation
Micropinion GenerationMicropinion Generation
Micropinion Generation
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
 
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
Opinosis Presentation @ Coling 2010: Opinosis - A Graph Based Approach to Abs...
 
Introduction to Java Strings, By Kavita Ganesan
Introduction to Java Strings, By Kavita GanesanIntroduction to Java Strings, By Kavita Ganesan
Introduction to Java Strings, By Kavita Ganesan
 
Francais orthographe
Francais orthographeFrancais orthographe
Francais orthographe
 
Power guineu 1[1]
Power guineu 1[1]Power guineu 1[1]
Power guineu 1[1]
 
Slide
SlideSlide
Slide
 
What do We Know about Drag Kings?
What do We Know about Drag Kings?What do We Know about Drag Kings?
What do We Know about Drag Kings?
 
Financial terms
Financial terms Financial terms
Financial terms
 
La moral kantiana( què he de fer
La moral kantiana( què he de ferLa moral kantiana( què he de fer
La moral kantiana( què he de fer
 
28th Social Work Day at the United Nations 2011
28th Social Work Day at the  United Nations 201128th Social Work Day at the  United Nations 2011
28th Social Work Day at the United Nations 2011
 
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 OstravaUser eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
 
Carlos lenin estrada
Carlos lenin estradaCarlos lenin estrada
Carlos lenin estrada
 
Real Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy TechnologyReal Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy Technology
 
UI Prototype
UI PrototypeUI Prototype
UI Prototype
 
Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011
 
Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01
 
Prsentation eng 101
Prsentation  eng 101Prsentation  eng 101
Prsentation eng 101
 
What is your earliest memory
What is your earliest memoryWhat is your earliest memory
What is your earliest memory
 

Similar to Segmentation of Clinical Texts

Shock-case-study-8.21.20.pptx
Shock-case-study-8.21.20.pptxShock-case-study-8.21.20.pptx
Shock-case-study-8.21.20.pptx
rishitagarg8
 
6 minute walk test
6 minute walk test6 minute walk test
6 minute walk test
Meghan Phutane
 
Phtls prep-packet-2-day
Phtls prep-packet-2-dayPhtls prep-packet-2-day
Phtls prep-packet-2-day
nuno marques
 
Documentation 101 - BMH/Tele
Documentation 101 - BMH/TeleDocumentation 101 - BMH/Tele
Documentation 101 - BMH/Tele
TeleClinEd
 
Surgery revision
Surgery revisionSurgery revision
Surgery revision
Pardeep Omani
 
Patient selection and functional outcomes by Dr Ashutosh Hardikar
Patient selection and functional outcomes by Dr Ashutosh HardikarPatient selection and functional outcomes by Dr Ashutosh Hardikar
Patient selection and functional outcomes by Dr Ashutosh Hardikar
CICM 2019 Annual Scientific Meeting
 
BCC4: Michael Parr on ICU - Surviving Trauma Guidelines
BCC4: Michael Parr on ICU - Surviving Trauma GuidelinesBCC4: Michael Parr on ICU - Surviving Trauma Guidelines
BCC4: Michael Parr on ICU - Surviving Trauma Guidelines
SMACC Conference
 
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Meningitis Research Foundation
 
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptxminimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
MubasshirBabar
 
Nrs 410 topic 1 mandatory discussion question
Nrs 410 topic 1 mandatory discussion questionNrs 410 topic 1 mandatory discussion question
Nrs 410 topic 1 mandatory discussion question
agathachristie189
 
Clinical materials for medicine III
Clinical materials for medicine IIIClinical materials for medicine III
Clinical materials for medicine III
Dr Ajith Karawita
 
STEMI Training
STEMI TrainingSTEMI Training
STEMI Training
cm6157
 
GCSC Stroke Symposium 2022-COMBINED
GCSC Stroke Symposium 2022-COMBINEDGCSC Stroke Symposium 2022-COMBINED
GCSC Stroke Symposium 2022-COMBINED
HollandAdhaus
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
Islam Ibrahim
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
Islam Ibrahim
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
Islam Ibrahim
 
Prof. Todor (Ted) A. Popov - 6th Clinical Research Conference
Prof. Todor (Ted) A. Popov - 6th Clinical Research ConferenceProf. Todor (Ted) A. Popov - 6th Clinical Research Conference
Prof. Todor (Ted) A. Popov - 6th Clinical Research Conference
Starttech Ventures
 
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic PracticeRemote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
bgander23
 
Lessons from CRITICOEN.pptx
Lessons from CRITICOEN.pptxLessons from CRITICOEN.pptx
Lessons from CRITICOEN.pptx
Pradeep Pande
 
Spontaneous pneumothorax: Are we treating the patient or the xray?
Spontaneous pneumothorax: Are we treating the patient or the xray?Spontaneous pneumothorax: Are we treating the patient or the xray?
Spontaneous pneumothorax: Are we treating the patient or the xray?
kellyam18
 

Similar to Segmentation of Clinical Texts (20)

Shock-case-study-8.21.20.pptx
Shock-case-study-8.21.20.pptxShock-case-study-8.21.20.pptx
Shock-case-study-8.21.20.pptx
 
6 minute walk test
6 minute walk test6 minute walk test
6 minute walk test
 
Phtls prep-packet-2-day
Phtls prep-packet-2-dayPhtls prep-packet-2-day
Phtls prep-packet-2-day
 
Documentation 101 - BMH/Tele
Documentation 101 - BMH/TeleDocumentation 101 - BMH/Tele
Documentation 101 - BMH/Tele
 
Surgery revision
Surgery revisionSurgery revision
Surgery revision
 
Patient selection and functional outcomes by Dr Ashutosh Hardikar
Patient selection and functional outcomes by Dr Ashutosh HardikarPatient selection and functional outcomes by Dr Ashutosh Hardikar
Patient selection and functional outcomes by Dr Ashutosh Hardikar
 
BCC4: Michael Parr on ICU - Surviving Trauma Guidelines
BCC4: Michael Parr on ICU - Surviving Trauma GuidelinesBCC4: Michael Parr on ICU - Surviving Trauma Guidelines
BCC4: Michael Parr on ICU - Surviving Trauma Guidelines
 
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
Professor Richard Beale @ MRF's Meningitis & Septicaemia in Children & Adults...
 
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptxminimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
minimallyinvasivecardiacsurgery-130110015719-phpapp02 (1) (1).pptx
 
Nrs 410 topic 1 mandatory discussion question
Nrs 410 topic 1 mandatory discussion questionNrs 410 topic 1 mandatory discussion question
Nrs 410 topic 1 mandatory discussion question
 
Clinical materials for medicine III
Clinical materials for medicine IIIClinical materials for medicine III
Clinical materials for medicine III
 
STEMI Training
STEMI TrainingSTEMI Training
STEMI Training
 
GCSC Stroke Symposium 2022-COMBINED
GCSC Stroke Symposium 2022-COMBINEDGCSC Stroke Symposium 2022-COMBINED
GCSC Stroke Symposium 2022-COMBINED
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
 
Covid 19 (1)
Covid 19 (1)Covid 19 (1)
Covid 19 (1)
 
Prof. Todor (Ted) A. Popov - 6th Clinical Research Conference
Prof. Todor (Ted) A. Popov - 6th Clinical Research ConferenceProf. Todor (Ted) A. Popov - 6th Clinical Research Conference
Prof. Todor (Ted) A. Popov - 6th Clinical Research Conference
 
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic PracticeRemote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
Remote Ischaemic Conditioning: A Paper Review & Uses in Paramedic Practice
 
Lessons from CRITICOEN.pptx
Lessons from CRITICOEN.pptxLessons from CRITICOEN.pptx
Lessons from CRITICOEN.pptx
 
Spontaneous pneumothorax: Are we treating the patient or the xray?
Spontaneous pneumothorax: Are we treating the patient or the xray?Spontaneous pneumothorax: Are we treating the patient or the xray?
Spontaneous pneumothorax: Are we treating the patient or the xray?
 

More from Kavita Ganesan

Comparison between cbow, skip gram and skip-gram with subword information (1)
Comparison between cbow, skip gram and skip-gram with subword information (1)Comparison between cbow, skip gram and skip-gram with subword information (1)
Comparison between cbow, skip gram and skip-gram with subword information (1)
Kavita Ganesan
 
Comparison between cbow, skip gram and skip-gram with subword information
Comparison between cbow, skip gram and skip-gram with subword informationComparison between cbow, skip gram and skip-gram with subword information
Comparison between cbow, skip gram and skip-gram with subword information
Kavita Ganesan
 
Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...
Kavita Ganesan
 
In situ evaluation of entity retrieval and opinion summarization
In situ evaluation of entity retrieval and opinion summarizationIn situ evaluation of entity retrieval and opinion summarization
In situ evaluation of entity retrieval and opinion summarization
Kavita Ganesan
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Kavita Ganesan
 
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval ToolkitVery Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
Kavita Ganesan
 
Opinion-Based Entity Ranking
Opinion-Based Entity RankingOpinion-Based Entity Ranking
Opinion-Based Entity Ranking
Kavita Ganesan
 

More from Kavita Ganesan (7)

Comparison between cbow, skip gram and skip-gram with subword information (1)
Comparison between cbow, skip gram and skip-gram with subword information (1)Comparison between cbow, skip gram and skip-gram with subword information (1)
Comparison between cbow, skip gram and skip-gram with subword information (1)
 
Comparison between cbow, skip gram and skip-gram with subword information
Comparison between cbow, skip gram and skip-gram with subword informationComparison between cbow, skip gram and skip-gram with subword information
Comparison between cbow, skip gram and skip-gram with subword information
 
Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...Statistical Methods for Integration and Analysis of Online Opinionated Text...
Statistical Methods for Integration and Analysis of Online Opinionated Text...
 
In situ evaluation of entity retrieval and opinion summarization
In situ evaluation of entity retrieval and opinion summarizationIn situ evaluation of entity retrieval and opinion summarization
In situ evaluation of entity retrieval and opinion summarization
 
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
Enabling Opinion-Driven Decision Making - Sentiment Analysis Innovation Summit
 
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval ToolkitVery Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
 
Opinion-Based Entity Ranking
Opinion-Based Entity RankingOpinion-Based Entity Ranking
Opinion-Based Entity Ranking
 

Recently uploaded

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 

Recently uploaded (20)

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 

Segmentation of Clinical Texts

  • 1. Kavita Ganesan & Michael Subotin Presented at: 2014 Conference on IEEE Big Data
  • 2. All sorts of notes types!  Admit notes ◦ documenting why patient is being admitted ◦ baseline status, etc.  Progress notes ◦ progress during course of hospitalization  Discharge notes ◦ conclusion of a hospital stay or series of treatments  Others ◦ Operative notes ◦ Procedure notes ◦ Delivery notes ◦ Emergency Department notes, etc
  • 3. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. CHIEF COMPLAINT: Injured right little toe. HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of multiple myeloma who presents today after hitting his fifth toe of the right foot on a wood panel yesterday…… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. RESPIRATORY: No cough, shortness of breath, or wheezing. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... PAST MEDICAL HISTORY Multiple myeloma, peripheral neuropathy, hypertension.. PAST SURGICAL HISTORY:- Stem cell transplant. SOCIAL HISTORY The patient formerly smoked tobacco; however, quit within the last 10 years. FAMILY HISTORY: Hypertension. ALLERGIES: ASPIRIN. ……… Purpose of visit Patient’s current condition in narrative form Ongoing issues, issues in the past Information on allergies
  • 4. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. CHIEF COMPLAINT: Injured right little toe. HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of multiple myeloma who presents today after hitting his fifth toe of the right foot on a wood panel yesterday…… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. RESPIRATORY: No cough, shortness of breath, or wheezing. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... PAST MEDICAL HISTORY Multiple myeloma, peripheral neuropathy, hypertension.. PAST SURGICAL HISTORY:- Stem cell transplant. SOCIAL HISTORY The patient formerly smoked tobacco; however, quit within the last 10 years. This is how most notes look: • some longer, some shorter • different set of headers, etc FAMILY HISTORY: Hypertension. ALLERGIES: ASPIRIN. ……… Purpose of visit Patient’s current condition in narrative form Ongoing issues, issues in the past Information on allergies
  • 5. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. CHIEF COMPLAIN: Injured right little toe. CHIEF COMPLAIN: Injured right little toe. CHIEF COMPLAINT: Injured right little toe. HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of… HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of… HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... ……… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... ……… ………  Very unstructured ◦ formatting cues  inconsistent ◦ varies: across physicians, notes, hospitals  Hard to analyze specific sections ◦ E.g. analyze allergies patient population ◦ Need to segment notes to extract all allergy info.
  • 6. ◦ Information collected vary from note types to note types  Ex. info on progress notes vs. admit note ◦ Contents & formatting can vary from hospital to hospital  Even within the same organization – E.g. Kaiser ◦ Contents & formatting vary between physicians  Different styles, speed of typing, etc.
  • 7.  If you are looking at a single note type, from a single hospital - then maybe  Not suitable as a general segmentation approach:  Can easily break: ◦ on unseen note types and minor format variations ◦ Example:  regex based on all caps  regex based on seen headers only
  • 8.  Several works have explored supervised methods to segmenting clinical notes [Cho et al. 2003, tepper et al. 2012, apostolva et al. 2009]  Problem: methods not general! ◦ Cho et al. 2003: One model for each type of note  20 note types  20 models!  Not practical  maintain each model ◦ Tepper et al. 2012: Model had low adaptability to unseen documents  features used, training data used, etc.
  • 9.  General segmentation approach for clinical texts  Requirements: ◦ Single model/approach for most note types ◦ Discount extreme non-standard formatting e.g. tabular format  Segment: ◦ Header ◦ Top level sections ◦ Footer
  • 10. PRIMARY CARE PHYSICIAN: Dr. XXXXX XXXXXXXX. CHIEF COMPLAINT: Injured right little toe. HISTORY OF PRESENT ILLNESS: This is a 63-year-old male with a past medical history of multiple myeloma who presents today after hitting his fifth toe of the right foot on a wood panel yesterday…… Review of Systems: CONSTITUTIONAL: No fever, chills, or weight loss. RESPIRATORY: No cough, shortness of breath, or wheezing. CARDIOVASCULAR: No chest pain, chest pressure, or palpitations. ............... PAST MEDICAL HISTORY Multiple myeloma, peripheral neuropathy, hypertension.. PAST SURGICAL HISTORY:- Stem cell transplant. SOCIAL HISTORY The patient formerly smoked tobacco; however, quit within the last 10 years. FAMILY HISTORY: Hypertension. ALLERGIES: ASPIRIN. ……… Header Top-level section Top-level section Top-level section Top-level section Top-level section Top-level section Top-level section
  • 11.  Supervised approach using L1-Logistic Regression with a constraint combination approach  Idea: scan each line in a clinical document and label as: ◦ BeginHeader ◦ ContHeader ◦ BeginSection ◦ ContSection ◦ Footer  Labels are predicted with certain confidence  But, problem using line-wise predictions as is: ◦ Label sequences may not make sense ◦ E.g. There maybe a BeginHeader after a BeginSection  incorrect
  • 12.  Post-processing: enforce sequence combination rules: ◦ First line of document: BeginHeader or BeginSection ◦ BeginHeader cannot come right after BeginHeader or ContHeader ◦ ContHeader must come after BeginHeader or ContHeader ◦ ContSection must come after BeginSection or ContSection ◦ Footer cannot come right after BeginHeader or ContHeader  Rules applied after all lines in document labeled ◦ Applied to consecutive label pairs ◦ Computed efficiently: Viterbi algorithm
  • 13. Inpatient Outpatient • Notes from 12 different enterprises • Some large enterprises • All sorts of note types • Some noisy sectioning, some clean • 100 radiology notes • Fairly clean sections • One hospital • All sorts of note types • Fairly well sectioned • 35, 000 notes in total • 2000 randomly sampled notes (inpatient) • 100 radiology notes • Fairly clean sections
  • 14.  Emphasis on training data  Variation in training data ◦ Use different note types for training ◦ Intuition: help model generalize well  Sample training data: ◦ Instead of using all training data from 2100 notes ◦ Generated subsets of training data with varying size and cross-validate on test sets ◦ Intuition: allows to pick the best model  Best model only used < 700 notes (out of 2100)
  • 15.  5 test sets ◦ 4/5 test set from hospitals not in train set  true estimate of accuracy ◦ Covers both inpatient and outpatient notes ◦ Covers different note types ◦ ~12,500 test notes  Primary evaluation metric: line-wise accuracy ◦ percentage of correctly predicted line labels
  • 16. 1st model: limited variety (hp + discharge) Train set 3-folded cross validation Unseen test accuracy Inp1HospB (300 - limited) 96.70% 67.00% Inp3HospD (300 - varied) 96.58% 88.23% 2nd model: variety (11 types - hp, ds, pn…) Model with variety: higher accuracy on unseen test set 3-folded cross-validation accuracy: high in both Important to have variety in training notes in building general segmentation model
  • 17. Accuracy consistently > 90% across enterprises Client/Data In/Outpatient # Test Docs Accuracy 1. Inp1HospB In 300 92.58% 2. Inp2HospC In 1000 93.29% 3. Inp3HospD In 300 95.81% 4. Rad1MixedHosps Out 9000 92.45% 5. Rad2HospA Out 1902 93.67% Average 93.56% • Average accuracy: 93.56% • Covers inpatient/outpatient Single model: But, performs well across enterprises
  • 18. Document Type Accuracy 1. History and Physical 95.70% 2. Physician Clinicals 93.10% 3. Discharge Summary 94.00% 4. Consult Note 94.60% 5. Short Stay Summary 94.60% 6. Operative Note 92.20% 7. Progress Note 87.80% 8. Cardiac Cath Report 85.40% 9. Procedure Note 83.60% • Model performs well across note types • Lowest performance: procedure notes low recall on segmenting “technique” sections Performs very well > 90% Reasonable.. > 80% Accuracy Breakdown for Inp2HospC
  • 19. 94.00% 93.00% 92.00% 91.00% 90.00% 89.00% 88.00% 87.00% 86.00% # Notes vs. Accuracy No benefit with more notes 0 500 1000 1500 2000 Accuracy # Training Notes Avg. accurracy peaks @500 notes on all test sets No benefit with more notes No need for big data for a general model. We need good data from all that big data!
  • 20.  Unigrams – of each line (LineUnigram)  Relative position of line in document (PosInDoc) ◦ Top, Middle, Bottom  Known Header features (KnownHeader) ◦ Find potential headers using repository of seen headers ◦ Seen headers can have canonical type E.g. Past Medical History, Previous Med History “PAST_MEDICAL_HISTORY” ◦ If potential headers found, we include features:  Canonical type  Unigram & Char n-gram of potential header  Caps/colon info – mixed case, all caps, lowercase  Length of potential header
  • 21. Feature Set Avg. Accuracy Improvement LineUnigram 85.55% LineUnigram+PosInDoc 88.62% +3.46% LineUnigram+PosInDoc+KnownHeader 93.10% +4.81%
  • 22.  Explored: ◦ Supervised approach to building a very general segmentation model for clinical texts  Evaluation showed: ◦ Model works well on notes across enterprises ◦ Model works across note types  Key to effectiveness: ◦ Variation in training data –all sorts of note types ◦ Training data selection strategy – sample and cross-validate ◦ Feature set – not explored in existing works
  • 23. Contact: Kavita Ganesan ganesan.kavita@gmail.com www.kavita-ganesan.com www.text-analytics101.com