Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches

Effective Classification of Clinical Reports:
Natural Language Processing-Based and
Topic Modeling-Based Approaches
Department of Computer Science
School of Engineering and Applied Science
The George Washington University
Efsun Sarioglu Kayi

Outline
¡ Introduction
¡ Research Objective
¡ Scope
¡ Proposed System
¡ Background and Related Work
¡ Natural Language Processing (NLP)
¡ Topic Modeling
¡ Text Classification
¡ Dissertation Research and Results
¡ Clinical Dataset
¡ Raw Text Classification of Clinical Reports
¡ NLP-based Classifiers
¡ Topic Modeling-based Classifiers
¡ Summary and Contributions
2

Electronic Health Records (EHR)
¡ Large amounts of clinical data have become available in
Electronic Health Records (EHR)
¡ Patient reports in free text
¡ Not directly usable in automated systems
¡ Clinical decision support systems
¡ Automated advice to providers by examining the EHRs
¡ Help medical professionals make clinical decisions faster and with more
confidence
¡ Improve the quality and efficiency of healthcare
¡ Prevent medical errors
¡ Reduce healthcare costs
¡ Increase administrative efficiencies
¡ Decrease paperwork
4

Clinical Reports
¡ Free text
¡ Not directly usable for automated processing
¡ Ambiguity in natural language
¡ Medical terms
¡ Not common in daily language
¡ Many synonyms
¡ Need for standardization
¡ Context sensitivity
¡ Patient history
¡ Ruled-out diagnosis
¡ Case study: Automatic classification of Emergency Medicine
computed tomography (CT) imaging reports into binary categories
5

Research Objective
¡ Objective: Given a list of patient reports, automatically classify them
into user-defined categories efficiently and effectively
¡ Natural Language Processing (NLP) tools transform the text into
structured data that can be used more easily by automated processes
¡ Context information in text is critical to classification performance [Garla11]
¡ Requires manual customization for each domain
¡ Discharge summaries, radiology, and mammography reports
¡ A more general and compact representation can be achieved by
topic model of patient reports
¡ Biomedical concepts are typically nouns/noun phrases [Huang05]
¡ Nouns, compared to other parts of speech, form topics [Griffiths04]
¡ Can be adapted to new applications/domains more easily
6

Proposed System
¡ Automated classification of clinical reports into categories
¡ Binary categories,
¡ presence/absence of fracture
¡ Multi-categories
¡ types of fractures, e.g., facial, orbital, etc.
¡ Clinical report representation
¡ Natural Language Processing (NLP)
¡ Mapping of medical terms to standard medical dictionaries
¡ Context modifiers such as probability, negation, and time
¡ Topic Modeling
¡ A more general and compact representation of reports based on their topic
distributions
7

Proposed System Overview
Patient
Reports
LabelsPreprocess NLP
SVM
Topic
Modeling
Decision Tree
Next
BoW
Topic
Vectors
NLP
Features
Topic Modeling-
based Classifiers

Scope
9
¡ Classification of text/discrete data
¡ Continuous data can be discretized using supervised or unsupervised
techniques before doing the classification
¡ Binary classification
¡ Multi-class datasets can be used with binary classifiers using techniques
such as All-vs-One (AVO) [Vapnik98] or All-vs-All (AVA) [Friedman96,
Hastie98] classification

Background
Natural Language Processing (NLP)

Natural Language Processing (NLP)
¡ Techniques for understanding the syntactic and semantic relations
that exist in natural language
¡ Part-Of-Speech (POS) tagging
¡ Assigns each word its syntactic class
¡ Noun, verb, adjective, etc.
¡ Dependency parsing
¡ Finds the syntactic representation of a given sentence
¡ Dependencies between its words with labels showing grammatical relations
¡ Subject of a clause, object of a verb phrase
11

Biomedical NLP
¡ General NLP tools, such as Stanford NLP, trained on general English
data, i.e. news
¡ Not suitable to be used in biomedical domain
¡ Biomedical data
¡ Medical terms
¡ Synonyms: ocular --> eye, eyes, optic, opthalmic, opthalmia, oculus, oculi
¡ Temporal, negation, and certainty status of clinical terms
¡ Biomedical NLP tools
¡ Use biomedical vocabularies and translate clinical text into coded
descriptions suitable for automated processing
¡ Medical Language Extraction and Encoding (MedLEE) [Friedman00]
¡ Standard representation via Unified Medical Language System (UMLS)
¡ Modifiers for each clinical term to evaluate context
12

Unified Medical Language System (UMLS)
¡ Repository of vocabularies in biomedical sciences developed and
maintained by the National Library of Medicine (NLM)
¡ 6.4 million unique terms
¡ 1.3 million unique concepts
¡ More than 119 families of biomedical vocabularies
¡ Three knowledge sources
¡ Metatheasurus
¡ Multi-lingual vocabulary organized by concepts
¡ Links alternative names and views of the same concept from different vocabularies
¡ Each concept is given a unique ID called Concept Unique Identifier (CUI)
¡ Semantic Network
¡ Semantic types and relations to categorize and disambiguate concepts
¡ Semantic types: medical device, clinical drug, and laboratory
¡ Semantic relations: treats, diagnoses, and contains
¡ Specialist Lexicon
¡ General English and biomedical vocabulary
13

UMLS Metathesaurus
14
¡ “Orbital fractures” with CUI: C0029184

MedLEE System Overview
Lex CodingMappingGrammar
Site-Specific
Tailored
Preprocessor
Parser EncoderComposer
Next
Patient Report
Structured
output
Previous
Raw Text
Impression: Right lamina papyracea fracture. No evidence of entrapment.
MedLEE Output
<sectname v = "report impression item"></sectname>
<sid idref = "s7"></sid>
<code v = "UMLS:C0016658_Fracture"></code>
<problem v = "entrapment" code = "UMLS:C1285497_Entrapment (morphologic abnormality)">
<certainty v = "no"></certainty>
</problem>
Sec Cru
Ab
UMLS: C0029184
Le Fort's fracture: UMLS: C0272464
Orbital fracture

MedLEE Findings and Modifiers
16
¡ Maps its findings to UMLS CUIs
¡ Problems (e.g. fracture, pain, trauma)
¡ Procedures (e.g. image, computerized axial tomography)
¡ Device (e.g. tube)
¡ Lab tests (e.g. blood type)
¡ Assigns modifiers to each finding for more context
¡ Body locations (e.g. head, brain, face)
¡ Section names (e.g. impression, findings)
¡ Certainty (e.g. high/moderate/low certainty) and
¡ Status (e.g. acute, new, recent, chronic, past, history)

Topic Modeling
¡ Automatically finds out themes/topics of a document collection
¡ Low dimensional representation that captures the semantics
¡ A way to browse an unstructured collection in a structured way
¡ Topic Modeling techniques
¡ Matrix decomposition
¡ Latent Semantic Analysis (LSA) [Deerwester90]
¡ Probabilistic topic modeling
¡ Probabilistic Latent Semantic Analysis (PLSA) [Hoffman99]
¡ Latent Dirichlet Allocation (LDA) [Blei03]
18

Latent Dirichlet Allocation (LDA)
¡ Topic is defined as a probability distribution over entire vocabulary
¡ Documents can exhibit multiple topics with different proportions
¡ Given the observed words in a set of documents and the total
number of topics
¡ Finds out the topic model that is most likely to have generated the data
¡ For each topic, its probability distribution over words
¡ For each document, its probability distribution over topics
¡ The topic responsible for generating each word in each document
19

Sample Topic Model
Documents
Topic
Modeling
NextPrevious
axial 0.07
structure 0.05
Images 0.02
...
maxilary 0.04
coronal 0.02
intraorbital 0.01
…
intact 0.05
evidence 0.02
entrapment 0.01
…
Topic 1
Topic 2
Topic 3
Axial1 images1 and
coronal2 images1…
The intraorbital1
structures1 are
intact3…
No evidence3 of
entrapment3…
Document 1
Document 2
Document 3
.
.
.
K=3
Doc 3
Doc 2
Doc 1

Topic Model Training
Calculate a score s
Sample a new topic tnewfor word
w in document d based on s
Randomly assign a topic for
every word in all of the
documents
For
each
topic
For each
word in
each
document
Repeat
1000 times
Nt|d +α
Nd +Tα
×
Nw|t + β
Nt +Vβ

1
2
3
4
5
6
7
8
9
10
River Stream Bank Money Loan
Topic A Topic B Nd
1 12 0 12
2 3 6 9
3 5 10 15
4 6 9 15
5 8 3 11
6 4 9 13
7 5 3 8
8 6 4 10
9 4 6 10
1
0 4 5 9
Document-topic statistics
Documents
River Stream
Ban
k
Money Loan Nt
Topic
A
4 5 20 9 18 56
Topic B 5 7 20 12 11 55
Topic-word statistics
Initialization

1
2
3
4
5
6
7
8
9
10
Document #5
23
DOC 5
River B
Stream A
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A

Topic Modeling Training
24
DOC 5
River B
Stream A
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
for each possible topic [A, B]
compute
score (A | NA|5, NRiver|A)
score (B | NB|5, NRiver|B)

25
DOC 5
River B
Stream A
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
sum up the scores
score (A | NA|5, NRiver|A)
score (B | NB|5, NRiver|B)
Z =

26
DOC 5
River B
Stream A
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
sample a new topic
u = rand() * Z
sample
return t = A

27
DOC 5
Initial Topics
River A
Stream B
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
DOC 5
Final Topics
River A
Stream A
Bank A
Bank B
Bank B
Money B
Money B
Money B
Loan B
Loan B
Loan B

Document-Topic Statistics
28
Topic A Topic B
1 12 0
2 3 6
3 5 10
4 6 9
5 8 3
6 4 9
7 5 3
8 6 4
9 4 6
10 4 5
Topic A Topic B
1 0 12
2 2 7
3 0 15
4 0 15
5 3 8
6 4 9
7 2 6
8 8 2
9 10 0
10 8 1
Documents
Topics Topics
1000 iterations
with sampling

Topic-Word statistics
29
Topic A 4 5 20 9 18
Topic B 5 7 20 12 11
Topics
Word
Topic A 9 12 16 0 0
Topic B 0 0 24 21 29
Topics
1000
iterations
with
sampling

30
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
River Stream Bank Money Loan Topic A Topic B
1 12 0
2 3 6
3 5 10
4 6 9
5 8 3
6 4 9
7 5 3
8 6 4
9 4 6
10 4 5
1 0 12
2 2 7
3 0 15
4 0 15
5 3 8
6 4 9
7 2 6
8 8 2
9 10 0
10 8 1
Initial
Final
Document-Topic Statistics

Background
Text Classification

Bag-of-Words (BOW) Representation
32
¡ Document-term matrix where columns are terms and rows are
documents
¡ NxM matrix for N documents and M terms
¡ Pros: Simple, conventional
¡ Cons: Word ordering is lost
¡ Weighting
¡ Binary
¡ Term frequency:
¡ Inverse document frequency:
¡ Combined measure:
idft = log
N
dft
tf −idft,d = tft,d ×idft
tft

Text Classification
¡ Given a set of n training vectors in BoW
representation in d-dimension with binary labels,
a classifier is a binary valued function:
¡ Trivial rejector/zero rule (ZR) classifier
¡ Baseline
¡ Decision Tree
¡ Popular classification algorithm due to its human-interpretable output of
binary tree
¡ Support Vector Machines (SVM)
¡ Shown to perform well in text classification [Joachims98, Sebastiani02]
f : Rd
! →! 0,1{ }
{x1, x2,…, xn}
{y1, y2,…, yn}
33

Decision Tree
¡ Binary tree where nodes represent criteria based on terms and
leaves represent the class label
¡ Build tree top-down: start with all attributes in a one node
¡ Repeat until all leaves are pure
¡ Look at all leaves and all possible splits
¡ Choose the split that most decreases the uncertainty based on entropy, gini index, etc
Fracture > 0
Orbital > 0
Facial >
0
Positive
Negative
Negative
Negative
Y
Y
Y
N
N
N
34

35
Uncertainty Measures
o
+
+
+
+ +
+
+
+
+
+
o
o
o
o
o
o
o
o o
0 1
x
y
1
0.25 0.5 0.75
[0,8][10,2]
[10,10]
x < 0.5
= 12/20 x 10/36 + 8/20 x 0
= 1/6 = 0.16
pL = 12/20 pR = 8/20
uL = 2 x 10/12 x 2/12
= 10/36
uR = 0
Uncertainty before split =
2 x 10/20 x 10/20 = ½ = 0.5
= pLuL + pRuR
= 2p(1− p)Gini index
Uncertainty
after split
u = 0.33 u = 0.38u = 0.16

Support Vector Machines (SVM)
¡ Large-margin classifier
¡ Smallest distance from any example to the decision boundary
¡ Objective: Minimize
¡ Inversely proportional to the margin
¡ Subject to for all
¡ Constraints guarantee that it classifies each sample correctly
!
!
!
!
!
" "
"
"
"
"
#$%&%'%(%
)*+,-,./%0.1/)234%
#$%&%5%(%
#$%&%6%(%
#%
).+17*/8-%920*9*)%!%:%
).+17*/8-%920*9*)%;:%
1
2
θ
2
ytθT
xt ≥1 t =1,…,n
1
θ
Margin =
θ ∈ Rd
Parameter vector:
36

37
Maximum Margin
o
+
+
+
+
+
+
+
+
+
+
+
+
o
o o
o
o
o
o
o o
o
o +
+
o

Evaluation: Classification
¡ Possible cases for binary classification
¡ Precision (P), Recall (R), and F-score measures are used for
evaluating classification performance
Positive Negative
Positive True Positive (TP) False Negative (FN)
Negative False Positive (FP) True Negative (TN)
Predicted class
Actual
class
P =
TP
TP+FP
R =
TP
TP + FN
Fscore =
2× P× R
P+ R
38

Training and Testing
¡ Train the classifier using only a subset of the entire dataset
¡ Optionally, use a validation set to learn the best values of the parameters
¡ Evaluate its performance on unseen test dataset
¡ Stratified: similar class distribution with the original dataset
¡ Run the algorithm several times and take the average of the performances
TR = {d1,d2,…,dTR
, TR > 0}
VD = {dTR +1
,…,dTR +VD+1
, VD ≥ 0}
TS = {dTR +VD+1
,…,dD
, TS > 0}
39

NLP Related Work
¡ General NLP
¡ Daily English
¡ Stanford NLP, OpenNLP
¡ Biomedical NLP
¡ Use of medical dictionaries for medical terms and modifiers for
context
¡ MetaMap for UMLS mapping [Aronson01, Aronson10]
¡ Only negation modifier
¡ ConText system [Harkema09]
¡ Temporality, negation and experiencer of a given clinical term in a sentence
¡ Not a complete analysis since it expects the condition to be provided
¡ Clinical Text Analysis and Knowledge Extraction System (cTakes) [Garla11]
¡ History, probability, and negation modifiers
¡ Medical Language Extraction and Encoding (MedLEE) [Friedman00]
¡ More types of modifiers (51) and wider range of modifier values
41

Biomedical NLP
42
¡ UMLS for classification
¡ Classify conditions into disease categories based on inter-concept
relations from UMLS [Bodenreider00]
¡ Expert rules
¡ MedLEE with expert rules for classification instead of machine learning
approaches [Mendonca05, Hripcsak02]
¡ NegEx (Negative Identification for Clinical Conditions) [Chapman13]
and SQL Server 2008 Free Text Search to identify fractures in radiograph
reports using expert rules [Womack10]
¡ Expert rules could be costly to construct
¡ Less generalizable to new clinical areas in contrast to machine learning
approaches
¡ Our approach: utilize MedLEE with machine learning approaches

Topic Modeling Related Work
¡ Applications of topic modeling
¡ Computer vision [Li05], biology [Yeh10], information retrieval [Wei06] and
text segmentation [Misra09]
¡ Structural extensions to topic modeling
¡ N-gram for phrases [Wallach06]
¡ Combined with POS tagging [Griffiths04]
¡ Combined with parse trees [Boyd-Graber08]
¡ Enhances the capability of topic modeling by combining NLP
techniques
¡ Computationally more expensive than standard topic modeling
¡ Clinical text
¡ Similarity measure based on topic distributions of patients for information
retrieval [Arnold10]
43

Topic Modeling for Text Classification
44
¡ Topic modeling for text classification
¡ Comparison of vector space model, LSA, and LDA for text classification
with SVM [Liu11]
¡ Performance of SVM with LDA supersedes vector space model and LSA
¡ Keyword selection based on entropy of words in topics [Zhang08]
¡ Text classification with topic vectors with fixed number of topics
¡ In addition to BoW [Banerjee08] and instead of BoW [Sriurai11]
¡ Topic modeling based resampling instead of random sampling for
imbalanced classes [Chen11]
¡ Multi-label text classification using topic model [Rubin12]
¡ Our approaches can be transformed into a multi-class by standard techniques
¡ Our approach: utilize topic modeling for clinical text classification
using LDA

Dissertation Research and
Results

Orbital Dataset
¡ CT imaging reports for patients suffering traumatic orbital injury
[Yadav12]
¡ Each report was dictated by a staff radiologist
¡ Outcomes were extracted by a trained data abstractor
¡ Positive for acute orbital fracture
¡ Negative for acute orbital fracture
¡ Among the 3,705 orbital CT reports, 3,242 were negative and 463 were
positive
46

Pediatric Dataset
47
¡ Prospectively collected patient CT report data for pediatric
traumatic brain injury [Kuppermann09]
¡ Obtained at emergency department clinicians discretion and were
interpreted by site faculty radiologists
¡ The outcome of interest was extracted by a trained data abstractor
¡ Positive for traumatic brain injury
¡ Negative for traumatic brain injury
¡ Among the 2,126 pediatric head CT reports, 1,973 were negative and
153 were positive

Dataset Preparation
48
¡ Training and testing datasets
¡ Resampled datasets
Original Undersampled Oversampled
Dataset Pos Neg Total Pos Neg Total Pos Neg Total
Orbital 463 3,242 3,705 463 463 926 1,895 1,810 3,705
Pediatric 153 1,973 2,126 153 151 304 1,094 1,032 2,126
Proportions
Dataset 75% 66% 50% 34% 25%
Orbital 2,778 2,445 1,852 1,259 926
Pediatric 1,594 1,403 1,063 722 531

Raw Text Classification of
Clinical Reports
49

Raw Text Classification
Patient
Reports
Labels
Preprocess SVM
NextPrevious
BoW
Decision
Tree
Zero Rule

Decision Tree for Pediatric Dataset
51

Raw Text Classification Results
52
Orbital Pediatric
Algorithm Test (%) Precision Recall F-score Precision Recall F-score
ZR 76.57 87.50 81.67 86.12 92.80 89.34
DT
25 93.08 93.71 93.40 94.09 94.16 94.13
34 93.64 94.25 93.94 94.13 94.25 94.19
50 92.93 93.32 93.12 94.14 94.20 94.17
66 92.87 93.44 93.15 93.56 93.65 93.60
75 92.46 93.02 92.74 93.39 93.53 93.46
SVM
25 94.24 94.29 94.27 95.53 95.56 95.55
34 94.14 94.23 94.18 95.78 95.80 95.79
50 94.28 94.28 94.27 95.88 95.90 95.89
66 93.81 93.85 93.83 95.49 95.60 95.55
75 93.46 93.49 93.48 95.42 95.53 95.47

NLP-based Classification of
Clinical Reports
53

NLP-based Classification
Patient
Reports
Labels
NLP SVM
NextPrevious
Post-process All Features
Filtered
Features
Decision
Tree

MedLEE Lexicon Extension
Table 4.2: List of terms added to MedLEE lexicon for the orbital dataset
Term Category Target Form UMLS CUI
ramus Body location ramus of mandible C0222748
angle Body location angle of mandible C0222753
body Body location body of mandible C0222746
intraorbit Body location ocular orbit C0029180
nasal cavity Body location nasal cavity C0027423
mastoid air cell Body location pneumatic mastoid cell C0229427
pterygoid plate Body location pterygoid process C0222730
lamina papyracea Body location orbital plate of ethmoid bone C0222699
lamina paprycea Body location orbital plate of ethmoid bone C0222699
LeFort Finding Le Fort’s fracture C0272464
LeFort I Finding Le Fort’s fracture, type I C0435328
LeFort II Finding Le Fort’s fracture, type II C0435329
LeFort III Finding Le Fort’s fracture, type III C1402218
LeFort Type I Finding Le Fort’s fracture, type I C0435328
LeFort Type II Finding Le Fort’s fracture, type II C0435329
LeFort Type III Finding Le Fort’s fracture, type III C1402218
premaxilla Body location premaxillary bone C0687094
supraorbital Body location supraorbital C0230002
preorbital Body location periorbital C0230064
depressed fracture Finding depressed fracture C0332759
maxillary sinus Body location maxillary sinus C0024957
emphysema Finding subcutaneous emphysema C0038536
ramus fracture Finding Closed fracture of ramus of mandible C0272469
angle fracture Finding Mandible angle fracture C0746383
body fracture Finding Closed fracture of body of mandible C0272470
lamina papyracea fracture Finding Fracture of orbital plate of ethmoid bone C1264245
maxillary sinus fracture Finding sinus maxillaris fracture C1409796
orbital floor fracture Finding Fracture of orbital floor C0149944
nasal bone fracture Finding Fractured nasal bones C0339848
tripod fracture Finding Closed fracture of zygomatic tripod C1264249
on the status modifier, historical or chronic findings were filtered out. For instance, findings
with status modifier values active, recent were kept and the ones with previous, past were
44
Table 4.3: List of terms added to MedLEE lexicon for the pediatric dataset
Term Category Target Form UMLS CUI
mass effect Finding cerebral mass effect C0186894
shift Finding midline shift of brain C0576481
midline shift Finding midline shift of brain C0576481
extraaxial hemorrhage Finding enlarged extraaxial space on brain imaging C3280298
extra-axial hemorrhage Finding enlarged extraaxial space on brain imaging C3280298
extraaxial fluid collection Finding enlarged extraaxial space on brain imaging C3280298
extra-axial fluid collection Finding enlarged extraaxial space on brain imaging C3280298
extra-axial collection Finding enlarged extraaxial space on brain imaging C3280298
extraaxial collection Finding enlarged extraaxial space on brain imaging C3280298
extraaxial hematoma Finding enlarged extraaxial space on brain imaging C3280298
extra-axial hematoma Finding enlarged extraaxial space on brain imaging C3280298
sulcus Body location sulcus of brain C0228177
parenchymal hemorrhage Finding parenchymal hemorrhage C0747264
ventricle Body location cerebral ventricles C0007799
sutures Body location joint structure of suture of skull C0010272
ischemic changes Finding cerebral ischemia C0917798
depressed Finding depressed fracture C0332759
depression Finding depressed fracture C0332759
depressed fracture Finding depressed fracture C0332759
nasopharyngeal passage Body location entire nasal passage C1283892
filtered out. Finally, the section names were also checked and sections corresponding to the
patient history were excluded.
4.2.3 Classification
Both of the feature sets from the NLP output were converted into BoW representation.
Orbital Dataset
Pediatric Dataset

NLP Feature Selection
¡ Feature set 1: All
¡ Only problems with body locations
¡ Excluded findings: procedure, device, technique, etc.
¡ Feature set 2: Filtered
¡ Subset of feature set 1
¡ Current and highly probable findings based on modifiers
¡ Certainty modifier
¡ Included: high certainty, moderate certainty,…
¡ Included with a preceding ‘no_’: low certainty, negative
¡ Excluded: rule out
¡ Status modifier
¡ Included: active, recent…
¡ Excluded: previous, past…
¡ Section modifier
¡ Included: Indications and findings
¡ Excluded: past history
56

NLP Feature Selection
Include the finding with a
preceding no_ Include the finding as it is
Find all problems with body
locations
Filtered
Not past
history &
high
certainty
Negated
Y N
Y
N
Y
Exclude
N

Sample MedLEE Filtered Features
value, and negative predictive value, with 95% confi-
dence intervals (CIs).
tivity, knowing that the CARN research node head CT
data set has a positive TBI CT report prevalence of
7.2%. Assuming sensitivity and specificity of the system
Findings:
Extracranial, subcutaneous hyperdense hematoma is seen along
the right parietal region with underlying minimally
depressed right parietal skull fracture.
MedLEE structured text:
<problem v = "hematoma" code = "UMLS:C0018944_hematoma">
<bodyloc v = "subcutaneous"><region v = "extracranial">
</region></bodyloc>
<certainty v = "high certainty"></certainty>
<problemdescr v = "hyperdensity"></problemdescr>
<region v = "region"><region v = "parietal"><region v =
"right"></region></region></region>
<code v = "UMLS:C0018944_hematoma"></code>
<code v = "UMLS:C0520532_subcutaneous hematoma"></code>
</problem>
<problem v = "fracture" code = "UMLS:C0016658_fracture">
<bodyloc v = "skull" code = "UMLS:C0037303_bone structure of
cranium"> <region v = "parietal"><region v = "right">
</region></region>
<code v = "UMLS:C0037303_bone structure of cranium"></code>
</bodyloc>
<certainty v = "high certainty"></certainty>
<change v = "depressed"><degree v = "low degree"></degree>
</change>
<code v = "UMLS:C0016658_fracture"></code>
<code v = "UMLS:C0037304_skull fractures"></code>
<code v = "UMLS:C0272451_fracture of parietal bone
(disorder)"></code>
Filtered Feature Selection:
hematoma subcutaneous
C0018944 hematoma
C0520532 subcutaneous hematoma
fracture skull
C0037303 bone structure of cranium
C0016658 fracture
C0037304 skull fractures
C0272451 fracture of parietal bone (disorder)
Figure 2. Sample MedLEE and filtered feature outputs. MedLEE = Medical Language Extraction and Encoding.
174 Yadav et al. • AUTOMATED OUTCOME CLASSIFICATION OF CT REPORTS FOR PEDIATRIC TBI

Decision Tree for Orbital Dataset using
NLP All Features
59

Decision Tree for Orbital Dataset using
NLP Filtered Features
60

Raw Text vs NLP Features
Baseline Decision Tree SVM
Text
NLP
Text
NLP
All Filtered All Filtered
Precision 76.57 93.64 96.28 96.53 94.28 96.13 96.96
Recall 87.50 94.25 96.33 96.59 94.28 96.14 97.00
F-Score 81.67 93.94 96.30 96.56 94.28 96.14 96.98
Baseline Decision Tree SVM
Text
NLP
Text
NLP
All Filtered All Filtered
Precision 86.12 94.13 95.21 96.63 95.88 96.74 97.13
Recall 92.80 94.25 95.46 96.80 95.90 96.90 97.25
F-Score 89.34 94.19 95.34 96.55 95.88 96.73 97.10
Orbital Dataset
Pediatric Dataset
61

Raw Text vs NLP-based Classification
62
Orbital Dataset Pediatric Dataset

Classification Errors
chest radiographs on a consecutive cohort of 1,277 neo-
nates to detect pneumonia.12
Although this meant each
unique patient was the source of multiple reports and
there was a low prevalence of positive cases (seven
cases, 0.5%), MedLEE and expert rules identified pneu-
monia with a sensitivity of 0.71 and specificity of 0.99.
The best performance was found in a study comparing
the precision and accuracy of NegEx (Negative Identifi-
cation for Clinical Conditions, https://code.google.com/
p/negex/) and SQL Server 2008 Free Text Search
(Microsoft Corp., Redmond, WA) to identify acute frac-
tures in 400 randomly selected extremity and spine
radiograph reports.23
Although the expert rules were
constructed to broadly identify any acute fracture and
there was a low prevalence of positive cases (13 cases,
3.25%), NegEx performance was perfect, while modified
SQLServer also did well (precision = 1.00, recall = 0.92;
F-score = 0.96).
This study improves on previous work in two ways.
First, we achieved similar performance using data
sourced from real-world clinical settings despite being
the first to our knowledge to use machine learning for
outcome identification. Second, we analyzed the largest
number of unique patients to date.
In selecting MedLEE for our hybrid approach, we did
consider other available NLP tools. Alternative medical
NLP tools, such as the open-source Clinical Text Analy-
Table 3
Performance of Automated Classification Compared to Physi-
cian Raters
Study and
Coding Method Sensitivity Specificity
This study,
hybrid automated
(95% CI)
0.933 (0.897–0.959) 0.969 (0.964–0.973)
Table 2
Classification Errors (Combination of Training and Test Sets)*
Cause Frequency (%)
Nonorbital fracture 32 (31.4)
Final reading disagrees
with preliminary reading
19 (18.6)
Vague certainty 9 (8.8)
Fracture acuity 9 (8.8)
Recent facial fracture surgery 6 (5.9)
MedLEE miscoding 5 (4.9)
Other†
22 (21.6)
*Total sample of 3,710. Errors total 102 instances (2.7%).
†Includes dictation error, filtering error, fracture implied but
not stated, and miscellaneous poor wording.
MedLEE = Medical Language Extraction and Encoding.
852 Yadav et al. • AUTOMATED CLASSIFICATION OF CT REPORTS
the CT report (“hidden”), and so the automated classifi-
cation system suffered. In our study, misclassification
analysis revealed a general problem with report ambi-
guity (Table 3). In addition, certain aspects of the injury
findings of the PECARN TBI criteria (such as degree
of displacement of a skull fracture) were often not
explicitly reported and therefore difficult to detect by
our automated classification approach, leading to an
a
m
st
p
re
d
a
sa
th
te
re
et
ce
th
se
a
W
a
ca
w
le
d
fi
em
fo
st
Figure 4. Decision tree. TBI = traumatic brain injury.
Table 3
Misclassification Categorization (From Both Test and Training
Sets)
Misclassification Reason Number (%)
False negatives (from 1,829 coded negative) 7 (0.4)
Decision tree misclassification 7 (100)
False positives (from 292 coded positive) 147 (50.3)
Abnormal but not PECARN TBI 53 (36.1)
Report ambiguity 12 (8.2)
Report dictation error 6 (4.1)
Text conversion error 3 (2.0)
MedLEE misread 27 (18.4)
Decision tree misclassification 46 (31.3)
MedLEE = Medical Language Extraction and Encoding;
PECARN = Pediatric Emergency Care Applied Research Net-
work; TBI = traumatic brain injury.
Orbital Dataset Pediatric Dataset

Topic Modeling-based
Classification of Clinical
Reports

Topic Modeling-based Classification
Patient
Reports
Labels
Preprocess CTCTopic
Modeling
BTC
DT
SVM
STC
ATC
Topic
Vectors

Topic Vectors
66
¡ Compact representation of documents via topics
¡ Each document is represented by a vector of its topic proportions
¡ Number of topics k
¡ Needs to be determined empirically
¡ Total number of attributes for orbital and pediatric datasets ~1300-1500
¡ Without preprocessing: 6K to 9K
¡ Number of topics: 5-150
¡ Dimension reduction achieved:
¡ Orbital dataset: 88.4% - 99.6%
¡ Pediatric dataset: 90.0% - 99.7%
k = 3,d1 =
0.2
0.5
0.3
!
"
#
#
#
$
%
&
&
&
,d2 =
0.3
0.1
0.6
!
"
#
#
#
$
%
&
&
&
DimensionReduction(%) =
attributes − topics∑∑
attributes∑

Baseline Topic Classifier (BTC)
Topic Orbital Pediatric
0 acute, report, axial, facial, findings contrast, head, report, evidence, intracranial
1 left, right, maxillary, fracture, sinus fracture, findings, tissue, soft, impression
¡ Topic model is built using K=|C| where |C| is the total number of
classes
¡ The topic with the higher probability is
assigned as the predicted class
d1 =
0.2
0.8
!
"
#
$
%
&⇒ Positive
d2 =
0.7
0.3
!
"
#
$
%
&⇒ Negative
67

BTC Results
68
¡ Oversampled and undersampled datasets
Orbital Pediatric
Dataset Algorithm Precision Recall F-score Precision Recall F-score
Original
ZR
76.6 87.5 81.7 86.1 92.8 89.3
BTC
88.6 73.4 80.3 83.3 59.4 69.3
Undersampled
ZR
49.6 49.7 49.7 25.3 50.3 33.7
BTC
84.4 84.2 84.3 72.6 64.6 68.4
Oversampled
ZR
26.2 51.1 34.6 26.5 51.5 35.0
BTC
83.4 82.5 82.9 73.3 66.7 69.8

Topic Vector Classifier
Train topic model
Build classifiers using
decision tree and SVM
Merge the documents in
topic vector representation
with their classes

Topic Vector Classifier
70
Decision Tree SVM
Rank K Test
Precisio
n Recall F-score K Test
Precisio
n Recall F-score
1 50 34 95.38 95.31 95.35 150 25 96.25 96.33 96.27
2 25 75 95.00 95.07 95.03 150 66 96.08 96.16 96.11
3 75 34 95.07 95.00 95.03 150 50 96.07 96.17 96.10
Decision Tree SVM
Rank K Test
Precisio
n Recall F-score K Test
Precisio
n Recall F-score
1 15 25 95.79 96.05 95.87 15 75 96.11 96.16 96.13
2 15 50 95.50 95.58 95.54 15 50 96.00 96.23 96.06
3 15 66 95.53 95.51 95.52 15 66 96.00 96.23 96.06
Pediatric dataset
Orbital dataset

Confidence-based Topic Classifier (CTC)
Train topic model
For each topic T and class C
calculate confidence
conf(T=>C)
Select the topic t with
biggest confidence for
positive class
Pick a threshold th on the
selected topic t
Infer the topic
distribution
Find the selected topic
t’s value v
Predict
positive
v > th
Predict
negative
Y N
TRAINING DATASET TESTING DATASET
with their classes
sup(T) =
NT
N
conf (T ⇒ C) =
sup(T ∪C)
sup(T)

Similarity-based Topic Classifier (STC)
Train topic model
For each class, calculate
average of topic
distributions
Infer the topic
distribution
Predict
positive
Similar to
positive
class?
Predict
negative
with their classes
Compute similarity
Y N
cos(θ) =
x⋅ y
x y

Aggregate Topic Classifier (ATC)
Train topic model
For each class, calculate
average of topic
distributions
Select the topic t with
maximum difference
between classes
Pick a threshold th on the
selected topic t
Infer the topic
distribution
Find the selected topic
t’s value: v
Predict
positive
v > th
Predict
negative
Y N
with their classes

CTC, STC, ATC Results
74
CTC STC ATC
Rank K Test P R F K Test P R F K Test P R F
1 150 66 93.84 94.29 94.01 150 25 96.58 93.96 94.79 20 34 96.16 96.26 96.20
2 75 25 93.69 93.4 93.53 75 34 95.88 93.62 94.34 15 34 96.18 95.7 95.88
3 75 34 94.07 93.07 93.48 30 34 95.65 93.34 94.09 20 25 96.14 95.66 95.81
CTC STC ATC
Rank K Test P R F K Test P R F K Test P R F
1 75 50 93.65 93.56 93.61 75 34 95.72 95.39 95.51 15 66 96.07 96.03 96.05
2 50 50 93.86 92.86 93.20 75 66 95.69 95.22 95.37 20 66 96.07 95.95 95.97
3 30 34 93.86 92.77 93.14 15 66 95.93 95.09 95.33 15 75 95.47 95.43 95.45
Orbital dataset
Pediatric dataset

Topic Modeling-based Classifiers’ Results
75
Report: Untitled 282 Page 1 of 2
Fscore
0.78
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
25
TestRatio(%)
0 25 50 75 100 125 150
K
SVM
ATC
STC
CTC
DT
Graph Builder
Report: Untitled 198 Page 1 of 2
Fscore
0.78
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
34
TestRatio(%)
0 25 50 75 100 125 150
K
SVM
ATC
STC
CTC
DT
Graph Builder
Orbital Dataset
Pediatric Dataset

Overall Classification Performances
Orbital Pediatric
Algorithm Precision Recall F-score Precision Recall F-score
Baseline
ZR 76.6 87.5 81.7 86.1 92.8 89.3
BTC 88.6 73.4 80.3 83.3 59.4 69.3
Raw Text
DT 93.6 94.3 93.9 94.1 94.3 94.2
SVM 94.3 94.3 94.3 95.9 95.9 95.9
NLP-based
DT-All 96.3 96.3 96.3 95.2 95.5 95.3
DT-Filtered 96.5 96.6 96.6 95.5 95.8 95.7
SVM-All 96.1 96.1 96.1 96.7 96.9 96.8
SVM-
Filtered
97.0 97.0 97.0 97.1 97.3 97.2
TM-based
DT 95.4 95.3 95.4 95.8 96.1 95.9
SVM 96.3 96.3 96.3 96.1 96.2 96.1
CTC 93.7 93.6 93.6 93.8 94.3 94.0
STC 95.7 95.4 95.5 96.6 94.0 94.8
ATC 96.1 96.0 96.1 96.2 96.3 96.2

Best Classification Performances
77
Algorithm Precision Recall F-score
Baseline ZR 76.6 87.5 81.7
Raw Text SVM 94.3 94.3 94.3
NLP-based SVM-Filtered 97.0 97.0 97.0
TM-based SVM 96.3 96.3 96.3
Algorithm Precision Recall F-score
Baseline ZR 86.1 92.8 89.3
Raw Text SVM 95.9 95.9 95.9
NLP-based SVM-Filtered 97.1 97.3 97.1
TM-based ATC 96.2 96.3 96.2
Orbital Dataset
Pediatric Dataset

Discussion
78
¡ NLP-based classification approaches
¡ Best classification performance among all classifiers
¡ Needs more customizations
¡ Topic modeling-based classification approaches
¡ Provides dimension reduction
¡ Performs better than raw text classification
¡ Competitive with NLP-based classification
¡ More general framework than NLP-based classifiers

Summary
80
¡ Large amounts of electronic clinical data have become available
with the increased use of Electronic Health Records (EHR)
¡ Automated processing of these records could benefit both the
patient and the provider
¡ Speeding up the decision process and reducing costs
¡ Classifiers that can automatically predict the outcome from raw
text clinical reports
¡ Raw text classification of clinical reports
¡ NLP-based classification of clinical reports
¡ Topic modeling-based classification of clinical reports

Significance and Contributions
81
¡ Addressing issues specific to automated processing of clinical text
¡ Unstructured data
¡ Medical terms
¡ Real world dataset
¡ Natural Language Processing-based classification of clinical reports
¡ Selecting and adjusting a biomedical NLP tool
¡ Best ways to extract NLP features for better classification
¡ Topic Modeling-based classification of clinical reports
¡ A more general framework than NLP-based solutions
¡ Utilizing an unsupervised technique, i.e. topic modeling, in a supervised
fashion, i.e. classification
¡ Topic modeling-based classifiers: BTC, CTC, STC, and ATC

Research Impacts
82
¡ Improve quality and efficiency of healthcare
¡ The classifiers can be used to automatically predict the conditions
in a clinical report
¡ Can replace the manual review of clinical reports, which can be time
consuming and error-prone
¡ Clinicians can have more confidence in utilizing such systems in
real life settings
¡ Increased accuracy and interpretability

83
References
1. K. Yadav, C. E, H. JS, A. Z, N. V, and G. P. et al., “Derivation of a clinical risk score for traumatic orbital fracture,” 2012,
in Press.
2. V. Garla, V. L. R. III, Z. Dorey-Stein, F. Kidwai, M. Scotch, J. Womack, A. Justice, and C. Brandt, “The Yale cTAKES
extensions for document classification: architecture and application.” JAMIA, 2011
3. C. Friedman, “A broad-coverage natural language processing system,” Proc AMIA Symp, pp. 270–274, 2000.
4. T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” in
Proceedings of the 10th European Conference on Machine Learning, ser. ECML ’98., pp. 137–142.
5. F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, 2002
6. Aronson, A. R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA
Symp (2001), 17–21.
7. Aronson, A. R., and Lang, F.-M. An overview of metamap: historical perspective and recent advances. Journal of the
American Medical Informatics Association 17, (2010).
8. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,”
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, vol. 41, no. 6, pp. 391–407, 1990.
9. T. Hofmann, “Probabilistic latent semantic analysis,” in UAI, 1999.
10. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar.
2003.

84
References – Continued
11. Z. Zhang, X.-H. Phan, and S. Horiguchi, “An Efficient Feature Selection Using Hidden Topic in Text Categorization.”
AINAW ’08.
12. W. Sriurai, “Improving Text Categorization by Using a Topic Model,” Advanced Computing: An International Journal
(ACIJ), 2011.
13. E. Chen, Y. Lin, H. Xiong, Q. Luo, and H. Ma, “Exploiting probabilistic topic models to improve text categorization
under class imbalance,” Inf. Process. Manage., 2011.
14. S. Banerjee, “Improving text classification accuracy using topic modeling over an additional corpus,”, SIGIR ’08.
15. Arnold, C. W., El-Saden, S. M., Bui, A. A. T., and Taira, R. Clinical Case-based Retrieval Using Latent Topic Analysis.
AMIA Annu Symp Proc 2010 (2010), 26–30.
16. H. M. Wallach, “Topic modeling: beyond bag-of-words.”, ICML ’06.
17. Griffiths, T. L., Steyvers, M., Blei, D. M., and Tenenbaum, J. B. Integrating topics and syntax. In In Advances in Neural
Information Processing Systems 17 (2005), MIT Press, pp. 537–544.
18. Boyd-Graber, J. L., and Blei, D. M. Syntactic topic models. CoRR abs/1002.4665 (2010).
19. N. Kuppermann, J. F. Holmes, P. S. Dayan, J. D. J. Hoyle, S. M. Atabaki, R. Holubkov, F. M. Nadel, D. Monroe, R. M.
Stanley, D. A. Borgialli, M. K. Badawy, J. E. Schunk, K. S. Quayle, P. Mahajan, R. Lichenstein, K. A. Lillis, M. G. Tunik, E. S.
Jacobs, J. M. Callahan, M. H. Gorelick, T. F. Glass, L. K. Lee, M. C. Bachman, A. Cooper, E. C. Powell, M. J. Gerardi, K.
A. Melville, J. P. Muizelaar, D. H. Wisner, S. J. Zuspan, J. M. Dean, and S. L. Wootton-Gorges, “Identification of children
at very low risk of clinically-important brain injuries after head trauma: a prospective cohort study.” Lancet, vol. 374,
no. 9696, pp. 1160–1170.
20. T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers, “Statistical topic models for multi-label document classification,”
Mach. Learn., vol. 88, no. 1-2, pp. 157–208

85
References – Continued
21. Z. Liu, M. Li, Y. Liu, and M. Ponraj, “Performance evaluation of Latent Dirichlet Allocation in text mining,” in Fuzzy
Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on, vol. 4, pp. 2695 –2698.
22. W. W. Chapman, D. Hillert, S. Velupillai, M. Kvist, M. Skeppstedt, B. E. Chapman, M. Conway, M. Tharp, D. L. Mowery,
and L. Deleger, “Extending the negex lexicon for multiple languages.” Stud Health Technol Inform, vol. 192, pp. 677–
681, 2013.
23. E. A. Mendonca, J. Haas, L. Shagina, E. Larson, and C. Friedman, “Extracting information on pneumonia in infants
using natural language processing of radiology reports.” J Biomed Inform, vol. 38, no. 4, pp. 314–321.
24. H. Harkema, J. N. Dowling, T. Thornblade, and W. W. Chapman, “Context: an algorithm for determining negation,
experiencer, and temporal status from clinical reports.” J Biomed Inform, vol. 42, no. 5, pp. 839–851.
25. H. Misra, F. Yvon, J. M. Jose, and O. Cappe, “Text segmentation via topic modeling: an analytical study,” in
Proceedings of the 18th ACM conference on Information and knowledge management, ser. CIKM ’09, pp. 1553–
1556.
26. J.-H. Yeh and C.-H. Chen, “Protein remote homology detection based on latent topic vector model,” in Networking
and Information Technology (ICNIT), 2010 International Conference on, pp. 456 –460.
27. J. A. Womack, M. Scotch, C. Gibert, W. Chapman, M. Yin, A. C. Justice, and C. Brandt, “A comparison of two
approaches to text processing: facilitating chart reviews of radiology reports in electronic medical records.”
Perspect Health Inf Manag,vol. 7, p. 1a, 2010.
28. O. Bodenreider, “Using UMLS semantics for classification purposes.” Proc AMIA Symp, pp. 86–90, 2000.
29. F.-F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Proceedings of the
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2. CVPR
’05. 2005, pp. 524–531.
30. G. Hripcsak, J. H. M. Austin, P. O. Alderson, and C. Friedman, “Use of natural language processing to translate clinical
information from a database of 889,921 chest radiographic reports.” Radiology, vol. 224, no. 1, pp. 157–163.

References
86
31. Y. Huang, H. J. Lowe, D. Klein, and R. J. Cucina, “Improved identification of noun phrases in clinical radiology reports
using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.” J Am Med
Inform Assoc, vol. 12, no. 3, pp. 275–285, May-Jun 2005.
32. V. N. Vapnik, Statistical learning theory, 1st ed. Wiley, Sep. 1998.
33. J. H. Friedman, “Another approach to polychotomous classification,” Department of Statistics, Stanford University, Tech.
Rep., 1996.
34. T. Hastie and R. Tibshirani, “Classification by Pairwise Coupling,” 1998.
35. Stanford NLP: http://nlp.stanford.edu/software/index.shtml
36. Open NLP: http://opennlp.apache.org

Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches

Similar to Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches (20)

More from Efsun Kayi

More from Efsun Kayi (6)

Recently uploaded

Recently uploaded (20)

Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches