SlideShare a Scribd company logo
1 of 86
Download to read offline
Effective Classification of Clinical Reports:
Natural Language Processing-Based and
Topic Modeling-Based Approaches
Department of Computer Science
School of Engineering and Applied Science
The George Washington University
Efsun Sarioglu Kayi
Outline
¡ Introduction
¡ Research Objective
¡ Scope
¡ Proposed System
¡ Background and Related Work
¡ Natural Language Processing (NLP)
¡ Topic Modeling
¡ Text Classification
¡ Dissertation Research and Results
¡ Clinical Dataset
¡ Raw Text Classification of Clinical Reports
¡ NLP-based Classifiers
¡ Topic Modeling-based Classifiers
¡ Summary and Contributions
2
Introduction
Electronic Health Records (EHR)
¡ Large amounts of clinical data have become available in
Electronic Health Records (EHR)
¡ Patient reports in free text
¡ Not directly usable in automated systems
¡ Clinical decision support systems
¡ Automated advice to providers by examining the EHRs
¡ Help medical professionals make clinical decisions faster and with more
confidence
¡ Improve the quality and efficiency of healthcare
¡ Prevent medical errors
¡ Reduce healthcare costs
¡ Increase administrative efficiencies
¡ Decrease paperwork
4
Clinical Reports
¡ Free text
¡ Not directly usable for automated processing
¡ Ambiguity in natural language
¡ Medical terms
¡ Not common in daily language
¡ Many synonyms
¡ Need for standardization
¡ Context sensitivity
¡ Patient history
¡ Ruled-out diagnosis
¡ Case study: Automatic classification of Emergency Medicine
computed tomography (CT) imaging reports into binary categories
5
Research Objective
¡ Objective: Given a list of patient reports, automatically classify them
into user-defined categories efficiently and effectively
¡ Natural Language Processing (NLP) tools transform the text into
structured data that can be used more easily by automated processes
¡ Context information in text is critical to classification performance [Garla11]
¡ Requires manual customization for each domain
¡ Discharge summaries, radiology, and mammography reports
¡ A more general and compact representation can be achieved by
topic model of patient reports
¡ Biomedical concepts are typically nouns/noun phrases [Huang05]
¡ Nouns, compared to other parts of speech, form topics [Griffiths04]
¡ Can be adapted to new applications/domains more easily
6
Proposed System
¡ Automated classification of clinical reports into categories
¡ Binary categories,
¡ presence/absence of fracture
¡ Multi-categories
¡ types of fractures, e.g., facial, orbital, etc.
¡ Clinical report representation
¡ Natural Language Processing (NLP)
¡ Mapping of medical terms to standard medical dictionaries
¡ Context modifiers such as probability, negation, and time
¡ Topic Modeling
¡ A more general and compact representation of reports based on their topic
distributions
7
Proposed System Overview
Patient
Reports
LabelsPreprocess NLP
SVM
Topic
Modeling
Decision Tree
Next
BoW
Topic
Vectors
NLP
Features
Topic Modeling-
based Classifiers
Scope
9
¡ Classification of text/discrete data
¡ Continuous data can be discretized using supervised or unsupervised
techniques before doing the classification
¡ Binary classification
¡ Multi-class datasets can be used with binary classifiers using techniques
such as All-vs-One (AVO) [Vapnik98] or All-vs-All (AVA) [Friedman96,
Hastie98] classification
Background
Natural Language Processing (NLP)
Natural Language Processing (NLP)
¡ Techniques for understanding the syntactic and semantic relations
that exist in natural language
¡ Part-Of-Speech (POS) tagging
¡ Assigns each word its syntactic class
¡ Noun, verb, adjective, etc.
¡ Dependency parsing
¡ Finds the syntactic representation of a given sentence
¡ Dependencies between its words with labels showing grammatical relations
¡ Subject of a clause, object of a verb phrase
11
Biomedical NLP
¡ General NLP tools, such as Stanford NLP, trained on general English
data, i.e. news
¡ Not suitable to be used in biomedical domain
¡ Biomedical data
¡ Medical terms
¡ Synonyms: ocular --> eye, eyes, optic, opthalmic, opthalmia, oculus, oculi
¡ Context sensitivity
¡ Temporal, negation, and certainty status of clinical terms
¡ Biomedical NLP tools
¡ Use biomedical vocabularies and translate clinical text into coded
descriptions suitable for automated processing
¡ Medical Language Extraction and Encoding (MedLEE) [Friedman00]
¡ Standard representation via Unified Medical Language System (UMLS)
¡ Modifiers for each clinical term to evaluate context
12
Unified Medical Language System (UMLS)
¡ Repository of vocabularies in biomedical sciences developed and
maintained by the National Library of Medicine (NLM)
¡ 6.4 million unique terms
¡ 1.3 million unique concepts
¡ More than 119 families of biomedical vocabularies
¡ Three knowledge sources
¡ Metatheasurus
¡ Multi-lingual vocabulary organized by concepts
¡ Links alternative names and views of the same concept from different vocabularies
¡ Each concept is given a unique ID called Concept Unique Identifier (CUI)
¡ Semantic Network
¡ Semantic types and relations to categorize and disambiguate concepts
¡ Semantic types: medical device, clinical drug, and laboratory
¡ Semantic relations: treats, diagnoses, and contains
¡ Specialist Lexicon
¡ General English and biomedical vocabulary
13
UMLS Metathesaurus
14
¡ “Orbital fractures” with CUI: C0029184
MedLEE System Overview
Lex CodingMappingGrammar
Site-Specific
Tailored
Preprocessor
Parser EncoderComposer
Next
Patient Report
Structured
output
Previous
Raw Text
Impression: Right lamina papyracea fracture. No evidence of entrapment.
MedLEE Output
<sectname v = "report impression item"></sectname>
<sid idref = "s7"></sid>
<code v = "UMLS:C0016658_Fracture"></code>
<problem v = "entrapment" code = "UMLS:C1285497_Entrapment (morphologic abnormality)">
<certainty v = "no"></certainty>
</problem>
Sec Cru
Ab
UMLS: C0029184
Le Fort's fracture: UMLS: C0272464
Orbital fracture
MedLEE Findings and Modifiers
16
¡ Maps its findings to UMLS CUIs
¡ Problems (e.g. fracture, pain, trauma)
¡ Procedures (e.g. image, computerized axial tomography)
¡ Device (e.g. tube)
¡ Lab tests (e.g. blood type)
¡ Assigns modifiers to each finding for more context
¡ Body locations (e.g. head, brain, face)
¡ Section names (e.g. impression, findings)
¡ Certainty (e.g. high/moderate/low certainty) and
¡ Status (e.g. acute, new, recent, chronic, past, history)
Background
Topic Modeling
Topic Modeling
¡ Automatically finds out themes/topics of a document collection
¡ Low dimensional representation that captures the semantics
¡ A way to browse an unstructured collection in a structured way
¡ Topic Modeling techniques
¡ Matrix decomposition
¡ Latent Semantic Analysis (LSA) [Deerwester90]
¡ Probabilistic topic modeling
¡ Probabilistic Latent Semantic Analysis (PLSA) [Hoffman99]
¡ Latent Dirichlet Allocation (LDA) [Blei03]
18
Latent Dirichlet Allocation (LDA)
¡ Topic is defined as a probability distribution over entire vocabulary
¡ Documents can exhibit multiple topics with different proportions
¡ Given the observed words in a set of documents and the total
number of topics
¡ Finds out the topic model that is most likely to have generated the data
¡ For each topic, its probability distribution over words
¡ For each document, its probability distribution over topics
¡ The topic responsible for generating each word in each document
19
Sample Topic Model
Documents
Topic
Modeling
NextPrevious
axial 0.07
structure 0.05
Images 0.02
...
maxilary 0.04
coronal 0.02
intraorbital 0.01
…
intact 0.05
evidence						0.02
entrapment	0.01
…
Topic 1
Topic 2
Topic 3
Axial1 images1 and	
coronal2 images1…
The	intraorbital1
structures1 are	
intact3…
No	evidence3 of	
entrapment3…
Document 1
Document 2
Document 3
.
.
.
K=3
Doc 3
Doc 2
Doc 1
Topic Model Training
Calculate a score s
Sample a new topic tnewfor word
w in document d based on s
Randomly assign a topic for
every word in all of the
documents
For
each
topic
For each
word in
each
document
Repeat
1000 times
Nt|d +α
Nd +Tα
×
Nw|t + β
Nt +Vβ
1
2
3
4
5
6
7
8
9
10
River Stream Bank Money Loan
Topic A Topic B Nd
1 12 0 12
2 3 6 9
3 5 10 15
4 6 9 15
5 8 3 11
6 4 9 13
7 5 3 8
8 6 4 10
9 4 6 10
1
0 4 5 9
Document-topic statistics
Documents
River Stream
Ban
k
Money Loan Nt
Topic
A
4 5 20 9 18 56
Topic B 5 7 20 12 11 55
Topic-word statistics
Initialization
1
2
3
4
5
6
7
8
9
10
Document #5
23
DOC 5
River B
Stream A
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
River Stream Bank Money Loan
Topic Modeling Training
24
DOC 5
River B
Stream A
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
for each possible topic [A, B]
compute
score (A | NA|5, NRiver|A)
score (B | NB|5, NRiver|B)
Topic Modeling Training
25
DOC 5
River B
Stream A
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
for each possible topic [A, B]
sum up the scores
score (A | NA|5, NRiver|A)
score (B | NB|5, NRiver|B)
Z =
Topic Modeling Training
26
DOC 5
River B
Stream A
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
for each possible topic [A, B]
sample a new topic
u = rand() * Z
sample
return t = A
Topic Modeling Training
27
DOC 5
Initial Topics
River A
Stream B
Bank A
Bank A
Bank A
Money B
Money A
Money B
Loan A
Loan A
Loan A
DOC 5
Final Topics
River A
Stream A
Bank A
Bank B
Bank B
Money B
Money B
Money B
Loan B
Loan B
Loan B
Document-Topic Statistics
28
Topic A Topic B
1 12 0
2 3 6
3 5 10
4 6 9
5 8 3
6 4 9
7 5 3
8 6 4
9 4 6
10 4 5
Topic A Topic B
1 0 12
2 2 7
3 0 15
4 0 15
5 3 8
6 4 9
7 2 6
8 8 2
9 10 0
10 8 1
Documents
Topics Topics
1000 iterations
with sampling
Topic-Word statistics
29
River Stream Bank Money Loan
Topic A 4 5 20 9 18
Topic B 5 7 20 12 11
Topics
Word
River Stream Bank Money Loan
Topic A 9 12 16 0 0
Topic B 0 0 24 21 29
Topics
1000
iterations
with
sampling
30
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
River Stream Bank Money Loan
River Stream Bank Money Loan Topic A Topic B
1 12 0
2 3 6
3 5 10
4 6 9
5 8 3
6 4 9
7 5 3
8 6 4
9 4 6
10 4 5
1 0 12
2 2 7
3 0 15
4 0 15
5 3 8
6 4 9
7 2 6
8 8 2
9 10 0
10 8 1
Initial
Final
Document-Topic Statistics
Background
Text Classification
Bag-of-Words (BOW) Representation
32
¡ Document-term matrix where columns are terms and rows are
documents
¡ NxM matrix for N documents and M terms
¡ Pros: Simple, conventional
¡ Cons: Word ordering is lost
¡ Weighting
¡ Binary
¡ Term frequency:
¡ Inverse document frequency:
¡ Combined measure:
idft = log
N
dft
tf −idft,d = tft,d ×idft
tft
Text Classification
¡ Given a set of n training vectors in BoW
representation in d-dimension with binary labels,
a classifier is a binary valued function:
¡ Trivial rejector/zero rule (ZR) classifier
¡ Baseline
¡ Decision Tree
¡ Popular classification algorithm due to its human-interpretable output of
binary tree
¡ Support Vector Machines (SVM)
¡ Shown to perform well in text classification [Joachims98, Sebastiani02]
f : Rd
! →! 0,1{ }
{x1, x2,…, xn}
{y1, y2,…, yn}
33
Decision Tree
¡ Binary tree where nodes represent criteria based on terms and
leaves represent the class label
¡ Build tree top-down: start with all attributes in a one node
¡ Repeat until all leaves are pure
¡ Look at all leaves and all possible splits
¡ Choose the split that most decreases the uncertainty based on entropy, gini index, etc
Fracture > 0
Orbital > 0
Facial >
0
Positive
Negative
Negative
Negative
Y
Y
Y
N
N
N
34
35
Uncertainty Measures
o
+
+
+
+ +
+
+
+
+
+
o
o
o
o
o
o
o
o o
0 1
x
y
1
0.25 0.5 0.75
[0,8][10,2]
[10,10]
x < 0.5
= 12/20 x 10/36 + 8/20 x 0
= 1/6 = 0.16
pL = 12/20 pR = 8/20
uL = 2 x 10/12 x 2/12
= 10/36
uR = 0
Uncertainty before split =
2 x 10/20 x 10/20 = ½ = 0.5
= pLuL + pRuR
= 2p(1− p)Gini index
Uncertainty
after split
u = 0.33 u = 0.38u = 0.16
Support Vector Machines (SVM)
¡ Large-margin classifier
¡ Smallest distance from any example to the decision boundary
¡ Objective: Minimize
¡ Inversely proportional to the margin
¡ Subject to for all
¡ Constraints guarantee that it classifies each sample correctly
!
!
!
!
!
" "
"
"
"
"
#$%&%'%(%
)*+,-,./%0.1/)234%
#$%&%5%(%
#$%&%6%(%
#%
).+17*/8-%920*9*)%!%:%
).+17*/8-%920*9*)%;:%
1
2
θ
2
ytθT
xt ≥1 t =1,…,n
1
θ
Margin =
θ ∈ Rd
Parameter vector:
36
37
Maximum Margin
o
+
+
+
+
+
+
+
+
+
+
+
+
o
o o
o
o
o
o
o o
o
o +
+
o
Evaluation: Classification
¡ Possible cases for binary classification
¡ Precision (P), Recall (R), and F-score measures are used for
evaluating classification performance
Positive Negative
Positive True Positive (TP) False Negative (FN)
Negative False Positive (FP) True Negative (TN)
Predicted class
Actual
class
P =
TP
TP+FP
R =
TP
TP + FN
Fscore =
2× P× R
P+ R
38
Training and Testing
¡ Train the classifier using only a subset of the entire dataset
¡ Optionally, use a validation set to learn the best values of the parameters
¡ Evaluate its performance on unseen test dataset
¡ Stratified: similar class distribution with the original dataset
¡ Run the algorithm several times and take the average of the performances
TR = {d1,d2,…,dTR
, TR > 0}
VD = {dTR +1
,…,dTR +VD+1
, VD ≥ 0}
TS = {dTR +VD+1
,…,dD
, TS > 0}
39
Related Work
NLP Related Work
¡ General NLP
¡ Daily English
¡ Stanford NLP, OpenNLP
¡ Biomedical NLP
¡ Use of medical dictionaries for medical terms and modifiers for
context
¡ MetaMap for UMLS mapping [Aronson01, Aronson10]
¡ Only negation modifier
¡ ConText system [Harkema09]
¡ Temporality, negation and experiencer of a given clinical term in a sentence
¡ Not a complete analysis since it expects the condition to be provided
¡ Clinical Text Analysis and Knowledge Extraction System (cTakes) [Garla11]
¡ History, probability, and negation modifiers
¡ Medical Language Extraction and Encoding (MedLEE) [Friedman00]
¡ More types of modifiers (51) and wider range of modifier values
41
Biomedical NLP
42
¡ UMLS for classification
¡ Classify conditions into disease categories based on inter-concept
relations from UMLS [Bodenreider00]
¡ Expert rules
¡ MedLEE with expert rules for classification instead of machine learning
approaches [Mendonca05, Hripcsak02]
¡ NegEx (Negative Identification for Clinical Conditions) [Chapman13]
and SQL Server 2008 Free Text Search to identify fractures in radiograph
reports using expert rules [Womack10]
¡ Expert rules could be costly to construct
¡ Less generalizable to new clinical areas in contrast to machine learning
approaches
¡ Our approach: utilize MedLEE with machine learning approaches
Topic Modeling Related Work
¡ Applications of topic modeling
¡ Computer vision [Li05], biology [Yeh10], information retrieval [Wei06] and
text segmentation [Misra09]
¡ Structural extensions to topic modeling
¡ N-gram for phrases [Wallach06]
¡ Combined with POS tagging [Griffiths04]
¡ Combined with parse trees [Boyd-Graber08]
¡ Enhances the capability of topic modeling by combining NLP
techniques
¡ Computationally more expensive than standard topic modeling
¡ Clinical text
¡ Similarity measure based on topic distributions of patients for information
retrieval [Arnold10]
43
Topic Modeling for Text Classification
44
¡ Topic modeling for text classification
¡ Comparison of vector space model, LSA, and LDA for text classification
with SVM [Liu11]
¡ Performance of SVM with LDA supersedes vector space model and LSA
¡ Keyword selection based on entropy of words in topics [Zhang08]
¡ Text classification with topic vectors with fixed number of topics
¡ In addition to BoW [Banerjee08] and instead of BoW [Sriurai11]
¡ Topic modeling based resampling instead of random sampling for
imbalanced classes [Chen11]
¡ Multi-label text classification using topic model [Rubin12]
¡ Our approaches can be transformed into a multi-class by standard techniques
¡ Our approach: utilize topic modeling for clinical text classification
using LDA
Dissertation Research and
Results
Orbital Dataset
¡ CT imaging reports for patients suffering traumatic orbital injury
[Yadav12]
¡ Each report was dictated by a staff radiologist
¡ Outcomes were extracted by a trained data abstractor
¡ Positive for acute orbital fracture
¡ Negative for acute orbital fracture
¡ Among the 3,705 orbital CT reports, 3,242 were negative and 463 were
positive
46
Pediatric Dataset
47
¡ Prospectively collected patient CT report data for pediatric
traumatic brain injury [Kuppermann09]
¡ Obtained at emergency department clinicians discretion and were
interpreted by site faculty radiologists
¡ The outcome of interest was extracted by a trained data abstractor
¡ Positive for traumatic brain injury
¡ Negative for traumatic brain injury
¡ Among the 2,126 pediatric head CT reports, 1,973 were negative and
153 were positive
Dataset Preparation
48
¡ Training and testing datasets
¡ Resampled datasets
Original Undersampled Oversampled
Dataset Pos Neg Total Pos Neg Total Pos Neg Total
Orbital 463 3,242 3,705 463 463 926 1,895 1,810 3,705
Pediatric 153 1,973 2,126 153 151 304 1,094 1,032 2,126
Proportions
Dataset 75% 66% 50% 34% 25%
Orbital 2,778 2,445 1,852 1,259 926
Pediatric 1,594 1,403 1,063 722 531
Raw Text Classification of
Clinical Reports
49
Raw Text Classification
Patient
Reports
Labels
Preprocess SVM
NextPrevious
BoW
Decision
Tree
Zero Rule
Decision Tree for Pediatric Dataset
51
Raw Text Classification Results
52
Orbital Pediatric
Algorithm Test (%) Precision Recall F-score Precision Recall F-score
ZR 76.57 87.50 81.67 86.12 92.80 89.34
DT
25 93.08 93.71 93.40 94.09 94.16 94.13
34 93.64 94.25 93.94 94.13 94.25 94.19
50 92.93 93.32 93.12 94.14 94.20 94.17
66 92.87 93.44 93.15 93.56 93.65 93.60
75 92.46 93.02 92.74 93.39 93.53 93.46
SVM
25 94.24 94.29 94.27 95.53 95.56 95.55
34 94.14 94.23 94.18 95.78 95.80 95.79
50 94.28 94.28 94.27 95.88 95.90 95.89
66 93.81 93.85 93.83 95.49 95.60 95.55
75 93.46 93.49 93.48 95.42 95.53 95.47
NLP-based Classification of
Clinical Reports
53
NLP-based Classification
Patient
Reports
Labels
NLP SVM
NextPrevious
Post-process All Features
Filtered
Features
Decision
Tree
MedLEE Lexicon Extension
Table 4.2: List of terms added to MedLEE lexicon for the orbital dataset
Term Category Target Form UMLS CUI
ramus Body location ramus of mandible C0222748
angle Body location angle of mandible C0222753
body Body location body of mandible C0222746
intraorbit Body location ocular orbit C0029180
nasal cavity Body location nasal cavity C0027423
mastoid air cell Body location pneumatic mastoid cell C0229427
pterygoid plate Body location pterygoid process C0222730
lamina papyracea Body location orbital plate of ethmoid bone C0222699
lamina paprycea Body location orbital plate of ethmoid bone C0222699
LeFort Finding Le Fort’s fracture C0272464
LeFort I Finding Le Fort’s fracture, type I C0435328
LeFort II Finding Le Fort’s fracture, type II C0435329
LeFort III Finding Le Fort’s fracture, type III C1402218
LeFort Type I Finding Le Fort’s fracture, type I C0435328
LeFort Type II Finding Le Fort’s fracture, type II C0435329
LeFort Type III Finding Le Fort’s fracture, type III C1402218
premaxilla Body location premaxillary bone C0687094
supraorbital Body location supraorbital C0230002
preorbital Body location periorbital C0230064
depressed fracture Finding depressed fracture C0332759
maxillary sinus Body location maxillary sinus C0024957
emphysema Finding subcutaneous emphysema C0038536
ramus fracture Finding Closed fracture of ramus of mandible C0272469
angle fracture Finding Mandible angle fracture C0746383
body fracture Finding Closed fracture of body of mandible C0272470
lamina papyracea fracture Finding Fracture of orbital plate of ethmoid bone C1264245
maxillary sinus fracture Finding sinus maxillaris fracture C1409796
orbital floor fracture Finding Fracture of orbital floor C0149944
nasal bone fracture Finding Fractured nasal bones C0339848
tripod fracture Finding Closed fracture of zygomatic tripod C1264249
on the status modifier, historical or chronic findings were filtered out. For instance, findings
with status modifier values active, recent were kept and the ones with previous, past were
44
Table 4.3: List of terms added to MedLEE lexicon for the pediatric dataset
Term Category Target Form UMLS CUI
mass effect Finding cerebral mass effect C0186894
shift Finding midline shift of brain C0576481
midline shift Finding midline shift of brain C0576481
extraaxial hemorrhage Finding enlarged extraaxial space on brain imaging C3280298
extra-axial hemorrhage Finding enlarged extraaxial space on brain imaging C3280298
extraaxial fluid collection Finding enlarged extraaxial space on brain imaging C3280298
extra-axial fluid collection Finding enlarged extraaxial space on brain imaging C3280298
extra-axial collection Finding enlarged extraaxial space on brain imaging C3280298
extraaxial collection Finding enlarged extraaxial space on brain imaging C3280298
extraaxial hematoma Finding enlarged extraaxial space on brain imaging C3280298
extra-axial hematoma Finding enlarged extraaxial space on brain imaging C3280298
sulcus Body location sulcus of brain C0228177
parenchymal hemorrhage Finding parenchymal hemorrhage C0747264
ventricle Body location cerebral ventricles C0007799
sutures Body location joint structure of suture of skull C0010272
ischemic changes Finding cerebral ischemia C0917798
depressed Finding depressed fracture C0332759
depression Finding depressed fracture C0332759
depressed fracture Finding depressed fracture C0332759
nasopharyngeal passage Body location entire nasal passage C1283892
filtered out. Finally, the section names were also checked and sections corresponding to the
patient history were excluded.
4.2.3 Classification
Both of the feature sets from the NLP output were converted into BoW representation.
Orbital Dataset
Pediatric Dataset
NLP Feature Selection
¡ Feature set 1: All
¡ Only problems with body locations
¡ Excluded findings: procedure, device, technique, etc.
¡ Feature set 2: Filtered
¡ Subset of feature set 1
¡ Current and highly probable findings based on modifiers
¡ Certainty modifier
¡ Included: high certainty, moderate certainty,…
¡ Included with a preceding ‘no_’: low certainty, negative
¡ Excluded: rule out
¡ Status modifier
¡ Included: active, recent…
¡ Excluded: previous, past…
¡ Section modifier
¡ Included: Indications and findings
¡ Excluded: past history
56
NLP Feature Selection
Include the finding with a
preceding no_ Include the finding as it is
Find all problems with body
locations
Filtered
Not past
history &
high
certainty
Negated
Y N
Y
N
Y
Exclude
N
Sample MedLEE Filtered Features
value, and negative predictive value, with 95% confi-
dence intervals (CIs).
tivity, knowing that the CARN research node head CT
data set has a positive TBI CT report prevalence of
7.2%. Assuming sensitivity and specificity of the system
Findings:
Extracranial, subcutaneous hyperdense hematoma is seen along
the right parietal region with underlying minimally
depressed right parietal skull fracture.
MedLEE structured text:
<problem v = "hematoma" code = "UMLS:C0018944_hematoma">
<bodyloc v = "subcutaneous"><region v = "extracranial">
</region></bodyloc>
<certainty v = "high certainty"></certainty>
<problemdescr v = "hyperdensity"></problemdescr>
<region v = "region"><region v = "parietal"><region v =
"right"></region></region></region>
<code v = "UMLS:C0018944_hematoma"></code>
<code v = "UMLS:C0520532_subcutaneous hematoma"></code>
</problem>
<problem v = "fracture" code = "UMLS:C0016658_fracture">
<bodyloc v = "skull" code = "UMLS:C0037303_bone structure of
cranium"> <region v = "parietal"><region v = "right">
</region></region>
<code v = "UMLS:C0037303_bone structure of cranium"></code>
</bodyloc>
<certainty v = "high certainty"></certainty>
<change v = "depressed"><degree v = "low degree"></degree>
</change>
<code v = "UMLS:C0016658_fracture"></code>
<code v = "UMLS:C0037304_skull fractures"></code>
<code v = "UMLS:C0272451_fracture of parietal bone
(disorder)"></code>
Filtered Feature Selection:
hematoma subcutaneous
C0018944 hematoma
C0520532 subcutaneous hematoma
fracture skull
C0037303 bone structure of cranium
C0016658 fracture
C0037304 skull fractures
C0272451 fracture of parietal bone (disorder)
Figure 2. Sample MedLEE and filtered feature outputs. MedLEE = Medical Language Extraction and Encoding.
174 Yadav et al. • AUTOMATED OUTCOME CLASSIFICATION OF CT REPORTS FOR PEDIATRIC TBI
Decision Tree for Orbital Dataset using
NLP All Features
59
Decision Tree for Orbital Dataset using
NLP Filtered Features
60
Raw Text vs NLP Features
Baseline Decision Tree SVM
Text
NLP
Text
NLP
All Filtered All Filtered
Precision 76.57 93.64 96.28 96.53 94.28 96.13 96.96
Recall 87.50 94.25 96.33 96.59 94.28 96.14 97.00
F-Score 81.67 93.94 96.30 96.56 94.28 96.14 96.98
Baseline Decision Tree SVM
Text
NLP
Text
NLP
All Filtered All Filtered
Precision 86.12 94.13 95.21 96.63 95.88 96.74 97.13
Recall 92.80 94.25 95.46 96.80 95.90 96.90 97.25
F-Score 89.34 94.19 95.34 96.55 95.88 96.73 97.10
Orbital Dataset
Pediatric Dataset
61
Raw Text vs NLP-based Classification
62
Orbital Dataset Pediatric Dataset
Classification Errors
chest radiographs on a consecutive cohort of 1,277 neo-
nates to detect pneumonia.12
Although this meant each
unique patient was the source of multiple reports and
there was a low prevalence of positive cases (seven
cases, 0.5%), MedLEE and expert rules identified pneu-
monia with a sensitivity of 0.71 and specificity of 0.99.
The best performance was found in a study comparing
the precision and accuracy of NegEx (Negative Identifi-
cation for Clinical Conditions, https://code.google.com/
p/negex/) and SQL Server 2008 Free Text Search
(Microsoft Corp., Redmond, WA) to identify acute frac-
tures in 400 randomly selected extremity and spine
radiograph reports.23
Although the expert rules were
constructed to broadly identify any acute fracture and
there was a low prevalence of positive cases (13 cases,
3.25%), NegEx performance was perfect, while modified
SQLServer also did well (precision = 1.00, recall = 0.92;
F-score = 0.96).
This study improves on previous work in two ways.
First, we achieved similar performance using data
sourced from real-world clinical settings despite being
the first to our knowledge to use machine learning for
outcome identification. Second, we analyzed the largest
number of unique patients to date.
In selecting MedLEE for our hybrid approach, we did
consider other available NLP tools. Alternative medical
NLP tools, such as the open-source Clinical Text Analy-
Table 3
Performance of Automated Classification Compared to Physi-
cian Raters
Study and
Coding Method Sensitivity Specificity
This study,
hybrid automated
(95% CI)
0.933 (0.897–0.959) 0.969 (0.964–0.973)
Table 2
Classification Errors (Combination of Training and Test Sets)*
Cause Frequency (%)
Nonorbital fracture 32 (31.4)
Final reading disagrees
with preliminary reading
19 (18.6)
Vague certainty 9 (8.8)
Fracture acuity 9 (8.8)
Recent facial fracture surgery 6 (5.9)
MedLEE miscoding 5 (4.9)
Other†
22 (21.6)
*Total sample of 3,710. Errors total 102 instances (2.7%).
†Includes dictation error, filtering error, fracture implied but
not stated, and miscellaneous poor wording.
MedLEE = Medical Language Extraction and Encoding.
852 Yadav et al. • AUTOMATED CLASSIFICATION OF CT REPORTS
the CT report (“hidden”), and so the automated classifi-
cation system suffered. In our study, misclassification
analysis revealed a general problem with report ambi-
guity (Table 3). In addition, certain aspects of the injury
findings of the PECARN TBI criteria (such as degree
of displacement of a skull fracture) were often not
explicitly reported and therefore difficult to detect by
our automated classification approach, leading to an
a
m
st
p
re
d
a
sa
th
te
re
et
ce
th
se
a
W
a
ca
w
le
d
fi
em
fo
st
Figure 4. Decision tree. TBI = traumatic brain injury.
Table 3
Misclassification Categorization (From Both Test and Training
Sets)
Misclassification Reason Number (%)
False negatives (from 1,829 coded negative) 7 (0.4)
Decision tree misclassification 7 (100)
False positives (from 292 coded positive) 147 (50.3)
Abnormal but not PECARN TBI 53 (36.1)
Report ambiguity 12 (8.2)
Report dictation error 6 (4.1)
Text conversion error 3 (2.0)
MedLEE misread 27 (18.4)
Decision tree misclassification 46 (31.3)
MedLEE = Medical Language Extraction and Encoding;
PECARN = Pediatric Emergency Care Applied Research Net-
work; TBI = traumatic brain injury.
Orbital Dataset Pediatric Dataset
Topic Modeling-based
Classification of Clinical
Reports
Topic Modeling-based Classification
Patient
Reports
Labels
Preprocess CTCTopic
Modeling
BTC
DT
SVM
STC
ATC
Topic
Vectors
Topic Vectors
66
¡ Compact representation of documents via topics
¡ Each document is represented by a vector of its topic proportions
¡ Number of topics k
¡ Needs to be determined empirically
¡ Total number of attributes for orbital and pediatric datasets ~1300-1500
¡ Without preprocessing: 6K to 9K
¡ Number of topics: 5-150
¡ Dimension reduction achieved:
¡ Orbital dataset: 88.4% - 99.6%
¡ Pediatric dataset: 90.0% - 99.7%
k = 3,d1 =
0.2
0.5
0.3
!
"
#
#
#
$
%
&
&
&
,d2 =
0.3
0.1
0.6
!
"
#
#
#
$
%
&
&
&
DimensionReduction(%) =
attributes − topics∑∑
attributes∑
Baseline Topic Classifier (BTC)
Topic Orbital Pediatric
0 acute, report, axial, facial, findings contrast, head, report, evidence, intracranial
1 left, right, maxillary, fracture, sinus fracture, findings, tissue, soft, impression
¡ Topic model is built using K=|C| where |C| is the total number of
classes
¡ The topic with the higher probability is
assigned as the predicted class
d1 =
0.2
0.8
!
"
#
$
%
&⇒ Positive
d2 =
0.7
0.3
!
"
#
$
%
&⇒ Negative
67
BTC Results
68
¡ Oversampled and undersampled datasets
Orbital Pediatric
Dataset Algorithm Precision Recall F-score Precision Recall F-score
Original
ZR
76.6 87.5 81.7 86.1 92.8 89.3
BTC
88.6 73.4 80.3 83.3 59.4 69.3
Undersampled
ZR
49.6 49.7 49.7 25.3 50.3 33.7
BTC
84.4 84.2 84.3 72.6 64.6 68.4
Oversampled
ZR
26.2 51.1 34.6 26.5 51.5 35.0
BTC
83.4 82.5 82.9 73.3 66.7 69.8
Topic Vector Classifier
Train topic model
Build classifiers using
decision tree and SVM
Merge the documents in
topic vector representation
with their classes
Topic Vector Classifier
70
Decision Tree SVM
Rank K Test
Precisio
n Recall F-score K Test
Precisio
n Recall F-score
1 50 34 95.38 95.31 95.35 150 25 96.25 96.33 96.27
2 25 75 95.00 95.07 95.03 150 66 96.08 96.16 96.11
3 75 34 95.07 95.00 95.03 150 50 96.07 96.17 96.10
Decision Tree SVM
Rank K Test
Precisio
n Recall F-score K Test
Precisio
n Recall F-score
1 15 25 95.79 96.05 95.87 15 75 96.11 96.16 96.13
2 15 50 95.50 95.58 95.54 15 50 96.00 96.23 96.06
3 15 66 95.53 95.51 95.52 15 66 96.00 96.23 96.06
Pediatric dataset
Orbital dataset
Confidence-based Topic Classifier (CTC)
Train topic model
For each topic T and class C
calculate confidence
conf(T=>C)
Select the topic t with
biggest confidence for
positive class
Pick a threshold th on the
selected topic t
Infer the topic
distribution
Find the selected topic
t’s value v
Predict
positive
v > th
Predict
negative
Y N
TRAINING DATASET TESTING DATASET
Merge the documents in
topic vector representation
with their classes
sup(T) =
NT
N
conf (T ⇒ C) =
sup(T ∪C)
sup(T)
Similarity-based Topic Classifier (STC)
Train topic model
For each class, calculate
average of topic
distributions
Infer the topic
distribution
Predict
positive
Similar to
positive
class?
Predict
negative
TRAINING DATASET TESTING DATASET
Merge the documents in
topic vector representation
with their classes
Compute similarity
Y N
cos(θ) =
x⋅ y
x y
Aggregate Topic Classifier (ATC)
Train topic model
For each class, calculate
average of topic
distributions
Select the topic t with
maximum difference
between classes
Pick a threshold th on the
selected topic t
Infer the topic
distribution
Find the selected topic
t’s value: v
Predict
positive
v > th
Predict
negative
Y N
TRAINING DATASET TESTING DATASET
Merge the documents in
topic vector representation
with their classes
CTC, STC, ATC Results
74
CTC STC ATC
Rank K Test P R F K Test P R F K Test P R F
1 150 66 93.84 94.29 94.01 150 25 96.58 93.96 94.79 20 34 96.16 96.26 96.20
2 75 25 93.69 93.4 93.53 75 34 95.88 93.62 94.34 15 34 96.18 95.7 95.88
3 75 34 94.07 93.07 93.48 30 34 95.65 93.34 94.09 20 25 96.14 95.66 95.81
CTC STC ATC
Rank K Test P R F K Test P R F K Test P R F
1 75 50 93.65 93.56 93.61 75 34 95.72 95.39 95.51 15 66 96.07 96.03 96.05
2 50 50 93.86 92.86 93.20 75 66 95.69 95.22 95.37 20 66 96.07 95.95 95.97
3 30 34 93.86 92.77 93.14 15 66 95.93 95.09 95.33 15 75 95.47 95.43 95.45
Orbital dataset
Pediatric dataset
Topic Modeling-based Classifiers’ Results
75
Report: Untitled 282 Page 1 of 2
Fscore
0.78
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
25
TestRatio(%)
0 25 50 75 100 125 150
K
SVM
ATC
STC
CTC
DT
Graph Builder
Report: Untitled 198 Page 1 of 2
Fscore
0.78
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
34
TestRatio(%)
0 25 50 75 100 125 150
K
SVM
ATC
STC
CTC
DT
Graph Builder
Orbital Dataset
Pediatric Dataset
Overall Classification Performances
Orbital Pediatric
Algorithm Precision Recall F-score Precision Recall F-score
Baseline
ZR 76.6 87.5 81.7 86.1 92.8 89.3
BTC 88.6 73.4 80.3 83.3 59.4 69.3
Raw Text
DT 93.6 94.3 93.9 94.1 94.3 94.2
SVM 94.3 94.3 94.3 95.9 95.9 95.9
NLP-based
DT-All 96.3 96.3 96.3 95.2 95.5 95.3
DT-Filtered 96.5 96.6 96.6 95.5 95.8 95.7
SVM-All 96.1 96.1 96.1 96.7 96.9 96.8
SVM-
Filtered
97.0 97.0 97.0 97.1 97.3 97.2
TM-based
DT 95.4 95.3 95.4 95.8 96.1 95.9
SVM 96.3 96.3 96.3 96.1 96.2 96.1
CTC 93.7 93.6 93.6 93.8 94.3 94.0
STC 95.7 95.4 95.5 96.6 94.0 94.8
ATC 96.1 96.0 96.1 96.2 96.3 96.2
Best Classification Performances
77
Algorithm Precision Recall F-score
Baseline ZR 76.6 87.5 81.7
Raw Text SVM 94.3 94.3 94.3
NLP-based SVM-Filtered 97.0 97.0 97.0
TM-based SVM 96.3 96.3 96.3
Algorithm Precision Recall F-score
Baseline ZR 86.1 92.8 89.3
Raw Text SVM 95.9 95.9 95.9
NLP-based SVM-Filtered 97.1 97.3 97.1
TM-based ATC 96.2 96.3 96.2
Orbital Dataset
Pediatric Dataset
Discussion
78
¡ NLP-based classification approaches
¡ Best classification performance among all classifiers
¡ Needs more customizations
¡ Topic modeling-based classification approaches
¡ Provides dimension reduction
¡ Performs better than raw text classification
¡ Competitive with NLP-based classification
¡ More general framework than NLP-based classifiers
Summary and Contributions
Summary
80
¡ Large amounts of electronic clinical data have become available
with the increased use of Electronic Health Records (EHR)
¡ Automated processing of these records could benefit both the
patient and the provider
¡ Speeding up the decision process and reducing costs
¡ Classifiers that can automatically predict the outcome from raw
text clinical reports
¡ Raw text classification of clinical reports
¡ NLP-based classification of clinical reports
¡ Topic modeling-based classification of clinical reports
Significance and Contributions
81
¡ Addressing issues specific to automated processing of clinical text
¡ Unstructured data
¡ Medical terms
¡ Context sensitivity
¡ Real world dataset
¡ Natural Language Processing-based classification of clinical reports
¡ Selecting and adjusting a biomedical NLP tool
¡ Best ways to extract NLP features for better classification
¡ Topic Modeling-based classification of clinical reports
¡ A more general framework than NLP-based solutions
¡ Utilizing an unsupervised technique, i.e. topic modeling, in a supervised
fashion, i.e. classification
¡ Topic modeling-based classifiers: BTC, CTC, STC, and ATC
Research Impacts
82
¡ Improve quality and efficiency of healthcare
¡ The classifiers can be used to automatically predict the conditions
in a clinical report
¡ Can replace the manual review of clinical reports, which can be time
consuming and error-prone
¡ Clinicians can have more confidence in utilizing such systems in
real life settings
¡ Increased accuracy and interpretability
83
References
1. K. Yadav, C. E, H. JS, A. Z, N. V, and G. P. et al., “Derivation of a clinical risk score for traumatic orbital fracture,” 2012,
in Press.
2. V. Garla, V. L. R. III, Z. Dorey-Stein, F. Kidwai, M. Scotch, J. Womack, A. Justice, and C. Brandt, “The Yale cTAKES
extensions for document classification: architecture and application.” JAMIA, 2011
3. C. Friedman, “A broad-coverage natural language processing system,” Proc AMIA Symp, pp. 270–274, 2000.
4. T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” in
Proceedings of the 10th European Conference on Machine Learning, ser. ECML ’98., pp. 137–142.
5. F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, 2002
6. Aronson, A. R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA
Symp (2001), 17–21.
7. Aronson, A. R., and Lang, F.-M. An overview of metamap: historical perspective and recent advances. Journal of the
American Medical Informatics Association 17, (2010).
8. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,”
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, vol. 41, no. 6, pp. 391–407, 1990.
9. T. Hofmann, “Probabilistic latent semantic analysis,” in UAI, 1999.
10. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar.
2003.
84
References – Continued
11. Z. Zhang, X.-H. Phan, and S. Horiguchi, “An Efficient Feature Selection Using Hidden Topic in Text Categorization.”
AINAW ’08.
12. W. Sriurai, “Improving Text Categorization by Using a Topic Model,” Advanced Computing: An International Journal
(ACIJ), 2011.
13. E. Chen, Y. Lin, H. Xiong, Q. Luo, and H. Ma, “Exploiting probabilistic topic models to improve text categorization
under class imbalance,” Inf. Process. Manage., 2011.
14. S. Banerjee, “Improving text classification accuracy using topic modeling over an additional corpus,”, SIGIR ’08.
15. Arnold, C. W., El-Saden, S. M., Bui, A. A. T., and Taira, R. Clinical Case-based Retrieval Using Latent Topic Analysis.
AMIA Annu Symp Proc 2010 (2010), 26–30.
16. H. M. Wallach, “Topic modeling: beyond bag-of-words.”, ICML ’06.
17. Griffiths, T. L., Steyvers, M., Blei, D. M., and Tenenbaum, J. B. Integrating topics and syntax. In In Advances in Neural
Information Processing Systems 17 (2005), MIT Press, pp. 537–544.
18. Boyd-Graber, J. L., and Blei, D. M. Syntactic topic models. CoRR abs/1002.4665 (2010).
19. N. Kuppermann, J. F. Holmes, P. S. Dayan, J. D. J. Hoyle, S. M. Atabaki, R. Holubkov, F. M. Nadel, D. Monroe, R. M.
Stanley, D. A. Borgialli, M. K. Badawy, J. E. Schunk, K. S. Quayle, P. Mahajan, R. Lichenstein, K. A. Lillis, M. G. Tunik, E. S.
Jacobs, J. M. Callahan, M. H. Gorelick, T. F. Glass, L. K. Lee, M. C. Bachman, A. Cooper, E. C. Powell, M. J. Gerardi, K.
A. Melville, J. P. Muizelaar, D. H. Wisner, S. J. Zuspan, J. M. Dean, and S. L. Wootton-Gorges, “Identification of children
at very low risk of clinically-important brain injuries after head trauma: a prospective cohort study.” Lancet, vol. 374,
no. 9696, pp. 1160–1170.
20. T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers, “Statistical topic models for multi-label document classification,”
Mach. Learn., vol. 88, no. 1-2, pp. 157–208
85
References – Continued
21. Z. Liu, M. Li, Y. Liu, and M. Ponraj, “Performance evaluation of Latent Dirichlet Allocation in text mining,” in Fuzzy
Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on, vol. 4, pp. 2695 –2698.
22. W. W. Chapman, D. Hillert, S. Velupillai, M. Kvist, M. Skeppstedt, B. E. Chapman, M. Conway, M. Tharp, D. L. Mowery,
and L. Deleger, “Extending the negex lexicon for multiple languages.” Stud Health Technol Inform, vol. 192, pp. 677–
681, 2013.
23. E. A. Mendonca, J. Haas, L. Shagina, E. Larson, and C. Friedman, “Extracting information on pneumonia in infants
using natural language processing of radiology reports.” J Biomed Inform, vol. 38, no. 4, pp. 314–321.
24. H. Harkema, J. N. Dowling, T. Thornblade, and W. W. Chapman, “Context: an algorithm for determining negation,
experiencer, and temporal status from clinical reports.” J Biomed Inform, vol. 42, no. 5, pp. 839–851.
25. H. Misra, F. Yvon, J. M. Jose, and O. Cappe, “Text segmentation via topic modeling: an analytical study,” in
Proceedings of the 18th ACM conference on Information and knowledge management, ser. CIKM ’09, pp. 1553–
1556.
26. J.-H. Yeh and C.-H. Chen, “Protein remote homology detection based on latent topic vector model,” in Networking
and Information Technology (ICNIT), 2010 International Conference on, pp. 456 –460.
27. J. A. Womack, M. Scotch, C. Gibert, W. Chapman, M. Yin, A. C. Justice, and C. Brandt, “A comparison of two
approaches to text processing: facilitating chart reviews of radiology reports in electronic medical records.”
Perspect Health Inf Manag,vol. 7, p. 1a, 2010.
28. O. Bodenreider, “Using UMLS semantics for classification purposes.” Proc AMIA Symp, pp. 86–90, 2000.
29. F.-F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Proceedings of the
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2. CVPR
’05. 2005, pp. 524–531.
30. G. Hripcsak, J. H. M. Austin, P. O. Alderson, and C. Friedman, “Use of natural language processing to translate clinical
information from a database of 889,921 chest radiographic reports.” Radiology, vol. 224, no. 1, pp. 157–163.
References
86
31. Y. Huang, H. J. Lowe, D. Klein, and R. J. Cucina, “Improved identification of noun phrases in clinical radiology reports
using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.” J Am Med
Inform Assoc, vol. 12, no. 3, pp. 275–285, May-Jun 2005.
32. V. N. Vapnik, Statistical learning theory, 1st ed. Wiley, Sep. 1998.
33. J. H. Friedman, “Another approach to polychotomous classification,” Department of Statistics, Stanford University, Tech.
Rep., 1996.
34. T. Hastie and R. Tibshirani, “Classification by Pairwise Coupling,” 1998.
35. Stanford NLP: http://nlp.stanford.edu/software/index.shtml
36. Open NLP: http://opennlp.apache.org

More Related Content

What's hot

O que você precisa saber para modelar bancos de dados NoSQL - Dani Monteiro
O que você precisa saber para modelar bancos de dados NoSQL - Dani MonteiroO que você precisa saber para modelar bancos de dados NoSQL - Dani Monteiro
O que você precisa saber para modelar bancos de dados NoSQL - Dani MonteiroiMasters
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena PekezInstitute of Contemporary Sciences
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfMichael Kogan
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question AnsweringMarina Santini
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)Alibaba Cloud Korea
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 wordsananth
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4jNeo4j
 

What's hot (20)

Document similarity
Document similarityDocument similarity
Document similarity
 
O que você precisa saber para modelar bancos de dados NoSQL - Dani Monteiro
O que você precisa saber para modelar bancos de dados NoSQL - Dani MonteiroO que você precisa saber para modelar bancos de dados NoSQL - Dani Monteiro
O que você precisa saber para modelar bancos de dados NoSQL - Dani Monteiro
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 

Similar to Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches

The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 
Literature Reviews For The Health Sciences March 2010
Literature Reviews For The Health Sciences March 2010Literature Reviews For The Health Sciences March 2010
Literature Reviews For The Health Sciences March 2010Robin Featherstone
 
How to Conduct a Literature Review
How to Conduct a Literature ReviewHow to Conduct a Literature Review
How to Conduct a Literature ReviewRobin Featherstone
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Koray Atalag
 
Strategies For Answering Research Questions
Strategies For Answering Research QuestionsStrategies For Answering Research Questions
Strategies For Answering Research QuestionsRobin Featherstone
 
Lit Reviews for the Health Sciences
Lit Reviews for the Health SciencesLit Reviews for the Health Sciences
Lit Reviews for the Health SciencesRobin Featherstone
 
How to Conduct a Systematic Search
How to Conduct a Systematic SearchHow to Conduct a Systematic Search
How to Conduct a Systematic SearchRobin Featherstone
 
Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)markmac
 
SNOMED Bound to (Information) Model | Putting terminology to work
SNOMED Bound to (Information) Model | Putting terminology to workSNOMED Bound to (Information) Model | Putting terminology to work
SNOMED Bound to (Information) Model | Putting terminology to workKoray Atalag
 
Fire and Ice - SNOMED is bound to model - Koray Atalag
Fire and Ice - SNOMED is bound to model - Koray AtalagFire and Ice - SNOMED is bound to model - Koray Atalag
Fire and Ice - SNOMED is bound to model - Koray AtalagHL7 New Zealand
 
How to Conduct a Literature Review (ISRAPM 2014)
How to Conduct a Literature Review  (ISRAPM 2014)How to Conduct a Literature Review  (ISRAPM 2014)
How to Conduct a Literature Review (ISRAPM 2014)Saeid Safari
 
2011 01 27 - Clinical LOINC Tutorial - Documents
2011 01 27 - Clinical LOINC Tutorial - Documents2011 01 27 - Clinical LOINC Tutorial - Documents
2011 01 27 - Clinical LOINC Tutorial - Documentsdvreeman
 
How to Conduct a Literature Search
How to Conduct a Literature SearchHow to Conduct a Literature Search
How to Conduct a Literature SearchRobin Featherstone
 
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instrumentsdvreeman
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji
 
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docxDeductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docxsimonithomas47935
 
2009 08 13 - Clinical Loinc Tutorial Documents
2009 08 13 - Clinical Loinc Tutorial   Documents2009 08 13 - Clinical Loinc Tutorial   Documents
2009 08 13 - Clinical Loinc Tutorial Documentsdvreeman
 

Similar to Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches (20)

The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
Literature Reviews For The Health Sciences March 2010
Literature Reviews For The Health Sciences March 2010Literature Reviews For The Health Sciences March 2010
Literature Reviews For The Health Sciences March 2010
 
How to Conduct a Literature Review
How to Conduct a Literature ReviewHow to Conduct a Literature Review
How to Conduct a Literature Review
 
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
 
Strategies For Answering Research Questions
Strategies For Answering Research QuestionsStrategies For Answering Research Questions
Strategies For Answering Research Questions
 
Lit Reviews for the Health Sciences
Lit Reviews for the Health SciencesLit Reviews for the Health Sciences
Lit Reviews for the Health Sciences
 
How to Conduct a Systematic Search
How to Conduct a Systematic SearchHow to Conduct a Systematic Search
How to Conduct a Systematic Search
 
Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)Exhaustive Literature Searching (Systematic Reviews)
Exhaustive Literature Searching (Systematic Reviews)
 
SNOMED Bound to (Information) Model | Putting terminology to work
SNOMED Bound to (Information) Model | Putting terminology to workSNOMED Bound to (Information) Model | Putting terminology to work
SNOMED Bound to (Information) Model | Putting terminology to work
 
Fire and Ice - SNOMED is bound to model - Koray Atalag
Fire and Ice - SNOMED is bound to model - Koray AtalagFire and Ice - SNOMED is bound to model - Koray Atalag
Fire and Ice - SNOMED is bound to model - Koray Atalag
 
How to Conduct a Literature Review (ISRAPM 2014)
How to Conduct a Literature Review  (ISRAPM 2014)How to Conduct a Literature Review  (ISRAPM 2014)
How to Conduct a Literature Review (ISRAPM 2014)
 
2011 01 27 - Clinical LOINC Tutorial - Documents
2011 01 27 - Clinical LOINC Tutorial - Documents2011 01 27 - Clinical LOINC Tutorial - Documents
2011 01 27 - Clinical LOINC Tutorial - Documents
 
How to Conduct a Literature Search
How to Conduct a Literature SearchHow to Conduct a Literature Search
How to Conduct a Literature Search
 
Clinical Models - What Are They Good For?
Clinical Models - What Are They Good For?Clinical Models - What Are They Good For?
Clinical Models - What Are They Good For?
 
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
2009 08 13 - Clinical LOINC Tutorial - Patient Assessment Instruments
 
Systematic Reviews: Context & Methodology for Librarians
Systematic Reviews: Context & Methodology for LibrariansSystematic Reviews: Context & Methodology for Librarians
Systematic Reviews: Context & Methodology for Librarians
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
 
Secondary Data Analysis
Secondary Data AnalysisSecondary Data Analysis
Secondary Data Analysis
 
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docxDeductive vs Inductive ReasoningDeductive reasoning starts out w.docx
Deductive vs Inductive ReasoningDeductive reasoning starts out w.docx
 
2009 08 13 - Clinical Loinc Tutorial Documents
2009 08 13 - Clinical Loinc Tutorial   Documents2009 08 13 - Clinical Loinc Tutorial   Documents
2009 08 13 - Clinical Loinc Tutorial Documents
 

More from Efsun Kayi

Answer Span Correction in Machine Reading Comprehension
Answer Span Correction in Machine Reading ComprehensionAnswer Span Correction in Machine Reading Comprehension
Answer Span Correction in Machine Reading ComprehensionEfsun Kayi
 
Transfer Learning for Low Resource Languages and Domains
Transfer Learning for Low Resource Languages and DomainsTransfer Learning for Low Resource Languages and Domains
Transfer Learning for Low Resource Languages and DomainsEfsun Kayi
 
Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for L...
Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for L...Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for L...
Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for L...Efsun Kayi
 
Object Detection using HoG Features for Visual Situation Recognition
Object Detection using HoG Features for Visual Situation RecognitionObject Detection using HoG Features for Visual Situation Recognition
Object Detection using HoG Features for Visual Situation RecognitionEfsun Kayi
 
Predictive Linguistic Features of Schizophrenia
Predictive Linguistic Features of SchizophreniaPredictive Linguistic Features of Schizophrenia
Predictive Linguistic Features of SchizophreniaEfsun Kayi
 

More from Efsun Kayi (6)

Answer Span Correction in Machine Reading Comprehension
Answer Span Correction in Machine Reading ComprehensionAnswer Span Correction in Machine Reading Comprehension
Answer Span Correction in Machine Reading Comprehension
 
MultiSeg
MultiSegMultiSeg
MultiSeg
 
Transfer Learning for Low Resource Languages and Domains
Transfer Learning for Low Resource Languages and DomainsTransfer Learning for Low Resource Languages and Domains
Transfer Learning for Low Resource Languages and Domains
 
Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for L...
Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for L...Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for L...
Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for L...
 
Object Detection using HoG Features for Visual Situation Recognition
Object Detection using HoG Features for Visual Situation RecognitionObject Detection using HoG Features for Visual Situation Recognition
Object Detection using HoG Features for Visual Situation Recognition
 
Predictive Linguistic Features of Schizophrenia
Predictive Linguistic Features of SchizophreniaPredictive Linguistic Features of Schizophrenia
Predictive Linguistic Features of Schizophrenia
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Recently uploaded (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 

Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches

  • 1. Effective Classification of Clinical Reports: Natural Language Processing-Based and Topic Modeling-Based Approaches Department of Computer Science School of Engineering and Applied Science The George Washington University Efsun Sarioglu Kayi
  • 2. Outline ¡ Introduction ¡ Research Objective ¡ Scope ¡ Proposed System ¡ Background and Related Work ¡ Natural Language Processing (NLP) ¡ Topic Modeling ¡ Text Classification ¡ Dissertation Research and Results ¡ Clinical Dataset ¡ Raw Text Classification of Clinical Reports ¡ NLP-based Classifiers ¡ Topic Modeling-based Classifiers ¡ Summary and Contributions 2
  • 4. Electronic Health Records (EHR) ¡ Large amounts of clinical data have become available in Electronic Health Records (EHR) ¡ Patient reports in free text ¡ Not directly usable in automated systems ¡ Clinical decision support systems ¡ Automated advice to providers by examining the EHRs ¡ Help medical professionals make clinical decisions faster and with more confidence ¡ Improve the quality and efficiency of healthcare ¡ Prevent medical errors ¡ Reduce healthcare costs ¡ Increase administrative efficiencies ¡ Decrease paperwork 4
  • 5. Clinical Reports ¡ Free text ¡ Not directly usable for automated processing ¡ Ambiguity in natural language ¡ Medical terms ¡ Not common in daily language ¡ Many synonyms ¡ Need for standardization ¡ Context sensitivity ¡ Patient history ¡ Ruled-out diagnosis ¡ Case study: Automatic classification of Emergency Medicine computed tomography (CT) imaging reports into binary categories 5
  • 6. Research Objective ¡ Objective: Given a list of patient reports, automatically classify them into user-defined categories efficiently and effectively ¡ Natural Language Processing (NLP) tools transform the text into structured data that can be used more easily by automated processes ¡ Context information in text is critical to classification performance [Garla11] ¡ Requires manual customization for each domain ¡ Discharge summaries, radiology, and mammography reports ¡ A more general and compact representation can be achieved by topic model of patient reports ¡ Biomedical concepts are typically nouns/noun phrases [Huang05] ¡ Nouns, compared to other parts of speech, form topics [Griffiths04] ¡ Can be adapted to new applications/domains more easily 6
  • 7. Proposed System ¡ Automated classification of clinical reports into categories ¡ Binary categories, ¡ presence/absence of fracture ¡ Multi-categories ¡ types of fractures, e.g., facial, orbital, etc. ¡ Clinical report representation ¡ Natural Language Processing (NLP) ¡ Mapping of medical terms to standard medical dictionaries ¡ Context modifiers such as probability, negation, and time ¡ Topic Modeling ¡ A more general and compact representation of reports based on their topic distributions 7
  • 8. Proposed System Overview Patient Reports LabelsPreprocess NLP SVM Topic Modeling Decision Tree Next BoW Topic Vectors NLP Features Topic Modeling- based Classifiers
  • 9. Scope 9 ¡ Classification of text/discrete data ¡ Continuous data can be discretized using supervised or unsupervised techniques before doing the classification ¡ Binary classification ¡ Multi-class datasets can be used with binary classifiers using techniques such as All-vs-One (AVO) [Vapnik98] or All-vs-All (AVA) [Friedman96, Hastie98] classification
  • 11. Natural Language Processing (NLP) ¡ Techniques for understanding the syntactic and semantic relations that exist in natural language ¡ Part-Of-Speech (POS) tagging ¡ Assigns each word its syntactic class ¡ Noun, verb, adjective, etc. ¡ Dependency parsing ¡ Finds the syntactic representation of a given sentence ¡ Dependencies between its words with labels showing grammatical relations ¡ Subject of a clause, object of a verb phrase 11
  • 12. Biomedical NLP ¡ General NLP tools, such as Stanford NLP, trained on general English data, i.e. news ¡ Not suitable to be used in biomedical domain ¡ Biomedical data ¡ Medical terms ¡ Synonyms: ocular --> eye, eyes, optic, opthalmic, opthalmia, oculus, oculi ¡ Context sensitivity ¡ Temporal, negation, and certainty status of clinical terms ¡ Biomedical NLP tools ¡ Use biomedical vocabularies and translate clinical text into coded descriptions suitable for automated processing ¡ Medical Language Extraction and Encoding (MedLEE) [Friedman00] ¡ Standard representation via Unified Medical Language System (UMLS) ¡ Modifiers for each clinical term to evaluate context 12
  • 13. Unified Medical Language System (UMLS) ¡ Repository of vocabularies in biomedical sciences developed and maintained by the National Library of Medicine (NLM) ¡ 6.4 million unique terms ¡ 1.3 million unique concepts ¡ More than 119 families of biomedical vocabularies ¡ Three knowledge sources ¡ Metatheasurus ¡ Multi-lingual vocabulary organized by concepts ¡ Links alternative names and views of the same concept from different vocabularies ¡ Each concept is given a unique ID called Concept Unique Identifier (CUI) ¡ Semantic Network ¡ Semantic types and relations to categorize and disambiguate concepts ¡ Semantic types: medical device, clinical drug, and laboratory ¡ Semantic relations: treats, diagnoses, and contains ¡ Specialist Lexicon ¡ General English and biomedical vocabulary 13
  • 14. UMLS Metathesaurus 14 ¡ “Orbital fractures” with CUI: C0029184
  • 15. MedLEE System Overview Lex CodingMappingGrammar Site-Specific Tailored Preprocessor Parser EncoderComposer Next Patient Report Structured output Previous Raw Text Impression: Right lamina papyracea fracture. No evidence of entrapment. MedLEE Output <sectname v = "report impression item"></sectname> <sid idref = "s7"></sid> <code v = "UMLS:C0016658_Fracture"></code> <problem v = "entrapment" code = "UMLS:C1285497_Entrapment (morphologic abnormality)"> <certainty v = "no"></certainty> </problem> Sec Cru Ab UMLS: C0029184 Le Fort's fracture: UMLS: C0272464 Orbital fracture
  • 16. MedLEE Findings and Modifiers 16 ¡ Maps its findings to UMLS CUIs ¡ Problems (e.g. fracture, pain, trauma) ¡ Procedures (e.g. image, computerized axial tomography) ¡ Device (e.g. tube) ¡ Lab tests (e.g. blood type) ¡ Assigns modifiers to each finding for more context ¡ Body locations (e.g. head, brain, face) ¡ Section names (e.g. impression, findings) ¡ Certainty (e.g. high/moderate/low certainty) and ¡ Status (e.g. acute, new, recent, chronic, past, history)
  • 18. Topic Modeling ¡ Automatically finds out themes/topics of a document collection ¡ Low dimensional representation that captures the semantics ¡ A way to browse an unstructured collection in a structured way ¡ Topic Modeling techniques ¡ Matrix decomposition ¡ Latent Semantic Analysis (LSA) [Deerwester90] ¡ Probabilistic topic modeling ¡ Probabilistic Latent Semantic Analysis (PLSA) [Hoffman99] ¡ Latent Dirichlet Allocation (LDA) [Blei03] 18
  • 19. Latent Dirichlet Allocation (LDA) ¡ Topic is defined as a probability distribution over entire vocabulary ¡ Documents can exhibit multiple topics with different proportions ¡ Given the observed words in a set of documents and the total number of topics ¡ Finds out the topic model that is most likely to have generated the data ¡ For each topic, its probability distribution over words ¡ For each document, its probability distribution over topics ¡ The topic responsible for generating each word in each document 19
  • 20. Sample Topic Model Documents Topic Modeling NextPrevious axial 0.07 structure 0.05 Images 0.02 ... maxilary 0.04 coronal 0.02 intraorbital 0.01 … intact 0.05 evidence 0.02 entrapment 0.01 … Topic 1 Topic 2 Topic 3 Axial1 images1 and coronal2 images1… The intraorbital1 structures1 are intact3… No evidence3 of entrapment3… Document 1 Document 2 Document 3 . . . K=3 Doc 3 Doc 2 Doc 1
  • 21. Topic Model Training Calculate a score s Sample a new topic tnewfor word w in document d based on s Randomly assign a topic for every word in all of the documents For each topic For each word in each document Repeat 1000 times Nt|d +α Nd +Tα × Nw|t + β Nt +Vβ
  • 22. 1 2 3 4 5 6 7 8 9 10 River Stream Bank Money Loan Topic A Topic B Nd 1 12 0 12 2 3 6 9 3 5 10 15 4 6 9 15 5 8 3 11 6 4 9 13 7 5 3 8 8 6 4 10 9 4 6 10 1 0 4 5 9 Document-topic statistics Documents River Stream Ban k Money Loan Nt Topic A 4 5 20 9 18 56 Topic B 5 7 20 12 11 55 Topic-word statistics Initialization
  • 23. 1 2 3 4 5 6 7 8 9 10 Document #5 23 DOC 5 River B Stream A Bank A Bank A Bank A Money B Money A Money B Loan A Loan A Loan A River Stream Bank Money Loan
  • 24. Topic Modeling Training 24 DOC 5 River B Stream A Bank A Bank A Bank A Money B Money A Money B Loan A Loan A Loan A for each possible topic [A, B] compute score (A | NA|5, NRiver|A) score (B | NB|5, NRiver|B)
  • 25. Topic Modeling Training 25 DOC 5 River B Stream A Bank A Bank A Bank A Money B Money A Money B Loan A Loan A Loan A for each possible topic [A, B] sum up the scores score (A | NA|5, NRiver|A) score (B | NB|5, NRiver|B) Z =
  • 26. Topic Modeling Training 26 DOC 5 River B Stream A Bank A Bank A Bank A Money B Money A Money B Loan A Loan A Loan A for each possible topic [A, B] sample a new topic u = rand() * Z sample return t = A
  • 27. Topic Modeling Training 27 DOC 5 Initial Topics River A Stream B Bank A Bank A Bank A Money B Money A Money B Loan A Loan A Loan A DOC 5 Final Topics River A Stream A Bank A Bank B Bank B Money B Money B Money B Loan B Loan B Loan B
  • 28. Document-Topic Statistics 28 Topic A Topic B 1 12 0 2 3 6 3 5 10 4 6 9 5 8 3 6 4 9 7 5 3 8 6 4 9 4 6 10 4 5 Topic A Topic B 1 0 12 2 2 7 3 0 15 4 0 15 5 3 8 6 4 9 7 2 6 8 8 2 9 10 0 10 8 1 Documents Topics Topics 1000 iterations with sampling
  • 29. Topic-Word statistics 29 River Stream Bank Money Loan Topic A 4 5 20 9 18 Topic B 5 7 20 12 11 Topics Word River Stream Bank Money Loan Topic A 9 12 16 0 0 Topic B 0 0 24 21 29 Topics 1000 iterations with sampling
  • 30. 30 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 River Stream Bank Money Loan River Stream Bank Money Loan Topic A Topic B 1 12 0 2 3 6 3 5 10 4 6 9 5 8 3 6 4 9 7 5 3 8 6 4 9 4 6 10 4 5 1 0 12 2 2 7 3 0 15 4 0 15 5 3 8 6 4 9 7 2 6 8 8 2 9 10 0 10 8 1 Initial Final Document-Topic Statistics
  • 32. Bag-of-Words (BOW) Representation 32 ¡ Document-term matrix where columns are terms and rows are documents ¡ NxM matrix for N documents and M terms ¡ Pros: Simple, conventional ¡ Cons: Word ordering is lost ¡ Weighting ¡ Binary ¡ Term frequency: ¡ Inverse document frequency: ¡ Combined measure: idft = log N dft tf −idft,d = tft,d ×idft tft
  • 33. Text Classification ¡ Given a set of n training vectors in BoW representation in d-dimension with binary labels, a classifier is a binary valued function: ¡ Trivial rejector/zero rule (ZR) classifier ¡ Baseline ¡ Decision Tree ¡ Popular classification algorithm due to its human-interpretable output of binary tree ¡ Support Vector Machines (SVM) ¡ Shown to perform well in text classification [Joachims98, Sebastiani02] f : Rd ! →! 0,1{ } {x1, x2,…, xn} {y1, y2,…, yn} 33
  • 34. Decision Tree ¡ Binary tree where nodes represent criteria based on terms and leaves represent the class label ¡ Build tree top-down: start with all attributes in a one node ¡ Repeat until all leaves are pure ¡ Look at all leaves and all possible splits ¡ Choose the split that most decreases the uncertainty based on entropy, gini index, etc Fracture > 0 Orbital > 0 Facial > 0 Positive Negative Negative Negative Y Y Y N N N 34
  • 35. 35 Uncertainty Measures o + + + + + + + + + + o o o o o o o o o 0 1 x y 1 0.25 0.5 0.75 [0,8][10,2] [10,10] x < 0.5 = 12/20 x 10/36 + 8/20 x 0 = 1/6 = 0.16 pL = 12/20 pR = 8/20 uL = 2 x 10/12 x 2/12 = 10/36 uR = 0 Uncertainty before split = 2 x 10/20 x 10/20 = ½ = 0.5 = pLuL + pRuR = 2p(1− p)Gini index Uncertainty after split u = 0.33 u = 0.38u = 0.16
  • 36. Support Vector Machines (SVM) ¡ Large-margin classifier ¡ Smallest distance from any example to the decision boundary ¡ Objective: Minimize ¡ Inversely proportional to the margin ¡ Subject to for all ¡ Constraints guarantee that it classifies each sample correctly ! ! ! ! ! " " " " " " #$%&%'%(% )*+,-,./%0.1/)234% #$%&%5%(% #$%&%6%(% #% ).+17*/8-%920*9*)%!%:% ).+17*/8-%920*9*)%;:% 1 2 θ 2 ytθT xt ≥1 t =1,…,n 1 θ Margin = θ ∈ Rd Parameter vector: 36
  • 38. Evaluation: Classification ¡ Possible cases for binary classification ¡ Precision (P), Recall (R), and F-score measures are used for evaluating classification performance Positive Negative Positive True Positive (TP) False Negative (FN) Negative False Positive (FP) True Negative (TN) Predicted class Actual class P = TP TP+FP R = TP TP + FN Fscore = 2× P× R P+ R 38
  • 39. Training and Testing ¡ Train the classifier using only a subset of the entire dataset ¡ Optionally, use a validation set to learn the best values of the parameters ¡ Evaluate its performance on unseen test dataset ¡ Stratified: similar class distribution with the original dataset ¡ Run the algorithm several times and take the average of the performances TR = {d1,d2,…,dTR , TR > 0} VD = {dTR +1 ,…,dTR +VD+1 , VD ≥ 0} TS = {dTR +VD+1 ,…,dD , TS > 0} 39
  • 41. NLP Related Work ¡ General NLP ¡ Daily English ¡ Stanford NLP, OpenNLP ¡ Biomedical NLP ¡ Use of medical dictionaries for medical terms and modifiers for context ¡ MetaMap for UMLS mapping [Aronson01, Aronson10] ¡ Only negation modifier ¡ ConText system [Harkema09] ¡ Temporality, negation and experiencer of a given clinical term in a sentence ¡ Not a complete analysis since it expects the condition to be provided ¡ Clinical Text Analysis and Knowledge Extraction System (cTakes) [Garla11] ¡ History, probability, and negation modifiers ¡ Medical Language Extraction and Encoding (MedLEE) [Friedman00] ¡ More types of modifiers (51) and wider range of modifier values 41
  • 42. Biomedical NLP 42 ¡ UMLS for classification ¡ Classify conditions into disease categories based on inter-concept relations from UMLS [Bodenreider00] ¡ Expert rules ¡ MedLEE with expert rules for classification instead of machine learning approaches [Mendonca05, Hripcsak02] ¡ NegEx (Negative Identification for Clinical Conditions) [Chapman13] and SQL Server 2008 Free Text Search to identify fractures in radiograph reports using expert rules [Womack10] ¡ Expert rules could be costly to construct ¡ Less generalizable to new clinical areas in contrast to machine learning approaches ¡ Our approach: utilize MedLEE with machine learning approaches
  • 43. Topic Modeling Related Work ¡ Applications of topic modeling ¡ Computer vision [Li05], biology [Yeh10], information retrieval [Wei06] and text segmentation [Misra09] ¡ Structural extensions to topic modeling ¡ N-gram for phrases [Wallach06] ¡ Combined with POS tagging [Griffiths04] ¡ Combined with parse trees [Boyd-Graber08] ¡ Enhances the capability of topic modeling by combining NLP techniques ¡ Computationally more expensive than standard topic modeling ¡ Clinical text ¡ Similarity measure based on topic distributions of patients for information retrieval [Arnold10] 43
  • 44. Topic Modeling for Text Classification 44 ¡ Topic modeling for text classification ¡ Comparison of vector space model, LSA, and LDA for text classification with SVM [Liu11] ¡ Performance of SVM with LDA supersedes vector space model and LSA ¡ Keyword selection based on entropy of words in topics [Zhang08] ¡ Text classification with topic vectors with fixed number of topics ¡ In addition to BoW [Banerjee08] and instead of BoW [Sriurai11] ¡ Topic modeling based resampling instead of random sampling for imbalanced classes [Chen11] ¡ Multi-label text classification using topic model [Rubin12] ¡ Our approaches can be transformed into a multi-class by standard techniques ¡ Our approach: utilize topic modeling for clinical text classification using LDA
  • 46. Orbital Dataset ¡ CT imaging reports for patients suffering traumatic orbital injury [Yadav12] ¡ Each report was dictated by a staff radiologist ¡ Outcomes were extracted by a trained data abstractor ¡ Positive for acute orbital fracture ¡ Negative for acute orbital fracture ¡ Among the 3,705 orbital CT reports, 3,242 were negative and 463 were positive 46
  • 47. Pediatric Dataset 47 ¡ Prospectively collected patient CT report data for pediatric traumatic brain injury [Kuppermann09] ¡ Obtained at emergency department clinicians discretion and were interpreted by site faculty radiologists ¡ The outcome of interest was extracted by a trained data abstractor ¡ Positive for traumatic brain injury ¡ Negative for traumatic brain injury ¡ Among the 2,126 pediatric head CT reports, 1,973 were negative and 153 were positive
  • 48. Dataset Preparation 48 ¡ Training and testing datasets ¡ Resampled datasets Original Undersampled Oversampled Dataset Pos Neg Total Pos Neg Total Pos Neg Total Orbital 463 3,242 3,705 463 463 926 1,895 1,810 3,705 Pediatric 153 1,973 2,126 153 151 304 1,094 1,032 2,126 Proportions Dataset 75% 66% 50% 34% 25% Orbital 2,778 2,445 1,852 1,259 926 Pediatric 1,594 1,403 1,063 722 531
  • 49. Raw Text Classification of Clinical Reports 49
  • 50. Raw Text Classification Patient Reports Labels Preprocess SVM NextPrevious BoW Decision Tree Zero Rule
  • 51. Decision Tree for Pediatric Dataset 51
  • 52. Raw Text Classification Results 52 Orbital Pediatric Algorithm Test (%) Precision Recall F-score Precision Recall F-score ZR 76.57 87.50 81.67 86.12 92.80 89.34 DT 25 93.08 93.71 93.40 94.09 94.16 94.13 34 93.64 94.25 93.94 94.13 94.25 94.19 50 92.93 93.32 93.12 94.14 94.20 94.17 66 92.87 93.44 93.15 93.56 93.65 93.60 75 92.46 93.02 92.74 93.39 93.53 93.46 SVM 25 94.24 94.29 94.27 95.53 95.56 95.55 34 94.14 94.23 94.18 95.78 95.80 95.79 50 94.28 94.28 94.27 95.88 95.90 95.89 66 93.81 93.85 93.83 95.49 95.60 95.55 75 93.46 93.49 93.48 95.42 95.53 95.47
  • 55. MedLEE Lexicon Extension Table 4.2: List of terms added to MedLEE lexicon for the orbital dataset Term Category Target Form UMLS CUI ramus Body location ramus of mandible C0222748 angle Body location angle of mandible C0222753 body Body location body of mandible C0222746 intraorbit Body location ocular orbit C0029180 nasal cavity Body location nasal cavity C0027423 mastoid air cell Body location pneumatic mastoid cell C0229427 pterygoid plate Body location pterygoid process C0222730 lamina papyracea Body location orbital plate of ethmoid bone C0222699 lamina paprycea Body location orbital plate of ethmoid bone C0222699 LeFort Finding Le Fort’s fracture C0272464 LeFort I Finding Le Fort’s fracture, type I C0435328 LeFort II Finding Le Fort’s fracture, type II C0435329 LeFort III Finding Le Fort’s fracture, type III C1402218 LeFort Type I Finding Le Fort’s fracture, type I C0435328 LeFort Type II Finding Le Fort’s fracture, type II C0435329 LeFort Type III Finding Le Fort’s fracture, type III C1402218 premaxilla Body location premaxillary bone C0687094 supraorbital Body location supraorbital C0230002 preorbital Body location periorbital C0230064 depressed fracture Finding depressed fracture C0332759 maxillary sinus Body location maxillary sinus C0024957 emphysema Finding subcutaneous emphysema C0038536 ramus fracture Finding Closed fracture of ramus of mandible C0272469 angle fracture Finding Mandible angle fracture C0746383 body fracture Finding Closed fracture of body of mandible C0272470 lamina papyracea fracture Finding Fracture of orbital plate of ethmoid bone C1264245 maxillary sinus fracture Finding sinus maxillaris fracture C1409796 orbital floor fracture Finding Fracture of orbital floor C0149944 nasal bone fracture Finding Fractured nasal bones C0339848 tripod fracture Finding Closed fracture of zygomatic tripod C1264249 on the status modifier, historical or chronic findings were filtered out. For instance, findings with status modifier values active, recent were kept and the ones with previous, past were 44 Table 4.3: List of terms added to MedLEE lexicon for the pediatric dataset Term Category Target Form UMLS CUI mass effect Finding cerebral mass effect C0186894 shift Finding midline shift of brain C0576481 midline shift Finding midline shift of brain C0576481 extraaxial hemorrhage Finding enlarged extraaxial space on brain imaging C3280298 extra-axial hemorrhage Finding enlarged extraaxial space on brain imaging C3280298 extraaxial fluid collection Finding enlarged extraaxial space on brain imaging C3280298 extra-axial fluid collection Finding enlarged extraaxial space on brain imaging C3280298 extra-axial collection Finding enlarged extraaxial space on brain imaging C3280298 extraaxial collection Finding enlarged extraaxial space on brain imaging C3280298 extraaxial hematoma Finding enlarged extraaxial space on brain imaging C3280298 extra-axial hematoma Finding enlarged extraaxial space on brain imaging C3280298 sulcus Body location sulcus of brain C0228177 parenchymal hemorrhage Finding parenchymal hemorrhage C0747264 ventricle Body location cerebral ventricles C0007799 sutures Body location joint structure of suture of skull C0010272 ischemic changes Finding cerebral ischemia C0917798 depressed Finding depressed fracture C0332759 depression Finding depressed fracture C0332759 depressed fracture Finding depressed fracture C0332759 nasopharyngeal passage Body location entire nasal passage C1283892 filtered out. Finally, the section names were also checked and sections corresponding to the patient history were excluded. 4.2.3 Classification Both of the feature sets from the NLP output were converted into BoW representation. Orbital Dataset Pediatric Dataset
  • 56. NLP Feature Selection ¡ Feature set 1: All ¡ Only problems with body locations ¡ Excluded findings: procedure, device, technique, etc. ¡ Feature set 2: Filtered ¡ Subset of feature set 1 ¡ Current and highly probable findings based on modifiers ¡ Certainty modifier ¡ Included: high certainty, moderate certainty,… ¡ Included with a preceding ‘no_’: low certainty, negative ¡ Excluded: rule out ¡ Status modifier ¡ Included: active, recent… ¡ Excluded: previous, past… ¡ Section modifier ¡ Included: Indications and findings ¡ Excluded: past history 56
  • 57. NLP Feature Selection Include the finding with a preceding no_ Include the finding as it is Find all problems with body locations Filtered Not past history & high certainty Negated Y N Y N Y Exclude N
  • 58. Sample MedLEE Filtered Features value, and negative predictive value, with 95% confi- dence intervals (CIs). tivity, knowing that the CARN research node head CT data set has a positive TBI CT report prevalence of 7.2%. Assuming sensitivity and specificity of the system Findings: Extracranial, subcutaneous hyperdense hematoma is seen along the right parietal region with underlying minimally depressed right parietal skull fracture. MedLEE structured text: <problem v = "hematoma" code = "UMLS:C0018944_hematoma"> <bodyloc v = "subcutaneous"><region v = "extracranial"> </region></bodyloc> <certainty v = "high certainty"></certainty> <problemdescr v = "hyperdensity"></problemdescr> <region v = "region"><region v = "parietal"><region v = "right"></region></region></region> <code v = "UMLS:C0018944_hematoma"></code> <code v = "UMLS:C0520532_subcutaneous hematoma"></code> </problem> <problem v = "fracture" code = "UMLS:C0016658_fracture"> <bodyloc v = "skull" code = "UMLS:C0037303_bone structure of cranium"> <region v = "parietal"><region v = "right"> </region></region> <code v = "UMLS:C0037303_bone structure of cranium"></code> </bodyloc> <certainty v = "high certainty"></certainty> <change v = "depressed"><degree v = "low degree"></degree> </change> <code v = "UMLS:C0016658_fracture"></code> <code v = "UMLS:C0037304_skull fractures"></code> <code v = "UMLS:C0272451_fracture of parietal bone (disorder)"></code> Filtered Feature Selection: hematoma subcutaneous C0018944 hematoma C0520532 subcutaneous hematoma fracture skull C0037303 bone structure of cranium C0016658 fracture C0037304 skull fractures C0272451 fracture of parietal bone (disorder) Figure 2. Sample MedLEE and filtered feature outputs. MedLEE = Medical Language Extraction and Encoding. 174 Yadav et al. • AUTOMATED OUTCOME CLASSIFICATION OF CT REPORTS FOR PEDIATRIC TBI
  • 59. Decision Tree for Orbital Dataset using NLP All Features 59
  • 60. Decision Tree for Orbital Dataset using NLP Filtered Features 60
  • 61. Raw Text vs NLP Features Baseline Decision Tree SVM Text NLP Text NLP All Filtered All Filtered Precision 76.57 93.64 96.28 96.53 94.28 96.13 96.96 Recall 87.50 94.25 96.33 96.59 94.28 96.14 97.00 F-Score 81.67 93.94 96.30 96.56 94.28 96.14 96.98 Baseline Decision Tree SVM Text NLP Text NLP All Filtered All Filtered Precision 86.12 94.13 95.21 96.63 95.88 96.74 97.13 Recall 92.80 94.25 95.46 96.80 95.90 96.90 97.25 F-Score 89.34 94.19 95.34 96.55 95.88 96.73 97.10 Orbital Dataset Pediatric Dataset 61
  • 62. Raw Text vs NLP-based Classification 62 Orbital Dataset Pediatric Dataset
  • 63. Classification Errors chest radiographs on a consecutive cohort of 1,277 neo- nates to detect pneumonia.12 Although this meant each unique patient was the source of multiple reports and there was a low prevalence of positive cases (seven cases, 0.5%), MedLEE and expert rules identified pneu- monia with a sensitivity of 0.71 and specificity of 0.99. The best performance was found in a study comparing the precision and accuracy of NegEx (Negative Identifi- cation for Clinical Conditions, https://code.google.com/ p/negex/) and SQL Server 2008 Free Text Search (Microsoft Corp., Redmond, WA) to identify acute frac- tures in 400 randomly selected extremity and spine radiograph reports.23 Although the expert rules were constructed to broadly identify any acute fracture and there was a low prevalence of positive cases (13 cases, 3.25%), NegEx performance was perfect, while modified SQLServer also did well (precision = 1.00, recall = 0.92; F-score = 0.96). This study improves on previous work in two ways. First, we achieved similar performance using data sourced from real-world clinical settings despite being the first to our knowledge to use machine learning for outcome identification. Second, we analyzed the largest number of unique patients to date. In selecting MedLEE for our hybrid approach, we did consider other available NLP tools. Alternative medical NLP tools, such as the open-source Clinical Text Analy- Table 3 Performance of Automated Classification Compared to Physi- cian Raters Study and Coding Method Sensitivity Specificity This study, hybrid automated (95% CI) 0.933 (0.897–0.959) 0.969 (0.964–0.973) Table 2 Classification Errors (Combination of Training and Test Sets)* Cause Frequency (%) Nonorbital fracture 32 (31.4) Final reading disagrees with preliminary reading 19 (18.6) Vague certainty 9 (8.8) Fracture acuity 9 (8.8) Recent facial fracture surgery 6 (5.9) MedLEE miscoding 5 (4.9) Other† 22 (21.6) *Total sample of 3,710. Errors total 102 instances (2.7%). †Includes dictation error, filtering error, fracture implied but not stated, and miscellaneous poor wording. MedLEE = Medical Language Extraction and Encoding. 852 Yadav et al. • AUTOMATED CLASSIFICATION OF CT REPORTS the CT report (“hidden”), and so the automated classifi- cation system suffered. In our study, misclassification analysis revealed a general problem with report ambi- guity (Table 3). In addition, certain aspects of the injury findings of the PECARN TBI criteria (such as degree of displacement of a skull fracture) were often not explicitly reported and therefore difficult to detect by our automated classification approach, leading to an a m st p re d a sa th te re et ce th se a W a ca w le d fi em fo st Figure 4. Decision tree. TBI = traumatic brain injury. Table 3 Misclassification Categorization (From Both Test and Training Sets) Misclassification Reason Number (%) False negatives (from 1,829 coded negative) 7 (0.4) Decision tree misclassification 7 (100) False positives (from 292 coded positive) 147 (50.3) Abnormal but not PECARN TBI 53 (36.1) Report ambiguity 12 (8.2) Report dictation error 6 (4.1) Text conversion error 3 (2.0) MedLEE misread 27 (18.4) Decision tree misclassification 46 (31.3) MedLEE = Medical Language Extraction and Encoding; PECARN = Pediatric Emergency Care Applied Research Net- work; TBI = traumatic brain injury. Orbital Dataset Pediatric Dataset
  • 65. Topic Modeling-based Classification Patient Reports Labels Preprocess CTCTopic Modeling BTC DT SVM STC ATC Topic Vectors
  • 66. Topic Vectors 66 ¡ Compact representation of documents via topics ¡ Each document is represented by a vector of its topic proportions ¡ Number of topics k ¡ Needs to be determined empirically ¡ Total number of attributes for orbital and pediatric datasets ~1300-1500 ¡ Without preprocessing: 6K to 9K ¡ Number of topics: 5-150 ¡ Dimension reduction achieved: ¡ Orbital dataset: 88.4% - 99.6% ¡ Pediatric dataset: 90.0% - 99.7% k = 3,d1 = 0.2 0.5 0.3 ! " # # # $ % & & & ,d2 = 0.3 0.1 0.6 ! " # # # $ % & & & DimensionReduction(%) = attributes − topics∑∑ attributes∑
  • 67. Baseline Topic Classifier (BTC) Topic Orbital Pediatric 0 acute, report, axial, facial, findings contrast, head, report, evidence, intracranial 1 left, right, maxillary, fracture, sinus fracture, findings, tissue, soft, impression ¡ Topic model is built using K=|C| where |C| is the total number of classes ¡ The topic with the higher probability is assigned as the predicted class d1 = 0.2 0.8 ! " # $ % &⇒ Positive d2 = 0.7 0.3 ! " # $ % &⇒ Negative 67
  • 68. BTC Results 68 ¡ Oversampled and undersampled datasets Orbital Pediatric Dataset Algorithm Precision Recall F-score Precision Recall F-score Original ZR 76.6 87.5 81.7 86.1 92.8 89.3 BTC 88.6 73.4 80.3 83.3 59.4 69.3 Undersampled ZR 49.6 49.7 49.7 25.3 50.3 33.7 BTC 84.4 84.2 84.3 72.6 64.6 68.4 Oversampled ZR 26.2 51.1 34.6 26.5 51.5 35.0 BTC 83.4 82.5 82.9 73.3 66.7 69.8
  • 69. Topic Vector Classifier Train topic model Build classifiers using decision tree and SVM Merge the documents in topic vector representation with their classes
  • 70. Topic Vector Classifier 70 Decision Tree SVM Rank K Test Precisio n Recall F-score K Test Precisio n Recall F-score 1 50 34 95.38 95.31 95.35 150 25 96.25 96.33 96.27 2 25 75 95.00 95.07 95.03 150 66 96.08 96.16 96.11 3 75 34 95.07 95.00 95.03 150 50 96.07 96.17 96.10 Decision Tree SVM Rank K Test Precisio n Recall F-score K Test Precisio n Recall F-score 1 15 25 95.79 96.05 95.87 15 75 96.11 96.16 96.13 2 15 50 95.50 95.58 95.54 15 50 96.00 96.23 96.06 3 15 66 95.53 95.51 95.52 15 66 96.00 96.23 96.06 Pediatric dataset Orbital dataset
  • 71. Confidence-based Topic Classifier (CTC) Train topic model For each topic T and class C calculate confidence conf(T=>C) Select the topic t with biggest confidence for positive class Pick a threshold th on the selected topic t Infer the topic distribution Find the selected topic t’s value v Predict positive v > th Predict negative Y N TRAINING DATASET TESTING DATASET Merge the documents in topic vector representation with their classes sup(T) = NT N conf (T ⇒ C) = sup(T ∪C) sup(T)
  • 72. Similarity-based Topic Classifier (STC) Train topic model For each class, calculate average of topic distributions Infer the topic distribution Predict positive Similar to positive class? Predict negative TRAINING DATASET TESTING DATASET Merge the documents in topic vector representation with their classes Compute similarity Y N cos(θ) = x⋅ y x y
  • 73. Aggregate Topic Classifier (ATC) Train topic model For each class, calculate average of topic distributions Select the topic t with maximum difference between classes Pick a threshold th on the selected topic t Infer the topic distribution Find the selected topic t’s value: v Predict positive v > th Predict negative Y N TRAINING DATASET TESTING DATASET Merge the documents in topic vector representation with their classes
  • 74. CTC, STC, ATC Results 74 CTC STC ATC Rank K Test P R F K Test P R F K Test P R F 1 150 66 93.84 94.29 94.01 150 25 96.58 93.96 94.79 20 34 96.16 96.26 96.20 2 75 25 93.69 93.4 93.53 75 34 95.88 93.62 94.34 15 34 96.18 95.7 95.88 3 75 34 94.07 93.07 93.48 30 34 95.65 93.34 94.09 20 25 96.14 95.66 95.81 CTC STC ATC Rank K Test P R F K Test P R F K Test P R F 1 75 50 93.65 93.56 93.61 75 34 95.72 95.39 95.51 15 66 96.07 96.03 96.05 2 50 50 93.86 92.86 93.20 75 66 95.69 95.22 95.37 20 66 96.07 95.95 95.97 3 30 34 93.86 92.77 93.14 15 66 95.93 95.09 95.33 15 75 95.47 95.43 95.45 Orbital dataset Pediatric dataset
  • 75. Topic Modeling-based Classifiers’ Results 75 Report: Untitled 282 Page 1 of 2 Fscore 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 25 TestRatio(%) 0 25 50 75 100 125 150 K SVM ATC STC CTC DT Graph Builder Report: Untitled 198 Page 1 of 2 Fscore 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 34 TestRatio(%) 0 25 50 75 100 125 150 K SVM ATC STC CTC DT Graph Builder Orbital Dataset Pediatric Dataset
  • 76. Overall Classification Performances Orbital Pediatric Algorithm Precision Recall F-score Precision Recall F-score Baseline ZR 76.6 87.5 81.7 86.1 92.8 89.3 BTC 88.6 73.4 80.3 83.3 59.4 69.3 Raw Text DT 93.6 94.3 93.9 94.1 94.3 94.2 SVM 94.3 94.3 94.3 95.9 95.9 95.9 NLP-based DT-All 96.3 96.3 96.3 95.2 95.5 95.3 DT-Filtered 96.5 96.6 96.6 95.5 95.8 95.7 SVM-All 96.1 96.1 96.1 96.7 96.9 96.8 SVM- Filtered 97.0 97.0 97.0 97.1 97.3 97.2 TM-based DT 95.4 95.3 95.4 95.8 96.1 95.9 SVM 96.3 96.3 96.3 96.1 96.2 96.1 CTC 93.7 93.6 93.6 93.8 94.3 94.0 STC 95.7 95.4 95.5 96.6 94.0 94.8 ATC 96.1 96.0 96.1 96.2 96.3 96.2
  • 77. Best Classification Performances 77 Algorithm Precision Recall F-score Baseline ZR 76.6 87.5 81.7 Raw Text SVM 94.3 94.3 94.3 NLP-based SVM-Filtered 97.0 97.0 97.0 TM-based SVM 96.3 96.3 96.3 Algorithm Precision Recall F-score Baseline ZR 86.1 92.8 89.3 Raw Text SVM 95.9 95.9 95.9 NLP-based SVM-Filtered 97.1 97.3 97.1 TM-based ATC 96.2 96.3 96.2 Orbital Dataset Pediatric Dataset
  • 78. Discussion 78 ¡ NLP-based classification approaches ¡ Best classification performance among all classifiers ¡ Needs more customizations ¡ Topic modeling-based classification approaches ¡ Provides dimension reduction ¡ Performs better than raw text classification ¡ Competitive with NLP-based classification ¡ More general framework than NLP-based classifiers
  • 80. Summary 80 ¡ Large amounts of electronic clinical data have become available with the increased use of Electronic Health Records (EHR) ¡ Automated processing of these records could benefit both the patient and the provider ¡ Speeding up the decision process and reducing costs ¡ Classifiers that can automatically predict the outcome from raw text clinical reports ¡ Raw text classification of clinical reports ¡ NLP-based classification of clinical reports ¡ Topic modeling-based classification of clinical reports
  • 81. Significance and Contributions 81 ¡ Addressing issues specific to automated processing of clinical text ¡ Unstructured data ¡ Medical terms ¡ Context sensitivity ¡ Real world dataset ¡ Natural Language Processing-based classification of clinical reports ¡ Selecting and adjusting a biomedical NLP tool ¡ Best ways to extract NLP features for better classification ¡ Topic Modeling-based classification of clinical reports ¡ A more general framework than NLP-based solutions ¡ Utilizing an unsupervised technique, i.e. topic modeling, in a supervised fashion, i.e. classification ¡ Topic modeling-based classifiers: BTC, CTC, STC, and ATC
  • 82. Research Impacts 82 ¡ Improve quality and efficiency of healthcare ¡ The classifiers can be used to automatically predict the conditions in a clinical report ¡ Can replace the manual review of clinical reports, which can be time consuming and error-prone ¡ Clinicians can have more confidence in utilizing such systems in real life settings ¡ Increased accuracy and interpretability
  • 83. 83 References 1. K. Yadav, C. E, H. JS, A. Z, N. V, and G. P. et al., “Derivation of a clinical risk score for traumatic orbital fracture,” 2012, in Press. 2. V. Garla, V. L. R. III, Z. Dorey-Stein, F. Kidwai, M. Scotch, J. Womack, A. Justice, and C. Brandt, “The Yale cTAKES extensions for document classification: architecture and application.” JAMIA, 2011 3. C. Friedman, “A broad-coverage natural language processing system,” Proc AMIA Symp, pp. 270–274, 2000. 4. T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” in Proceedings of the 10th European Conference on Machine Learning, ser. ECML ’98., pp. 137–142. 5. F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, 2002 6. Aronson, A. R. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp (2001), 17–21. 7. Aronson, A. R., and Lang, F.-M. An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17, (2010). 8. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, vol. 41, no. 6, pp. 391–407, 1990. 9. T. Hofmann, “Probabilistic latent semantic analysis,” in UAI, 1999. 10. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar. 2003.
  • 84. 84 References – Continued 11. Z. Zhang, X.-H. Phan, and S. Horiguchi, “An Efficient Feature Selection Using Hidden Topic in Text Categorization.” AINAW ’08. 12. W. Sriurai, “Improving Text Categorization by Using a Topic Model,” Advanced Computing: An International Journal (ACIJ), 2011. 13. E. Chen, Y. Lin, H. Xiong, Q. Luo, and H. Ma, “Exploiting probabilistic topic models to improve text categorization under class imbalance,” Inf. Process. Manage., 2011. 14. S. Banerjee, “Improving text classification accuracy using topic modeling over an additional corpus,”, SIGIR ’08. 15. Arnold, C. W., El-Saden, S. M., Bui, A. A. T., and Taira, R. Clinical Case-based Retrieval Using Latent Topic Analysis. AMIA Annu Symp Proc 2010 (2010), 26–30. 16. H. M. Wallach, “Topic modeling: beyond bag-of-words.”, ICML ’06. 17. Griffiths, T. L., Steyvers, M., Blei, D. M., and Tenenbaum, J. B. Integrating topics and syntax. In In Advances in Neural Information Processing Systems 17 (2005), MIT Press, pp. 537–544. 18. Boyd-Graber, J. L., and Blei, D. M. Syntactic topic models. CoRR abs/1002.4665 (2010). 19. N. Kuppermann, J. F. Holmes, P. S. Dayan, J. D. J. Hoyle, S. M. Atabaki, R. Holubkov, F. M. Nadel, D. Monroe, R. M. Stanley, D. A. Borgialli, M. K. Badawy, J. E. Schunk, K. S. Quayle, P. Mahajan, R. Lichenstein, K. A. Lillis, M. G. Tunik, E. S. Jacobs, J. M. Callahan, M. H. Gorelick, T. F. Glass, L. K. Lee, M. C. Bachman, A. Cooper, E. C. Powell, M. J. Gerardi, K. A. Melville, J. P. Muizelaar, D. H. Wisner, S. J. Zuspan, J. M. Dean, and S. L. Wootton-Gorges, “Identification of children at very low risk of clinically-important brain injuries after head trauma: a prospective cohort study.” Lancet, vol. 374, no. 9696, pp. 1160–1170. 20. T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers, “Statistical topic models for multi-label document classification,” Mach. Learn., vol. 88, no. 1-2, pp. 157–208
  • 85. 85 References – Continued 21. Z. Liu, M. Li, Y. Liu, and M. Ponraj, “Performance evaluation of Latent Dirichlet Allocation in text mining,” in Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on, vol. 4, pp. 2695 –2698. 22. W. W. Chapman, D. Hillert, S. Velupillai, M. Kvist, M. Skeppstedt, B. E. Chapman, M. Conway, M. Tharp, D. L. Mowery, and L. Deleger, “Extending the negex lexicon for multiple languages.” Stud Health Technol Inform, vol. 192, pp. 677– 681, 2013. 23. E. A. Mendonca, J. Haas, L. Shagina, E. Larson, and C. Friedman, “Extracting information on pneumonia in infants using natural language processing of radiology reports.” J Biomed Inform, vol. 38, no. 4, pp. 314–321. 24. H. Harkema, J. N. Dowling, T. Thornblade, and W. W. Chapman, “Context: an algorithm for determining negation, experiencer, and temporal status from clinical reports.” J Biomed Inform, vol. 42, no. 5, pp. 839–851. 25. H. Misra, F. Yvon, J. M. Jose, and O. Cappe, “Text segmentation via topic modeling: an analytical study,” in Proceedings of the 18th ACM conference on Information and knowledge management, ser. CIKM ’09, pp. 1553– 1556. 26. J.-H. Yeh and C.-H. Chen, “Protein remote homology detection based on latent topic vector model,” in Networking and Information Technology (ICNIT), 2010 International Conference on, pp. 456 –460. 27. J. A. Womack, M. Scotch, C. Gibert, W. Chapman, M. Yin, A. C. Justice, and C. Brandt, “A comparison of two approaches to text processing: facilitating chart reviews of radiology reports in electronic medical records.” Perspect Health Inf Manag,vol. 7, p. 1a, 2010. 28. O. Bodenreider, “Using UMLS semantics for classification purposes.” Proc AMIA Symp, pp. 86–90, 2000. 29. F.-F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2. CVPR ’05. 2005, pp. 524–531. 30. G. Hripcsak, J. H. M. Austin, P. O. Alderson, and C. Friedman, “Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports.” Radiology, vol. 224, no. 1, pp. 157–163.
  • 86. References 86 31. Y. Huang, H. J. Lowe, D. Klein, and R. J. Cucina, “Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.” J Am Med Inform Assoc, vol. 12, no. 3, pp. 275–285, May-Jun 2005. 32. V. N. Vapnik, Statistical learning theory, 1st ed. Wiley, Sep. 1998. 33. J. H. Friedman, “Another approach to polychotomous classification,” Department of Statistics, Stanford University, Tech. Rep., 1996. 34. T. Hastie and R. Tibshirani, “Classification by Pairwise Coupling,” 1998. 35. Stanford NLP: http://nlp.stanford.edu/software/index.shtml 36. Open NLP: http://opennlp.apache.org