SlideShare a Scribd company logo
1 of 57
Download to read offline
Clinical data successes
using machine learning
A survey by Joseph Paul Cohen, PhD
Montreal Institute for Learning Algorithms
Topics:
1. Medical Concept Representation
2. Clinical Event Prediction
Tutorial
Where is the Deep Learning research?
[Shickel, Deep EHR : A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record, 2018]
(~word2vec)
(word2vec)
Topic of today!
Concept Representation
Mr. Smith is a 63-year-old gentleman
with coronary artery disease,
hypertension, hypercholesterolemia,
COPD and tobacco abuse. He reports
doing well. He did have some more knee
pain for a few weeks, but this has
resolved. He is having more trouble
with his sinuses. I had started him on
Flonase back in December. He says this
has not really helped. Over the past
couple weeks he has had significant
congestion and thick discharge. No
fevers or headaches but does have
diffuse upper right-sided teeth pain.
He denies any chest pains,
palpitations, PND, orthopnea, edema or
syncope. His breathing is doing fine.
No cough. He continues to smoke about
half-a-pack per day. He plans ontrying
the patches again.
Clinical Note Clinical Publication
Representations for:
Patient
Doctor
Visit
Disease
Drug
Symptom
Helpful to understand
similarity or make
semi-supervised
predictions!
Word Embeddings for Biomedical Language
Extract relationships
between words and
produce a latent
representation
Word Embeddings for Biomedical Language
What to do with word embeddings?
● We can compose them to create paragraph embeddings (bag of embeddings).
● Use in place of words for an RNNs (More on this later!)
● Augment learned representations on small datasets
● `
[Cultural Shift or Linguistic Drift, Hamilton, 2016]
Study how the meaning between two
texts varies (or hospitals, or doctors)?
[Pennington, 2014][Mikolov, 2013]
Study the compositionality of the
learned latent space
Token representations
One-hot encoding: binary vector per token
Example:
cat = [0 0 1 0 0 0 0 0 0 0 0 0 0 0 … 0]
dog = [0 0 0 0 1 0 0 0 0 0 0 0 0 0 … 0]
house = [1 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0]
Note!
If x is one hot
The dot product of Wx
= a Single column of W M x N N x 1
=
M x 1
word2vec
Target wordContext wordContext word Context word
involving respiratory system and other chest symptoms
Context word
involving
respiratory
doctor
chest
Mikolov, Efficient Estimation of Word Representations in Vector Space, 2013
1. Each word is a training example
2. Each word is used in many contexts
3. The context defines each word
system
1
1
0
1
0
0
0
0
0
1
-1
5.1
involving
respiratory
doctor
chest
system
Context window = 2
Learning in progress
king + (woman - man) = ?
The point that is
closest is queen!
Follow along online!
https://colab.research.google.com/drive/1g4zvEg921sLQK-VsBk5mMb2-h4goCGyd
window_size = 2
idx_pairs = []
for sentence in tokenized_corpus:
indices = [word2idx[word] for word in sentence]
for center_word_pos in range(len(indices)):
for w in range(-window_size, window_size + 1):
context_word_pos = center_word_pos + w
if context_word_pos < 0 or
context_word_pos >= len(indices) or
center_word_pos == context_word_pos:
Continue
context_word_idx = indices[context_word_pos]
idx_pairs.append((indices[center_word_pos], context_word_idx))
Code by github user: mbednarski
Output:
[[16, 12],
[16, 1],
...
[ 1, 16],
[ 1, 12]]
for each word, treated as center word
center_word_pos = 0, 1, 2, ...
for each window position
w = -2, -1, 0, 1, 2
make sure not to jump out of the sentence
sentence = ['paris', 'france', 'capital']
indices = [2, 5, 11]
class SkipGram(nn.Module):
def __init__(self, vocab_size, embd_size):
super(SkipGram, self).__init__()
self.W1 = Variable(torch.randn(embd_size, vocab_size).float(), requires_grad=True)
self.W2 = Variable(torch.randn(vocab_size, embd_size).float(), requires_grad=True)
def forward(self, focus):
z1 = torch.matmul(self.W1, focus)
z2 = torch.matmul(self.W2, z1)
softmax = F.log_softmax(z2, dim=0)
return softmax
model = SkipGram(vocabulary_size, 2)
Initialize two matrices
|E| x |V|
|V| x |E|
Encoder:
|E| x |V| dot |V| x 1 = |E| x 1
Softmax over context
prediction
Decoder:
|V| x |E| dot |E| x 1 = |V| x 1
Code by github user: mbednarski
A note on the softmax function
To predict multiple classes we project to a probability distribution
word2
word1
word3
3.2
5.1
-1.7
24.5
164.0
0.18
exp normalize
0.13
0.87
0.0
Simplex
Because it is on a simplex; the correction of one term impacts all
Tumor
word3
word2word1 word3
Image credits: http://gureckislab.org/
word1
word2
word2
word1
word3
0.13
0.87
0.0
x
Infinite ways to generate the same
output.
A correction of one sends gradients
to others
We can learn unseen classes
through a process of elimination.
https://github.com/ieee8023/NeuralNetwork-Examples/blob/master/general/simplex-softmax.ipynb
word2
word3word1
Word1
Word3
Word2
Softmax and Cross-entropy loss
To predict multiple class we can project the output onto a simplex and
compute the loss there.
word2
word1
word3
3.2
5.1
-1.7
24.5
164.0
0.18
exp normalize
0.13
0.87
0.0
loss
0.0
0.13
0.0
Open Access Subset of PubMedCentral
Subset of files that are available open access
● Papers available in PDF or XML
● 1.25 million biomedical articles and 2 million distinct words
● Available over FTP for bulk download (CompSci Friendly!)
● Metadata includes journal name and year
https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/
Breast Cancer Res.
Genome Biol.
Arthritis Res.
BMC Cell Biol.
…
Journal names
Example XML data
Word Embeddings for Biomedical Language
Wang, A Comparison of Word Embeddings for the Biomedical Natural Language Processing, 2018
Word Embeddings for Biomedical Language
(Mayo Clinic)
(c) Wikipedia + Gigaword
Representations are
biased by the data.
We can use this to our
advantage to control the
domain.
(Mayo Clinic)
Wikipedia +
Gigaword
Specific General
Example medical words and their five post similar words based on each training corpus of text
The full name of
diabetes is "diabetes
mellitus"
Articles say diabetics
are at increased risk of
hypertension (high
blood pressure)
A type of peptic ulcer
One symptom is
an ulcer
Colon cancer is
associated with
breast cancer!
Indiana University Hospital Reports
Chest X-ray images from the Indiana University hospital network
1000 reports available in XML format!
['heart size normal lungs ...',
'the heart size and ...',
'the heart is normal in ...',
'the lungs are clear the ...',
'heart size normal lungs ...',
'heart is mildly heart ...',
'the lungs are clear there ...',
'cardiac and mediastinal ...',
…
def clean(s):
for c in [".",",",":",";",""","/","[","]","<",">","?"]:
s = s.replace(c, " ").lower()
return s
corpus = []
for f in os.listdir("ecgen-radiology"):
tree = xml.etree.ElementTree.parse("ecgen-radiology/" + f)
root = tree.getroot()
node = root.findall(".//AbstractText/[@Label='FINDINGS']")[0]
corpus.append(clean(str(text)))
<Abstract>
<AbstractText Label="COMPARISON">None.</AbstractText>
<AbstractText Label="INDICATION">Positive TB test</AbstractText>
<AbstractText Label="FINDINGS">The cardiac silhouette and mediastinum size
are within normal limits. There is no pulmonary edema. There is no focal
consolidation. There are no XXXX of a pleural effusion. There is no evidence of
pneumothorax.</AbstractText>
<AbstractText Label="IMPRESSION">Normal chest x-XXXX.</AbstractText>
</Abstract>
Python code to parse the XML
Output!:
Input:
Hyperparameters!
Depending on the configuration of the model the embeddings can vary:
No information Compression
We can vary: the dimension of the embedding, learning rate, token words, window size, etc..
Size of embedding space matters
Dim = 2 Dim = 100
*via t-sne
Distance Distance
Time series medical records
Tasks:
● Predict probability of event in future
● Predict duration until next visit
● Find similar patients/events/visits
We need to define
what events are!
1893 - Causes of Death (International Statistical Institute)
1975 - ICD-9 - (WHO)
1990 - ICD-10 - (WHO)
2022 - ICD-11 - (WHO)
ICD (International Classification of Diseases)
ICD is the foundation for the identification of health trends and statistics globally, and the
international standard for reporting diseases and health conditions. It is the diagnostic
classification standard for all clinical and research purposes. ICD defines the universe of
diseases, disorders, injuries and other related health conditions, listed in a comprehensive,
hierarchical fashion (http://www.who.int/)
There are alternative
standards but they can
require a fee
ICD (International Classification of Diseases)
Example ICD-9 codes:
786 Symptoms involving respiratory system and other chest symptoms
786.0 Dyspnea and respiratory abnormalities
786.1 Stridor
786.2 Cough
786.3 Hemoptysis
786.4 Abnormal sputum
786.5 Chest pain
786.6 Swelling, mass or lump in chest
786.7 Abnormal chest sounds
786.8 Hiccough
786.9 Other
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/ICD9-CM/2011
ICD (International Classification of Diseases)
Example ICD-9 codes (786.5)
Cardialgia (see also Pain, precordial) 786.51
Diaphragmalgia 786.52
chest 786.59
anginoid (see also Pain, precordial) 786.51
chest (central) 786.50
atypical 786.59
midsternal 786.51
musculoskeletal 786.59
noncardiac 786.59
substernal 786.51
wall (anterior) 786.52
costochondral 786.52
diaphragm 786.52
heart (see also Pain, precordial) 786.51
intercostal 786.59
over heart (see also Pain, precordial) 786.51
pericardial (see also Pain, precordial) 786.51
pleura, pleural, pleuritic 786.52
precordial (region) 786.51
respiration 786.52
retrosternal 786.51
rib 786.50
substernal 786.51
respiration 786.52
Pleuralgia 786.52
Pleurodynia 786.52
Precordial pain 786.51
chest 786.59
Prinzmetal-Massumi syndrome (anterior chest wall) 786.52
painful 786.52
Lots of grouping!
ICD-10 is very detailed
V97.33XD: Sucked into jet engine, subsequent encounter
V00.15: Heelies Accident
Applicable To Rolling shoe, Wheeled shoe, Wheelies accident.
Supertypes:
V00-Y99 External causes of morbidity
V95-V97 Air and space transport accidents
V00 Pedestrian conveyance accident
V00.1 Rolling-type pedestrian conveyance accident
Time series medical records
Sequence of medical codes over time
787.2
787.2
358.2 682.1
Tasks:
● Predict probability of code in future
● Predict duration until next visit
● Find similar patients/codes/visits
MLPs on time series data
787.2 682.1
Before
After
-5 years +1 year
0
1
0
0
0
1
0
0
0
.
.
.
787.2
358.2
MLP 1 682.1
787.2
358.2
Multi-hot
vectors!
Med2Vec
Word2Vec for time series patient
visits with ICD codes.
Embeddings learned for codes
and demographics.
Visits
(ICD Codes)
Visits +
Demographics
ICD Codes over time
Choi, Multi-layer Representation Learning for Medical Concepts, 2016
Visit embedding
Visit embedding
conditioned on
demographics
Baseline methods
One-hot: In order to compare with the raw input data, the binary vector for the visit is used.
Stacked autoencoder (SA): Using the binary vector concatenated with patient demographic
information as the input the SA is trained to minimize the reconstruction error. Then will be used to
generate visit representations.
Sum of Skip-gram vectors (word2vec): First learn the code-level representations with Skip-gram only.
Then for the visit-level representation, add the representations of the codes within the visit.
Choi, Multi-layer Representation Learning for Medical Concepts, 2016
Med2Vec
Evaluation: Predicting codes of the next visit
Choi, Multi-layer Representation Learning for Medical Concepts, 2016
Private):
Dataset CHOA
Example clinical note
Mr. Smith is a 63-year-old gentleman with coronary artery
disease, hypertension, hypercholesterolemia, COPD and
tobacco abuse. He reports doing well. He did have some more
knee pain for a few weeks, but this has resolved. He is
having more trouble with his sinuses. I had started him on
Flonase back in December. He says this has not really
helped. Over the past couple weeks he has had significant
congestion and thick discharge. No fevers or headaches but
does have diffuse upper right-sided teeth pain. He denies
any chest pains, palpitations, PND, orthopnea, edema or
syncope. His breathing is doing fine. No cough. He continues
to smoke about half-a-pack per day. He plans on trying the
patches again.
● Notes are still written together with codes selected for each visit.
● Often explaining details which do not have equivalent codes.
● Natural language is very difficult to parse with certainty.
Predicting codes from notes
Converting text to codes can
● Adapt old databases
● Correct errors
● Upgrade ICD versions
Jagannatha, Bidirectional RNN for Medical Event Detection in Electronic Health Records, 2016
... with upset stomach
<done>
<ICD-9 787.0
Nausea and vomiting>
RNNs, Different types of sequential prediction tasks
Input
Output
State
one to one one to many many to one many to many many to many
Taken from http://vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture10.pdf and Francis Dutil
Cat "This is a cat"
“It's hairy and I'm
allergic to it”
Cat
“Ceci est un chat”
“This is a cat” “Meow Meow”
RNNs
An RNN applies a function over a sequence of inputs [x1
, x2
, …, xT
]
which produces a sequence of outputs [y1
, y2
, …, yT
]
and each input produces a internal state [h1
, h2
, …, hT
].
Sequence of outputs
Sequence of internal states
Sequence of inputs
xt
yt
ht
Image du blog de Christopher Olah, slide from l’École d’automne 2018, César Laurent
RNNs
Image du blog de Christopher Olah, slide from l’École d’automne 2018, César Laurent
● A simple RNN
● W, U and V are the parameters of
the network.
● The weights are shared over time.
xt
yt
ht
U
V
W
Unrolling the RNN over time
The weights are shared over time.
x0xt
x1
x2
xT
y2
yT
y1
y0yt
h2
hT
h1
h0ht
U U U U
W W W W
V
V V V V
U
Image du blog de Christopher Olah, slide from l’École d’automne 2018, César Laurent
Unrolling the RNN over time
x0xt
x1
x2
xT
y2
yTyt
h2
hT
h1
h0ht
U U U U
W
V
V V V V
U
The weights are shared over time.
Image du blog de Christopher Olah
W W W
Unrolling the RNN over time
x0xt
x1
x2
xT
y2
yT
y1
y0yt
h2
hT
h1
h0ht
U U U U
W W W W
V
V V V V
U
The weights are shared over time.
Image du blog de Christopher Olah
Predicting future events
785.1
345.1
xt
782.2
358.2
682.1782.2787.2yt
h2
hT
h1
h0ht
U U U U
W W W W
V
V V V V
U
785.1 - 785.1
A multi-hot vector
Predicting next time
step
Vanishing gradients
Problem with the basic RNN
Slide from l’École d’automne 2018, César Laurent
The shade of gray shows the influence of the input of the RNN at time 1. It decreases over time,
as the RNN gradually forgets its first input.
Issue addressed by:
LSTM
GRU
Attention
Recurrent batch norm
Weight regularization
Layer norm
More reading: [Pascanu 2013]
We can get creative with RNN designs
Slide from l’École d’automne 2018, César Laurent
x0
x1
x2
z2
z1
z0
h2
2
h2
1
h2
0
x0
x1
x2
y2
y1
y0
h1
2
h1
1
h1
0
h0
h’0
h’1
h’2
h’i
hi
h2
h1
Stacked RNNs Bi-directional RNN
RNNs for next visit prediction (Doctor AI)
● Treat codes equally: ICD diagnosis
codes, procedure codes, and
medication codes
● Grouped codes into higher-order
categories
Medical codes R40000
y = High level medical codes R1778
d = time since last visit
Choi, Doctor AI: Predicting Clinical Event via Recurrent Neural Networks, 2016
Patients from Sutter Health Palo Alto Medical Foundation
Pretraining RNNs (Doctor AI)
MIMIC II has 2,695 patients with 2+ visits
Pretrained using a larger Sutter Health dataset
~300,000 patients
Choi, Doctor AI: Predicting Clinical Events via Recurrent Neural Networks, 2016
Related work
Lipton, Zachary C et al. “Learning to Diagnose with LSTM Recurrent Neural
Networks.” International Conference on Learning Representations. 2016
Che, Zhengping et al. “Recurrent Neural Networks for Multivariate Time Series
with Missing Values.” Nature Scientific Reports. 2018
Romanov, Lessons from Natural Language Inference in the Clinical Domain, 2018
Medical Natural Language Inference
MedNLI dataset derived from the MIMIC-III dataset
Romanov, Lessons from Natural Language Inference in the Clinical Domain, 2018
Medical Natural Language Inference
Studying the errors
we can see the
limits of the model.
Discussion
"While code-based representations of clinical
concepts and patient encounters are a tractable first
step towards working with heterogeneous EHR data,
they ignore many important real-valued
measurements associated with items such as
laboratory tests, intravenous medication infusions,
vital signs, and more." [Shickel et al,, 2018]
Discussion
"While some researchers downplay the importance
of interpretability in favor of significant
improvements in model performance, we feel
advances in deep learning transparency will hasten
the widespread adoption of such methods in clinical
practice." [Shickel et al, 2018]
Discussion
"Many studies claim state-of-the-art results, but
few can be verified by external parties. This is a
barrier for future model development and one
cause of the slow pace of advancement."
[Shickel et al, 2018]
Pr. Yoshua Bengio,
PhD
Francis Dutil Martin Weiss
Tristan Sylvain Margaux Luck,
PhD
Assya Trofimov Vincent Frappier,
PhD
Joseph Paul Cohen,
PhD
Shawn Tan
Sina Honari
Geneviève BoucherMandana Samiei Georgy Derevyanko,
PhD
Myriam Côté,
PhD
References
Ching, T., et al. Opportunities And Obstacles For Deep Learning In Biology
And Medicine. Journal of The Royal Society Interface. 2018
Shickel, B. et al. Deep EHR : A Survey of Recent Advances in Deep Learning
Techniques for Electronic Health Record. IEEE Journal of Biomedical and
Health Informatics, 2018

More Related Content

Similar to Clinical data successes using machine learning (word2vec, RNN)

web-application.pdf
web-application.pdfweb-application.pdf
web-application.pdfouiamouhdifa
 
How to manage your Experimental Protocol with Basic Statistics
How to manage your Experimental Protocol with Basic StatisticsHow to manage your Experimental Protocol with Basic Statistics
How to manage your Experimental Protocol with Basic StatisticsYoann Pageaud
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Mark Wilkinson
 
Perl cures coronary heart disease
Perl cures coronary heart diseasePerl cures coronary heart disease
Perl cures coronary heart diseaseBiogeeks
 
1 SDEV 460 – Homework 4 Input Validation and Busine
1  SDEV 460 – Homework 4 Input Validation and Busine1  SDEV 460 – Homework 4 Input Validation and Busine
1 SDEV 460 – Homework 4 Input Validation and BusineAbbyWhyte974
 
1 SDEV 460 – Homework 4 Input Validation and Busine
1  SDEV 460 – Homework 4 Input Validation and Busine1  SDEV 460 – Homework 4 Input Validation and Busine
1 SDEV 460 – Homework 4 Input Validation and BusineMartineMccracken314
 
Enhancing Health Risk Assessment:The Individual Exposure Health Risk Profile ...
Enhancing Health Risk Assessment:The Individual Exposure Health Risk Profile ...Enhancing Health Risk Assessment:The Individual Exposure Health Risk Profile ...
Enhancing Health Risk Assessment:The Individual Exposure Health Risk Profile ...Richard Hartman, Ph.D.
 
Alexander ch07 lecture
Alexander ch07 lectureAlexander ch07 lecture
Alexander ch07 lecturecorynava00
 
ValueChain_OganGurel
ValueChain_OganGurelValueChain_OganGurel
ValueChain_OganGurelOgan Gurel MD
 
B2.2 Inside Bacteria
B2.2 Inside BacteriaB2.2 Inside Bacteria
B2.2 Inside BacteriaMiss Lavin
 
Daniels and Worthingham's Muscle Testing Techniques of Manual Examination[001...
Daniels and Worthingham's Muscle Testing Techniques of Manual Examination[001...Daniels and Worthingham's Muscle Testing Techniques of Manual Examination[001...
Daniels and Worthingham's Muscle Testing Techniques of Manual Examination[001...DianayanethIllanosnu
 

Similar to Clinical data successes using machine learning (word2vec, RNN) (16)

web-application.pdf
web-application.pdfweb-application.pdf
web-application.pdf
 
How to manage your Experimental Protocol with Basic Statistics
How to manage your Experimental Protocol with Basic StatisticsHow to manage your Experimental Protocol with Basic Statistics
How to manage your Experimental Protocol with Basic Statistics
 
Slight change of plans!
Slight change of plans!Slight change of plans!
Slight change of plans!
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
 
Perl cures coronary heart disease
Perl cures coronary heart diseasePerl cures coronary heart disease
Perl cures coronary heart disease
 
1 SDEV 460 – Homework 4 Input Validation and Busine
1  SDEV 460 – Homework 4 Input Validation and Busine1  SDEV 460 – Homework 4 Input Validation and Busine
1 SDEV 460 – Homework 4 Input Validation and Busine
 
1 SDEV 460 – Homework 4 Input Validation and Busine
1  SDEV 460 – Homework 4 Input Validation and Busine1  SDEV 460 – Homework 4 Input Validation and Busine
1 SDEV 460 – Homework 4 Input Validation and Busine
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
Enhancing Health Risk Assessment:The Individual Exposure Health Risk Profile ...
Enhancing Health Risk Assessment:The Individual Exposure Health Risk Profile ...Enhancing Health Risk Assessment:The Individual Exposure Health Risk Profile ...
Enhancing Health Risk Assessment:The Individual Exposure Health Risk Profile ...
 
06.09.26
06.09.2606.09.26
06.09.26
 
2 ueda
2 ueda2 ueda
2 ueda
 
Alexander ch07 lecture
Alexander ch07 lectureAlexander ch07 lecture
Alexander ch07 lecture
 
ValueChain_OganGurel
ValueChain_OganGurelValueChain_OganGurel
ValueChain_OganGurel
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
B2.2 Inside Bacteria
B2.2 Inside BacteriaB2.2 Inside Bacteria
B2.2 Inside Bacteria
 
Daniels and Worthingham's Muscle Testing Techniques of Manual Examination[001...
Daniels and Worthingham's Muscle Testing Techniques of Manual Examination[001...Daniels and Worthingham's Muscle Testing Techniques of Manual Examination[001...
Daniels and Worthingham's Muscle Testing Techniques of Manual Examination[001...
 

Recently uploaded

Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...Sheetaleventcompany
 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthanindiancallgirl4rent
 
ooty Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
ooty Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetooty Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
ooty Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetdhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Call Girl in Indore 8827247818 {Low Price}👉 Nitya Indore Call Girls * ITRG...
Call Girl in Indore 8827247818 {Low Price}👉   Nitya Indore Call Girls  * ITRG...Call Girl in Indore 8827247818 {Low Price}👉   Nitya Indore Call Girls  * ITRG...
Call Girl in Indore 8827247818 {Low Price}👉 Nitya Indore Call Girls * ITRG...mahaiklolahd
 
Erode Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Erode Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetErode Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Erode Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetsurat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Chandigarh
 
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in LahoreBest Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in LahoreDeny Daniel
 
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In ChandigarhSheetaleventcompany
 
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetbhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance PaymentsEscorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance PaymentsAhmedabad Call Girls
 
Sangli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sangli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetSangli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sangli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Malda Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Malda Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMalda Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Malda Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...mahaiklolahd
 
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girlKolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girlonly4webmaster01
 
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetnagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
bhopal Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhopal Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetbhopal Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhopal Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetNanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetErnakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Chandigarh
 

Recently uploaded (20)

Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
Premium Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bangalor...
 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
 
ooty Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
ooty Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetooty Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
ooty Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetdhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girl in Indore 8827247818 {Low Price}👉 Nitya Indore Call Girls * ITRG...
Call Girl in Indore 8827247818 {Low Price}👉   Nitya Indore Call Girls  * ITRG...Call Girl in Indore 8827247818 {Low Price}👉   Nitya Indore Call Girls  * ITRG...
Call Girl in Indore 8827247818 {Low Price}👉 Nitya Indore Call Girls * ITRG...
 
Erode Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Erode Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetErode Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Erode Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetsurat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
surat Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in LahoreBest Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
 
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
💚 Punjabi Call Girls In Chandigarh 💯Lucky 🔝8868886958🔝Call Girl In Chandigarh
 
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetbhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhubaneswar Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance PaymentsEscorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
Escorts Service Ahmedabad🌹6367187148 🌹 No Need For Advance Payments
 
Sangli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sangli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetSangli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Sangli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Malda Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Malda Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMalda Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Malda Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
 
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girlKolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
Kolkata Call Girls Miss Inaaya ❤️ at @30% discount Everyday Call girl
 
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetnagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
nagpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
bhopal Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhopal Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetbhopal Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
bhopal Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetNanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetErnakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 

Clinical data successes using machine learning (word2vec, RNN)

  • 1. Clinical data successes using machine learning A survey by Joseph Paul Cohen, PhD Montreal Institute for Learning Algorithms Topics: 1. Medical Concept Representation 2. Clinical Event Prediction Tutorial
  • 2. Where is the Deep Learning research? [Shickel, Deep EHR : A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record, 2018] (~word2vec) (word2vec) Topic of today!
  • 3. Concept Representation Mr. Smith is a 63-year-old gentleman with coronary artery disease, hypertension, hypercholesterolemia, COPD and tobacco abuse. He reports doing well. He did have some more knee pain for a few weeks, but this has resolved. He is having more trouble with his sinuses. I had started him on Flonase back in December. He says this has not really helped. Over the past couple weeks he has had significant congestion and thick discharge. No fevers or headaches but does have diffuse upper right-sided teeth pain. He denies any chest pains, palpitations, PND, orthopnea, edema or syncope. His breathing is doing fine. No cough. He continues to smoke about half-a-pack per day. He plans ontrying the patches again. Clinical Note Clinical Publication Representations for: Patient Doctor Visit Disease Drug Symptom Helpful to understand similarity or make semi-supervised predictions!
  • 4. Word Embeddings for Biomedical Language Extract relationships between words and produce a latent representation Word Embeddings for Biomedical Language
  • 5. What to do with word embeddings? ● We can compose them to create paragraph embeddings (bag of embeddings). ● Use in place of words for an RNNs (More on this later!) ● Augment learned representations on small datasets ● ` [Cultural Shift or Linguistic Drift, Hamilton, 2016] Study how the meaning between two texts varies (or hospitals, or doctors)? [Pennington, 2014][Mikolov, 2013] Study the compositionality of the learned latent space
  • 6. Token representations One-hot encoding: binary vector per token Example: cat = [0 0 1 0 0 0 0 0 0 0 0 0 0 0 … 0] dog = [0 0 0 0 1 0 0 0 0 0 0 0 0 0 … 0] house = [1 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0] Note! If x is one hot The dot product of Wx = a Single column of W M x N N x 1 = M x 1
  • 7. word2vec Target wordContext wordContext word Context word involving respiratory system and other chest symptoms Context word involving respiratory doctor chest Mikolov, Efficient Estimation of Word Representations in Vector Space, 2013 1. Each word is a training example 2. Each word is used in many contexts 3. The context defines each word system 1 1 0 1 0 0 0 0 0 1 -1 5.1 involving respiratory doctor chest system Context window = 2
  • 9. king + (woman - man) = ? The point that is closest is queen!
  • 11. window_size = 2 idx_pairs = [] for sentence in tokenized_corpus: indices = [word2idx[word] for word in sentence] for center_word_pos in range(len(indices)): for w in range(-window_size, window_size + 1): context_word_pos = center_word_pos + w if context_word_pos < 0 or context_word_pos >= len(indices) or center_word_pos == context_word_pos: Continue context_word_idx = indices[context_word_pos] idx_pairs.append((indices[center_word_pos], context_word_idx)) Code by github user: mbednarski Output: [[16, 12], [16, 1], ... [ 1, 16], [ 1, 12]] for each word, treated as center word center_word_pos = 0, 1, 2, ... for each window position w = -2, -1, 0, 1, 2 make sure not to jump out of the sentence sentence = ['paris', 'france', 'capital'] indices = [2, 5, 11]
  • 12. class SkipGram(nn.Module): def __init__(self, vocab_size, embd_size): super(SkipGram, self).__init__() self.W1 = Variable(torch.randn(embd_size, vocab_size).float(), requires_grad=True) self.W2 = Variable(torch.randn(vocab_size, embd_size).float(), requires_grad=True) def forward(self, focus): z1 = torch.matmul(self.W1, focus) z2 = torch.matmul(self.W2, z1) softmax = F.log_softmax(z2, dim=0) return softmax model = SkipGram(vocabulary_size, 2) Initialize two matrices |E| x |V| |V| x |E| Encoder: |E| x |V| dot |V| x 1 = |E| x 1 Softmax over context prediction Decoder: |V| x |E| dot |E| x 1 = |V| x 1 Code by github user: mbednarski
  • 13. A note on the softmax function To predict multiple classes we project to a probability distribution word2 word1 word3 3.2 5.1 -1.7 24.5 164.0 0.18 exp normalize 0.13 0.87 0.0
  • 14. Simplex Because it is on a simplex; the correction of one term impacts all Tumor word3 word2word1 word3 Image credits: http://gureckislab.org/ word1 word2 word2 word1 word3 0.13 0.87 0.0 x
  • 15. Infinite ways to generate the same output. A correction of one sends gradients to others We can learn unseen classes through a process of elimination. https://github.com/ieee8023/NeuralNetwork-Examples/blob/master/general/simplex-softmax.ipynb word2 word3word1 Word1 Word3 Word2
  • 16. Softmax and Cross-entropy loss To predict multiple class we can project the output onto a simplex and compute the loss there. word2 word1 word3 3.2 5.1 -1.7 24.5 164.0 0.18 exp normalize 0.13 0.87 0.0 loss 0.0 0.13 0.0
  • 17. Open Access Subset of PubMedCentral Subset of files that are available open access ● Papers available in PDF or XML ● 1.25 million biomedical articles and 2 million distinct words ● Available over FTP for bulk download (CompSci Friendly!) ● Metadata includes journal name and year https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/ Breast Cancer Res. Genome Biol. Arthritis Res. BMC Cell Biol. … Journal names Example XML data
  • 18. Word Embeddings for Biomedical Language Wang, A Comparison of Word Embeddings for the Biomedical Natural Language Processing, 2018 Word Embeddings for Biomedical Language (Mayo Clinic) (c) Wikipedia + Gigaword Representations are biased by the data. We can use this to our advantage to control the domain.
  • 19. (Mayo Clinic) Wikipedia + Gigaword Specific General Example medical words and their five post similar words based on each training corpus of text The full name of diabetes is "diabetes mellitus" Articles say diabetics are at increased risk of hypertension (high blood pressure) A type of peptic ulcer One symptom is an ulcer Colon cancer is associated with breast cancer!
  • 20. Indiana University Hospital Reports Chest X-ray images from the Indiana University hospital network 1000 reports available in XML format!
  • 21. ['heart size normal lungs ...', 'the heart size and ...', 'the heart is normal in ...', 'the lungs are clear the ...', 'heart size normal lungs ...', 'heart is mildly heart ...', 'the lungs are clear there ...', 'cardiac and mediastinal ...', … def clean(s): for c in [".",",",":",";",""","/","[","]","<",">","?"]: s = s.replace(c, " ").lower() return s corpus = [] for f in os.listdir("ecgen-radiology"): tree = xml.etree.ElementTree.parse("ecgen-radiology/" + f) root = tree.getroot() node = root.findall(".//AbstractText/[@Label='FINDINGS']")[0] corpus.append(clean(str(text))) <Abstract> <AbstractText Label="COMPARISON">None.</AbstractText> <AbstractText Label="INDICATION">Positive TB test</AbstractText> <AbstractText Label="FINDINGS">The cardiac silhouette and mediastinum size are within normal limits. There is no pulmonary edema. There is no focal consolidation. There are no XXXX of a pleural effusion. There is no evidence of pneumothorax.</AbstractText> <AbstractText Label="IMPRESSION">Normal chest x-XXXX.</AbstractText> </Abstract> Python code to parse the XML Output!: Input:
  • 22. Hyperparameters! Depending on the configuration of the model the embeddings can vary: No information Compression We can vary: the dimension of the embedding, learning rate, token words, window size, etc..
  • 23. Size of embedding space matters Dim = 2 Dim = 100 *via t-sne Distance Distance
  • 24.
  • 25. Time series medical records Tasks: ● Predict probability of event in future ● Predict duration until next visit ● Find similar patients/events/visits We need to define what events are!
  • 26. 1893 - Causes of Death (International Statistical Institute) 1975 - ICD-9 - (WHO) 1990 - ICD-10 - (WHO) 2022 - ICD-11 - (WHO) ICD (International Classification of Diseases) ICD is the foundation for the identification of health trends and statistics globally, and the international standard for reporting diseases and health conditions. It is the diagnostic classification standard for all clinical and research purposes. ICD defines the universe of diseases, disorders, injuries and other related health conditions, listed in a comprehensive, hierarchical fashion (http://www.who.int/) There are alternative standards but they can require a fee
  • 27. ICD (International Classification of Diseases) Example ICD-9 codes: 786 Symptoms involving respiratory system and other chest symptoms 786.0 Dyspnea and respiratory abnormalities 786.1 Stridor 786.2 Cough 786.3 Hemoptysis 786.4 Abnormal sputum 786.5 Chest pain 786.6 Swelling, mass or lump in chest 786.7 Abnormal chest sounds 786.8 Hiccough 786.9 Other ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/ICD9-CM/2011
  • 28. ICD (International Classification of Diseases) Example ICD-9 codes (786.5) Cardialgia (see also Pain, precordial) 786.51 Diaphragmalgia 786.52 chest 786.59 anginoid (see also Pain, precordial) 786.51 chest (central) 786.50 atypical 786.59 midsternal 786.51 musculoskeletal 786.59 noncardiac 786.59 substernal 786.51 wall (anterior) 786.52 costochondral 786.52 diaphragm 786.52 heart (see also Pain, precordial) 786.51 intercostal 786.59 over heart (see also Pain, precordial) 786.51 pericardial (see also Pain, precordial) 786.51 pleura, pleural, pleuritic 786.52 precordial (region) 786.51 respiration 786.52 retrosternal 786.51 rib 786.50 substernal 786.51 respiration 786.52 Pleuralgia 786.52 Pleurodynia 786.52 Precordial pain 786.51 chest 786.59 Prinzmetal-Massumi syndrome (anterior chest wall) 786.52 painful 786.52 Lots of grouping!
  • 29. ICD-10 is very detailed V97.33XD: Sucked into jet engine, subsequent encounter V00.15: Heelies Accident Applicable To Rolling shoe, Wheeled shoe, Wheelies accident. Supertypes: V00-Y99 External causes of morbidity V95-V97 Air and space transport accidents V00 Pedestrian conveyance accident V00.1 Rolling-type pedestrian conveyance accident
  • 30. Time series medical records Sequence of medical codes over time 787.2 787.2 358.2 682.1 Tasks: ● Predict probability of code in future ● Predict duration until next visit ● Find similar patients/codes/visits
  • 31. MLPs on time series data 787.2 682.1 Before After -5 years +1 year 0 1 0 0 0 1 0 0 0 . . . 787.2 358.2 MLP 1 682.1 787.2 358.2 Multi-hot vectors!
  • 32. Med2Vec Word2Vec for time series patient visits with ICD codes. Embeddings learned for codes and demographics. Visits (ICD Codes) Visits + Demographics ICD Codes over time Choi, Multi-layer Representation Learning for Medical Concepts, 2016 Visit embedding Visit embedding conditioned on demographics
  • 33. Baseline methods One-hot: In order to compare with the raw input data, the binary vector for the visit is used. Stacked autoencoder (SA): Using the binary vector concatenated with patient demographic information as the input the SA is trained to minimize the reconstruction error. Then will be used to generate visit representations. Sum of Skip-gram vectors (word2vec): First learn the code-level representations with Skip-gram only. Then for the visit-level representation, add the representations of the codes within the visit. Choi, Multi-layer Representation Learning for Medical Concepts, 2016
  • 34. Med2Vec Evaluation: Predicting codes of the next visit Choi, Multi-layer Representation Learning for Medical Concepts, 2016 Private): Dataset CHOA
  • 35.
  • 36. Example clinical note Mr. Smith is a 63-year-old gentleman with coronary artery disease, hypertension, hypercholesterolemia, COPD and tobacco abuse. He reports doing well. He did have some more knee pain for a few weeks, but this has resolved. He is having more trouble with his sinuses. I had started him on Flonase back in December. He says this has not really helped. Over the past couple weeks he has had significant congestion and thick discharge. No fevers or headaches but does have diffuse upper right-sided teeth pain. He denies any chest pains, palpitations, PND, orthopnea, edema or syncope. His breathing is doing fine. No cough. He continues to smoke about half-a-pack per day. He plans on trying the patches again. ● Notes are still written together with codes selected for each visit. ● Often explaining details which do not have equivalent codes. ● Natural language is very difficult to parse with certainty.
  • 37. Predicting codes from notes Converting text to codes can ● Adapt old databases ● Correct errors ● Upgrade ICD versions Jagannatha, Bidirectional RNN for Medical Event Detection in Electronic Health Records, 2016 ... with upset stomach <done> <ICD-9 787.0 Nausea and vomiting>
  • 38. RNNs, Different types of sequential prediction tasks Input Output State one to one one to many many to one many to many many to many Taken from http://vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture10.pdf and Francis Dutil Cat "This is a cat" “It's hairy and I'm allergic to it” Cat “Ceci est un chat” “This is a cat” “Meow Meow”
  • 39. RNNs An RNN applies a function over a sequence of inputs [x1 , x2 , …, xT ] which produces a sequence of outputs [y1 , y2 , …, yT ] and each input produces a internal state [h1 , h2 , …, hT ]. Sequence of outputs Sequence of internal states Sequence of inputs xt yt ht Image du blog de Christopher Olah, slide from l’École d’automne 2018, César Laurent
  • 40. RNNs Image du blog de Christopher Olah, slide from l’École d’automne 2018, César Laurent ● A simple RNN ● W, U and V are the parameters of the network. ● The weights are shared over time. xt yt ht U V W
  • 41. Unrolling the RNN over time The weights are shared over time. x0xt x1 x2 xT y2 yT y1 y0yt h2 hT h1 h0ht U U U U W W W W V V V V V U Image du blog de Christopher Olah, slide from l’École d’automne 2018, César Laurent
  • 42. Unrolling the RNN over time x0xt x1 x2 xT y2 yTyt h2 hT h1 h0ht U U U U W V V V V V U The weights are shared over time. Image du blog de Christopher Olah W W W
  • 43. Unrolling the RNN over time x0xt x1 x2 xT y2 yT y1 y0yt h2 hT h1 h0ht U U U U W W W W V V V V V U The weights are shared over time. Image du blog de Christopher Olah
  • 44. Predicting future events 785.1 345.1 xt 782.2 358.2 682.1782.2787.2yt h2 hT h1 h0ht U U U U W W W W V V V V V U 785.1 - 785.1 A multi-hot vector Predicting next time step
  • 46. Problem with the basic RNN Slide from l’École d’automne 2018, César Laurent The shade of gray shows the influence of the input of the RNN at time 1. It decreases over time, as the RNN gradually forgets its first input. Issue addressed by: LSTM GRU Attention Recurrent batch norm Weight regularization Layer norm More reading: [Pascanu 2013]
  • 47. We can get creative with RNN designs Slide from l’École d’automne 2018, César Laurent x0 x1 x2 z2 z1 z0 h2 2 h2 1 h2 0 x0 x1 x2 y2 y1 y0 h1 2 h1 1 h1 0 h0 h’0 h’1 h’2 h’i hi h2 h1 Stacked RNNs Bi-directional RNN
  • 48. RNNs for next visit prediction (Doctor AI) ● Treat codes equally: ICD diagnosis codes, procedure codes, and medication codes ● Grouped codes into higher-order categories Medical codes R40000 y = High level medical codes R1778 d = time since last visit Choi, Doctor AI: Predicting Clinical Event via Recurrent Neural Networks, 2016 Patients from Sutter Health Palo Alto Medical Foundation
  • 49. Pretraining RNNs (Doctor AI) MIMIC II has 2,695 patients with 2+ visits Pretrained using a larger Sutter Health dataset ~300,000 patients Choi, Doctor AI: Predicting Clinical Events via Recurrent Neural Networks, 2016
  • 50. Related work Lipton, Zachary C et al. “Learning to Diagnose with LSTM Recurrent Neural Networks.” International Conference on Learning Representations. 2016 Che, Zhengping et al. “Recurrent Neural Networks for Multivariate Time Series with Missing Values.” Nature Scientific Reports. 2018
  • 51. Romanov, Lessons from Natural Language Inference in the Clinical Domain, 2018 Medical Natural Language Inference MedNLI dataset derived from the MIMIC-III dataset
  • 52. Romanov, Lessons from Natural Language Inference in the Clinical Domain, 2018 Medical Natural Language Inference Studying the errors we can see the limits of the model.
  • 53. Discussion "While code-based representations of clinical concepts and patient encounters are a tractable first step towards working with heterogeneous EHR data, they ignore many important real-valued measurements associated with items such as laboratory tests, intravenous medication infusions, vital signs, and more." [Shickel et al,, 2018]
  • 54. Discussion "While some researchers downplay the importance of interpretability in favor of significant improvements in model performance, we feel advances in deep learning transparency will hasten the widespread adoption of such methods in clinical practice." [Shickel et al, 2018]
  • 55. Discussion "Many studies claim state-of-the-art results, but few can be verified by external parties. This is a barrier for future model development and one cause of the slow pace of advancement." [Shickel et al, 2018]
  • 56. Pr. Yoshua Bengio, PhD Francis Dutil Martin Weiss Tristan Sylvain Margaux Luck, PhD Assya Trofimov Vincent Frappier, PhD Joseph Paul Cohen, PhD Shawn Tan Sina Honari Geneviève BoucherMandana Samiei Georgy Derevyanko, PhD Myriam Côté, PhD
  • 57. References Ching, T., et al. Opportunities And Obstacles For Deep Learning In Biology And Medicine. Journal of The Royal Society Interface. 2018 Shickel, B. et al. Deep EHR : A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record. IEEE Journal of Biomedical and Health Informatics, 2018