Deep learning in healthcare: Oppotunities and challenges with Electronic Medical Records (EMR) data

Deep learning in healthcare
Opportunities and challenges
with Electronic Medical Records (EMR) data
Trần Quang Thiện
University of Tsukuba
(Master student)
1

● Tran Quang Thien (1993)
● Has been in Japan since 2012
● Currently a master student at Tsukuba University
● Researchs
○ Feature selection in prediction of infectious disease using large-scale
search log data (Yahoo Japan search log).
○ Applied research with Long-term care insurance data and Electronic
Medical Record (EMR) data.
● Recent interests
○ Bayesian modeling
○ Selective inference
Introduction
Who am I
2

1. Introduction to Electronic Medical Records (EMR) data
2. Some common tasks and example solutions
3. Opportunities and challenges
Introduction
Agenda
3

Introduction
What is EMR data
http://api.sunlab.org/static/media/uxJ/UL5/5b6aef5b241ba60001bec1c1.pdf
Age, sex, race,
incomes ….
- Medicine codes
- Procedure codes
- Vital signs and labs
- Pulse Ox, heart rate,
Temperature, Co2...
- Diagnosis codes
- ICD9, ICD10 ...
- Additional comments
Nowadays, many types of medical data is collecting across multiples institutes
Electronic
Medical
Record
(EMR data)
4

CODE CODE CODE
CODE CODE CODE
CODE CODE CODE
CODE CODE CODE
CODE CODE CODE
5

● From wikipedia:
“Diagnostic coding is the translation of written descriptions of
diseases, illnesses and injuries into codes from a particular
classification”
● ICD-9, ICD-10
○ The most popular and currently the international standard
○ Contains diseases, signs, symptoms, external cause, …
○ 14,000 different codes
○ Has a hierarchical structure
Introduction
OK, so what is diagnosis code
6

Introduction
Structure of ICD-10 code
https://doctors.practo.com/icd-10-codes-important-doctors
/
Gan hóa sợi và xơ gan có mã bệnh là K74
- Gan hóa sợi K74.0
- Gan hóa xơ K74.1
- Gan hóa sợi với gan hóa xơ K74.2
- Xơ gan mật nguyên phát K74.3
- Xơ gan mật thứ phát K74.4
- Xơ gan mật không xác định K74.5
- Xơ gan khác và không xác định K74.6
7

Introduction
A sample of diagnosis code
Risk Prediction on Electronic Health Records with Prior Medical Knowledge, Fenglong Ma et. al
ICD
codes
8

When
EMR data
meets
deep learning
9

Commons tasks
The (typical) overall framework
http://api.sunlab.org/static/media/uxJ/UL5/5b6aef5b241ba60001bec1c1.pdf 10

Concept embedding
Learning a mapping from raw EMR data to useful
representation or medical concept.
● Also called as electronic phenotyping task.
● The result embedding can be used in any other tasks
● Many studies focus on diagnosis/medicine/procedure codes
● Some take inspiration from NLP domain (GloVe, skip-gram v..v)
since these codes can be treated as a bag of codes
● Almost trained in unsupervised setting (no ground-truth label)
Commons tasks
11

GRAM: Graph-based Attention Model for Healthcare Representation Learning, Edward Choi et. al (KDD 2017)
Concept embedding
Commons tasks
12

Concept embedding
Commons tasks
● An embedding vector of a code is the mixture of the (base) embedding vectors of
○ The current leaf code
○ It’s ancestors
● Advantages:
○ Usable information from high-level nodes, especially useful for rare codes
○ Something like hierarchical bayes model
13

GRAM GloVe skip-gram
● The embedding result matches with medical ontology
● Similar to the shrinkage of parameters in hierarchical models
Concept embedding
Commons tasks
14

● The embedding result matches with medical ontology
● And also help increasing prediction performance for post-embedding
tasks
Concept embedding
Commons tasks
15

Detecting whether specific diseases/stage of disease can
be confirmed in the EMR data
● Predicting a target disease using EMR data is not easy
○ Assigning diagnosis is a noisy task due to
different practices of doctors or hospitals
○ Predict diagnosis codes from other data
● Examples:
○ Predicting diagnose code from clinical notes
○ Classification stage of Parkinson’s disease
○ Predict heart failure risk
○ Predict medicine codes ←AI doctor...
Disease/drugs classification
Commons tasks
16

Commons tasks
1. RNN
2. Reinforcement finetune
3. Beam search
Main techniques
LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity,
Yutao Zhang et. al (KDD 2017) 17

Commons tasks
Scoring criterion
[2] Complete without unfavorable drug-drug interaction
[1] Partially complete without unfavorable interaction
[0] Address less than 50% diagnoses or including
negative interaction
Evaluation criterion: Jaccard coefficient
● The proposed method works better than rule-based or other competitors
● However, the quality seems not enough to be ready in real application yet
18
LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity,
Yutao Zhang et. al (KDD 2017)

Predicting future clinical events based on past
longitudinal event sequences
● For examples:
○ Prediction of 30-day hospital readmission
○ Predict of heart failure risk
○ Predict of mortality in Intensive Care Unit (ICU)
● Useful in grouping/detection patients with high risk
Sequential prediction of clinical events
Commons tasks
19

Commons tasks
RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism,
Edward Choi et al. NIPS 2016
- Interpretable model for sequential prediction
- Giving explanation for prediction using attention mechanism
- 2 level of attention
- Importance of each visit
- Importance of each code within a visit
20

Commons tasks
- The attention is calculated using time-reversed RNN model
- Mimic the clinician that look at the ERM data backward
- Recent visit contributes more
21RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism,

Commons tasks
MLP Logistic regression with a hidden layer
RNN Two layer of RNN, no attention
RNN+ _M RNN with visit-level attention
RNN+ _R Reversed-RNN with visit-level attention
RETAIN Reversed-RNN with 2 levels attentions
● Task: predict whether an heart failure (HF) event occurs
Input: EMR data until the date HF event happens
● The AUC is around 0.87 (High?)
● However, normal RNN is also high ….
● NOTE: y-axis limit starts from 0.76

Commons tasks
● Since the different in performance is not significance
● RETAIN give us the answer for the question WHY

Data augmentation:
Creating synthesis data elements or patient records
based on real EMR data
● Saving labeling cost
● Generated synthesis data can be useful for privacy issue
● Most researches based on GANs techniques
● medGAN paper reported that generated data is
indistinguishable to a human doctor expert.
Data augmentation
Commons tasks
24

Data augmentation
Commons tasks
Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Edward Choi et al. (JMLR 2017)
● Task
Generating synthetic patient records from real data
○ Require 1: (statistically) similar to the real records
○ Require 2: Individual patient information can not be exploited
● Result
Indistinguishable to a human doctor, except a few outliers
25

Opportunities
Challenges
is out there
26

Opportunities
Growing data availability
Adoption of EMR system in Japan
医療実施調査(厚生労働省)
● The data is quite old….
● In 2017, only 41.6% of hospital, clinical in Japan adopted EMR system
● However, the number is about 85% for big hospitals ( >400 beds)
27

Opportunities
Adoption of EMR system in US
● The data is also old ….
● 87% physicians interviewed uses EMR system
● However, the coverage on hospitals/clinicals is unclear 28

Opportunities
Thực hiện Nghị quyết 20, đến nay, Bộ Y tế đã xây dựng xong phần mềm hồ sơ
sức khỏe điện tử (EHR) sử dụng nguồn dữ liệu hộ gia đình tham gia BHYT của
Bảo hiểm xã hội Việt Nam để tạo lập mã số định danh (ID). Theo kế hoạch, từ
tháng 1-2019 đến tháng 6-2019, triển khai và hoàn thiện phần mềm hồ sơ sức
khỏe điện tử cho tám tỉnh, thành phố trong mô hình điểm. Từ tháng 7-2019 tổ
chức triển khai nhân rộng trên toàn quốc. Ðến cuối năm 2019 sẽ hình thành
hệ thống hồ sơ sức khỏe điện tử cho mỗi người dân. Khi người dân đến cơ
sở y tế, người thầy thuốc ở bất kỳ đâu trên lãnh thổ Việt Nam, chỉ cần một động
tác nhấn chuột, máy tính sẽ hiện ra đầy đủ thông tin về hiện trạng sức khỏe của
người đó, giúp ích rất nhiều cho chẩn đoán và điều trị.
(Báo nhân dân điện tử, 31/01/2019)
Adoption of EMR system in Vietnam
● NEW newspaper !
● Nationwide EMR system will be completed in 2019 !
● A better, digitized healthcare system is coming !
29

Opportunities
Adoption of EMR system in Vietnam
Image from a friend doctor in Vietnam…
Medical research scene, Cho Ray hospital, 11/4/2019
30

Opportunities
Promising achieves of deep learning
● Have shown outstanding performance in various tasks
○ Computer vision
○ Speech recognition
○ Natural language processing
○ Reinforcement learning
○ …….
● Can bypass the feature engineering process
○ E.g. ICD-10 code contains more than 70,000 types of codes
How can we express these code in our model ?
● Deep learning structures are well suited with various healthcare tasks
○ CNN
○ RNN, LSTM
○ GANs
○ Attention mechanism
○ …...
31

However,
developing deep
learning models
using EMR data
is a challenging
task.
32

Challenges
● Lack of data
○ The EMR system is not ready for some countries, regions
○ Complicated process to get access to healthcare data
due to the significant sensitivity of the data.
○ Combining EMR data across institutions can be complicated
because of difference in EMR versions between institutions.
● Lack of labels (both in quality and quantity)
○ Annotation requires domain knowledge of trained experts
○ Rare diseases implies less data
○ Even noisy between clinicians or institutions
■ E.g. ICD-10 contains many similar codes
■ Different practices of doctors
Challenges with EMR data
33

Challenges
Challenges with EMR data
● Heterogeneous types of data
○ Numeric values (lab tests)
○ Discrete codes (diagnosis, medicine, procedures)
○ Continuous monitoring data
○ Free-text clinical notes
○ Medical images
How to effectively use these data altogether ?
● Temporality and irregularity
○ Events are irregularly sampled (and also biased)
○ Long-term and short-term dependencies
34

Challenges
Challenges with healthcare domain
● Decision in healthcare is sensitive
○ Require immediate information and decision
○ Sometimes have life or death consequences
○ High standard required
How much of accuracy will be considered enough?
● Model needs interpretability
○ An accurate black-box might not be enough
○ Clinicians often require reasoning behind prediction
● Toward more complex output
○ Question and answer
35

Thank
you!
Contact:
Trần Quang Thiện
thientquang@gmail.com
36

Some public datasets
● MIMIC-III, a freely accessible critical care database.
Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P,
Celi LA, and Mark RG. Scientific Data (2016)
https://mimic.physionet.org/
● EMRBOTS, experiment with artificial large medical datasets without
worrying about privacy.
http://www.emrbots.org/
37

Other inferences
● Deep learning for healthcare: review, opportunities and challenges
Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang and Joel T. Dudley
● Opportunities and challenges in developing deep learning models
using electronic health records data: a systematic review
Cao Xiao, Edward Choi and Jimeng Sun
● Harnessing the Power of Data in Health
Stanford Medicine 2017 Health Trends Report
● The opportunities and challenges of data analytics in health care
Paul B. Ginsburg, Andrés de Loera-Brust, Caitlin Brandt, and Abigail Durak
● AI And Healthcare: A Giant Opportunity
Insights Team, Insights Contributor FORBES INSIGHTS With Intel AI
38

Deep learning in healthcare: Oppotunities and challenges with Electronic Medical Records (EMR) data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep learning in healthcare: Oppotunities and challenges with Electronic Medical Records (EMR) data

Similar to Deep learning in healthcare: Oppotunities and challenges with Electronic Medical Records (EMR) data (20)

More from Thien Q. Tran

More from Thien Q. Tran (6)

Recently uploaded

Recently uploaded (20)

Deep learning in healthcare: Oppotunities and challenges with Electronic Medical Records (EMR) data