Early Hospital Mortality Prediction using Vital Signals

Early Hospital Mortality Prediction using
Vital Signals
By
Reza Sadeghi
http://knoesis.org/Reza
Advisor
Dr. Banerjee
Mar 2018

2
Outline
• The importance of early mortality prediction
• Challenges in early mortality prediction in ICU patients
• Previous work pros and cons
• The proposed methods
• Experiments
• Conclusion

• The more accurate and quick decision provides the more benefit for patients
and health care resources
1. ICU departments consume 22% of total hospital costs in the United States [1]
2. Increasing ICU length of stay is associated with higher 1-year mortality [2]
[1] Tan, S. S., Bakker, J., Hoogendoorn, M. E., Kapila, A., Martin, J., Pezzi, A., ... & Hakkaart-van Roijen, L. (2012). Direct cost analysis of intensive care
unit stay in four European countries: applying a standardized costing methodology. Value in health, 15(1), 81-86.
[2] Moitra, V. K., Guerra, C., Linde-Zwirble, W. T., & Wunsch, H. (2016). Relationship between ICU length of stay and long-term mortality for elderly ICU
survivors. Critical care medicine, 44(4), 655.
3
Why early mortality prediction is important?

• The ICU patient is highly monitored using electronic equipment
• A decision support systems for intensivists
• The higher quality and lower cost of treatment
4
How we can contribute to this field?

Challenges in early mortality prediction
in ICU patients
• Big data
2.4 TB The MIMIC-III Waveform Database Matched Subset
• Real time application
• Costly fault
• Distributed systems
• Missing data
• Comorbidity
• Parallel processing
5

Previous work pros and cons
Customized models
Score based models
Data mining models
- perform better than traditional scores [1]
- patients may suffering from heterogeneous diseases then selecting the
proper model is hard job
- rely on panels of experts or statistical models
- refined for use within specified geographical areas
- extracting features from Electronic Medical Records
- only few models are designed for early mortality
- low discrimination power
- many of them required attributes which are not always available at ICU
admission
- higher performance rather than the previous methods
- descriptive modelling as it explains hidden clinical implications
- the results of different algorithm on different data set are not constant.
- it depends on the population of interest, the variables measured and the
outcome being tested

Missing Data Type:
1. missing completely at random (MCAR)
2. missing at random (MAR)
3. Missing not at random (MNAR) [2]
Missing Data Handling:
1. interpretation
2. ignoring those records from the dataset that
are not complete
3. substitute the missing value by the mean or
mode value of each attribute.
("ReplaceMissing-Values“ filter in weka)
4. predicted by using a learning algorithm, such
as Multiple Imputation or EMImputation [3]
Imbalance data Handling:
1. re-sampling : under sampling and
oversampling.
2. making the classifier 'cost sensitive'
3. hybrid method
4. One class classifiers
Data mining models issues
Interpretability:
1. Deep learning
2. Ensemble methods

Our main contributions
• A signal-based model for early mortality prediction
• A clinical decision support system which focuses on using only the
initial one hour of heart rate signal
• Faster feedback to healthcare professionals
• No dependency on many clinical records which contain missing values
• A real time system due to not associated with laboratory test results
needed to be processed
9

MIMIC-III
10
The age distribution over the Whole MIMIC-III and the
Matched Subset
• MIMIC-III database comprising the records of 46520 patients
• The Matched Subset contains records of 10282 patients
• 2.4 TB The MIMIC-III Waveform Database Matched Subset

Causality
11
diseases of the
circulatory system
• High heart rate has been shown to be an independent risk factor for all-cause
and cardiovascular death in general population studies [1]
• Epidemiological studies have reported increased risk of cardiovascular disease,
cancer and all-cause mortality with greater resting heart rate [2]
[1]https://www.medicographia.com/2010/07/recommendations-on-how-to-measure-resting-heart-rate/
[2] Aune, D., Sen, A., ó'Hartaigh, B., Janszky, I., Romundstad, P. R., Tonstad, S., & Vatten, L. J. (2017). Resting heart rate and
the risk of cardiovascular disease, total cancer, and all-cause mortality–A systematic review and dose–response meta-analysis of
prospective studies. Nutrition, Metabolism and Cardiovascular Diseases, 27(6), 504-517

Signal processing
12
• Handling noise due to different recording systems
• Moving average filtering
𝑆′ 𝑡 =
1
𝑇
𝑡=1
𝑇
𝑆 𝑡 𝜌 ≥ 𝑇 ≥ 1
1
𝜌
𝑡=𝑇
𝑇−𝜌+1
𝑆 𝑡 𝐿 − 𝜌 ≥ 𝑇 ≥ 𝜌
1
𝑇
𝑡=𝐿−𝜌+1
𝐿
𝑆 𝑡 𝐿 − 𝜌 + 1 ≥ 𝑇 ≥ 𝐿
• Resampling using the anti-aliasing finite impulse response low-pass filter

Feature extraction
Each signal is described in terms of 12 statistical
and signal-based features
- The averaged power of a finite discrete-time
signal is defined as the mean of the signal’s energy
𝑃 =
𝐸
𝑛2 − 𝑛1 + 1
=
1
𝑛2 − 𝑛1 + 1
𝑛1
𝑛2
𝑆[𝑛]2
- The signal power is computed by taking the
integral of the power spectral density (PSD)
𝑃 =
∆𝑇
𝑁
𝑛=0
𝑁−1
𝑆[𝑛]𝑒−𝑖2𝜋𝜌
13
Column Feature
Passed away
patients
Alive
patients
1 Maximum 97.82 90.92
2 Minimum 80.69 76.24
3 Mean 88.46 81.92
4 Median 88.45 81.81
5 Mode 85.25 79.98
6 Standard deviation 2.63 2.25
7 Variance 15.84 11.56
8 Range 17.13 14.68
9 Kurtosis 17.48 17.85
10 Skewness 0.83 1.02
11 Averaged power 8186.02 7045.04
12 Energy spectral density 5114.78 4420.38

Unbiased and efficient modeling
• Avoid biasing toward minority class (Imbalance data challenging)
Using adaptive semi-unsupervised weighted oversampling (A-SUWO)
• Robust against time dependency of samples
Design experiments based on 10-fold cross-validation strategy
• Handling high computational complexity of big data
Leveraging parallel programming
• Efficiency vs. Transparency
Comparing both black-box and interpretable classifiers
14

Classification
15
Classifier Precision Recall F1-score Interpretability
Random forest 0.97 0.97 0.97 Hard
Gaussian SVM 0.95 0.96 0.96 Hard
Decision tree 0.90 0.92 0.91 Easy
Boosted trees 0.91 0.83 0.87 Hard
K-NN 0.80 0.85 0.82 Hard
Logistic regression 0.77 0.67 0.72 Easy
Linear
Discriminant
0.78 0.66 0.71 Easy
Linear SVM 0.80 0.63 0.70 Easy

Transparency and Feature importance
16

• Special thank to Dr. Banerjee
Accessing MIMIC-III, processing and storing this big data, consultant
with physicians, weekly meeting and guidance
The paper has been sent to the CHASE 2018
The paper and this slides are accessible via following links
https://arxiv.org/abs/1803.06589
https://www.slideshare.net/RezaSadeghi4
The code will be published in my GitHub soon
https://github.com/RezaSadeghiWSU
17

Extra
• Gini index
𝐺𝐷𝐼 = 1 −
𝑖
(𝑝 𝑖 )2
• The risk of splitting
𝑅𝑖𝑠𝑘 𝑥 = 𝐺𝐷𝐼 𝑥 . 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦(𝑥)
• The node probability is defined as the number of records reaching the node,
divided by the total number of records.
• We utilize a resampling method called adaptive semi-unsupervised
weighted oversampling (A-SUWO)
19

Early Hospital Mortality Prediction using Vital Signals

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Early Hospital Mortality Prediction using Vital Signals

Similar to Early Hospital Mortality Prediction using Vital Signals (20)

More from Reza Sadeghi

More from Reza Sadeghi (14)

Recently uploaded

Recently uploaded (20)

Early Hospital Mortality Prediction using Vital Signals

Editor's Notes