Early Hospital Mortality Prediction using Vital Signals
1) The document proposes methods for early prediction of mortality in ICU patients using only the initial one hour of heart rate signals without dependency on other clinical records which can contain missing values.
2) A classification model is developed using 12 statistical features extracted from heart rate signals and evaluated on a MIMIC-III dataset, with classifiers such as random forest and SVM achieving precision and recall over 0.95.
3) Interpretable classifiers like decision trees performed reasonably well but with lower performance, highlighting the transparency-accuracy tradeoff in early mortality prediction.
Early Hospital Mortality Prediction using Vital Signals
1. Early Hospital Mortality Prediction using
Vital Signals
By
Reza Sadeghi
http://knoesis.org/Reza
Advisor
Dr. Banerjee
Mar 2018
2. 2
Outline
β’ The importance of early mortality prediction
β’ Challenges in early mortality prediction in ICU patients
β’ Previous work pros and cons
β’ The proposed methods
β’ Experiments
β’ Conclusion
3. β’ The more accurate and quick decision provides the more benefit for patients
and health care resources
1. ICU departments consume 22% of total hospital costs in the United States [1]
2. Increasing ICU length of stay is associated with higher 1-year mortality [2]
[1] Tan, S. S., Bakker, J., Hoogendoorn, M. E., Kapila, A., Martin, J., Pezzi, A., ... & Hakkaart-van Roijen, L. (2012). Direct cost analysis of intensive care
unit stay in four European countries: applying a standardized costing methodology. Value in health, 15(1), 81-86.
[2] Moitra, V. K., Guerra, C., Linde-Zwirble, W. T., & Wunsch, H. (2016). Relationship between ICU length of stay and long-term mortality for elderly ICU
survivors. Critical care medicine, 44(4), 655.
3
Why early mortality prediction is important?
4. β’ The ICU patient is highly monitored using electronic equipment
β’ A decision support systems for intensivists
β’ The higher quality and lower cost of treatment
4
How we can contribute to this field?
5. Challenges in early mortality prediction
in ICU patients
β’ Big data
2.4 TB The MIMIC-III Waveform Database Matched Subset
β’ Real time application
β’ Costly fault
β’ Distributed systems
β’ Missing data
β’ Comorbidity
β’ Parallel processing
5
7. Previous work pros and cons
Customized models
Score based models
Data mining models
- perform better than traditional scores [1]
- patients may suffering from heterogeneous diseases then selecting the
proper model is hard job
- rely on panels of experts or statistical models
- refined for use within specified geographical areas
- extracting features from Electronic Medical Records
- only few models are designed for early mortality
- low discrimination power
- many of them required attributes which are not always available at ICU
admission
- higher performance rather than the previous methods
- descriptive modelling as it explains hidden clinical implications
- the results of different algorithm on different data set are not constant.
- it depends on the population of interest, the variables measured and the
outcome being tested
8. Missing Data Type:
1. missing completely at random (MCAR)
2. missing at random (MAR)
3. Missing not at random (MNAR) [2]
Missing Data Handling:
1. interpretation
2. ignoring those records from the dataset that
are not complete
3. substitute the missing value by the mean or
mode value of each attribute.
("ReplaceMissing-Valuesβ filter in weka)
4. predicted by using a learning algorithm, such
as Multiple Imputation or EMImputation [3]
Imbalance data Handling:
1. re-sampling : under sampling and
oversampling.
2. making the classifier 'cost sensitive'
3. hybrid method
4. One class classifiers
Data mining models issues
Interpretability:
1. Deep learning
2. Ensemble methods
9. Our main contributions
β’ A signal-based model for early mortality prediction
β’ A clinical decision support system which focuses on using only the
initial one hour of heart rate signal
β’ Faster feedback to healthcare professionals
β’ No dependency on many clinical records which contain missing values
β’ A real time system due to not associated with laboratory test results
needed to be processed
9
10. MIMIC-III
10
The age distribution over the Whole MIMIC-III and the
Matched Subset
β’ MIMIC-III database comprising the records of 46520 patients
β’ The Matched Subset contains records of 10282 patients
β’ 2.4 TB The MIMIC-III Waveform Database Matched Subset
11. Causality
11
diseases of the
circulatory system
β’ High heart rate has been shown to be an independent risk factor for all-cause
and cardiovascular death in general population studies [1]
β’ Epidemiological studies have reported increased risk of cardiovascular disease,
cancer and all-cause mortality with greater resting heart rate [2]
[1]https://www.medicographia.com/2010/07/recommendations-on-how-to-measure-resting-heart-rate/
[2] Aune, D., Sen, A., Γ³'Hartaigh, B., Janszky, I., Romundstad, P. R., Tonstad, S., & Vatten, L. J. (2017). Resting heart rate and
the risk of cardiovascular disease, total cancer, and all-cause mortalityβA systematic review and doseβresponse meta-analysis of
prospective studies. Nutrition, Metabolism and Cardiovascular Diseases, 27(6), 504-517
12. Signal processing
12
β’ Handling noise due to different recording systems
β’ Moving average filtering
πβ² π‘ =
1
π
π‘=1
π
π π‘ π β₯ π β₯ 1
1
π
π‘=π
πβπ+1
π π‘ πΏ β π β₯ π β₯ π
1
π
π‘=πΏβπ+1
πΏ
π π‘ πΏ β π + 1 β₯ π β₯ πΏ
β’ Resampling using the anti-aliasing finite impulse response low-pass filter
13. Feature extraction
Each signal is described in terms of 12 statistical
and signal-based features
- The averaged power of a finite discrete-time
signal is defined as the mean of the signalβs energy
π =
πΈ
π2 β π1 + 1
=
1
π2 β π1 + 1
π1
π2
π[π]2
- The signal power is computed by taking the
integral of the power spectral density (PSD)
π =
βπ
π
π=0
πβ1
π[π]πβπ2ππ
13
Column Feature
Passed away
patients
Alive
patients
1 Maximum 97.82 90.92
2 Minimum 80.69 76.24
3 Mean 88.46 81.92
4 Median 88.45 81.81
5 Mode 85.25 79.98
6 Standard deviation 2.63 2.25
7 Variance 15.84 11.56
8 Range 17.13 14.68
9 Kurtosis 17.48 17.85
10 Skewness 0.83 1.02
11 Averaged power 8186.02 7045.04
12 Energy spectral density 5114.78 4420.38
14. Unbiased and efficient modeling
β’ Avoid biasing toward minority class (Imbalance data challenging)
Using adaptive semi-unsupervised weighted oversampling (A-SUWO)
β’ Robust against time dependency of samples
Design experiments based on 10-fold cross-validation strategy
β’ Handling high computational complexity of big data
Leveraging parallel programming
β’ Efficiency vs. Transparency
Comparing both black-box and interpretable classifiers
14
15. Classification
15
Classifier Precision Recall F1-score Interpretability
Random forest 0.97 0.97 0.97 Hard
Gaussian SVM 0.95 0.96 0.96 Hard
Decision tree 0.90 0.92 0.91 Easy
Boosted trees 0.91 0.83 0.87 Hard
K-NN 0.80 0.85 0.82 Hard
Logistic regression 0.77 0.67 0.72 Easy
Linear
Discriminant
0.78 0.66 0.71 Easy
Linear SVM 0.80 0.63 0.70 Easy
17. β’ Special thank to Dr. Banerjee
Accessing MIMIC-III, processing and storing this big data, consultant
with physicians, weekly meeting and guidance
The paper has been sent to the CHASE 2018
The paper and this slides are accessible via following links
https://arxiv.org/abs/1803.06589
https://www.slideshare.net/RezaSadeghi4
The code will be published in my GitHub soon
https://github.com/RezaSadeghiWSU
17
19. Extra
β’ Gini index
πΊπ·πΌ = 1 β
π
(π π )2
β’ The risk of splitting
π ππ π π₯ = πΊπ·πΌ π₯ . ππππππππππ‘π¦(π₯)
β’ The node probability is defined as the number of records reaching the node,
divided by the total number of records.
β’ We utilize a resampling method called adaptive semi-unsupervised
weighted oversampling (A-SUWO)
19
Editor's Notes
The Photo originated from
https://articles.mercola.com/sites/articles/archive/2010/03/13/hospitals-now-kill-48000-in-us-per-year-up-nearly-500-percent.aspx
The Photo originated from
https://www.rand.org/topics/health-care-cost-inflation.html
The Photo originated from
The Photo originated fromhttps://www.statnews.com/2016/09/07/hospital-icu-modernize/
The news titles accessible from internet:
1. Stanford's AI Predicts Death for Better End-of-Life Care
https://spectrum.ieee.org/the-human-os/biomedical/diagnostics/stanfords-ai-predicts-death-for-better-end-of-life-care
2. NEW SOFTWARE MEASURES MORTALITY RISK AT ADMISSION
https://ryortho.com/breaking/new-software-measures-mortality-risk-at-admission/
3. TEAM DEVELOPS EARLY MORTALITY RISK PREDICTION TOOL FOR INTENSIVE CARE
http://www.porthosp.nhs.uk/PHTNEWS/Team-develops-early-mortality-risk-prediction-tool-for-intensive-care.htm
[1]-> [11]
1.interpretation: if an individual patients record has multiple entries missing, it may be explained that this is because they
were regarded as being less sick than others, so they were not prioritized. Equally, the patient may have
been regarded as being extremely sick, so they died before much can be done. Distinguishing these cases is
not simple in the absence of other information.
[2] -> [42]
[3] -> [44]
[4,5] -> [42, 45]
The diseases of circulatory system formed the biggest category of primary issues in the patients admissions recorded in the whole MIMIC-III.
The PSD is the Fourier transform of the biased estimate of the autocorrelation sequence.