Using Bayes Theorem, predict who will die within 6 months. calculation of Likelihood ratio was based on icd9 code, Sensitivity, Specificity, Posterior Odds ratio.
3. Objective of work
• The objective of this work is to identify the probability of death based on
the diagnosis, medical history, category of diagnosis and reoccurring
diagnoses.
• I used Bayes Theorem to in order to solve for likelihood ratio, probability,
sensitivity and specificity.
• This work is important in order to make evidence based decisions on patient
ids that need more attention/follow up if they are alive, and if dead, use the
data to help understand the effects of these diseases.
4. Data & It’s Source
• In the original data
source, there are
17443442 rows of
data.
• Sorting by count of
icd9 desc, you would
have a left skewed.
Without sort, the
distribution would
be “all jumbled up”
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
I401.9
I272.4
I305.1
I530.81
I496.
I309.81
I414.01
I311.
I427.31
I428.0
I600.00
I584.9
I285.9
I303.90
IV60.0
IV62.84
I403.90
I244.9
I724.2
I250.00
I303.91
I599.0
I070.54
I414.00
I338.29
I327.23
I585.9
I491.21
I486.
IV58.61
IV45.81
I300.00
I276.8
I786.59
I070.70
I291.81
I564.00
I305.00
I276.1
IV15.81
IV62.0
I296.80
I278.00
IV15.82
IV45.82
I443.9
IV57.89
I276.51
I427.89
I715.90
ICD9Count
ICD9 Code
Count of ICD9, Top 50
5. Preparation of the Data
Remove Cases
Where Patients
Died Before Visit
17443442 to
17439892 rows of
id,icd9
Original Data
Remove cases in a
year that exceed
365
17443442 rows
id, icd9
829801 to 828616
id’s
Final Data to use for
analysis
15715093 rows id, icd9
7. Calculation of Likelihood Ratios
LR = _______________________________________DeadwithDx/Dead
AlivewithDx/Alive
8. Top Ten Deadliest Diseases – Each Dx from
Training Set
1. Brain Death
2. Malignant Ascites
3. Malignant Neoplasm of bilary tract
4. Encounter for palliative care
5. Cardiac Arrest
6. Coma
7. Malignant Pleural Effusion
8. Secondary Malignant Neoplasm of Adrenal Gland
9. Disseminated Malignant Neoplasm without specification
of site
10. Secondary Malignant Neoplasm of brain and spinal cord
icd9 LR
I348.82 11.78
I789.51 5.53
I156.9 5.04
IV66.7 4.95
I427.5 4.85
I780.01 4.75
I511.81 4.62
I198.7 4.45
I199.0 4.42
I198.3 4.35
9. Least Ten Deadliest Diseases – Each Dx from
Training Set
1. Dysmenorrhea
2. Chondromalacia of patella
3. Hypertrophy of tonsils alone
4. Personal History of injury presenting hazards to health
5. Schizophrenic disorders, residual type, chronic with acute
exacerbation
6. Unspecified symptom associated with female genital organs
7. Pelvic peritoneal adhesions, female (postoperative)(post infection)
8. Cervicitis and endocervicitis
9. Migraine without aura, without mention of intractable migraine
without mention of status migrainosus
10. Amphetamine or related acting sympathomimetic abuse, episodic
icd9 LR
I625.3 0.0009
I717.7 0.001
I474.11 0.002
IV15.5 0.002
I295.64 0.002
I625.9 0.002
I614.6 0.003
I616.0 0.004
I346.10 0.004
I305.72 0.004
10. How medical history can be used to predict
prognosis
• Medical history can be used to predict future prognosis because it can be
used to look into probabilities of developing another medical condition,
developing the medical condition again X amount of times and your chances
of death within a certain timespan.
13. Accuracy of the prediction
• I used the formula: Accuracy = (TN+TP)/(TN+TP+FN+FP)
Probability Sensitivity Specificity Accuracy
0 1 0 50%
0.2 0.738 0.493 62%
0.4 0.3033 0.86 58%
0.8 0.01206 0.997 50%
0.9 0.00001084 0.9999 50%
0.95 4.0669E-06 1 50%
1 0 1 50%
• Sensitivity – among patients with a
disease, the probability of a positive test
• Specificity – Among patients without
disease, the probability of a negative
test
15. Usefulness of the Project
• The usefulness of the project to others is that hospitals can focus more time to
these patients which may result in a delay of death or an increase in patient
satisfaction.
• Researchers can use this project to better understand trends and stages of different
icd9’s, i.e: why there is such a high odds/probability or low odds/probability
associated with the corresponding id.
• For me, it gave me some insight on how to apply Bayes Theorem (statistical
processes) to a big dataset and how to use different functions within SQL to
complete the desired tasks.