Report on Predicting Who Will
Die When
By: Khuloud Edwards
HAP 464 Data Analysis
Q1: The objective of the work. Why is this work important?
1. Identify the patiente who will die and when and to do that we should first identify
the erroes in the Dataset. For example:
A: When the patient continues to visit the medical center after his/her death is
reported,
B. Patients with no visits and ,
C: Imputing missing values
2. Calculate the risk associated with any diagnosis using the likelihood ratios
3. Predict the probability of mortality for patients based on their hospital diagnoses
2. The data and its source. How many cases, what is the distribution of the data
3. Preparation of the data. How many patients and diagnoses were excluded by each criteria.
Show this as a flow diagram depicting in each step the reduction in sample
17,443,442 Cases :
( Includes alive and dead patients. Patient continues to visit the medical center after their death
Include also negative and null values)
17,443,442 Cases
Live, Dead, Visit after death, Null,
and Negative values
Remove Visit after
death patients
168 IDs
w/o Duplication
17,432,694
Live, Dead, Null, and Negative
values
17,379,218
Remove Cases>365 dx, null,
negative
Remove Duplication
657,885 Distinct ID
Remove Duplication
171,692
Distinct ID
80% Training Set
13,760,073 Dx
20% Validation Set
3,619,145 Dx
4. Calculation of likelihood ratios. Give examples of 10 most deadly and 10 least deadly diseases.
Likelihood Ratio ( LR) =
( The number of patients who died within 6 m. with Dx / The total number of dead patients )
(The number of patients who still alive after 6 m. with Dx / The total number of alive patients)
10 Most Deadly:
ICD9 LR ICD9 Diagnosis
I853.05 14.51 Unspecified intracranial hemorrhage following injury without mention of open intracranial wound
I798.2 14.51 Death occurring in less than 24 hours from onset of symptoms, not otherwise explained
I183.2 9.67 Malignant neoplasm of fallopian tube
I798.2 9.67 Unspecified intracranial hemorrhage following injury without mention of open intracranial wound
I194.8 0.67 Malignant neoplasm of other endocrine glands and related structures
I960.7 9.67 Poisoning by antineoplastic antibiotics
I862.21 9.67 Injury to bronchus without mention of open wound into cavity
I852.05 9.67 Subarachnoid hemorrhage following injury without mention of open intracranial wound
I718.59 9.67 Ankylosis of joint, multiple sites
I531.21 9.67 Acute gastric ulcer with hemorrhage and perforation, with obstruction
10 Least Deadly:
ICD9 LR ICD9 Diagnosis
I218.9 0.0045 Leiomyoma of uterus, unspecified
I626.2 0.0049 Disorders of menstruation
I478.0 0.0082 Hypertrophy of nasal turbinates
I599.7 0.0089 Hematuria
I620.2 0.0125 Other and unspecified ovarian cyst
I717.83 0.0138 Old disruption of anterior cruciate ligament
I474.00 0.0141 Effusion, right foot
I296.42 0.0143 Bipolar I disorder, most recent episode (or current) manic, moderate
I716.17 0.0143 Traumatic arthropathy, ankle and foot
IV57.22 0.0165 Operations On Urinary Bladder
5. How medical history could be used to predict prognosis. Give example of making prediction
for 1 person. Give his medical history and prediction
id Death ICD9List LR
778942 89.5
I041.6,I197.0,I198.2,I272.4,I276.1,I294.10,I331.0,I401.9,I427.31,I428.0,I441.4,I
584.9,I591.,I599.0,I600.01,I682.6,I715.38,I728.88,I733.14,I780.97,IV10.83,IV57
.1,IV58.61 23 ICD9
I041.6: Proteus , infection in conditions classified elsewhere and of unspecified
site. Rare congenital disorder that causes skin overgrowth and atypical bone
development, often accompanied by tumors over half the body. 0.9889
I197.0: Secondary malignant neoplasm of lung 4.4349
I198.2: Secondary malignant neoplasm of skin 3.5932
I294.10: Alzheimer's Dementia 1.5544
I441.4: Abdominal aneurysm without mention of rupture 0.6378
I331.0 : Alzheimer's disease 1.5857
IV58.61: Anticoagulants: Substance that prevents blood from forming clots 0.5804
6. The accuracy of prediction (how well does the model predict mortality). Give a contingency
table.
Sensitivity: True Positive Rate (Percentage of sick people who are correctly identified as
HAVING the condition)
Specifity: True Negative Rate (Percentage of healthy people who are correctly identified as
NOT having the condition)
Posterior odds=Prior odds*MM Index
Probability of Mortality=(Posterior Odds)/(1+Posterior Odds)
7. Usefulness of the project to others and to you (introspection of what you learned)
A: Data analysis is important to all business.
B: Healthcare facilities such as, hospitals, clinics, both large and small, are beginning to utilize big data
and associated analysis approaches as a way to gain information to better support their facilities and
serve their customers
C: Combine multiple data sources creates new expectations for consistent quality and mesuare the
quality of health
D: Decisions made in preparing the data could radically change the findings

Predicting whowilldiewhen

  • 1.
    Report on PredictingWho Will Die When By: Khuloud Edwards HAP 464 Data Analysis
  • 2.
    Q1: The objectiveof the work. Why is this work important? 1. Identify the patiente who will die and when and to do that we should first identify the erroes in the Dataset. For example: A: When the patient continues to visit the medical center after his/her death is reported, B. Patients with no visits and , C: Imputing missing values 2. Calculate the risk associated with any diagnosis using the likelihood ratios 3. Predict the probability of mortality for patients based on their hospital diagnoses
  • 3.
    2. The dataand its source. How many cases, what is the distribution of the data 3. Preparation of the data. How many patients and diagnoses were excluded by each criteria. Show this as a flow diagram depicting in each step the reduction in sample 17,443,442 Cases : ( Includes alive and dead patients. Patient continues to visit the medical center after their death Include also negative and null values)
  • 4.
    17,443,442 Cases Live, Dead,Visit after death, Null, and Negative values Remove Visit after death patients 168 IDs w/o Duplication 17,432,694 Live, Dead, Null, and Negative values 17,379,218 Remove Cases>365 dx, null, negative Remove Duplication 657,885 Distinct ID Remove Duplication 171,692 Distinct ID 80% Training Set 13,760,073 Dx 20% Validation Set 3,619,145 Dx
  • 5.
    4. Calculation oflikelihood ratios. Give examples of 10 most deadly and 10 least deadly diseases. Likelihood Ratio ( LR) = ( The number of patients who died within 6 m. with Dx / The total number of dead patients ) (The number of patients who still alive after 6 m. with Dx / The total number of alive patients)
  • 6.
    10 Most Deadly: ICD9LR ICD9 Diagnosis I853.05 14.51 Unspecified intracranial hemorrhage following injury without mention of open intracranial wound I798.2 14.51 Death occurring in less than 24 hours from onset of symptoms, not otherwise explained I183.2 9.67 Malignant neoplasm of fallopian tube I798.2 9.67 Unspecified intracranial hemorrhage following injury without mention of open intracranial wound I194.8 0.67 Malignant neoplasm of other endocrine glands and related structures I960.7 9.67 Poisoning by antineoplastic antibiotics I862.21 9.67 Injury to bronchus without mention of open wound into cavity I852.05 9.67 Subarachnoid hemorrhage following injury without mention of open intracranial wound I718.59 9.67 Ankylosis of joint, multiple sites I531.21 9.67 Acute gastric ulcer with hemorrhage and perforation, with obstruction
  • 7.
    10 Least Deadly: ICD9LR ICD9 Diagnosis I218.9 0.0045 Leiomyoma of uterus, unspecified I626.2 0.0049 Disorders of menstruation I478.0 0.0082 Hypertrophy of nasal turbinates I599.7 0.0089 Hematuria I620.2 0.0125 Other and unspecified ovarian cyst I717.83 0.0138 Old disruption of anterior cruciate ligament I474.00 0.0141 Effusion, right foot I296.42 0.0143 Bipolar I disorder, most recent episode (or current) manic, moderate I716.17 0.0143 Traumatic arthropathy, ankle and foot IV57.22 0.0165 Operations On Urinary Bladder
  • 8.
    5. How medicalhistory could be used to predict prognosis. Give example of making prediction for 1 person. Give his medical history and prediction id Death ICD9List LR 778942 89.5 I041.6,I197.0,I198.2,I272.4,I276.1,I294.10,I331.0,I401.9,I427.31,I428.0,I441.4,I 584.9,I591.,I599.0,I600.01,I682.6,I715.38,I728.88,I733.14,I780.97,IV10.83,IV57 .1,IV58.61 23 ICD9 I041.6: Proteus , infection in conditions classified elsewhere and of unspecified site. Rare congenital disorder that causes skin overgrowth and atypical bone development, often accompanied by tumors over half the body. 0.9889 I197.0: Secondary malignant neoplasm of lung 4.4349 I198.2: Secondary malignant neoplasm of skin 3.5932 I294.10: Alzheimer's Dementia 1.5544 I441.4: Abdominal aneurysm without mention of rupture 0.6378 I331.0 : Alzheimer's disease 1.5857 IV58.61: Anticoagulants: Substance that prevents blood from forming clots 0.5804
  • 9.
    6. The accuracyof prediction (how well does the model predict mortality). Give a contingency table. Sensitivity: True Positive Rate (Percentage of sick people who are correctly identified as HAVING the condition)
  • 10.
    Specifity: True NegativeRate (Percentage of healthy people who are correctly identified as NOT having the condition) Posterior odds=Prior odds*MM Index Probability of Mortality=(Posterior Odds)/(1+Posterior Odds)
  • 11.
    7. Usefulness ofthe project to others and to you (introspection of what you learned) A: Data analysis is important to all business. B: Healthcare facilities such as, hospitals, clinics, both large and small, are beginning to utilize big data and associated analysis approaches as a way to gain information to better support their facilities and serve their customers C: Combine multiple data sources creates new expectations for consistent quality and mesuare the quality of health D: Decisions made in preparing the data could radically change the findings