2. Q1: The objective of the work. Why is this work important?
1. Identify the patiente who will die and when and to do that we should first identify
the erroes in the Dataset. For example:
A: When the patient continues to visit the medical center after his/her death is
reported,
B. Patients with no visits and ,
C: Imputing missing values
2. Calculate the risk associated with any diagnosis using the likelihood ratios
3. Predict the probability of mortality for patients based on their hospital diagnoses
3. 2. The data and its source. How many cases, what is the distribution of the data
3. Preparation of the data. How many patients and diagnoses were excluded by each criteria.
Show this as a flow diagram depicting in each step the reduction in sample
17,443,442 Cases :
( Includes alive and dead patients. Patient continues to visit the medical center after their death
Include also negative and null values)
4. 17,443,442 Cases
Live, Dead, Visit after death, Null,
and Negative values
Remove Visit after
death patients
168 IDs
w/o Duplication
17,432,694
Live, Dead, Null, and Negative
values
17,379,218
Remove Cases>365 dx, null,
negative
Remove Duplication
657,885 Distinct ID
Remove Duplication
171,692
Distinct ID
80% Training Set
13,760,073 Dx
20% Validation Set
3,619,145 Dx
5. 4. Calculation of likelihood ratios. Give examples of 10 most deadly and 10 least deadly diseases.
Likelihood Ratio ( LR) =
( The number of patients who died within 6 m. with Dx / The total number of dead patients )
(The number of patients who still alive after 6 m. with Dx / The total number of alive patients)
6. 10 Most Deadly:
ICD9 LR ICD9 Diagnosis
I853.05 14.51 Unspecified intracranial hemorrhage following injury without mention of open intracranial wound
I798.2 14.51 Death occurring in less than 24 hours from onset of symptoms, not otherwise explained
I183.2 9.67 Malignant neoplasm of fallopian tube
I798.2 9.67 Unspecified intracranial hemorrhage following injury without mention of open intracranial wound
I194.8 0.67 Malignant neoplasm of other endocrine glands and related structures
I960.7 9.67 Poisoning by antineoplastic antibiotics
I862.21 9.67 Injury to bronchus without mention of open wound into cavity
I852.05 9.67 Subarachnoid hemorrhage following injury without mention of open intracranial wound
I718.59 9.67 Ankylosis of joint, multiple sites
I531.21 9.67 Acute gastric ulcer with hemorrhage and perforation, with obstruction
7. 10 Least Deadly:
ICD9 LR ICD9 Diagnosis
I218.9 0.0045 Leiomyoma of uterus, unspecified
I626.2 0.0049 Disorders of menstruation
I478.0 0.0082 Hypertrophy of nasal turbinates
I599.7 0.0089 Hematuria
I620.2 0.0125 Other and unspecified ovarian cyst
I717.83 0.0138 Old disruption of anterior cruciate ligament
I474.00 0.0141 Effusion, right foot
I296.42 0.0143 Bipolar I disorder, most recent episode (or current) manic, moderate
I716.17 0.0143 Traumatic arthropathy, ankle and foot
IV57.22 0.0165 Operations On Urinary Bladder
8. 5. How medical history could be used to predict prognosis. Give example of making prediction
for 1 person. Give his medical history and prediction
id Death ICD9List LR
778942 89.5
I041.6,I197.0,I198.2,I272.4,I276.1,I294.10,I331.0,I401.9,I427.31,I428.0,I441.4,I
584.9,I591.,I599.0,I600.01,I682.6,I715.38,I728.88,I733.14,I780.97,IV10.83,IV57
.1,IV58.61 23 ICD9
I041.6: Proteus , infection in conditions classified elsewhere and of unspecified
site. Rare congenital disorder that causes skin overgrowth and atypical bone
development, often accompanied by tumors over half the body. 0.9889
I197.0: Secondary malignant neoplasm of lung 4.4349
I198.2: Secondary malignant neoplasm of skin 3.5932
I294.10: Alzheimer's Dementia 1.5544
I441.4: Abdominal aneurysm without mention of rupture 0.6378
I331.0 : Alzheimer's disease 1.5857
IV58.61: Anticoagulants: Substance that prevents blood from forming clots 0.5804
9. 6. The accuracy of prediction (how well does the model predict mortality). Give a contingency
table.
Sensitivity: True Positive Rate (Percentage of sick people who are correctly identified as
HAVING the condition)
10. Specifity: True Negative Rate (Percentage of healthy people who are correctly identified as
NOT having the condition)
Posterior odds=Prior odds*MM Index
Probability of Mortality=(Posterior Odds)/(1+Posterior Odds)
11. 7. Usefulness of the project to others and to you (introspection of what you learned)
A: Data analysis is important to all business.
B: Healthcare facilities such as, hospitals, clinics, both large and small, are beginning to utilize big data
and associated analysis approaches as a way to gain information to better support their facilities and
serve their customers
C: Combine multiple data sources creates new expectations for consistent quality and mesuare the
quality of health
D: Decisions made in preparing the data could radically change the findings