Predictive Medicine

PREDICTIVE MEDICINE
BY: Khuloud Edwards
HAP464 : Data Analysis

Q:1
Clean the data. Remove zombies (visits from dead patients), remove diagnoses
before birth (negative age of diagnosis), patients with more than 365 diagnosis per
year of age, diagnoses after age of 110, no age of diagnosis, no diagnosis, and less
than 365 days between first and last diagnosis. Report the number of patients and
diagnoses excluded from the study as you cleaned the data

17,443,442
Dx
Removed wrong
data(Patients had visits
after their death )
168 IDs
Removed patients
had > 365 Dx/ y
56 IDs
Removed Patients
had < 365 days
( first and last Dx)
829,632 IDs
Removed from AgeAtDx:
1. Null values
2. Negative values
3. AgeAtDx > = 110
4. ICD9 is NULL
#DATACLN
17,379,218 Dx

Q:2
For patients who have diabetes, identify the average age of diabetes and standard
deviation of the age of diabetes. Select a baseline period (e.g. diagnoses before 50)
that is relatively diabetes free. Exclude patients who had diabetes during the
baseline period. Report the baseline period and exclusions

Calculate the AVG and STDED for
patients with Diabetes
AvgAge SdDevAge
62.2 12.4

 Baseline period is when
AgeAtFirstDM <= 50
 Number of patients who relatively
free of DM and should be excluded
is: 40,432 ID
 The number of patients with diabetes
for patients with age GREATER
than 50 and after removing the
NULL values is:
4,519,842 Dx
194,157 ID

Q:3
Calculate the likelihood ratio of diabetes for each diagnosis from 90% randomly
selected training set. The medical history for diabetic patients is any diagnosis that
precedes diabetes in the baseline period. For non-diabetic patients, it is any
diagnoses non-diabetic patients had in the baseline period. In this analysis we are
doing a case control study, where cases are patients with diabetes and controls are
patients without diagnosis.

Training Set and Validation Set
TS –90%
WHERE Rand(ID) <=.9
15,566,207 Dx
743,734 ID
VS-10%
FROM #DATACLN a left join
#TrainID b
1,813,011 Dx
85,843 ID

Likelihood Ratio
𝐿𝑅 =
Number of Patients Dx with DM T otal 𝑛𝑢𝑚𝑛𝑒𝑟 𝑜𝑓 𝑝𝑎𝑡𝑖𝑒𝑛𝑡𝑠 𝑤𝑖𝑡ℎ 𝐷𝑀
Number of Patient with Dx and NOT D M T otalnumber without DM

Q:4
Use the likelihood ratios for the medical history to predict probability of diabetes in
10% set aside validation set.
𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝑫𝑴 =
𝑷𝒐𝒔𝒕𝒆𝒓𝒊𝒐𝒓 𝑶𝒅𝒅𝒔
𝟏 + 𝑷𝒐𝒔𝒕𝒆𝒓𝒊𝒐𝒓 𝑶𝒅𝒅𝒔
𝑷𝒐𝒔𝒕𝒆𝒓𝒊𝒐𝒓 𝒐𝒅𝒅𝒔 = 𝑷𝒓𝒊𝒐𝒓 𝒐𝒅𝒅𝒔 ∗ 𝑴𝑴 𝑰𝒏𝒅𝒆𝒙

Q:5
Calculate the sensitivity and specificity of the predictions. Calculate Area under
Receiver Operating Curve.

Sensitivity : The true positive rate, measures the proportion of positives that
are correctly identified as such (i.e. the percentage of sick people , with DM,
who are correctly identified as having the condition).
Sensitivity refers to the test's ability to correctly detect patients who do have
the condition

Specificity of a test is the proportion of healthy patients known not to have
the disease, who will test negative for it.
Specificity: The true negative rate, measures the proportion of negatives that
are correctly identified as such (i.e., the percentage of healthy people (Don’t
have DM) who are correctly identified as not having the condition).

Contingency Table
A contingency table is a type of table in a matrix format that
displays the (multivariate)frequency distribution of the
variables. They are heavily used in survey research, business
intelligence, engineering and scientific research. They provide
a basic picture of the interrelation between two variables and
can help find interactions between them.

Predicted Condition
Total
Population
Predicted Condition
Positive
Predicted Condition
Negative
True
Condition
Condition
Positive DM
True Positive False Negative SN = 96.4%
Condition
Negative
Control
False Positive True Negative SP= 98%

Receiver Operating Characteristic
(ROC) Curve
The graph at right showing the number of patients
with and without a disease arranged according to the
value of a diagnostic test. This distributions overlap--
the test (like most) does not distinguish normal from
disease with 100% accuracy. The area of overlap
indicates where the test cannot distinguish normal
from disease.

Q:6
Is the model accurate enough to guide individual patients? Public policy? Prepare a
presentation about the value of the predictive medicine and your effort.

Conclusion
 Predictive medicine is a branch of medicine that aims to identify patients at risk
of developing a disease, thereby enabling either prevention or early treatment of
that disease
 To guide individual patients: The likelihood that a given test result correlates with
the presence or absence of disease
 For public policy: Use Probability. Probability is a method to describe the
likeliness that an event will occur

Predictive Medicine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Predictive Medicine

Similar to Predictive Medicine (20)

Recently uploaded

Recently uploaded (20)

Predictive Medicine

Editor's Notes