Performance Evaluation of Data Mining Algorithm on Electronic Health Record o...
Krishna Chaitanya Yarlagadda Main Poster- Memory Based Reasoning
1. Yarlagadda, Merla-
Introduction
Health can be impacted by the physical and psychological characteristics of a
person. Many different types of models can be built to predict the occurrence
of a disease by taking several symptoms into consideration. The objective of
this research is to show that MBR (Memory based reasoning) model seem to
be more effective than all the models in this domain perhaps because MBR
uses K nearest neighbors to predict unknown values for a case based on
similarity with K most similar cases. Memory based reasoning might suit well
in the context of clinical data because the disease and its causes are known
only when the symptoms of earlier disease occurring patients were observed
and analyzed clearly
Data Preparation
The data was collected from the machine learning repository of UCI website.
The data set consists of 89000 number of observations and 20 variables such
as triglycerides, cholesterol, Body fat percentage, HDL, LDL, Systolic and
Diastolic BP, %fat in various body parts and other contributing factors to heart
disease like smoking, alcoholic consumption etc
Several imputation techniques like Tree imputation, synthetic distribution are
used to replace some of the missing values. The distribution of each of the
variable was observed clearly and some of the variable distributions are
transformed into normal distributions using transform variables in order to
improve the performance of the model
Model building and evaluation
Regression, Neural Networks, Decision Trees, RD Tree, Scan method MBR
models are being compared. The validation average squared error,
misclassification rate, ROC curve and cumulative lift statistics are used to
evaluate the performance of the models. RD tree method MBR model turned
out to be the best model with a validation average squared error of 0.07,
misclassification rate of 0.58 and cumulative lift of 1.76
Figure 2. Model building
Discussion:
1)Both the MBR methods for predicting the heart disease from the set of
symptoms worked very well as MBR node totally took care of the symptoms
in diseases with its memory.
2) Rather than focusing on running the previous old models, it is always
better to try and implement the different types of new models to dig the
several insights.
3)Clinical research organizations must give a try to run MBR models with
clinical data related to disease prediction because MBR models are proven
to be the best in the above case in predicting the symptoms related to
diseases as it stores everything into memory with k nearest neighbors
concept
Data Insights:
Age is also the most significant factor for the heart disease coupled with the
main factors such as high blood pressure, smoking, and high cholesterol.
The skin fold measurements of the abdomen and the thigh plays a
significant role to predict the body fat percentage which in turn might lead to
the heart disease rather than the other skin fold measurements
The effective use of Memory Based Reasoning model in predicting the illness of
disease using SAS Enterprise Miner 12.1
Krishna Chaitanya Yarlagadda
Data Mining and Reporting Analyst, IQR Consulting, Oklahoma State University(Alumni), Stillwater, OK 74078
Faculty Advisor: Dr. Goutam Chakraborty
Figure 3. Prediction Accuracy Results
Figure 4 .ROC curve of different models
Results
Figure 1. Existing and Proposed solution
References:
.Data Mining Techniques: For
Marketing, Sales, and
Customer Relationship
Management, Third Edition
. http://acl.ldc.upenn.edu
.Model-Based Reasoning:
Science, Technology, Values
By L.Magnani
Acknowledgement :
The authors wish to thank Dr.
Goutam Chakraborty for his
guidance and advice on this
project.
Authors Information:
Krishna Chaitanya
Yarlagadda
E-mail-
krishna.chaitanya.yarlagadda
@okstate.edu
Work Phone: (269)365-1975
Figure 5 .Cumulative lift of RD Tree MBR model