SlideShare a Scribd company logo
1 of 4
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 6 (Jul. - Aug. 2013), PP 83-86
www.iosrjournals.org
www.iosrjournals.org 83 | Page
A Heart Disease Prediction Model using Decision Tree
Atul Kumar Pandey1
,Prabhat Pandey2
,K.L. Jaiswal3
,Ashish Kumar Sen4
1
Assistant Professor of Computer Science, Department of Physics, Govt. PG Science College, Rewa(M.P.)-
India,
2
OSD, Additional Directorate, Higher Education, Division Rewa (M.P.)-India,
3
Assistant Professor and In charge of BCA, DCA & PGDCA, Department of Physics, Govt. PG Science College,
Rewa(M.P.)-India-486001,
4
Assistant Professor, Department of Mathematics & Computer Science, Govt. PG Science College, Rewa(M.P.)-
India,
Abstract—In this paper, we develop a heart disease prediction model that can assist medical professionals in
predicting heart disease status based on the clinical data of patients. Firstly, we select 14 important clinical
features, i.e., age, sex, chest pain type, trestbps, cholesterol, fasting blood sugar, resting ecg, max heart rate,
exercise induced angina, old peak, slope, number of vessels colored, thal and diagnosis of heart disease.
Secondly, we develop an prediction model using J48 decision tree for classifying heart disease based on these
clinical features against unpruned, pruned and pruned with reduced error pruning approach.. Finally, the
accuracy of Pruned J48 Decision Tree with Reduced Error Pruning Approach is more better then the simple
Pruned and Unpruned approach. The result obtained that which shows that fasting blood sugar is the most
important attribute which gives better classification against the other attributes but its gives not better
accuracy.
Keywords—Data mining, Reduced Error Pruning, Gain Ratio and Decision Tree.
I. Introduction
The diagnosis of heart disease in most cases depends on a complex combination of clinical and
pathological data; this complexity leads to the excessive medical costs affecting the quality of the medical care
[1]. According to the statistic data from WHO, one third population worldwide died from heart disease; heart
disease is found to be the leading cause of death in developing countries by 2010. It shows one third American
adult have one or more types of heart diseases based on American Heart Association report. Computational
biology is often applied in the process of translating biological knowledge into clinical practice, as well as in the
understanding of biological phenomena from the clinical data. The discovery of biomarkers in heart disease is
one of the key contributions using computational biology. This process involves the development of a predictive
model and the integration of different types of data and knowledge for diagnostic purposes.
Furthermore, this process requires the design and combination of different methodologies from
statistical analysis and data mining [2,3].
In the past decades, data mining have played an important role in heart disease research. To find the
hidden medical information from the different expression between the healthy and the heart disease individuals
in the existed clinical data is a noticeable and powerful approach in the study of heart disease classification.
Heart disease classification provides the critical basis for the therapy of patients. Statistics and machine learning
are two main approaches which have been applied to predict the status of heart disease based on the expression
of the clinical data [4,5].
Data mining (DM) is the core stage of knowledge discovery in databases (KDD), which is a "nontrivial
extraction of implicit, novel, and potentially useful information from data” [6]. It applies machine learning and
statistical methods in order to discover areas of previously unknown knowledge. As a rule, the KDD process
involves the following steps: data selection, data pre-processing, transformation, DM (induction of useful
patterns), and interpretation of results. Several data mining techniques are used in the diagnosis of heart disease
such as naïve bayes, decision tree, neural network, kernel density, bagging algorithm, and support vector
machine showing different levels of accuracies.
One of the most common classification models is the decision tree, which is a tree-like structure where
each internal node denotes a test on a predictive attribute and each branch denotes an attribute value. A leaf
node represents predicted classes or class distributions [7]. An unlabeled object is classified by starting at the
topmost (root) node of the tree, then traversing the tree, based on the values of the predictive attributes in this
object. Decision-tree techniques assume that the data objects are described by a fixed set of attributes, where
each predictive attribute takes a small number of disjoint possible values and the target (dependent) variable has
discrete output values, each value representing a class label.
A Heart Disease Prediction Model using Decision Tree
www.iosrjournals.org 84 | Page
There are several known algorithms of decision tree induction: ID3 - which uses information gain with
statistical pre-pruning, C4.5, an advanced version of ID3, and probably the most popular decision-tree algorithm
[8], ART, which minimizes a cost-complexity function, See5 - which builds several models and uses unequal
misclassification costs, and IFN – Info-Fuzzy Network which utilizes information theory to minimize the
number of predictive attributes in a decision-tree model [9] [10]. In [9], the IFN algorithm is shown empirically
to produce more compact models than C4.5, while preserving nearly the same level of classification accuracy.
II. Methods
A. Data sources
In this paper, we use the heart disease data from machine learning repository of UCI [11]. We have
total 303 instances of which 164 instances belonged to the healthy and 139 instances belonged to the heart
disease. 14 clinical features have been recorded for each instance.
B. Features description
Table 1 shows the 14 clinical features and their description.
. Table 1- Clinical features and their description
Clinical features Description
Age Instance age in years
Sex Instance gender
Cp Chest pain type
Trestbps (mmHg) Resting blood pressure
Chol (mg/dl) Serum cholesterol
Fbs Fasting blood sugar
Restecg Resting electrocardiographic results
Thalach Maximum heart rate achieved
Exang Exercise induced angina
Oldpeak ST depression induced by exercise relative to rest
Slope The slope of the peak exercise ST segment
Ca Number of major vessels (0-3)colored by flourosopy
Thal 3 = normal; 6 = fixed defect; 7 = reversible defect
Num Diagnosis of heart disease
In Table 1, there are 14 attributes used in this system, including 8 symbolic and 6 numeric: age (age in
years), sex (male, female), Chest pain type (typical angina, atypical angina, non-angina pain, asymptomatic),
Trestbps (resting blood pressure in mm Hg), cholesterol (serum cholesterol in mg/dl), fasting blood sugar < 120
mg/dl (true or false), resting electrocardiographic results (normal, having ST-T wave abnormality, showing
probable or definite left ventricular hypertrophy by Estes' criteria), max heart rate, exercise induced angina (true
or false), oldpeak (ST depression induced by exercise relative to rest), slope (up, flat, down), number of vessels
colored by fluoroscopy (0-3), thal (normal, fixed defect, reversible defect), and class (healthy, with heart-
disease).
III. Decision Tree
The decision tree type used in this research is the gain ratio decision tree. The gain ratio decision tree is
based on the entropy (information gain) approach, which selects the splitting attribute that minimizes the value
of entropy, thus maximizing the information gain [15]. Information gain [12] is the difference between the
original information content and the amount of information needed. The features are ranked by the information
gains, and then the top ranked features are chosen as the potential attributes used in the classifier. To identify the
splitting attribute of the decision tree, one must calculate the information gain for each attribute and then select
the attribute that maximizes the information gain. The information gain for each attribute is calculated using the
following formula [14, 15]:
E =
Where k is the number of classes of the target attributes Pi is the number of occurrences of class i
divided by the total number of instances (i.e. the probability of i occurring). To reduce the effect of bias
resulting from the use of information gain, a variant known as gain ratio was introduced by the Australian
academic Ross Quinlan [15]. The information gain measure is biased toward tests with many outcomes. That is,
it prefers to select attributes having a large number of values [13]. Gain Ratio adjusts the information gain for
each attribute to allow for the breadth and uniformity of the attribute values.
Gain Ratio = Information Gain / Split Information
Where the split information is a value based on the column sums of the frequency table [15].
A Heart Disease Prediction Model using Decision Tree
www.iosrjournals.org 85 | Page
IV. Results and Discussion
We demonstrate here the usefulness of the prediction model to the clinical data of heart disease where
training instances 200 and testing instances 103 using split test mode.
Table 2- summarizes the results generated from different approches.
J48 Unpruned tree J48 Pruned tree J48 Reduced Error Pruning
Percent Correct 72.82 73.79 75.73
Number Correct 75 76 78
IR Precision 0.68 0.72 0.74
No. of Rules 14 33 11
Tree Size 24 56 17
F Measure 0.78 0.77 0.79
After extracting the decision tree rules, reduced error pruning was used to prune the extracted decision
rules. Reduced error pruning is one of the fastest pruning methods and known to produce both accurate and
small decision rules [16]. Applying reduced error pruning provides more compact decision rules and reduces the
number of extracted rules.
Fig 1- J48 pruned tree with reduced error pruning approach
V. Conclusion
This work will provide better diagnosis the heart patients than the previous. We develop a heart disease
prediction model that can assist medical professionals in predicting heart disease status based on the clinical
data of patients. Firstly, we select 14 important clinical features, i.e., age, sex, chest pain type, trestbps,
cholesterol, fasting blood sugar, resting ecg, max heart rate, exercise induced angina, old peak, slope, number of
vessels colored, thal and diagnosis of heart disease. Secondly, we develop a prediction model using J48 decision
tree for classifying heart disease based on these clinical features against unpruned, pruned and pruned with
reduced error pruning approach. Finally, the accuracy of Pruned J48 Decision Tree with Reduced Error Pruning
Approach is more better then the simple Pruned and Unpruned approach. The result obtained that which shows
that fasting blood sugar is the most important attribute which gives better classification against the other
attributes but its gives not better accuracy.
A Heart Disease Prediction Model using Decision Tree
www.iosrjournals.org 86 | Page
References
[1.] Wu R, Peters W, Morgan MW. The next generation clinical decision support: linking evidence to best practice. J Healthc Inf
Manag, 2002; 16:50-5.
[2.] Thuraisingham BM. A Primer for Understanding and applying data mining. IT Professional 2000; 1:28-31.
[3.] Rajkumar A, Reena GS. Diagnosis of heart disease using datamining algorithm. Global Journal of Computer Science and
Technology 2010; 10:38-43.
[4.] Anbarasi M, Anupriya E, Iyengar NCHSN. Enhanced prediction of heart Disease with feature subset selection using genetic
algorithm. International Journal of Engineering Science and Technology 2010; 2:5370-76.
[5.] Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. International Journal of
Computer Science and Network Security 2008; 8:343-50.
[6.] J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann, 2006.
[7.] T. Mitchell, Machine Learning, McGraw Hill, 1997.
[8.] J.R. Quinlan: C4.5, Programs for MachineLearning, Morgan Kaufmann, 1993.
[9.] M. Last and O. Maimon, “A Compact and Accurate Model for Classification”, IEEE Transactions on Knowledge and Data
Engineering 2004; 16, 2: 203-215.
[10.] O. Maimon and M. Last, Knowledge Discovery and Data Mining – The InfoFuzzy Network (IFN) Methodology, Kluwer Academic
Publishers, Massive Computing, Boston, December 2000.
[11.] UCI Machine Learning Repository [homepage on the Internet]. Arlington: The Association; 2006 [updated 1996 Dec 3; cited 2011
Feb 2]. Available from: http://archive.ics.uci.edu/ml/datasets/Heart+Disease.
[12.] Kwong-Sak Leung,kin hong Lee,Jin-Feng Wang,Eddie Y.T.Ng,Henry L.Y.Chan,Stephen K.W.Tsui,Tony S.K. Mok,Pete Chi-Hang
Tse, Joseph Jao-yui Sung, Data Mining on DNA Sequences of Hepatitis B virus. IEEE/ACM Transactions on Computational
Biology and Bioinformatics, Vol 8,No 2,March/April 2011.
[13.] Sellappan Palaniappan Rafiah Awang, Intelligent Heart Disease Prediction System Using Data Mining Techniques. IJCSNS
International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008.
[14.] Han, j. and M. Kamber, Data Mining Concepts and Techniques. 2006: Morgan Kaufmann Publishers. Lee, I.-N., S.-C. Liao, and M.
Embrechts, Data.
[15.] Bramer, M., Principles of data mining. 2007: Springer.
[16.] Esposito, F., D. Malerba and G. Semeraro, A Comparative Analysis of method for pruning decision trees. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 997. Vol. 19, No. 5.

More Related Content

What's hot

Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learningmohdshoaibuddin1
 
Hybrid Technique for Associative Classification of Heart Diseases
Hybrid Technique for Associative Classification of Heart DiseasesHybrid Technique for Associative Classification of Heart Diseases
Hybrid Technique for Associative Classification of Heart DiseasesJagdeep Singh Malhi
 
Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data miningAbheepsa Pattnaik
 
Chronic Kidney Disease Prediction
Chronic Kidney Disease PredictionChronic Kidney Disease Prediction
Chronic Kidney Disease PredictionRajandeep Gill
 
Brain tumor mri image segmentation and detection
Brain tumor mri image segmentation and detectionBrain tumor mri image segmentation and detection
Brain tumor mri image segmentation and detectioneSAT Publishing House
 
Predict Breast Cancer using Deep Learning
Predict Breast Cancer using Deep LearningPredict Breast Cancer using Deep Learning
Predict Breast Cancer using Deep LearningAyesha Shafique
 
Health care analytics
Health care analyticsHealth care analytics
Health care analyticsGaurav Dubey
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
 
Normalization & join
Normalization & joinNormalization & join
Normalization & joinSohag Babu
 
3 V's of Big Data
3 V's of Big Data3 V's of Big Data
3 V's of Big Datasunil173422
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MININGshivaniyadav112
 
Classification Techniques
Classification TechniquesClassification Techniques
Classification TechniquesKiran Bhowmick
 
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...cscpconf
 
Disease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemDisease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemsabafarheen
 
Datascience - bigmart data analysis
Datascience - bigmart data analysisDatascience - bigmart data analysis
Datascience - bigmart data analysisRvk Reddy
 
Big Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTBig Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTNikhil Atkuri
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering methodrajshreemuthiah
 

What's hot (20)

Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learning
 
Hybrid Technique for Associative Classification of Heart Diseases
Hybrid Technique for Associative Classification of Heart DiseasesHybrid Technique for Associative Classification of Heart Diseases
Hybrid Technique for Associative Classification of Heart Diseases
 
Retail Analytics
Retail AnalyticsRetail Analytics
Retail Analytics
 
Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data mining
 
Final ppt
Final pptFinal ppt
Final ppt
 
Chronic Kidney Disease Prediction
Chronic Kidney Disease PredictionChronic Kidney Disease Prediction
Chronic Kidney Disease Prediction
 
Brain tumor mri image segmentation and detection
Brain tumor mri image segmentation and detectionBrain tumor mri image segmentation and detection
Brain tumor mri image segmentation and detection
 
Predict Breast Cancer using Deep Learning
Predict Breast Cancer using Deep LearningPredict Breast Cancer using Deep Learning
Predict Breast Cancer using Deep Learning
 
Health care analytics
Health care analyticsHealth care analytics
Health care analytics
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
Normalization & join
Normalization & joinNormalization & join
Normalization & join
 
KNN
KNNKNN
KNN
 
3 V's of Big Data
3 V's of Big Data3 V's of Big Data
3 V's of Big Data
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
 
Classification Techniques
Classification TechniquesClassification Techniques
Classification Techniques
 
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
 
Disease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemDisease prediction and doctor recommendation system
Disease prediction and doctor recommendation system
 
Datascience - bigmart data analysis
Datascience - bigmart data analysisDatascience - bigmart data analysis
Datascience - bigmart data analysis
 
Big Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPTBig Data in Manufacturing Final PPT
Big Data in Manufacturing Final PPT
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 

Similar to A Heart Disease Prediction Model using Decision Tree

08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyanIAESIJEECS
 
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...IJCSEA Journal
 
A Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction TechniquesA Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction Techniquesijtsrd
 
Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...IJECEIAES
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksIAEME Publication
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksIAEME Publication
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksIAEME Publication
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
 
Heart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierHeart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierIOSR Journals
 
Prediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewPrediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewIRJET Journal
 
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGHEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGIJDKP
 
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET Journal
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...BASMAJUMAASALEHALMOH
 
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET Journal
 
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET Journal
 
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationAcute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationCemal Ardil
 
Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...IRJET Journal
 
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...ijdmtaiir
 

Similar to A Heart Disease Prediction Model using Decision Tree (20)

Heart failure prediction based on random forest algorithm using genetic algo...
Heart failure prediction based on random forest algorithm  using genetic algo...Heart failure prediction based on random forest algorithm  using genetic algo...
Heart failure prediction based on random forest algorithm using genetic algo...
 
08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan
 
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
 
A Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction TechniquesA Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction Techniques
 
Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
 
Heart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierHeart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means Classifier
 
Prediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewPrediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A Review
 
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGHEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
 
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...
 
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
 
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
 
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationAcute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
 
Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...
 
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 

Recently uploaded (20)

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 

A Heart Disease Prediction Model using Decision Tree

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 6 (Jul. - Aug. 2013), PP 83-86 www.iosrjournals.org www.iosrjournals.org 83 | Page A Heart Disease Prediction Model using Decision Tree Atul Kumar Pandey1 ,Prabhat Pandey2 ,K.L. Jaiswal3 ,Ashish Kumar Sen4 1 Assistant Professor of Computer Science, Department of Physics, Govt. PG Science College, Rewa(M.P.)- India, 2 OSD, Additional Directorate, Higher Education, Division Rewa (M.P.)-India, 3 Assistant Professor and In charge of BCA, DCA & PGDCA, Department of Physics, Govt. PG Science College, Rewa(M.P.)-India-486001, 4 Assistant Professor, Department of Mathematics & Computer Science, Govt. PG Science College, Rewa(M.P.)- India, Abstract—In this paper, we develop a heart disease prediction model that can assist medical professionals in predicting heart disease status based on the clinical data of patients. Firstly, we select 14 important clinical features, i.e., age, sex, chest pain type, trestbps, cholesterol, fasting blood sugar, resting ecg, max heart rate, exercise induced angina, old peak, slope, number of vessels colored, thal and diagnosis of heart disease. Secondly, we develop an prediction model using J48 decision tree for classifying heart disease based on these clinical features against unpruned, pruned and pruned with reduced error pruning approach.. Finally, the accuracy of Pruned J48 Decision Tree with Reduced Error Pruning Approach is more better then the simple Pruned and Unpruned approach. The result obtained that which shows that fasting blood sugar is the most important attribute which gives better classification against the other attributes but its gives not better accuracy. Keywords—Data mining, Reduced Error Pruning, Gain Ratio and Decision Tree. I. Introduction The diagnosis of heart disease in most cases depends on a complex combination of clinical and pathological data; this complexity leads to the excessive medical costs affecting the quality of the medical care [1]. According to the statistic data from WHO, one third population worldwide died from heart disease; heart disease is found to be the leading cause of death in developing countries by 2010. It shows one third American adult have one or more types of heart diseases based on American Heart Association report. Computational biology is often applied in the process of translating biological knowledge into clinical practice, as well as in the understanding of biological phenomena from the clinical data. The discovery of biomarkers in heart disease is one of the key contributions using computational biology. This process involves the development of a predictive model and the integration of different types of data and knowledge for diagnostic purposes. Furthermore, this process requires the design and combination of different methodologies from statistical analysis and data mining [2,3]. In the past decades, data mining have played an important role in heart disease research. To find the hidden medical information from the different expression between the healthy and the heart disease individuals in the existed clinical data is a noticeable and powerful approach in the study of heart disease classification. Heart disease classification provides the critical basis for the therapy of patients. Statistics and machine learning are two main approaches which have been applied to predict the status of heart disease based on the expression of the clinical data [4,5]. Data mining (DM) is the core stage of knowledge discovery in databases (KDD), which is a "nontrivial extraction of implicit, novel, and potentially useful information from data” [6]. It applies machine learning and statistical methods in order to discover areas of previously unknown knowledge. As a rule, the KDD process involves the following steps: data selection, data pre-processing, transformation, DM (induction of useful patterns), and interpretation of results. Several data mining techniques are used in the diagnosis of heart disease such as naïve bayes, decision tree, neural network, kernel density, bagging algorithm, and support vector machine showing different levels of accuracies. One of the most common classification models is the decision tree, which is a tree-like structure where each internal node denotes a test on a predictive attribute and each branch denotes an attribute value. A leaf node represents predicted classes or class distributions [7]. An unlabeled object is classified by starting at the topmost (root) node of the tree, then traversing the tree, based on the values of the predictive attributes in this object. Decision-tree techniques assume that the data objects are described by a fixed set of attributes, where each predictive attribute takes a small number of disjoint possible values and the target (dependent) variable has discrete output values, each value representing a class label.
  • 2. A Heart Disease Prediction Model using Decision Tree www.iosrjournals.org 84 | Page There are several known algorithms of decision tree induction: ID3 - which uses information gain with statistical pre-pruning, C4.5, an advanced version of ID3, and probably the most popular decision-tree algorithm [8], ART, which minimizes a cost-complexity function, See5 - which builds several models and uses unequal misclassification costs, and IFN – Info-Fuzzy Network which utilizes information theory to minimize the number of predictive attributes in a decision-tree model [9] [10]. In [9], the IFN algorithm is shown empirically to produce more compact models than C4.5, while preserving nearly the same level of classification accuracy. II. Methods A. Data sources In this paper, we use the heart disease data from machine learning repository of UCI [11]. We have total 303 instances of which 164 instances belonged to the healthy and 139 instances belonged to the heart disease. 14 clinical features have been recorded for each instance. B. Features description Table 1 shows the 14 clinical features and their description. . Table 1- Clinical features and their description Clinical features Description Age Instance age in years Sex Instance gender Cp Chest pain type Trestbps (mmHg) Resting blood pressure Chol (mg/dl) Serum cholesterol Fbs Fasting blood sugar Restecg Resting electrocardiographic results Thalach Maximum heart rate achieved Exang Exercise induced angina Oldpeak ST depression induced by exercise relative to rest Slope The slope of the peak exercise ST segment Ca Number of major vessels (0-3)colored by flourosopy Thal 3 = normal; 6 = fixed defect; 7 = reversible defect Num Diagnosis of heart disease In Table 1, there are 14 attributes used in this system, including 8 symbolic and 6 numeric: age (age in years), sex (male, female), Chest pain type (typical angina, atypical angina, non-angina pain, asymptomatic), Trestbps (resting blood pressure in mm Hg), cholesterol (serum cholesterol in mg/dl), fasting blood sugar < 120 mg/dl (true or false), resting electrocardiographic results (normal, having ST-T wave abnormality, showing probable or definite left ventricular hypertrophy by Estes' criteria), max heart rate, exercise induced angina (true or false), oldpeak (ST depression induced by exercise relative to rest), slope (up, flat, down), number of vessels colored by fluoroscopy (0-3), thal (normal, fixed defect, reversible defect), and class (healthy, with heart- disease). III. Decision Tree The decision tree type used in this research is the gain ratio decision tree. The gain ratio decision tree is based on the entropy (information gain) approach, which selects the splitting attribute that minimizes the value of entropy, thus maximizing the information gain [15]. Information gain [12] is the difference between the original information content and the amount of information needed. The features are ranked by the information gains, and then the top ranked features are chosen as the potential attributes used in the classifier. To identify the splitting attribute of the decision tree, one must calculate the information gain for each attribute and then select the attribute that maximizes the information gain. The information gain for each attribute is calculated using the following formula [14, 15]: E = Where k is the number of classes of the target attributes Pi is the number of occurrences of class i divided by the total number of instances (i.e. the probability of i occurring). To reduce the effect of bias resulting from the use of information gain, a variant known as gain ratio was introduced by the Australian academic Ross Quinlan [15]. The information gain measure is biased toward tests with many outcomes. That is, it prefers to select attributes having a large number of values [13]. Gain Ratio adjusts the information gain for each attribute to allow for the breadth and uniformity of the attribute values. Gain Ratio = Information Gain / Split Information Where the split information is a value based on the column sums of the frequency table [15].
  • 3. A Heart Disease Prediction Model using Decision Tree www.iosrjournals.org 85 | Page IV. Results and Discussion We demonstrate here the usefulness of the prediction model to the clinical data of heart disease where training instances 200 and testing instances 103 using split test mode. Table 2- summarizes the results generated from different approches. J48 Unpruned tree J48 Pruned tree J48 Reduced Error Pruning Percent Correct 72.82 73.79 75.73 Number Correct 75 76 78 IR Precision 0.68 0.72 0.74 No. of Rules 14 33 11 Tree Size 24 56 17 F Measure 0.78 0.77 0.79 After extracting the decision tree rules, reduced error pruning was used to prune the extracted decision rules. Reduced error pruning is one of the fastest pruning methods and known to produce both accurate and small decision rules [16]. Applying reduced error pruning provides more compact decision rules and reduces the number of extracted rules. Fig 1- J48 pruned tree with reduced error pruning approach V. Conclusion This work will provide better diagnosis the heart patients than the previous. We develop a heart disease prediction model that can assist medical professionals in predicting heart disease status based on the clinical data of patients. Firstly, we select 14 important clinical features, i.e., age, sex, chest pain type, trestbps, cholesterol, fasting blood sugar, resting ecg, max heart rate, exercise induced angina, old peak, slope, number of vessels colored, thal and diagnosis of heart disease. Secondly, we develop a prediction model using J48 decision tree for classifying heart disease based on these clinical features against unpruned, pruned and pruned with reduced error pruning approach. Finally, the accuracy of Pruned J48 Decision Tree with Reduced Error Pruning Approach is more better then the simple Pruned and Unpruned approach. The result obtained that which shows that fasting blood sugar is the most important attribute which gives better classification against the other attributes but its gives not better accuracy.
  • 4. A Heart Disease Prediction Model using Decision Tree www.iosrjournals.org 86 | Page References [1.] Wu R, Peters W, Morgan MW. The next generation clinical decision support: linking evidence to best practice. J Healthc Inf Manag, 2002; 16:50-5. [2.] Thuraisingham BM. A Primer for Understanding and applying data mining. IT Professional 2000; 1:28-31. [3.] Rajkumar A, Reena GS. Diagnosis of heart disease using datamining algorithm. Global Journal of Computer Science and Technology 2010; 10:38-43. [4.] Anbarasi M, Anupriya E, Iyengar NCHSN. Enhanced prediction of heart Disease with feature subset selection using genetic algorithm. International Journal of Engineering Science and Technology 2010; 2:5370-76. [5.] Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. International Journal of Computer Science and Network Security 2008; 8:343-50. [6.] J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann, 2006. [7.] T. Mitchell, Machine Learning, McGraw Hill, 1997. [8.] J.R. Quinlan: C4.5, Programs for MachineLearning, Morgan Kaufmann, 1993. [9.] M. Last and O. Maimon, “A Compact and Accurate Model for Classification”, IEEE Transactions on Knowledge and Data Engineering 2004; 16, 2: 203-215. [10.] O. Maimon and M. Last, Knowledge Discovery and Data Mining – The InfoFuzzy Network (IFN) Methodology, Kluwer Academic Publishers, Massive Computing, Boston, December 2000. [11.] UCI Machine Learning Repository [homepage on the Internet]. Arlington: The Association; 2006 [updated 1996 Dec 3; cited 2011 Feb 2]. Available from: http://archive.ics.uci.edu/ml/datasets/Heart+Disease. [12.] Kwong-Sak Leung,kin hong Lee,Jin-Feng Wang,Eddie Y.T.Ng,Henry L.Y.Chan,Stephen K.W.Tsui,Tony S.K. Mok,Pete Chi-Hang Tse, Joseph Jao-yui Sung, Data Mining on DNA Sequences of Hepatitis B virus. IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol 8,No 2,March/April 2011. [13.] Sellappan Palaniappan Rafiah Awang, Intelligent Heart Disease Prediction System Using Data Mining Techniques. IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008. [14.] Han, j. and M. Kamber, Data Mining Concepts and Techniques. 2006: Morgan Kaufmann Publishers. Lee, I.-N., S.-C. Liao, and M. Embrechts, Data. [15.] Bramer, M., Principles of data mining. 2007: Springer. [16.] Esposito, F., D. Malerba and G. Semeraro, A Comparative Analysis of method for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 997. Vol. 19, No. 5.