SlideShare a Scribd company logo
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 6 (Jul. - Aug. 2013), PP 83-86
www.iosrjournals.org
www.iosrjournals.org 83 | Page
A Heart Disease Prediction Model using Decision Tree
Atul Kumar Pandey1
,Prabhat Pandey2
,K.L. Jaiswal3
,Ashish Kumar Sen4
1
Assistant Professor of Computer Science, Department of Physics, Govt. PG Science College, Rewa(M.P.)-
India,
2
OSD, Additional Directorate, Higher Education, Division Rewa (M.P.)-India,
3
Assistant Professor and In charge of BCA, DCA & PGDCA, Department of Physics, Govt. PG Science College,
Rewa(M.P.)-India-486001,
4
Assistant Professor, Department of Mathematics & Computer Science, Govt. PG Science College, Rewa(M.P.)-
India,
Abstract—In this paper, we develop a heart disease prediction model that can assist medical professionals in
predicting heart disease status based on the clinical data of patients. Firstly, we select 14 important clinical
features, i.e., age, sex, chest pain type, trestbps, cholesterol, fasting blood sugar, resting ecg, max heart rate,
exercise induced angina, old peak, slope, number of vessels colored, thal and diagnosis of heart disease.
Secondly, we develop an prediction model using J48 decision tree for classifying heart disease based on these
clinical features against unpruned, pruned and pruned with reduced error pruning approach.. Finally, the
accuracy of Pruned J48 Decision Tree with Reduced Error Pruning Approach is more better then the simple
Pruned and Unpruned approach. The result obtained that which shows that fasting blood sugar is the most
important attribute which gives better classification against the other attributes but its gives not better
accuracy.
Keywords—Data mining, Reduced Error Pruning, Gain Ratio and Decision Tree.
I. Introduction
The diagnosis of heart disease in most cases depends on a complex combination of clinical and
pathological data; this complexity leads to the excessive medical costs affecting the quality of the medical care
[1]. According to the statistic data from WHO, one third population worldwide died from heart disease; heart
disease is found to be the leading cause of death in developing countries by 2010. It shows one third American
adult have one or more types of heart diseases based on American Heart Association report. Computational
biology is often applied in the process of translating biological knowledge into clinical practice, as well as in the
understanding of biological phenomena from the clinical data. The discovery of biomarkers in heart disease is
one of the key contributions using computational biology. This process involves the development of a predictive
model and the integration of different types of data and knowledge for diagnostic purposes.
Furthermore, this process requires the design and combination of different methodologies from
statistical analysis and data mining [2,3].
In the past decades, data mining have played an important role in heart disease research. To find the
hidden medical information from the different expression between the healthy and the heart disease individuals
in the existed clinical data is a noticeable and powerful approach in the study of heart disease classification.
Heart disease classification provides the critical basis for the therapy of patients. Statistics and machine learning
are two main approaches which have been applied to predict the status of heart disease based on the expression
of the clinical data [4,5].
Data mining (DM) is the core stage of knowledge discovery in databases (KDD), which is a "nontrivial
extraction of implicit, novel, and potentially useful information from data” [6]. It applies machine learning and
statistical methods in order to discover areas of previously unknown knowledge. As a rule, the KDD process
involves the following steps: data selection, data pre-processing, transformation, DM (induction of useful
patterns), and interpretation of results. Several data mining techniques are used in the diagnosis of heart disease
such as naïve bayes, decision tree, neural network, kernel density, bagging algorithm, and support vector
machine showing different levels of accuracies.
One of the most common classification models is the decision tree, which is a tree-like structure where
each internal node denotes a test on a predictive attribute and each branch denotes an attribute value. A leaf
node represents predicted classes or class distributions [7]. An unlabeled object is classified by starting at the
topmost (root) node of the tree, then traversing the tree, based on the values of the predictive attributes in this
object. Decision-tree techniques assume that the data objects are described by a fixed set of attributes, where
each predictive attribute takes a small number of disjoint possible values and the target (dependent) variable has
discrete output values, each value representing a class label.
A Heart Disease Prediction Model using Decision Tree
www.iosrjournals.org 84 | Page
There are several known algorithms of decision tree induction: ID3 - which uses information gain with
statistical pre-pruning, C4.5, an advanced version of ID3, and probably the most popular decision-tree algorithm
[8], ART, which minimizes a cost-complexity function, See5 - which builds several models and uses unequal
misclassification costs, and IFN – Info-Fuzzy Network which utilizes information theory to minimize the
number of predictive attributes in a decision-tree model [9] [10]. In [9], the IFN algorithm is shown empirically
to produce more compact models than C4.5, while preserving nearly the same level of classification accuracy.
II. Methods
A. Data sources
In this paper, we use the heart disease data from machine learning repository of UCI [11]. We have
total 303 instances of which 164 instances belonged to the healthy and 139 instances belonged to the heart
disease. 14 clinical features have been recorded for each instance.
B. Features description
Table 1 shows the 14 clinical features and their description.
. Table 1- Clinical features and their description
Clinical features Description
Age Instance age in years
Sex Instance gender
Cp Chest pain type
Trestbps (mmHg) Resting blood pressure
Chol (mg/dl) Serum cholesterol
Fbs Fasting blood sugar
Restecg Resting electrocardiographic results
Thalach Maximum heart rate achieved
Exang Exercise induced angina
Oldpeak ST depression induced by exercise relative to rest
Slope The slope of the peak exercise ST segment
Ca Number of major vessels (0-3)colored by flourosopy
Thal 3 = normal; 6 = fixed defect; 7 = reversible defect
Num Diagnosis of heart disease
In Table 1, there are 14 attributes used in this system, including 8 symbolic and 6 numeric: age (age in
years), sex (male, female), Chest pain type (typical angina, atypical angina, non-angina pain, asymptomatic),
Trestbps (resting blood pressure in mm Hg), cholesterol (serum cholesterol in mg/dl), fasting blood sugar < 120
mg/dl (true or false), resting electrocardiographic results (normal, having ST-T wave abnormality, showing
probable or definite left ventricular hypertrophy by Estes' criteria), max heart rate, exercise induced angina (true
or false), oldpeak (ST depression induced by exercise relative to rest), slope (up, flat, down), number of vessels
colored by fluoroscopy (0-3), thal (normal, fixed defect, reversible defect), and class (healthy, with heart-
disease).
III. Decision Tree
The decision tree type used in this research is the gain ratio decision tree. The gain ratio decision tree is
based on the entropy (information gain) approach, which selects the splitting attribute that minimizes the value
of entropy, thus maximizing the information gain [15]. Information gain [12] is the difference between the
original information content and the amount of information needed. The features are ranked by the information
gains, and then the top ranked features are chosen as the potential attributes used in the classifier. To identify the
splitting attribute of the decision tree, one must calculate the information gain for each attribute and then select
the attribute that maximizes the information gain. The information gain for each attribute is calculated using the
following formula [14, 15]:
E =
Where k is the number of classes of the target attributes Pi is the number of occurrences of class i
divided by the total number of instances (i.e. the probability of i occurring). To reduce the effect of bias
resulting from the use of information gain, a variant known as gain ratio was introduced by the Australian
academic Ross Quinlan [15]. The information gain measure is biased toward tests with many outcomes. That is,
it prefers to select attributes having a large number of values [13]. Gain Ratio adjusts the information gain for
each attribute to allow for the breadth and uniformity of the attribute values.
Gain Ratio = Information Gain / Split Information
Where the split information is a value based on the column sums of the frequency table [15].
A Heart Disease Prediction Model using Decision Tree
www.iosrjournals.org 85 | Page
IV. Results and Discussion
We demonstrate here the usefulness of the prediction model to the clinical data of heart disease where
training instances 200 and testing instances 103 using split test mode.
Table 2- summarizes the results generated from different approches.
J48 Unpruned tree J48 Pruned tree J48 Reduced Error Pruning
Percent Correct 72.82 73.79 75.73
Number Correct 75 76 78
IR Precision 0.68 0.72 0.74
No. of Rules 14 33 11
Tree Size 24 56 17
F Measure 0.78 0.77 0.79
After extracting the decision tree rules, reduced error pruning was used to prune the extracted decision
rules. Reduced error pruning is one of the fastest pruning methods and known to produce both accurate and
small decision rules [16]. Applying reduced error pruning provides more compact decision rules and reduces the
number of extracted rules.
Fig 1- J48 pruned tree with reduced error pruning approach
V. Conclusion
This work will provide better diagnosis the heart patients than the previous. We develop a heart disease
prediction model that can assist medical professionals in predicting heart disease status based on the clinical
data of patients. Firstly, we select 14 important clinical features, i.e., age, sex, chest pain type, trestbps,
cholesterol, fasting blood sugar, resting ecg, max heart rate, exercise induced angina, old peak, slope, number of
vessels colored, thal and diagnosis of heart disease. Secondly, we develop a prediction model using J48 decision
tree for classifying heart disease based on these clinical features against unpruned, pruned and pruned with
reduced error pruning approach. Finally, the accuracy of Pruned J48 Decision Tree with Reduced Error Pruning
Approach is more better then the simple Pruned and Unpruned approach. The result obtained that which shows
that fasting blood sugar is the most important attribute which gives better classification against the other
attributes but its gives not better accuracy.
A Heart Disease Prediction Model using Decision Tree
www.iosrjournals.org 86 | Page
References
[1.] Wu R, Peters W, Morgan MW. The next generation clinical decision support: linking evidence to best practice. J Healthc Inf
Manag, 2002; 16:50-5.
[2.] Thuraisingham BM. A Primer for Understanding and applying data mining. IT Professional 2000; 1:28-31.
[3.] Rajkumar A, Reena GS. Diagnosis of heart disease using datamining algorithm. Global Journal of Computer Science and
Technology 2010; 10:38-43.
[4.] Anbarasi M, Anupriya E, Iyengar NCHSN. Enhanced prediction of heart Disease with feature subset selection using genetic
algorithm. International Journal of Engineering Science and Technology 2010; 2:5370-76.
[5.] Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. International Journal of
Computer Science and Network Security 2008; 8:343-50.
[6.] J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann, 2006.
[7.] T. Mitchell, Machine Learning, McGraw Hill, 1997.
[8.] J.R. Quinlan: C4.5, Programs for MachineLearning, Morgan Kaufmann, 1993.
[9.] M. Last and O. Maimon, “A Compact and Accurate Model for Classification”, IEEE Transactions on Knowledge and Data
Engineering 2004; 16, 2: 203-215.
[10.] O. Maimon and M. Last, Knowledge Discovery and Data Mining – The InfoFuzzy Network (IFN) Methodology, Kluwer Academic
Publishers, Massive Computing, Boston, December 2000.
[11.] UCI Machine Learning Repository [homepage on the Internet]. Arlington: The Association; 2006 [updated 1996 Dec 3; cited 2011
Feb 2]. Available from: http://archive.ics.uci.edu/ml/datasets/Heart+Disease.
[12.] Kwong-Sak Leung,kin hong Lee,Jin-Feng Wang,Eddie Y.T.Ng,Henry L.Y.Chan,Stephen K.W.Tsui,Tony S.K. Mok,Pete Chi-Hang
Tse, Joseph Jao-yui Sung, Data Mining on DNA Sequences of Hepatitis B virus. IEEE/ACM Transactions on Computational
Biology and Bioinformatics, Vol 8,No 2,March/April 2011.
[13.] Sellappan Palaniappan Rafiah Awang, Intelligent Heart Disease Prediction System Using Data Mining Techniques. IJCSNS
International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008.
[14.] Han, j. and M. Kamber, Data Mining Concepts and Techniques. 2006: Morgan Kaufmann Publishers. Lee, I.-N., S.-C. Liao, and M.
Embrechts, Data.
[15.] Bramer, M., Principles of data mining. 2007: Springer.
[16.] Esposito, F., D. Malerba and G. Semeraro, A Comparative Analysis of method for pruning decision trees. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 997. Vol. 19, No. 5.

More Related Content

What's hot

Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data mining
Abheepsa Pattnaik
 
A Heart Disease Prediction Model using Logistic Regression By Cleveland DataBase
A Heart Disease Prediction Model using Logistic Regression By Cleveland DataBaseA Heart Disease Prediction Model using Logistic Regression By Cleveland DataBase
A Heart Disease Prediction Model using Logistic Regression By Cleveland DataBase
ijtsrd
 
HPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningHPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine Learning
Nimai Chand Das Adhikari
 
Prediction of Heart Disease using Machine Learning Algorithms: A Survey
Prediction of Heart Disease using Machine Learning Algorithms: A SurveyPrediction of Heart Disease using Machine Learning Algorithms: A Survey
Prediction of Heart Disease using Machine Learning Algorithms: A Survey
rahulmonikasharma
 
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptxCardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Taminul Islam
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
CHANDRAJEETJHA4
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
shivaniyadav112
 
Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease prediction
Ariful Haque
 
Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction System
ijtsrd
 
heart final last sem.pptx
heart final last sem.pptxheart final last sem.pptx
heart final last sem.pptx
rakshashadu
 
50 Years of Data Science
50 Years of Data Science50 Years of Data Science
50 Years of Data Science
Nafiseh Navabpour
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
Ashish Salve
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction system
SWAMI06
 
Diagnosis of lung cancer prediction system using data mining Classification T...
Diagnosis of lung cancer predictionsystem using data mining Classification T...Diagnosis of lung cancer predictionsystem using data mining Classification T...
Diagnosis of lung cancer prediction system using data mining Classification T...
Toushik Paul
 
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare CommunitiesDisease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Khulna University of Engineering & Tecnology
 
Heart Disease Prediction using Machine Learning Algorithm
Heart Disease Prediction using Machine Learning AlgorithmHeart Disease Prediction using Machine Learning Algorithm
Heart Disease Prediction using Machine Learning Algorithm
ijtsrd
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
Pramod Sharma
 
A Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic RegressionA Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic Regression
ijtsrd
 
Case Study on GINA(Global Innovation Network and Analysis) based on Data Anal...
Case Study on GINA(Global Innovation Network and Analysis) based on Data Anal...Case Study on GINA(Global Innovation Network and Analysis) based on Data Anal...
Case Study on GINA(Global Innovation Network and Analysis) based on Data Anal...
divyawani2
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
amiteshg
 

What's hot (20)

Detection of heart diseases by data mining
Detection of heart diseases by data miningDetection of heart diseases by data mining
Detection of heart diseases by data mining
 
A Heart Disease Prediction Model using Logistic Regression By Cleveland DataBase
A Heart Disease Prediction Model using Logistic Regression By Cleveland DataBaseA Heart Disease Prediction Model using Logistic Regression By Cleveland DataBase
A Heart Disease Prediction Model using Logistic Regression By Cleveland DataBase
 
HPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningHPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine Learning
 
Prediction of Heart Disease using Machine Learning Algorithms: A Survey
Prediction of Heart Disease using Machine Learning Algorithms: A SurveyPrediction of Heart Disease using Machine Learning Algorithms: A Survey
Prediction of Heart Disease using Machine Learning Algorithms: A Survey
 
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptxCardiovascular Disease Prediction Using Machine Learning Approaches.pptx
Cardiovascular Disease Prediction Using Machine Learning Approaches.pptx
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
DISEASE PREDICTION SYSTEM USING DATA MINING
DISEASE PREDICTION SYSTEM USING  DATA MININGDISEASE PREDICTION SYSTEM USING  DATA MINING
DISEASE PREDICTION SYSTEM USING DATA MINING
 
Heart disease prediction
Heart disease predictionHeart disease prediction
Heart disease prediction
 
Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction System
 
heart final last sem.pptx
heart final last sem.pptxheart final last sem.pptx
heart final last sem.pptx
 
50 Years of Data Science
50 Years of Data Science50 Years of Data Science
50 Years of Data Science
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction system
 
Diagnosis of lung cancer prediction system using data mining Classification T...
Diagnosis of lung cancer predictionsystem using data mining Classification T...Diagnosis of lung cancer predictionsystem using data mining Classification T...
Diagnosis of lung cancer prediction system using data mining Classification T...
 
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare CommunitiesDisease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
 
Heart Disease Prediction using Machine Learning Algorithm
Heart Disease Prediction using Machine Learning AlgorithmHeart Disease Prediction using Machine Learning Algorithm
Heart Disease Prediction using Machine Learning Algorithm
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
 
A Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic RegressionA Heart Disease Prediction Model using Logistic Regression
A Heart Disease Prediction Model using Logistic Regression
 
Case Study on GINA(Global Innovation Network and Analysis) based on Data Anal...
Case Study on GINA(Global Innovation Network and Analysis) based on Data Anal...Case Study on GINA(Global Innovation Network and Analysis) based on Data Anal...
Case Study on GINA(Global Innovation Network and Analysis) based on Data Anal...
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 

Similar to A Heart Disease Prediction Model using Decision Tree

Heart failure prediction based on random forest algorithm using genetic algo...
Heart failure prediction based on random forest algorithm  using genetic algo...Heart failure prediction based on random forest algorithm  using genetic algo...
Heart failure prediction based on random forest algorithm using genetic algo...
International Journal of Reconfigurable and Embedded Systems
 
08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan
IAESIJEECS
 
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
IJCSEA Journal
 
A Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction TechniquesA Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction Techniques
ijtsrd
 
Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...
IJECEIAES
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
IAEME Publication
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
IAEME Publication
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
IAEME Publication
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
mlaij
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
mlaij
 
Prediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewPrediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A Review
IRJET Journal
 
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGHEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
IJDKP
 
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET Journal
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...
BASMAJUMAASALEHALMOH
 
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET Journal
 
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET Journal
 
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationAcute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Cemal Ardil
 
Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...
IRJET Journal
 
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
ijdmtaiir
 
238_heartdisease (1).pdf
238_heartdisease (1).pdf238_heartdisease (1).pdf
238_heartdisease (1).pdf
ArpitGupta563794
 

Similar to A Heart Disease Prediction Model using Decision Tree (20)

Heart failure prediction based on random forest algorithm using genetic algo...
Heart failure prediction based on random forest algorithm  using genetic algo...Heart failure prediction based on random forest algorithm  using genetic algo...
Heart failure prediction based on random forest algorithm using genetic algo...
 
08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan08. 9804 11737-1-rv edit dhyan
08. 9804 11737-1-rv edit dhyan
 
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
 
A Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction TechniquesA Survey on Heart Disease Prediction Techniques
A Survey on Heart Disease Prediction Techniques
 
Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...Predicting Heart Ailment in Patients with Varying number of Features using Da...
Predicting Heart Ailment in Patients with Varying number of Features using Da...
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
A data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networksA data mining approach for prediction of heart disease using neural networks
A data mining approach for prediction of heart disease using neural networks
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
 
Prediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A ReviewPrediction of Heart Disease Using Data Mining Techniques- A Review
Prediction of Heart Disease Using Data Mining Techniques- A Review
 
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGHEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
 
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...
 
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
 
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
IRJET- Genetic Algorithm for Feature Selection to Improve Heart Disease Predi...
 
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationAcute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
 
Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...
 
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
 
238_heartdisease (1).pdf
238_heartdisease (1).pdf238_heartdisease (1).pdf
238_heartdisease (1).pdf
 

More from IOSR Journals

A011140104
A011140104A011140104
A011140104
IOSR Journals
 
M0111397100
M0111397100M0111397100
M0111397100
IOSR Journals
 
L011138596
L011138596L011138596
L011138596
IOSR Journals
 
K011138084
K011138084K011138084
K011138084
IOSR Journals
 
J011137479
J011137479J011137479
J011137479
IOSR Journals
 
I011136673
I011136673I011136673
I011136673
IOSR Journals
 
G011134454
G011134454G011134454
G011134454
IOSR Journals
 
H011135565
H011135565H011135565
H011135565
IOSR Journals
 
F011134043
F011134043F011134043
F011134043
IOSR Journals
 
E011133639
E011133639E011133639
E011133639
IOSR Journals
 
D011132635
D011132635D011132635
D011132635
IOSR Journals
 
C011131925
C011131925C011131925
C011131925
IOSR Journals
 
B011130918
B011130918B011130918
B011130918
IOSR Journals
 
A011130108
A011130108A011130108
A011130108
IOSR Journals
 
I011125160
I011125160I011125160
I011125160
IOSR Journals
 
H011124050
H011124050H011124050
H011124050
IOSR Journals
 
G011123539
G011123539G011123539
G011123539
IOSR Journals
 
F011123134
F011123134F011123134
F011123134
IOSR Journals
 
E011122530
E011122530E011122530
E011122530
IOSR Journals
 
D011121524
D011121524D011121524
D011121524
IOSR Journals
 

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
PauloRodrigues104553
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 

Recently uploaded (20)

[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 

A Heart Disease Prediction Model using Decision Tree

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 6 (Jul. - Aug. 2013), PP 83-86 www.iosrjournals.org www.iosrjournals.org 83 | Page A Heart Disease Prediction Model using Decision Tree Atul Kumar Pandey1 ,Prabhat Pandey2 ,K.L. Jaiswal3 ,Ashish Kumar Sen4 1 Assistant Professor of Computer Science, Department of Physics, Govt. PG Science College, Rewa(M.P.)- India, 2 OSD, Additional Directorate, Higher Education, Division Rewa (M.P.)-India, 3 Assistant Professor and In charge of BCA, DCA & PGDCA, Department of Physics, Govt. PG Science College, Rewa(M.P.)-India-486001, 4 Assistant Professor, Department of Mathematics & Computer Science, Govt. PG Science College, Rewa(M.P.)- India, Abstract—In this paper, we develop a heart disease prediction model that can assist medical professionals in predicting heart disease status based on the clinical data of patients. Firstly, we select 14 important clinical features, i.e., age, sex, chest pain type, trestbps, cholesterol, fasting blood sugar, resting ecg, max heart rate, exercise induced angina, old peak, slope, number of vessels colored, thal and diagnosis of heart disease. Secondly, we develop an prediction model using J48 decision tree for classifying heart disease based on these clinical features against unpruned, pruned and pruned with reduced error pruning approach.. Finally, the accuracy of Pruned J48 Decision Tree with Reduced Error Pruning Approach is more better then the simple Pruned and Unpruned approach. The result obtained that which shows that fasting blood sugar is the most important attribute which gives better classification against the other attributes but its gives not better accuracy. Keywords—Data mining, Reduced Error Pruning, Gain Ratio and Decision Tree. I. Introduction The diagnosis of heart disease in most cases depends on a complex combination of clinical and pathological data; this complexity leads to the excessive medical costs affecting the quality of the medical care [1]. According to the statistic data from WHO, one third population worldwide died from heart disease; heart disease is found to be the leading cause of death in developing countries by 2010. It shows one third American adult have one or more types of heart diseases based on American Heart Association report. Computational biology is often applied in the process of translating biological knowledge into clinical practice, as well as in the understanding of biological phenomena from the clinical data. The discovery of biomarkers in heart disease is one of the key contributions using computational biology. This process involves the development of a predictive model and the integration of different types of data and knowledge for diagnostic purposes. Furthermore, this process requires the design and combination of different methodologies from statistical analysis and data mining [2,3]. In the past decades, data mining have played an important role in heart disease research. To find the hidden medical information from the different expression between the healthy and the heart disease individuals in the existed clinical data is a noticeable and powerful approach in the study of heart disease classification. Heart disease classification provides the critical basis for the therapy of patients. Statistics and machine learning are two main approaches which have been applied to predict the status of heart disease based on the expression of the clinical data [4,5]. Data mining (DM) is the core stage of knowledge discovery in databases (KDD), which is a "nontrivial extraction of implicit, novel, and potentially useful information from data” [6]. It applies machine learning and statistical methods in order to discover areas of previously unknown knowledge. As a rule, the KDD process involves the following steps: data selection, data pre-processing, transformation, DM (induction of useful patterns), and interpretation of results. Several data mining techniques are used in the diagnosis of heart disease such as naïve bayes, decision tree, neural network, kernel density, bagging algorithm, and support vector machine showing different levels of accuracies. One of the most common classification models is the decision tree, which is a tree-like structure where each internal node denotes a test on a predictive attribute and each branch denotes an attribute value. A leaf node represents predicted classes or class distributions [7]. An unlabeled object is classified by starting at the topmost (root) node of the tree, then traversing the tree, based on the values of the predictive attributes in this object. Decision-tree techniques assume that the data objects are described by a fixed set of attributes, where each predictive attribute takes a small number of disjoint possible values and the target (dependent) variable has discrete output values, each value representing a class label.
  • 2. A Heart Disease Prediction Model using Decision Tree www.iosrjournals.org 84 | Page There are several known algorithms of decision tree induction: ID3 - which uses information gain with statistical pre-pruning, C4.5, an advanced version of ID3, and probably the most popular decision-tree algorithm [8], ART, which minimizes a cost-complexity function, See5 - which builds several models and uses unequal misclassification costs, and IFN – Info-Fuzzy Network which utilizes information theory to minimize the number of predictive attributes in a decision-tree model [9] [10]. In [9], the IFN algorithm is shown empirically to produce more compact models than C4.5, while preserving nearly the same level of classification accuracy. II. Methods A. Data sources In this paper, we use the heart disease data from machine learning repository of UCI [11]. We have total 303 instances of which 164 instances belonged to the healthy and 139 instances belonged to the heart disease. 14 clinical features have been recorded for each instance. B. Features description Table 1 shows the 14 clinical features and their description. . Table 1- Clinical features and their description Clinical features Description Age Instance age in years Sex Instance gender Cp Chest pain type Trestbps (mmHg) Resting blood pressure Chol (mg/dl) Serum cholesterol Fbs Fasting blood sugar Restecg Resting electrocardiographic results Thalach Maximum heart rate achieved Exang Exercise induced angina Oldpeak ST depression induced by exercise relative to rest Slope The slope of the peak exercise ST segment Ca Number of major vessels (0-3)colored by flourosopy Thal 3 = normal; 6 = fixed defect; 7 = reversible defect Num Diagnosis of heart disease In Table 1, there are 14 attributes used in this system, including 8 symbolic and 6 numeric: age (age in years), sex (male, female), Chest pain type (typical angina, atypical angina, non-angina pain, asymptomatic), Trestbps (resting blood pressure in mm Hg), cholesterol (serum cholesterol in mg/dl), fasting blood sugar < 120 mg/dl (true or false), resting electrocardiographic results (normal, having ST-T wave abnormality, showing probable or definite left ventricular hypertrophy by Estes' criteria), max heart rate, exercise induced angina (true or false), oldpeak (ST depression induced by exercise relative to rest), slope (up, flat, down), number of vessels colored by fluoroscopy (0-3), thal (normal, fixed defect, reversible defect), and class (healthy, with heart- disease). III. Decision Tree The decision tree type used in this research is the gain ratio decision tree. The gain ratio decision tree is based on the entropy (information gain) approach, which selects the splitting attribute that minimizes the value of entropy, thus maximizing the information gain [15]. Information gain [12] is the difference between the original information content and the amount of information needed. The features are ranked by the information gains, and then the top ranked features are chosen as the potential attributes used in the classifier. To identify the splitting attribute of the decision tree, one must calculate the information gain for each attribute and then select the attribute that maximizes the information gain. The information gain for each attribute is calculated using the following formula [14, 15]: E = Where k is the number of classes of the target attributes Pi is the number of occurrences of class i divided by the total number of instances (i.e. the probability of i occurring). To reduce the effect of bias resulting from the use of information gain, a variant known as gain ratio was introduced by the Australian academic Ross Quinlan [15]. The information gain measure is biased toward tests with many outcomes. That is, it prefers to select attributes having a large number of values [13]. Gain Ratio adjusts the information gain for each attribute to allow for the breadth and uniformity of the attribute values. Gain Ratio = Information Gain / Split Information Where the split information is a value based on the column sums of the frequency table [15].
  • 3. A Heart Disease Prediction Model using Decision Tree www.iosrjournals.org 85 | Page IV. Results and Discussion We demonstrate here the usefulness of the prediction model to the clinical data of heart disease where training instances 200 and testing instances 103 using split test mode. Table 2- summarizes the results generated from different approches. J48 Unpruned tree J48 Pruned tree J48 Reduced Error Pruning Percent Correct 72.82 73.79 75.73 Number Correct 75 76 78 IR Precision 0.68 0.72 0.74 No. of Rules 14 33 11 Tree Size 24 56 17 F Measure 0.78 0.77 0.79 After extracting the decision tree rules, reduced error pruning was used to prune the extracted decision rules. Reduced error pruning is one of the fastest pruning methods and known to produce both accurate and small decision rules [16]. Applying reduced error pruning provides more compact decision rules and reduces the number of extracted rules. Fig 1- J48 pruned tree with reduced error pruning approach V. Conclusion This work will provide better diagnosis the heart patients than the previous. We develop a heart disease prediction model that can assist medical professionals in predicting heart disease status based on the clinical data of patients. Firstly, we select 14 important clinical features, i.e., age, sex, chest pain type, trestbps, cholesterol, fasting blood sugar, resting ecg, max heart rate, exercise induced angina, old peak, slope, number of vessels colored, thal and diagnosis of heart disease. Secondly, we develop a prediction model using J48 decision tree for classifying heart disease based on these clinical features against unpruned, pruned and pruned with reduced error pruning approach. Finally, the accuracy of Pruned J48 Decision Tree with Reduced Error Pruning Approach is more better then the simple Pruned and Unpruned approach. The result obtained that which shows that fasting blood sugar is the most important attribute which gives better classification against the other attributes but its gives not better accuracy.
  • 4. A Heart Disease Prediction Model using Decision Tree www.iosrjournals.org 86 | Page References [1.] Wu R, Peters W, Morgan MW. The next generation clinical decision support: linking evidence to best practice. J Healthc Inf Manag, 2002; 16:50-5. [2.] Thuraisingham BM. A Primer for Understanding and applying data mining. IT Professional 2000; 1:28-31. [3.] Rajkumar A, Reena GS. Diagnosis of heart disease using datamining algorithm. Global Journal of Computer Science and Technology 2010; 10:38-43. [4.] Anbarasi M, Anupriya E, Iyengar NCHSN. Enhanced prediction of heart Disease with feature subset selection using genetic algorithm. International Journal of Engineering Science and Technology 2010; 2:5370-76. [5.] Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. International Journal of Computer Science and Network Security 2008; 8:343-50. [6.] J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann, 2006. [7.] T. Mitchell, Machine Learning, McGraw Hill, 1997. [8.] J.R. Quinlan: C4.5, Programs for MachineLearning, Morgan Kaufmann, 1993. [9.] M. Last and O. Maimon, “A Compact and Accurate Model for Classification”, IEEE Transactions on Knowledge and Data Engineering 2004; 16, 2: 203-215. [10.] O. Maimon and M. Last, Knowledge Discovery and Data Mining – The InfoFuzzy Network (IFN) Methodology, Kluwer Academic Publishers, Massive Computing, Boston, December 2000. [11.] UCI Machine Learning Repository [homepage on the Internet]. Arlington: The Association; 2006 [updated 1996 Dec 3; cited 2011 Feb 2]. Available from: http://archive.ics.uci.edu/ml/datasets/Heart+Disease. [12.] Kwong-Sak Leung,kin hong Lee,Jin-Feng Wang,Eddie Y.T.Ng,Henry L.Y.Chan,Stephen K.W.Tsui,Tony S.K. Mok,Pete Chi-Hang Tse, Joseph Jao-yui Sung, Data Mining on DNA Sequences of Hepatitis B virus. IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol 8,No 2,March/April 2011. [13.] Sellappan Palaniappan Rafiah Awang, Intelligent Heart Disease Prediction System Using Data Mining Techniques. IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.8, August 2008. [14.] Han, j. and M. Kamber, Data Mining Concepts and Techniques. 2006: Morgan Kaufmann Publishers. Lee, I.-N., S.-C. Liao, and M. Embrechts, Data. [15.] Bramer, M., Principles of data mining. 2007: Springer. [16.] Esposito, F., D. Malerba and G. Semeraro, A Comparative Analysis of method for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 997. Vol. 19, No. 5.