This document summarizes research on using data mining techniques to predict heart disease. It discusses previous work using classification, clustering, association rule mining and other techniques on several heart disease datasets. Classification algorithms like naive bayes, decision trees and neural networks have been widely used with naive bayes found to often provide the best performance. Feature selection and attribute reduction are also examined. The document provides an overview of the key steps and techniques in medical data mining and predictive analysis for heart disease.
Protein Structure - threading Protein modelling pptx
Survey on data mining techniques in heart disease prediction
1. Presented by
S. Sivagowry
Research scholar
Bharathidasan university,
Trichy
Under the Guidance of
Dr. M.Durai Raj,
Assistant Professor
School of Computer Science and Engineering,
Bharathidasan University,
Trichy
2. Data Mining
• Exploration of large data sets to extract hidden and
previously unknown pattern, etc.,
• Two tasks:
Predictive Tasks
Descriptive Tasks
• Predictive tasks
predict the value of specific attribute based on other
attribute
Classification, Regression and Deviation Deduction
3. ContD..
• Descriptive Tasks
– Derive pattern that summarize the relationship between data
– Clustering, Association rule Mining and Sequential Pattern
Discovery
• Steps in Data Mining
Data Cleaning, Data Integration, Data Selection, Data
transformation, Data Mining, Pattern Evaluation and
Knowledge Representation
4. ContD..
Medical Data mining
Involves lot of accuracy and uncertainty
Quality service at affordable cost is a major challenge
Data is massive
Decision based on doctor’s experience may fail in some
cases
Data Mining in health care – an intelligent diagnostic tool
5. Heart Disease
29.2% of death is due to Cardio Vascular Disease
CVD – leading cause for death in developing countries.
7. ContD…
Collected from University of California, Irvine (UCI).
Cleveland data set, Hungary data set, Switzerland data
set, Long beach and statlog data set
76 attributes
14 are used
8. Data Mining techniques in heart Disease PreDiction
Clustering
Classification
Regression
Association Rule Mining
9. Data Mining anD association rules
Carlos Ordonez and et. Al.,[7] used a simple mapping
algorithm.
Treats numerical or categorical attributes as uniform.
Decision tree is incapable – it automatically split
numerical value. (Medical data are in numerical format )
Interpreting experimental result by D.T is difficult
Clustering medical data deserves further research
Justify the use of A.R in Medical data
10. contD…Deepika [11] used Pruning Classification Association
Rule (PCAR).
PCAR comes from Apriori algorithm.
Deletes minimum frequency item with minimum
frequency item sets.
Deletes infrequent item from item sets.
Classifies item based on frequency of item sets and
discovers frequent item sets.
11. Data Mining anD classification
Usha Rani[38], used ANN in heart disease using feed
forward and back propagation algorithm.
Experiment by single and multi layered neural network
models.
Parallelism is implemented to speed up learning process.
Neural network provides satisfactory results
12. contD….
In [3], Classification is based on Supervised machine
learning Algorithm.
Tanagara tool is used to classify data
Evaluation by using 10 fold cross validation.
The performance is analysed based on accuracy and time
taken to build the model.
Naïve bayes is the better algorithm
The table below shows the perfomance study of algorithm
14. contD..
In [24], novel neuro fuzzy techniques is used.
Preprocess by using Genetic Algorithm(GA).
A four layered fuzzy neural network is used.
Radial Basic Function neural network is constructed with
5 input, training and normalization in hidden layer and
output layer with 1 node.
In [25], Intelligent Heart Disease Prediction System
(IHDPS )is proposed using Decision Tree, NB and Neural
network.
NB is the most effective one.
15. Contd..
In [1], GA is used to determine the number of attributes.
NB, D.T., Classification by Clustering are compared.
DT takes more time to build the model.
NB performs consistently before and after reduction
of attributes.
CVC is poor in performance
16. Contd..
In [30], k-means clustering algorithm is used.
Maximal Frequent Item Set Algorithm (MAFIA) is used.
Multilayer perception network and back propagation algorithm is used as
training algorithm.
Pseudo code for MAFIA [29]:
MAFIA(C, MFI, Boolean IsHUT) {
name HUT = C.head C.tail;
if HUT is in MFI
stop generation of children and return
Count all children, use PEP to trim the tail, and recorder by increasing support,
For each item i in C, trimmed_tail {
IsHUT = whether i is the first item in the tail
newNode = C I
MAFIA (newNode, MFI, IsHUT)}
if (IsHUT and all extensions are frequent)
Stop search and go back up subtree
If (C is a leaf and C.head is not in MFI)
Add C.head to MFI
}
17. Contd…
In [35], Naïve Bayes is used for predicting Decision Support in heart
disease prediction System.
NB is found to be best in heart disease prediction.
It can be used as a tool for training nurses and medical students for
diagnosing.
It provides new ways of understanding and exploring the data.
In [6], NB Classification can be used as a best decision support system.
In [10], hybridization is used to train the neural network using GA. Feed
forward and Back propagation is used as a learning algorithm.
When two more attributes are added with existing attributes, Neural
Network shows better performance in both the cases.
18. Contd..
RIPPER, SVM, Decision Tree and ANN are compared
based on Sensitivity, Specificity, Accuracy, Error Rate,
TP AND FP Rate. [20]
SVM predicts with least error rate and higher
accuracy.
DM with Fuzzy Logic reduces the number of attributes
and number of tests for the patients.[21]
19. data Mining and
Clustering
K-means clustering algorithm is used for the prediction of the
heart disease[4].
Euclidean distance formula is used
NB is slow and Neural network takes number of iterations.
Performance of clustering and classification algorithm is
compared [28].
NB predicts with highest accuracy than Clustering Algorithm.
20. ConClusion
Classification task plays a vital role when compared with
Clustering, Association Rule and Regression.
In Classification, each techniques has its own merits and
demerits.
Reduction of attributes is considered.
Hybridization of Classification with Fuzzy Logic can predict
with highest accuracy.
21. reFerenCes
1. Anbarasi.M, Anupriya and Iyengar “Enhanced Prediction of Heart Disease with Feature Subset Selection using
Genetic Algorithm”, International Journal of Engineering and Technology, Vol 2(10), 2010, pp 5370-5376.
2. Annoj P.K.,” Clinical decision support system: Risk level prediction of heart disease using Data Mining
Algorithms”, Journal of King Saud University- Computer and Information Sciences, 2012,pp 27-40.
3. Asha Rajkumar and Mrs. Sophia Reena, “ Diagnosis of Heart Disease using Data Mining Algorithms, Global
Journal of Computer Science and Technology, vol. 10(10), 2010, pp 38-43.
4. Bala Sundar V, “Development of Data Clustering Algorithm for predicting Heart”, IJCA, Vol 48(7), June 2012,
pp 8-13.
5. Bhagyashree Ambulkar and Vaishali Borkar “Data Mining in Cloud Computing”, MPGINMC, Recent Trends in
Computing, ISSN 0975-8887,2012, pp 23-26.
6. Bhuvaneswari. R, “Naïve Bayesian Classification Approach in Health Care Application”, International Journal
of Computer Science and Telecommunication, vol 3(1), Jan 2012, pp 106-112.
7. Carlos Ordonez, Edward Omincenski and Levien de Braal “Mining Constraint Association Rules to Predict Heart
Disease”, Proceeding of 2001, IEEE International Conference of Data Mining, IEEE Computer Society, ISBN-0-
7695-1119-8, 2001, pp: 433-440.
8. Cengiz colak.M , Cemiz colak and Hasan Kocatruk “Predicting coronary artery disease using different artificial
neural network models”, CAD and Artificial neural network, pp 249-254, 2008.
9. Chaltrali S. Dangare and Sulabha, “Improved Study of Heart Disease Prediction System using Data Mining
Classification Techniques”, IJCA, Vol 47(10), pp 44-48, June 2012.
10. Chen A.H., “HDPS: Heart Disease Prediction System”, Computing in Cardiology, ISSN 0276-6574, pp 557-560,
2011.
22. 11. Deepika. N, “Association Rule for Classification of Heart Attack patients”, IJAEST, Vol 11(2), pp 253-
257, 2011.
12. Jabbar M.A., “Knowledge discovery from mining association rules for Heart disease Prediction”,
JATIT, Vol 41(2), pp 166-174, 2012.
13. Jyothi Soni, Uzma ansari and Dipesh Ansari “Intelligent and Effective Heart Disease Prediction System
using Weighted Associate Classifer”, IJCSE, Vol 3(6), pp 2385-2392, June 2011.
14. K.Rajeswari, “Prediction of Risk Score for Heart Disease in India using Machine Intelligence”,IPCSIT,
Vol 4, 2011.
15. Kavitha K.S, “Modeling and designing of evolutionary neural network for heart disease prediction”,
IJCSI, Vol 7(5), pp 272-283, September 2010.
16. Latha Parthiban and R.Subramanian, “Intelligent Heart Disease Prediction System using CANFIS and
Genetic Algorithm”, International Journal of Biological and Life Sciences, Vol 3(3), pp157-160,2007.
17. Liangxiao. J, Harry.Z, Zhihua.C and Jiang.S “One Dependency Augmented Naïve Bayes”, ADMA, pp
186-194, 2005.
18. Mia Shouman, “Using data mining techniques in heart disease diagnosis and treatment”, 978-1-4673-
0483-2, Japan-Egypt Conference on Electronics, Communications and Computers, pp 189-193, 2012.
19. Milan Kumari and Sunila Godara, “Review of Data Mining Classification Model in Cardio Vascular
Disease diagnosis”, IJCA, 2011.
20. Milan Kumari and Sunila Godara, “Comparative Study of Data Mining Classification Methods in
Cardio-Vascular Disease Prediction”, IJCST, Vol 2(2), June 2011.
23. 21. Nidhi Bhatia and Kiran Jyothi, “A Novel Approach for heart disease diagnosis using Data Mining and Fuzzy
logic”, IJCA, Vol 54(17), pp 16-21, September 2012.
22. Nithya N.S, Sarumathi. S and Dr. Duraisamy. K “ Assessment of the risk factors of Heart Attack using frequent
feature Selection Method”, International Journal of Communications and Enggineering, Vol 1(1), ISSN 0988-
0382, pp 127-133, March 2012.
23. Qeethara Kadhim Al. Shayea, “Artificial neural network in Medical Diagnosis”, IJCSI, Vol 3(2), March 2011.
24. R. Setthukkarase and Kannan “An Intelligent System for mining Temporal rules in Clinical database using Fuzzy
neural network”,European Journal of Scientific Research, ISSN 1450-216, Vol 70(3), pp 386-395, 2012.
25. Rafiah Awang and Palaniappan. S “Intelligent Heart Disease Prediction System Using Data Mining techniques”,
IJCSNS, Vol 8(8), pp 343-350, Aug 2008.
26. Rafiah Awang and Palaniappan. S “Web based Heart Disease Decision Support System using Data Mining
Classification Modeling techniques” , Proceedings of iiWAS, pp 177-187, 2007.
27. Raghu. D.Dr, “Probability Based Heart Disease Prediction using Data Mining Techniques”, IJCST, Vol 2(4), pp
66-68, Dec 2011.
28. Santhi. P, “Improving the performance of Data Mining Algorithm in Health Care data”, IJCST, Vol 2(3), 2011.
29. Setiawan N.A, “ Rule Selection for Coronary Artery Disease Diagnosis Based on Rough Set” ,International
Journal of Recent Trends in Engineering, Vol 2(5), pp 198-202, Dec 2009.
30. Shantakumar B.Patil, “Intelligent and Effective Heart Attack Prediction System using Data Mining and Artifical
Neural Network”, European Journal of Scientific Research, Vol 31(4), pp 642-656, 2009.
24. 31. Shanthakumar B. Patil, “Extraction of Significant patterns from Heart Disease Ware Houses for Heart
Attack Prediction”, IJCSNS, Vol 9(2), pp 228-235, Feb 2009.
32. Shouman.M, Turner.T and Stocker.R, “Applying K-Nearest Neighbour in diagnosing Heart Disease
Patients”, International Journal of Information and Education Technology, Vol 2(3), June 2012.
33. Siri Krishnan Wasan, Vasutha Bhatnagar and Harleen Kaur “The Impact of Data Mining techniques on
medical diagnostics”, Data Science Journal, Vol 5(19), pp 119-126, October 2006.
34. Srinivas, Kavitha Rani and Dr. Govarthan, “Application of Data Mining Techniques in Health Care and
Prediction of Heart Attack”, IJCSE, Vol 2(2), pp 250-255, 2010.
35. Subbulakshmi, Ramesh and Chinna Rao “Decision Support in Heart Disease Prediction System using
Naïve Bayes”, IJCSE, ISSN 0976-5166, Vol 2(2), May 2011.
36. Sudha.A, Gayathri.p and Jaishankar. N “Utilization of Data Mining Approaches for prediction of life
Threatening Disease Survivability”, IJAC (0975-8887), Vol 14(17), March 2012.
37. Jyothi. S, Ujma.A, Dipesh. S and Sunita. S “Predictive Data Mining for Medical Diagnosis : An Overview
of Heart Disease Prediction”, IJCA, Vol 17(8), pp 43-48, March 2011.
38. Usha. K Dr, “Analysis of Heart Disease Dataset using Neural network approach”, IJDKP, Vol 1(5), Sep
2011.