SlideShare a Scribd company logo
1 of 10
Download to read offline
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
DOI:10.5121/ijcsit.2018.10102 11
AN ALGORITHM FOR PREDICTIVE DATA
MINING APPROACH IN MEDICAL DIAGNOSIS
Shakuntala Jatav1
and Vivek Sharma2
1
M.Tech Scholar, Department of CSE, TIT College, Bhopal
2
Professor, Department of CSE, TIT College, Bhopal
ABSTRACT
The Healthcare industry contains big and complex data that may be required in order to discover
fascinating pattern of diseases & makes effective decisions with the help of different machine learning
techniques. Advanced data mining techniques are used to discover knowledge in database and for medical
research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more
number of input attributes. The data mining classification techniques, namely Support Vector
Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database.
The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well
as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the
experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease
respectively.
KEYWORDS
Data Mining, Clinical Decision Support System, Disease Prediction, Classification, SVM, RF.
1. INTRODUCTION
Computational health informatics is rising research topic that involving varied sciences like
biomedical, medical, nursing, data technology, computer science, and statistics [1]. Data mining
techniques are applied to predict the effectiveness of huge and complex clinical data in order to
diagnose disease and extract information to suggest effective medical assistance [2]. In bioscience,
doctor’s facilities introduced different knowledge frameworks with plenty of data to manage
medical insurance and patient information however unfortunately, knowledge don't seem to be
mined to find hidden data for effective decision [2][3].
Clinical test outcomes are often created on the basis of doctor’s perception and experience instead
of on the knowledge enrich data masked within the database and generally this procedure prompts
unintended predispositions, doctor’s experience might not be capable to diagnose it accurately that
affects the disease diagnosis system [2][3]. In aid sector, the term data mining will mean to
research the clinical data to predict patient’s health status. Therefore discovering fascinating
pattern from healthcare information, different data mining techniques are applied with statistical
analysis, machine learning and database technology.
Predictive systems are the systems that are wont to predict some outcome on the basis of some
pattern recognition, as shown in figure 1. Disease detection is that the method by which patient’s
diagnosis is performed on the basis of symptoms analyzed which may causes difficulty while
predicting disease affect [4]. As an example, fever itself could be a symptom of the many disorders
that doesn’t tell the healthcare professional what exactly the disease is. Because the results or
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
12
opinion vary from one physician to a different, there's a requirement to help a medical physician
which will have similar opinion certainly symptoms and disorders [5]. This may be done by
analyzing the information generated by medical data or medical records.
Figure 1: A Typical Health Informatics Processing
As a result, the new information is often compared with previous records and optimistic diagnosis
is often done. Predictive medical diagnosis could be a net application which is able to predict a
selected disorder on the basis of symptoms and supply diagnosis for same disorder which is able to
be detected by rule. Healthcare professionals use their previous data and insights to reach a
particular decision regarding any disease or disorder [6]. Within the similar manner, this paper
proposes different classification techniques for diagnosis by using generic disease datasets. This
paper brings into limelight all the benefits and drawbacks of using the various data mining
techniques for the prediction of diseases. It conjointly accounts for the prediction rate for various
techniques therefore, bringing out the comparison between each of them [7]-[10].
Data mining has been successfully used in knowledge discovery for predictive purposes to make
more active and accurate decision [11]. The main focus of the paper is on classification as well as
clustering techniques. In clustering process such as K-means, EM, Fuzzy c-means, etc, data is
partitioned into sets of clusters or sub-classes [12].Machine learning techniques such as KNN,
SVM, Naïve bayes etc, can be used to classify different objects on the basis of a training set of data
whose outcome value is known.
Clustering Techniques
The clustering process divides the data into cluster groups or subclasses. We used four clustering
algorithms, namely K-Means, EM, PAM, Fuzzy C-Means [12]. The K-Means classification
algorithm works by partitioning n observations in k-subclasses defined by centroids, where k is
chosen before the algorithm begins. K-Means and EM are two iterative algorithms. EM
(expectation-maximization) is a statistical model that depends on the unobserved latent variables to
estimate the maximum likelihood parameters. Partitioning around medoids (MAP) is similar to K-
means that partitioning is based on the K-medoids method, which divides data into a number of
disjoint clusters [12]. In fuzzy clustering, data elements can belong to multiple clusters. This is also
called soft clustering.
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
13
Classification Techniques
Machine learning based classification techniques can be used to classify various objects based on a
series of training data whose result value is known. In this study four classification algorithms are
used: KNN, SVM, Naive Bayes and C5.0. In the nearest neighbor KN, the object is classified by
the majority of its neighbors, with the object being assigned to the most commonly used class
among its nearest neighbors. In SVM (Support Vector Machines), data is first converted into a set
of points and then classified into classes that can be separated linearly. The Naive Bayes model
calculates the probability of a set of data that can belong to a class using the Bayes rule. The C5.0
algorithm is a decision tree that recursively separates observations in branches to form a tree to
improve prediction accuracy. It is an improved version of the C4.5 and ID3 algorithms [10]. It also
provides the powerful gain method to increase the accuracy of this classification algorithm [11].
2. RELATED WORK
In [1] neural networks, decision tree and naïve bayes machine learning approach is used to
diagnose heart disease. For optimized feature selection genetic algorithm is used and obtained an
accuracy of 100%, 99.62 and 90.74 respectively.
In [2], performed a prediction of heart disease detection using neural network with genetic
algorithm based feature extraction. Back propagation based neural network weight optimization by
Genetic algorithm is designed and obtained the 89% accuracy of prediction of heart disease.
In [3], author developed a heart disease prediction system using an approach of ANN with LVQ
and achieved accuracy of about 80%, sensitivity of about 85% sensitivity as well as specificity of
about 70%.
In [4] proposed a heart disease diagnosis is proposed using lazy data mining approach with data
reduction strategies i.e. principal component analysis is used to get category association rules. The
result analysis shows that J4.8 has 10.26 enhancement as well as 8.6% enhancement over naïve
bayes.
In [5] presented a heart disease prediction system using data mining approach with two additional
features i.e. obesity and smoking to boost the prediction rate. Neural networks, Decision trees and
Naive Bayes was used in for predicting heart disease with an accuracy of 99.25%, 94.44% and
96.66% respectively.
A web based application has introduced in [6] using Naïve Bayesian algorithm which took
symptoms from user and gave the diagnosis result to the user or patient. In [7] Association rule
mining technique was used for diagnosis of diabetes. The authors concluded that the data mining
techniques when used appropriately increases the computation and also the classification
performance. These rules have the potential to improve the expert system and to make better
clinical decision making. For predicting diabetes disease on weka tool, author in research work [8]
had presented a comparison between Naïve bayes algorithm and decision tree algorithm and
achieved system accuracy of about 79.56% and 76.96% respectively.
In [9] Decision Tree, Naive Bayes, and NBTree algorithms is used for liver disease detection with
10 features. The result analysis with respect to accuracy NBTree algorithm has the highest
accuracy whereas with respect to computational time Naive Bayes algorithm performs better.
In [11] author performed a comparative analysis on clustering and classification algorithms. The
result analysis shows that the classification is better than clustering algorithms with an accuracy of
about 81%.
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
14
In [12] author proposed classification with clustering technique i.e. KNN with FCM clustering and
F-KNN with FCM clustering. It is clear that the Fuzzy KNN with Fuzzy c-means model produced
the better result than the KNN with Fuzzy c-means model on both PIMA and Liver-disorder
datasets. It is also clear that the use of Fuzzy c-means clustering algorithm for preprocessing of
datasets improved the result in terms of classification accuracy and speed by reducing the number
of tuples from the original datasets. From experiment, it is been found that KNN with Fuzzy c-
means have accuracy of 97.02 and Fuzzy KNN with Fuzzy c-means have accuracy of 99.25 on
PIMA dataset whereas on Liver disorder KNN with Fuzzy c-means have accuracy of 96.13 and
Fuzzy KNN with Fuzzy c-means with accuracy of 98.95.
In [13] performed the chronic disease prediction by using data mining approach such as Naïve
Bayes, Decision tree, Support Vector Machine (SVM) and Artificial Neural Networks (ANN) for
the diagnosis of diabetes and heart disease. The result analysis shows that SVM gives highest
accuracy of 95.556% in case of heart disease and Naïve bayes gives accuracy of 73.588% in case
of diabetes. Table I gives the comparative analysis of different existing techniques for heart, liver
and diabetes diseases.
Table 1: Comparative Analysis of Different Techniques in terms of Accuracy
Name of Author Technique Accuracy
Bhatla et al.
Neural Network, Naïve Bayes
and Decision Tree for heart
disease detection.
Neural networks = 100 %
Decision tree= 99.62 %
Naïve Bayes 90.74 %
Amin et al.
Optimized Neural Network for
heart disease detection.
Training data was = 89%
Validation data = 96.2%.
Chen et al.
ANN based heart disease
detection.
ANN = 80%.
Dangare et al.
Decision trees, Neural networks
and Naive Bayes for heart
disease detection.
Neural Networks = 99.25%
Naive Bayes = 94.44 %
Decision Tree =96.66 %
Iyer et al.
Decision tree and Naïve bayes
algorithm for diabetes detection.
Decision Tree =76.96%
Naïve Bayes= 79.56%
Uma Ojha and
Savita Goel
Decision Tree, SVM, FCM
Decision tree (C5.0) =81%
SVM =81%
FCM = 37%
Chetty et al.
KNN and F-KNN for diabetes
and liver disease detection.
KNN = 97.02%
F-KNN = 99.25% for diabetes data.
KNN = 96.13%
F-KNN =98.95% for liver disease data.
Kumari Deepika
and Dr. S. Seema
Naïve Bayes, Decision tree,
Support Vector Machine (SVM)
and Artificial Neural Networks
(ANN) for diabetes and heart
disease detection.
SVM = 95.556% (heart disease)
Naïve Bayes = 73.588% (diabetes).
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
15
3. METHODOLOGY
3.1 Proposed Methodology
One of the interesting and important subjects among researchers in the field of medical and
computer science is diagnosing illness by considering the features that have the most impact on
recognitions. The subject discusses a new concept which is called Medical Data Mining (MDM).
Indeed, data mining methods use different ways such as classification and clustering to classify
diseases and their symptoms which are helpful for diagnosing.
A disease diagnosis system is created in order to predict different diseases such as diabetes,
kidney disease as well as liver disease, etc. System’s workflow is discussed below:
Step 1: Through the proposed application user (doctor, patient, physician etc.) can input the
attribute values of disease and send it to the decision support system for analysis.
Step 2: At decision support system, dataset of different diseases are loaded and apply data mining
algorithms to train dataset. Requested user inputs are collected and processed on server to predict
the diagnosis result.
Step 3: For analyzing healthcare data, major steps of data mining approaches like preprocess data,
replace missing values, feature selection, machine learning and make decision are applied on train
dataset. On the decision support system end different classification algorithms would be executed
on train dataset and ready to classify the test dataset.
In the proposed algorithm Support Vector Machine and Random Forest is used to give clustering
level for different subspaces. The voting model will ensemble all these results and output the final
classification result. Finally, the predicted results will be compared with true labels of the testing
phase.
Figure 2: Proposed Model
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
16
Support Vector Machine
Support vector machine is a machine learning approach that can be used as classifier as well as
for regression. SVM classifies the data into different classes by finding hyperplane (line) which
separates training data into classes. SVM does not overfit the data and gives best classification
performance in terms of precision and accuracy.
SVM does not make any strong assumptions on data. It shows more efficiency for correct
classification of the future data. SVM is classified into 2 categories i.e. Linear and non-Linear. In
linear approach, training data is separated by line i.e. hyperplane.
Random Forest
Random Forest algorithm is capable of performing each classification and regression tasks. the
fundamental principle of RF is that a group of weak learner’s come together to create a robust
learner. Random forest rule uses bagging approach to form the bunch of decision trees with
random set of the information. The model is trained few times on random sample of the dataset to
attain best prediction performance from the RF rule. In this ensemble technique of learning, the
output of all decision trees within the RF is combined to form a final prediction. The final
prediction of the RF rule is derived after polling the results of every decision tree.
Suppose there are N cases within the training set. Then these N samples are taken randomly
however with replacement. These samples are training set for growth of tree. If m < M is specific.
The simplest split of this m is employed to separate the node. The value of m is constant whereas
growing the forest.
3.2 Proposed Algorithm
Input: D {Clininal data};
Output: Label {Disease Label};
Patient’s Label{Normal, Disease}
Step1: Pre-processing and data cleansing
Step2: For each instance in D, do
Find feature vector (V)
Step 3: For each V do
Data clustering using FCM and split data in two halves and classify data using SVM and RF
algorithm
Step 4: Determine the total class label
Find
True_positive (TP)
True_negative (TN)
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
17
False_positive (FP)
False_negative (FN)
Step 5: Find Performance Parameters
Step 6: Predict Disease Class as
if ( class=1) Patient=Normal State
else_if(class =0) Patient=Disease State
end for
3.3 Dataset Description
Pima Indians Diabetes Database
This study used data sets from the Pima Indians Diabetes Database of National Institute of
Diabetes [14]. This dataset consists of 768 samples with 8 numerical valued attribute where 500
are tested negative and 268 are tested positive instances.
Chronic Kidney Disease Dataset
This study used data sets from the university of California Irvine (UCI) repository. The data set
contains 400 patients, where 250 patients were positively affected by kidney disease and as many
as 150 patients do not suffer from kidney disease.
Liver Disorders Data Set
This study used data sets from the university of California Irvine (UCI) repository. This data set
contains 416 liver patient records and 167 non liver patient records collected from North East of
Andhra Pradesh, India.
3.4 Performance Measures
In this study, we used three performance measures: Precision, Accuracy, Recall, F_measure and
Total execution time.
Accuracy is termed as ratio of the number of correctly classified instances to the total number of
instances.
Accuracy = (TP+TN)/(TP+TN+FP+FN)
Precision is the ration of actually true predicted instances out of the total true instances.
Precision = TP/(TP+FP)
Recall is the ratio of actual true instances out of all the items which are true.
Recall = TP/(TP+FN)
F-measure is the harmonic mean of both precision and recall.
F_Measure = 2*(Precision*Recall)/(Precision + Recall)
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
18
Where TP, TN, FP and FN denote true positives, true negatives, false positives and false
negatives respectively.
4. RESULT ANALYSIS
Below Table 2 and 3 as well as Figure 3 shows the comparative analysis of proposed algorithm
with some existing algorithms.
Table 2: Result Analysis of Diabetes Disease Detection
Diabetes Disease Detection
Recall Precision Accuracy F_measure
1 0.9821 0.9935 0.991
Table 3: Comparative Result Analysis of Diabetes Disease Detection
Accuracy Measurement
Existing Work [14] 90.43%
Proposed Work 99.35%
Figure 3: Comparative Chart of Diabetes Disease Detection
Below Table 4, 5 and 6 shows the parameter values for different diseases such as diabetes disease,
kidney disease as well as liver disease. Similarly figure 4 and 5 shows the corresponding
parametric chart of different disease detection using proposed algorithm.
Table 4: Result Analysis of Kidney Disease Detection
Kidney Disease Detection
Recall Precision Accuracy F_measure
1 0.9875 0.9937 0.9937
Table 5: Result Analysis of Liver Disease Detection
Liver Disease Detection
Recall Precision Accuracy F_measure
0.9667 1 0.9914 0.9831
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
19
Figure 4: Parameter Comparison Chart of Different Disease Detection
Figure 5: Time Comparison Chart of Different Disease Detection
5. CONCLUSION
This research paper is mainly focused to predict disease possibility using data mining or machine
learning approach in order to enhance the accuracy or precision of the disease detection expert
system. This paper also shows the related work study of different approaches such as neural
network, naïve bayes, SVM, KNN, FCN, etc and it is concluded that SVM gives the best
performance as compared to the other existing techniques. As a result of study the proposed
algorithm is designed using SVM and RF algorithm and the experimental result shows the
accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively. In future
using data mining approach a new optimized intelligent system can be designed which can give
accurate and efficient result.
International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
20
REFERENCES
[1] Nidhi Bhatla, Kiran Jyoti, “An Analysis of Heart Disease Prediction using Different Data Mining
Techniques”,IJERT,Vol 1, Issue 8, 2012.
[2] Syed Umar Amin, Kavita Agarwal, Rizwan Beg, “Genetic Neural Network based Data Mining in
Prediction of Heart Disease using Risk Factors”, IEEE, 2013.
[3] A H Chen, S Y Huang, P S Hong, C H Cheng, E J Lin, “HDPS: Heart Disease Prediction System”,
IEEE, 2011.
[4] M. Akhil Jabbar, B. L Deekshatulu, Priti Chandra, “Heart Disease Prediction using Lazy Associative
Classification”, IEEE, 2013.
[5] Chaitrali S. Dangare, Sulabha S. Apte, “Improved Study of Heart Disease Prediction System using
Data Mining Classification Techniques”, IJCA, Volume 47– No.10, June 2012.
[6] P. Bhandari, S. Yadav, S. Mote, D.Rankhambe, “Predictive System for Medical Diagnosis with
Expertise Analysis”, IJESC, Vol. 6, pp. 4652-4656, 2016.
[7] Nishara Banu, Gomathy, “Disease Forecasting System using Data Mining Methods”, IEEE Transaction
on Intelligent Computing Applications, 2014.
[8] A. Iyer, S. Jeyalatha and R. Sumbaly, “Diagnosis of Diabetes using Classification Mining Techniques”,
IJDKP, Vol. 5, pp. 1-14, 2015.
[9] Sadiyah Noor Novita Alfisahrin and Teddy Mantoro, “Data Mining Techniques for Optimatization of
Liver Disease Classification”, International Conference on Advanced Computer Science Applications
and Technologies, IEEE, pp. 379-384, 2013.
[10] A. Naik and L. Samant, “Correlation Review of Classification Algorithm using Data Mining Tool:
WEKA, Rapidminer , Tanagra ,Orange and Knime”, ELSEVIER, Vol. 85, pp. 662-668, 2016.
[11] Uma Ojha and Savita Goel, “A study on prediction of breast cancer recurrence using data mining
techniques”, International Conference on Cloud Computing, Data Science & Engineering, IEEE, 2017.
[12] Naganna Chetty, Kunwar Singh Vaisla, Nagamma Patil, “An Improved Method for Disease Prediction
using Fuzzy Approach”, International Conference on Advances in Computing and Communication
Engineering, IEEE, pp. 568-572, 2015.
[13] Kumari Deepika and Dr. S. Seema, “Predictive Analytics to Prevent and Control Chronic Diseases”,
International Conference on Applied and Theoretical Computing and Communication Technology,
IEEE, pp. 381-386, 2016.
[14] Emrana Kabir Hashi, Md. Shahid Uz Zaman and Md. Rokibul Hasan, “An Expert Clinical Decision
Support System to Predict Disease Using Classification Techniques”, IEEE, 2017.

More Related Content

What's hot

Predicting students performance in final examination
Predicting students performance in final examinationPredicting students performance in final examination
Predicting students performance in final examinationRashid Ansari
 
Data Analytics in Healthcare
Data Analytics in HealthcareData Analytics in Healthcare
Data Analytics in HealthcareMark Gall
 
Breast cancer detection using Artificial Neural Network
Breast cancer detection using Artificial Neural NetworkBreast cancer detection using Artificial Neural Network
Breast cancer detection using Artificial Neural NetworkSubroto Biswas
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease predictionKOYELMAJUMDAR1
 
Disease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemDisease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemsabafarheen
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning JAVAID AHMAD WANI
 
IRJET- Disease Prediction using Machine Learning
IRJET-  	  Disease Prediction using Machine LearningIRJET-  	  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 
Data Management - a top Priority for Healthcare Practices
Data Management - a top Priority for Healthcare PracticesData Management - a top Priority for Healthcare Practices
Data Management - a top Priority for Healthcare PracticesData Dynamics Inc
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.SUJIT SHIBAPRASAD MAITY
 
Diabetes prediction using different machine learning approaches
Diabetes prediction using different machine learning approachesDiabetes prediction using different machine learning approaches
Diabetes prediction using different machine learning approachesCloudTechnologies
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learningmohdshoaibuddin1
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysissneha penmetsa
 
Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Arun K
 
Student Management System
Student Management SystemStudent Management System
Student Management SystemAmit Gandhi
 
Project synopsis on online voting system
Project synopsis on online voting systemProject synopsis on online voting system
Project synopsis on online voting systemLhakpa Yangji
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 

What's hot (20)

Predicting students performance in final examination
Predicting students performance in final examinationPredicting students performance in final examination
Predicting students performance in final examination
 
Data Analytics in Healthcare
Data Analytics in HealthcareData Analytics in Healthcare
Data Analytics in Healthcare
 
Breast cancer detection using Artificial Neural Network
Breast cancer detection using Artificial Neural NetworkBreast cancer detection using Artificial Neural Network
Breast cancer detection using Artificial Neural Network
 
Tailor Management System
Tailor Management SystemTailor Management System
Tailor Management System
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
Disease prediction and doctor recommendation system
Disease prediction and doctor recommendation systemDisease prediction and doctor recommendation system
Disease prediction and doctor recommendation system
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning
 
IRJET- Disease Prediction using Machine Learning
IRJET-  	  Disease Prediction using Machine LearningIRJET-  	  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Data Management - a top Priority for Healthcare Practices
Data Management - a top Priority for Healthcare PracticesData Management - a top Priority for Healthcare Practices
Data Management - a top Priority for Healthcare Practices
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
 
Loan prediction
Loan predictionLoan prediction
Loan prediction
 
Diabetes prediction using different machine learning approaches
Diabetes prediction using different machine learning approachesDiabetes prediction using different machine learning approaches
Diabetes prediction using different machine learning approaches
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learning
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2Tableau Visual analytics complete deck 2
Tableau Visual analytics complete deck 2
 
Student Management System
Student Management SystemStudent Management System
Student Management System
 
Project synopsis on online voting system
Project synopsis on online voting systemProject synopsis on online voting system
Project synopsis on online voting system
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 

Similar to AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS

EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
 
Chronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningChronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningIJCSIS Research Publications
 
IRJET - E-Health Chain and Anticipation of Future Disease
IRJET - E-Health Chain and Anticipation of Future DiseaseIRJET - E-Health Chain and Anticipation of Future Disease
IRJET - E-Health Chain and Anticipation of Future DiseaseIRJET Journal
 
IRJET-Survey on Data Mining Techniques for Disease Prediction
IRJET-Survey on Data Mining Techniques for Disease PredictionIRJET-Survey on Data Mining Techniques for Disease Prediction
IRJET-Survey on Data Mining Techniques for Disease PredictionIRJET Journal
 
Paper id 212014112
Paper id 212014112Paper id 212014112
Paper id 212014112IJRAT
 
IRJET - Chronic Kidney Disease Prediction using Data Mining and Machine Learning
IRJET - Chronic Kidney Disease Prediction using Data Mining and Machine LearningIRJET - Chronic Kidney Disease Prediction using Data Mining and Machine Learning
IRJET - Chronic Kidney Disease Prediction using Data Mining and Machine LearningIRJET Journal
 
Comparing Data Mining Techniques used for Heart Disease Prediction
Comparing Data Mining Techniques used for Heart Disease PredictionComparing Data Mining Techniques used for Heart Disease Prediction
Comparing Data Mining Techniques used for Heart Disease PredictionIRJET Journal
 
Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningIRJET Journal
 
A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...BASMAJUMAASALEHALMOH
 
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Survey
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive SurveyPrognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Survey
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Surveyijtsrd
 
A comparative analysis of classification techniques on medical data sets
A comparative analysis of classification techniques on medical data setsA comparative analysis of classification techniques on medical data sets
A comparative analysis of classification techniques on medical data setseSAT Publishing House
 
IRJET- Role of Different Data Mining Techniques for Predicting Heart Disease
IRJET-  	  Role of Different Data Mining Techniques for Predicting Heart DiseaseIRJET-  	  Role of Different Data Mining Techniques for Predicting Heart Disease
IRJET- Role of Different Data Mining Techniques for Predicting Heart DiseaseIRJET Journal
 
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET Journal
 
Hybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionHybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionBASMAJUMAASALEHALMOH
 
Propose a Enhanced Framework for Prediction of Heart Disease
Propose a Enhanced Framework for Prediction of Heart DiseasePropose a Enhanced Framework for Prediction of Heart Disease
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Efficiency of Prediction Algorithms for Mining Biological Databases
Efficiency of Prediction Algorithms for Mining Biological  DatabasesEfficiency of Prediction Algorithms for Mining Biological  Databases
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
 
An Approach for Disease Data Classification Using Fuzzy Support Vector Machine
An Approach for Disease Data Classification Using Fuzzy Support Vector MachineAn Approach for Disease Data Classification Using Fuzzy Support Vector Machine
An Approach for Disease Data Classification Using Fuzzy Support Vector MachineIOSRJECE
 
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
 

Similar to AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS (20)

EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...EVALUATING THE ACCURACY OF CLASSIFICATION  ALGORITHMS FOR DETECTING HEART DIS...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...
 
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...
 
Chronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningChronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine Learning
 
IRJET - E-Health Chain and Anticipation of Future Disease
IRJET - E-Health Chain and Anticipation of Future DiseaseIRJET - E-Health Chain and Anticipation of Future Disease
IRJET - E-Health Chain and Anticipation of Future Disease
 
IRJET-Survey on Data Mining Techniques for Disease Prediction
IRJET-Survey on Data Mining Techniques for Disease PredictionIRJET-Survey on Data Mining Techniques for Disease Prediction
IRJET-Survey on Data Mining Techniques for Disease Prediction
 
Paper id 212014112
Paper id 212014112Paper id 212014112
Paper id 212014112
 
IRJET - Chronic Kidney Disease Prediction using Data Mining and Machine Learning
IRJET - Chronic Kidney Disease Prediction using Data Mining and Machine LearningIRJET - Chronic Kidney Disease Prediction using Data Mining and Machine Learning
IRJET - Chronic Kidney Disease Prediction using Data Mining and Machine Learning
 
Comparing Data Mining Techniques used for Heart Disease Prediction
Comparing Data Mining Techniques used for Heart Disease PredictionComparing Data Mining Techniques used for Heart Disease Prediction
Comparing Data Mining Techniques used for Heart Disease Prediction
 
Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep Learning
 
A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...A hybrid model for heart disease prediction using recurrent neural network an...
A hybrid model for heart disease prediction using recurrent neural network an...
 
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Survey
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive SurveyPrognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Survey
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Survey
 
A comparative analysis of classification techniques on medical data sets
A comparative analysis of classification techniques on medical data setsA comparative analysis of classification techniques on medical data sets
A comparative analysis of classification techniques on medical data sets
 
IRJET- Role of Different Data Mining Techniques for Predicting Heart Disease
IRJET-  	  Role of Different Data Mining Techniques for Predicting Heart DiseaseIRJET-  	  Role of Different Data Mining Techniques for Predicting Heart Disease
IRJET- Role of Different Data Mining Techniques for Predicting Heart Disease
 
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET- The Prediction of Heart Disease using Naive Bayes Classifier
IRJET- The Prediction of Heart Disease using Naive Bayes Classifier
 
Hybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease PredictionHybrid CNN and LSTM Network For Heart Disease Prediction
Hybrid CNN and LSTM Network For Heart Disease Prediction
 
Propose a Enhanced Framework for Prediction of Heart Disease
Propose a Enhanced Framework for Prediction of Heart DiseasePropose a Enhanced Framework for Prediction of Heart Disease
Propose a Enhanced Framework for Prediction of Heart Disease
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Efficiency of Prediction Algorithms for Mining Biological Databases
Efficiency of Prediction Algorithms for Mining Biological  DatabasesEfficiency of Prediction Algorithms for Mining Biological  Databases
Efficiency of Prediction Algorithms for Mining Biological Databases
 
An Approach for Disease Data Classification Using Fuzzy Support Vector Machine
An Approach for Disease Data Classification Using Fuzzy Support Vector MachineAn Approach for Disease Data Classification Using Fuzzy Support Vector Machine
An Approach for Disease Data Classification Using Fuzzy Support Vector Machine
 
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION
 

Recently uploaded

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Recently uploaded (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS

  • 1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 DOI:10.5121/ijcsit.2018.10102 11 AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS Shakuntala Jatav1 and Vivek Sharma2 1 M.Tech Scholar, Department of CSE, TIT College, Bhopal 2 Professor, Department of CSE, TIT College, Bhopal ABSTRACT The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively. KEYWORDS Data Mining, Clinical Decision Support System, Disease Prediction, Classification, SVM, RF. 1. INTRODUCTION Computational health informatics is rising research topic that involving varied sciences like biomedical, medical, nursing, data technology, computer science, and statistics [1]. Data mining techniques are applied to predict the effectiveness of huge and complex clinical data in order to diagnose disease and extract information to suggest effective medical assistance [2]. In bioscience, doctor’s facilities introduced different knowledge frameworks with plenty of data to manage medical insurance and patient information however unfortunately, knowledge don't seem to be mined to find hidden data for effective decision [2][3]. Clinical test outcomes are often created on the basis of doctor’s perception and experience instead of on the knowledge enrich data masked within the database and generally this procedure prompts unintended predispositions, doctor’s experience might not be capable to diagnose it accurately that affects the disease diagnosis system [2][3]. In aid sector, the term data mining will mean to research the clinical data to predict patient’s health status. Therefore discovering fascinating pattern from healthcare information, different data mining techniques are applied with statistical analysis, machine learning and database technology. Predictive systems are the systems that are wont to predict some outcome on the basis of some pattern recognition, as shown in figure 1. Disease detection is that the method by which patient’s diagnosis is performed on the basis of symptoms analyzed which may causes difficulty while predicting disease affect [4]. As an example, fever itself could be a symptom of the many disorders that doesn’t tell the healthcare professional what exactly the disease is. Because the results or
  • 2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 12 opinion vary from one physician to a different, there's a requirement to help a medical physician which will have similar opinion certainly symptoms and disorders [5]. This may be done by analyzing the information generated by medical data or medical records. Figure 1: A Typical Health Informatics Processing As a result, the new information is often compared with previous records and optimistic diagnosis is often done. Predictive medical diagnosis could be a net application which is able to predict a selected disorder on the basis of symptoms and supply diagnosis for same disorder which is able to be detected by rule. Healthcare professionals use their previous data and insights to reach a particular decision regarding any disease or disorder [6]. Within the similar manner, this paper proposes different classification techniques for diagnosis by using generic disease datasets. This paper brings into limelight all the benefits and drawbacks of using the various data mining techniques for the prediction of diseases. It conjointly accounts for the prediction rate for various techniques therefore, bringing out the comparison between each of them [7]-[10]. Data mining has been successfully used in knowledge discovery for predictive purposes to make more active and accurate decision [11]. The main focus of the paper is on classification as well as clustering techniques. In clustering process such as K-means, EM, Fuzzy c-means, etc, data is partitioned into sets of clusters or sub-classes [12].Machine learning techniques such as KNN, SVM, Naïve bayes etc, can be used to classify different objects on the basis of a training set of data whose outcome value is known. Clustering Techniques The clustering process divides the data into cluster groups or subclasses. We used four clustering algorithms, namely K-Means, EM, PAM, Fuzzy C-Means [12]. The K-Means classification algorithm works by partitioning n observations in k-subclasses defined by centroids, where k is chosen before the algorithm begins. K-Means and EM are two iterative algorithms. EM (expectation-maximization) is a statistical model that depends on the unobserved latent variables to estimate the maximum likelihood parameters. Partitioning around medoids (MAP) is similar to K- means that partitioning is based on the K-medoids method, which divides data into a number of disjoint clusters [12]. In fuzzy clustering, data elements can belong to multiple clusters. This is also called soft clustering.
  • 3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 13 Classification Techniques Machine learning based classification techniques can be used to classify various objects based on a series of training data whose result value is known. In this study four classification algorithms are used: KNN, SVM, Naive Bayes and C5.0. In the nearest neighbor KN, the object is classified by the majority of its neighbors, with the object being assigned to the most commonly used class among its nearest neighbors. In SVM (Support Vector Machines), data is first converted into a set of points and then classified into classes that can be separated linearly. The Naive Bayes model calculates the probability of a set of data that can belong to a class using the Bayes rule. The C5.0 algorithm is a decision tree that recursively separates observations in branches to form a tree to improve prediction accuracy. It is an improved version of the C4.5 and ID3 algorithms [10]. It also provides the powerful gain method to increase the accuracy of this classification algorithm [11]. 2. RELATED WORK In [1] neural networks, decision tree and naïve bayes machine learning approach is used to diagnose heart disease. For optimized feature selection genetic algorithm is used and obtained an accuracy of 100%, 99.62 and 90.74 respectively. In [2], performed a prediction of heart disease detection using neural network with genetic algorithm based feature extraction. Back propagation based neural network weight optimization by Genetic algorithm is designed and obtained the 89% accuracy of prediction of heart disease. In [3], author developed a heart disease prediction system using an approach of ANN with LVQ and achieved accuracy of about 80%, sensitivity of about 85% sensitivity as well as specificity of about 70%. In [4] proposed a heart disease diagnosis is proposed using lazy data mining approach with data reduction strategies i.e. principal component analysis is used to get category association rules. The result analysis shows that J4.8 has 10.26 enhancement as well as 8.6% enhancement over naïve bayes. In [5] presented a heart disease prediction system using data mining approach with two additional features i.e. obesity and smoking to boost the prediction rate. Neural networks, Decision trees and Naive Bayes was used in for predicting heart disease with an accuracy of 99.25%, 94.44% and 96.66% respectively. A web based application has introduced in [6] using Naïve Bayesian algorithm which took symptoms from user and gave the diagnosis result to the user or patient. In [7] Association rule mining technique was used for diagnosis of diabetes. The authors concluded that the data mining techniques when used appropriately increases the computation and also the classification performance. These rules have the potential to improve the expert system and to make better clinical decision making. For predicting diabetes disease on weka tool, author in research work [8] had presented a comparison between Naïve bayes algorithm and decision tree algorithm and achieved system accuracy of about 79.56% and 76.96% respectively. In [9] Decision Tree, Naive Bayes, and NBTree algorithms is used for liver disease detection with 10 features. The result analysis with respect to accuracy NBTree algorithm has the highest accuracy whereas with respect to computational time Naive Bayes algorithm performs better. In [11] author performed a comparative analysis on clustering and classification algorithms. The result analysis shows that the classification is better than clustering algorithms with an accuracy of about 81%.
  • 4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 14 In [12] author proposed classification with clustering technique i.e. KNN with FCM clustering and F-KNN with FCM clustering. It is clear that the Fuzzy KNN with Fuzzy c-means model produced the better result than the KNN with Fuzzy c-means model on both PIMA and Liver-disorder datasets. It is also clear that the use of Fuzzy c-means clustering algorithm for preprocessing of datasets improved the result in terms of classification accuracy and speed by reducing the number of tuples from the original datasets. From experiment, it is been found that KNN with Fuzzy c- means have accuracy of 97.02 and Fuzzy KNN with Fuzzy c-means have accuracy of 99.25 on PIMA dataset whereas on Liver disorder KNN with Fuzzy c-means have accuracy of 96.13 and Fuzzy KNN with Fuzzy c-means with accuracy of 98.95. In [13] performed the chronic disease prediction by using data mining approach such as Naïve Bayes, Decision tree, Support Vector Machine (SVM) and Artificial Neural Networks (ANN) for the diagnosis of diabetes and heart disease. The result analysis shows that SVM gives highest accuracy of 95.556% in case of heart disease and Naïve bayes gives accuracy of 73.588% in case of diabetes. Table I gives the comparative analysis of different existing techniques for heart, liver and diabetes diseases. Table 1: Comparative Analysis of Different Techniques in terms of Accuracy Name of Author Technique Accuracy Bhatla et al. Neural Network, Naïve Bayes and Decision Tree for heart disease detection. Neural networks = 100 % Decision tree= 99.62 % Naïve Bayes 90.74 % Amin et al. Optimized Neural Network for heart disease detection. Training data was = 89% Validation data = 96.2%. Chen et al. ANN based heart disease detection. ANN = 80%. Dangare et al. Decision trees, Neural networks and Naive Bayes for heart disease detection. Neural Networks = 99.25% Naive Bayes = 94.44 % Decision Tree =96.66 % Iyer et al. Decision tree and Naïve bayes algorithm for diabetes detection. Decision Tree =76.96% Naïve Bayes= 79.56% Uma Ojha and Savita Goel Decision Tree, SVM, FCM Decision tree (C5.0) =81% SVM =81% FCM = 37% Chetty et al. KNN and F-KNN for diabetes and liver disease detection. KNN = 97.02% F-KNN = 99.25% for diabetes data. KNN = 96.13% F-KNN =98.95% for liver disease data. Kumari Deepika and Dr. S. Seema Naïve Bayes, Decision tree, Support Vector Machine (SVM) and Artificial Neural Networks (ANN) for diabetes and heart disease detection. SVM = 95.556% (heart disease) Naïve Bayes = 73.588% (diabetes).
  • 5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 15 3. METHODOLOGY 3.1 Proposed Methodology One of the interesting and important subjects among researchers in the field of medical and computer science is diagnosing illness by considering the features that have the most impact on recognitions. The subject discusses a new concept which is called Medical Data Mining (MDM). Indeed, data mining methods use different ways such as classification and clustering to classify diseases and their symptoms which are helpful for diagnosing. A disease diagnosis system is created in order to predict different diseases such as diabetes, kidney disease as well as liver disease, etc. System’s workflow is discussed below: Step 1: Through the proposed application user (doctor, patient, physician etc.) can input the attribute values of disease and send it to the decision support system for analysis. Step 2: At decision support system, dataset of different diseases are loaded and apply data mining algorithms to train dataset. Requested user inputs are collected and processed on server to predict the diagnosis result. Step 3: For analyzing healthcare data, major steps of data mining approaches like preprocess data, replace missing values, feature selection, machine learning and make decision are applied on train dataset. On the decision support system end different classification algorithms would be executed on train dataset and ready to classify the test dataset. In the proposed algorithm Support Vector Machine and Random Forest is used to give clustering level for different subspaces. The voting model will ensemble all these results and output the final classification result. Finally, the predicted results will be compared with true labels of the testing phase. Figure 2: Proposed Model
  • 6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 16 Support Vector Machine Support vector machine is a machine learning approach that can be used as classifier as well as for regression. SVM classifies the data into different classes by finding hyperplane (line) which separates training data into classes. SVM does not overfit the data and gives best classification performance in terms of precision and accuracy. SVM does not make any strong assumptions on data. It shows more efficiency for correct classification of the future data. SVM is classified into 2 categories i.e. Linear and non-Linear. In linear approach, training data is separated by line i.e. hyperplane. Random Forest Random Forest algorithm is capable of performing each classification and regression tasks. the fundamental principle of RF is that a group of weak learner’s come together to create a robust learner. Random forest rule uses bagging approach to form the bunch of decision trees with random set of the information. The model is trained few times on random sample of the dataset to attain best prediction performance from the RF rule. In this ensemble technique of learning, the output of all decision trees within the RF is combined to form a final prediction. The final prediction of the RF rule is derived after polling the results of every decision tree. Suppose there are N cases within the training set. Then these N samples are taken randomly however with replacement. These samples are training set for growth of tree. If m < M is specific. The simplest split of this m is employed to separate the node. The value of m is constant whereas growing the forest. 3.2 Proposed Algorithm Input: D {Clininal data}; Output: Label {Disease Label}; Patient’s Label{Normal, Disease} Step1: Pre-processing and data cleansing Step2: For each instance in D, do Find feature vector (V) Step 3: For each V do Data clustering using FCM and split data in two halves and classify data using SVM and RF algorithm Step 4: Determine the total class label Find True_positive (TP) True_negative (TN)
  • 7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 17 False_positive (FP) False_negative (FN) Step 5: Find Performance Parameters Step 6: Predict Disease Class as if ( class=1) Patient=Normal State else_if(class =0) Patient=Disease State end for 3.3 Dataset Description Pima Indians Diabetes Database This study used data sets from the Pima Indians Diabetes Database of National Institute of Diabetes [14]. This dataset consists of 768 samples with 8 numerical valued attribute where 500 are tested negative and 268 are tested positive instances. Chronic Kidney Disease Dataset This study used data sets from the university of California Irvine (UCI) repository. The data set contains 400 patients, where 250 patients were positively affected by kidney disease and as many as 150 patients do not suffer from kidney disease. Liver Disorders Data Set This study used data sets from the university of California Irvine (UCI) repository. This data set contains 416 liver patient records and 167 non liver patient records collected from North East of Andhra Pradesh, India. 3.4 Performance Measures In this study, we used three performance measures: Precision, Accuracy, Recall, F_measure and Total execution time. Accuracy is termed as ratio of the number of correctly classified instances to the total number of instances. Accuracy = (TP+TN)/(TP+TN+FP+FN) Precision is the ration of actually true predicted instances out of the total true instances. Precision = TP/(TP+FP) Recall is the ratio of actual true instances out of all the items which are true. Recall = TP/(TP+FN) F-measure is the harmonic mean of both precision and recall. F_Measure = 2*(Precision*Recall)/(Precision + Recall)
  • 8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 18 Where TP, TN, FP and FN denote true positives, true negatives, false positives and false negatives respectively. 4. RESULT ANALYSIS Below Table 2 and 3 as well as Figure 3 shows the comparative analysis of proposed algorithm with some existing algorithms. Table 2: Result Analysis of Diabetes Disease Detection Diabetes Disease Detection Recall Precision Accuracy F_measure 1 0.9821 0.9935 0.991 Table 3: Comparative Result Analysis of Diabetes Disease Detection Accuracy Measurement Existing Work [14] 90.43% Proposed Work 99.35% Figure 3: Comparative Chart of Diabetes Disease Detection Below Table 4, 5 and 6 shows the parameter values for different diseases such as diabetes disease, kidney disease as well as liver disease. Similarly figure 4 and 5 shows the corresponding parametric chart of different disease detection using proposed algorithm. Table 4: Result Analysis of Kidney Disease Detection Kidney Disease Detection Recall Precision Accuracy F_measure 1 0.9875 0.9937 0.9937 Table 5: Result Analysis of Liver Disease Detection Liver Disease Detection Recall Precision Accuracy F_measure 0.9667 1 0.9914 0.9831
  • 9. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 19 Figure 4: Parameter Comparison Chart of Different Disease Detection Figure 5: Time Comparison Chart of Different Disease Detection 5. CONCLUSION This research paper is mainly focused to predict disease possibility using data mining or machine learning approach in order to enhance the accuracy or precision of the disease detection expert system. This paper also shows the related work study of different approaches such as neural network, naïve bayes, SVM, KNN, FCN, etc and it is concluded that SVM gives the best performance as compared to the other existing techniques. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively. In future using data mining approach a new optimized intelligent system can be designed which can give accurate and efficient result.
  • 10. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018 20 REFERENCES [1] Nidhi Bhatla, Kiran Jyoti, “An Analysis of Heart Disease Prediction using Different Data Mining Techniques”,IJERT,Vol 1, Issue 8, 2012. [2] Syed Umar Amin, Kavita Agarwal, Rizwan Beg, “Genetic Neural Network based Data Mining in Prediction of Heart Disease using Risk Factors”, IEEE, 2013. [3] A H Chen, S Y Huang, P S Hong, C H Cheng, E J Lin, “HDPS: Heart Disease Prediction System”, IEEE, 2011. [4] M. Akhil Jabbar, B. L Deekshatulu, Priti Chandra, “Heart Disease Prediction using Lazy Associative Classification”, IEEE, 2013. [5] Chaitrali S. Dangare, Sulabha S. Apte, “Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques”, IJCA, Volume 47– No.10, June 2012. [6] P. Bhandari, S. Yadav, S. Mote, D.Rankhambe, “Predictive System for Medical Diagnosis with Expertise Analysis”, IJESC, Vol. 6, pp. 4652-4656, 2016. [7] Nishara Banu, Gomathy, “Disease Forecasting System using Data Mining Methods”, IEEE Transaction on Intelligent Computing Applications, 2014. [8] A. Iyer, S. Jeyalatha and R. Sumbaly, “Diagnosis of Diabetes using Classification Mining Techniques”, IJDKP, Vol. 5, pp. 1-14, 2015. [9] Sadiyah Noor Novita Alfisahrin and Teddy Mantoro, “Data Mining Techniques for Optimatization of Liver Disease Classification”, International Conference on Advanced Computer Science Applications and Technologies, IEEE, pp. 379-384, 2013. [10] A. Naik and L. Samant, “Correlation Review of Classification Algorithm using Data Mining Tool: WEKA, Rapidminer , Tanagra ,Orange and Knime”, ELSEVIER, Vol. 85, pp. 662-668, 2016. [11] Uma Ojha and Savita Goel, “A study on prediction of breast cancer recurrence using data mining techniques”, International Conference on Cloud Computing, Data Science & Engineering, IEEE, 2017. [12] Naganna Chetty, Kunwar Singh Vaisla, Nagamma Patil, “An Improved Method for Disease Prediction using Fuzzy Approach”, International Conference on Advances in Computing and Communication Engineering, IEEE, pp. 568-572, 2015. [13] Kumari Deepika and Dr. S. Seema, “Predictive Analytics to Prevent and Control Chronic Diseases”, International Conference on Applied and Theoretical Computing and Communication Technology, IEEE, pp. 381-386, 2016. [14] Emrana Kabir Hashi, Md. Shahid Uz Zaman and Md. Rokibul Hasan, “An Expert Clinical Decision Support System to Predict Disease Using Classification Techniques”, IEEE, 2017.