nternational Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Data mining techniques are rapidly developed for many applications. In recent year, Data mining in healthcare is an emerging field research and development of intelligent medical diagnosis system. Classification is the major research topic in data mining. Decision trees are popular methods for classification. In this paper many decision tree classifiers are used for diagnosis of medical datasets. AD Tree, J48, NB Tree, Random Tree and Random Forest algorithms are used for analysis of medical dataset. Heart disease dataset, Diabetes dataset and Hepatitis disorder dataset are used to test the decision tree models. Aung Nway Oo | Thin Naing ""Decision Tree Models for Medical Diagnosis"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23510.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-miining/23510/decision-tree-models-for-medical-diagnosis/aung-nway-oo
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
A comparative study on remote tracking of parkinson’s disease progression usi...ijfcstjournal
In recent years, applications of data mining method
s are become more popular in many fields of medical
diagnosis and evaluations. The data mining methods
are appropriate tools for discovering and extractin
g
of available knowledge in medical databases. In thi
s study, we divided 11 data mining algorithms into
five
groups which are applied to a dataset of patient’s
clinical variables data with Parkinson’s Disease (P
D) to
study the disease progression. The dataset includes
22 properties of 42 people that all of our algorit
hms
are applied to this dataset. The Decision Table wit
h 0.9985 correlation coefficients has the best accu
racy
and Decision Stump with 0.7919 correlation coeffici
ents has the lowest accuracy.
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
Data mining techniques are rapidly developed for many applications. In recent year, Data mining in healthcare is an emerging field research and development of intelligent medical diagnosis system. Classification is the major research topic in data mining. Decision trees are popular methods for classification. In this paper many decision tree classifiers are used for diagnosis of medical datasets. AD Tree, J48, NB Tree, Random Tree and Random Forest algorithms are used for analysis of medical dataset. Heart disease dataset, Diabetes dataset and Hepatitis disorder dataset are used to test the decision tree models. Aung Nway Oo | Thin Naing ""Decision Tree Models for Medical Diagnosis"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23510.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-miining/23510/decision-tree-models-for-medical-diagnosis/aung-nway-oo
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
A comparative study on remote tracking of parkinson’s disease progression usi...ijfcstjournal
In recent years, applications of data mining method
s are become more popular in many fields of medical
diagnosis and evaluations. The data mining methods
are appropriate tools for discovering and extractin
g
of available knowledge in medical databases. In thi
s study, we divided 11 data mining algorithms into
five
groups which are applied to a dataset of patient’s
clinical variables data with Parkinson’s Disease (P
D) to
study the disease progression. The dataset includes
22 properties of 42 people that all of our algorit
hms
are applied to this dataset. The Decision Table wit
h 0.9985 correlation coefficients has the best accu
racy
and Decision Stump with 0.7919 correlation coeffici
ents has the lowest accuracy.
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
Hospital Medicine Classification using Data Mining Techniquesijtsrd
Data mining is a process which finds useful patterns from large amount of data. This paper is analyzing decision trees technique using the medicine data from central medical store, department of health. The results are support to best solution to share many type of medicine for manager at Central Medical store. This Central Medical store is supporting medicine to hospitals from upper Myanmar. Yin Yin Cho | Khin Htay | Win Myat Thuzar "Hospital Medicine Classification using Data Mining Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26774.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26774/hospital-medicine-classification-using-data-mining-techniques/yin-yin-cho
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
HQR Framework optimization for predicting patient treatment time in big datadbpublications
Today most of the hospital face overcrowded with patients long queues for different tasks. Hospital management face difficulty to handle these patients to provide optimal treatment time for each patients waiting in the long queue. Unnecessary and annoying waits for long periods result in substantial human resource and time wastage and increase the frustration endured by patients.It would be convenient and preferable if the patients could receive the most efficient treatment plan and know the predicted waiting time updates in real time. Because of the large-scale, realistic data-set and the requirement for real-time response, the PTTP algorithm and HQR system mandate efficiency and low-latency response. Extensive experimentation and simulation results demonstrate the effectiveness and applicability of the proposed model to recommend an effective and convenient treatment plan for patients to minimize their wait times in hospitals.
HQR Framework optimization for predicting patient treatment time in big datadbpublications
Today most of the hospital face overcrowded with patients long queues for different tasks. Hospital management face difficulty to handle these patients to provide optimal treatment time for each patients waiting in the long queue. Unnecessary and annoying waits for long periods result in substantial human resource and time wastage and increase the frustration endured by patients.It would be convenient and preferable if the patients could receive the most efficient treatment plan and know the predicted waiting time updates in real time. Because of the large-scale, realistic data-set and the requirement for real-time response, the PTTP algorithm and HQR system mandate efficiency and low-latency response. Extensive experimentation and simulation results demonstrate the effectiveness and applicability of the proposed model to recommend an effective and convenient treatment plan for patients to minimize their wait times in hospitals.
As we are into the age of digital information, the problem of data overload emerges so worryingly ahead. Our ability to analyze and understand immense datasets wrap extreme behind our ability together and stores the data. But a new age group of computational techniques and tools is required to support the extraction of useful knowledge from the rapidly increasing volumes of data. These techniques and tools are the focus of emerging fields of Knowledge Discovery in Databases (KDD) and also called data mining. Data mining is highly noticeable in the fields like marketing, e-commerce or e-business or the fame of its use in KDD in other sectors or industries also. Among these sectors that are just discovering data mining are the fields of medicine and public health also. This research paper provides a survey of current technique of data mining/KDD for healthcare.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
Dichotomous data is a type of categorical data, which is binary with categories zero and one. Health care data is one of the heavily used categorical data. Binary data are the simplest form of data used for heath care databases in which close ended questions can be used; it is very efficient based on computational efficiency and memory capacity to represent categorical type data. Clustering health care or medical data is very tedious due to its complex data representation models, high dimensionality and data sparsity. In this paper, clustering is performed after transforming the dichotomous data into real by wiener transformation. The proposed algorithm can be usable for determining the correlation of the health disorders and symptoms observed in large medical and health binary databases. Computational results show that the clustering based on Wiener transformation is very efficient in terms of objectivity and subjectivity.
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Surveyijtsrd
The Healthcare exchange generally clinical diagnosis is ended commonly by doctor's knowledge and practice. Computer Aided Decision Support System plays a major task in the medical field. Data mining provides the methodology and technology to modify these rises of data into valuable data for decision making. By utilizing data mining techniques it requires less time for the prediction of the diseases with more accuracy. Among the expanding research on coronary diseases predicting system, it has happened significant to classifications the exploration results and gives readers with a layout of the current coronary diseases forecast strategies in every discussion. Data mining tools can respond to exchange addresses that expectedly being used much time over riding to decide. In this paper we study different papers in which at least one algorithm of data mining used for the prediction of coronary diseases. As of the study it is observed that Naïve Bayes Technique increase the accuracy of the coronary diseases prediction system. The commonly used techniques for Heart Disease Prediction and their complexities are outlined in this paper. D. Haripriya | Dr. M. Lovelin Ponn Felciah "Prognosis of Cardiac Disease using Data Mining Techniques: A Comprehensive Survey" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26605.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26605/prognosis-of-cardiac-disease-using-data-mining-techniques-a-comprehensive-survey/d-haripriya
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
Data mining techniques play very important role in health care industry. Liver disease is one of the growing
diseases these days due to the changed life style of people. Various authors have worked in the field of classification of
data and they have used various classification techniques like Decision Tree, Support Vector Machine, Naïve Bayes,
Artificial Neural Network (ANN) etc. These techniques can be very useful in timely and accurate classification and
prediction of diseases and better care of patients. The main focus of this work is to analyze the use of data mining
techniques by different authors for the prediction and classification of liver patient.
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...cscpconf
Machine learning algorithms are used to diagnosis for many diseases after very important improvements of classification algorithms as well as having large data sets and high performing computational units. All of these increased the accuracy of these methods. The diagnosis of thyroid gland disorders is one of the application for important classification problem. This study majorly focuses on thyroid gland medical diseases caused by underactive or overactive thyroid glands. The dataset used for the study was taken from UCI repository. Classification of this thyroid disease dataset was a considerable task using decision tree algorithm. The overall
prediction accuracy is 100% for training and in range between 98.7% and 99.8% for testing. In this study, we developed the Machine Learning tool for Thyroid Disease Diagnosis (MLTDD), an Intelligent thyroid gland disease prediction tool in Python, which can effectively help to make the right decision, has been designed using PyDev, which is python IDE for Eclipse.
Mining of Prevalent Ailments in a Health Database Using Fp-Growth AlgorithmWaqas Tariq
Health databases are characterised by large number of attributes such as personal biological and diagnosis information, health history, prescription, billing information and so on. The increasing need for providing enhanced medical system has necessitated the need for adopting an efficient data mining technique for extracting hidden and useful information from health database. In the past, many data mining algorithms such as Apriori, Eclat, H-Mine have been developed with deficiency in time-space trade off. In this work, an enhanced FP-growth frequent pattern mining algorithm coined FP-Ail is applied to students’ health database with a view to provide information about prevalent ailments and suggestions for managing the identified ailments. FP-Ail is tested on a student’s health database of a tertiary institution in Nigeria and the results obtained could be used by the management of the health centre for enhanced strategic decision making about health care. FP-Ail also provides the possibility to refine the minimum support threshold interactively, and to see the changes instantly.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Preprocessing and Classification in WEKA Using Different ClassifiersIJERA Editor
Data mining is a process of extracting information from a dataset and transform it into understandable structure
for further use, also it discovers patterns in large data sets [1]. Data mining has number of important techniques
such as preprocessing, classification. Classification is one such technique which is based on supervised learning.
It is a technique used for predicting group membership for the data instance. Here in this paper we use
preprocessing, classification on diabetes database. Here we apply classifiers on this database and compare the
result based on certain parameters using WEKA. 77.2 million people in India are suffering from pre diabetes.
ICMR estimates that around 65.1million are diabetes patients. Globally in year 2010, 227 to 285 million people
had diabetes, out of that 90% cases are related to type 2 ,this is equal to 3.3% of the population with equal rates
in both women and men in 2011 it resulted in 1.4 million deaths worldwide making it the leading cause of
death.
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM csandit
The aim of this paper is to use data mining techniq
ue and opinion mining(OM) concepts to the
field of health informatics. The decision making in
health informatics involves number of
opinions given by the group of medical experts for
specific disease in the form of decision based
opinions which will be presented in medical databas
e in the form of text. These decision based
opinions are then mined from database with the help
of mining technique. Text document
clustering plays major role in the fast developing
information Explosion. It is considered as tool
for performing information based operations. Text d
ocument clustering generates clusters from
whole document collection automatically, normally K
-means clustering technique used for text
document clustering. In this paper we use Bisecting
K-means clustering technique and it is
better compared to traditional K-means technique. T
he objective is to study the revealed
groupings of similar opinion-types associated with
the likelihood of physicians and medical
experts.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
Hospital Medicine Classification using Data Mining Techniquesijtsrd
Data mining is a process which finds useful patterns from large amount of data. This paper is analyzing decision trees technique using the medicine data from central medical store, department of health. The results are support to best solution to share many type of medicine for manager at Central Medical store. This Central Medical store is supporting medicine to hospitals from upper Myanmar. Yin Yin Cho | Khin Htay | Win Myat Thuzar "Hospital Medicine Classification using Data Mining Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26774.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26774/hospital-medicine-classification-using-data-mining-techniques/yin-yin-cho
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
HQR Framework optimization for predicting patient treatment time in big datadbpublications
Today most of the hospital face overcrowded with patients long queues for different tasks. Hospital management face difficulty to handle these patients to provide optimal treatment time for each patients waiting in the long queue. Unnecessary and annoying waits for long periods result in substantial human resource and time wastage and increase the frustration endured by patients.It would be convenient and preferable if the patients could receive the most efficient treatment plan and know the predicted waiting time updates in real time. Because of the large-scale, realistic data-set and the requirement for real-time response, the PTTP algorithm and HQR system mandate efficiency and low-latency response. Extensive experimentation and simulation results demonstrate the effectiveness and applicability of the proposed model to recommend an effective and convenient treatment plan for patients to minimize their wait times in hospitals.
HQR Framework optimization for predicting patient treatment time in big datadbpublications
Today most of the hospital face overcrowded with patients long queues for different tasks. Hospital management face difficulty to handle these patients to provide optimal treatment time for each patients waiting in the long queue. Unnecessary and annoying waits for long periods result in substantial human resource and time wastage and increase the frustration endured by patients.It would be convenient and preferable if the patients could receive the most efficient treatment plan and know the predicted waiting time updates in real time. Because of the large-scale, realistic data-set and the requirement for real-time response, the PTTP algorithm and HQR system mandate efficiency and low-latency response. Extensive experimentation and simulation results demonstrate the effectiveness and applicability of the proposed model to recommend an effective and convenient treatment plan for patients to minimize their wait times in hospitals.
As we are into the age of digital information, the problem of data overload emerges so worryingly ahead. Our ability to analyze and understand immense datasets wrap extreme behind our ability together and stores the data. But a new age group of computational techniques and tools is required to support the extraction of useful knowledge from the rapidly increasing volumes of data. These techniques and tools are the focus of emerging fields of Knowledge Discovery in Databases (KDD) and also called data mining. Data mining is highly noticeable in the fields like marketing, e-commerce or e-business or the fame of its use in KDD in other sectors or industries also. Among these sectors that are just discovering data mining are the fields of medicine and public health also. This research paper provides a survey of current technique of data mining/KDD for healthcare.
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
Dichotomous data is a type of categorical data, which is binary with categories zero and one. Health care data is one of the heavily used categorical data. Binary data are the simplest form of data used for heath care databases in which close ended questions can be used; it is very efficient based on computational efficiency and memory capacity to represent categorical type data. Clustering health care or medical data is very tedious due to its complex data representation models, high dimensionality and data sparsity. In this paper, clustering is performed after transforming the dichotomous data into real by wiener transformation. The proposed algorithm can be usable for determining the correlation of the health disorders and symptoms observed in large medical and health binary databases. Computational results show that the clustering based on Wiener transformation is very efficient in terms of objectivity and subjectivity.
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Surveyijtsrd
The Healthcare exchange generally clinical diagnosis is ended commonly by doctor's knowledge and practice. Computer Aided Decision Support System plays a major task in the medical field. Data mining provides the methodology and technology to modify these rises of data into valuable data for decision making. By utilizing data mining techniques it requires less time for the prediction of the diseases with more accuracy. Among the expanding research on coronary diseases predicting system, it has happened significant to classifications the exploration results and gives readers with a layout of the current coronary diseases forecast strategies in every discussion. Data mining tools can respond to exchange addresses that expectedly being used much time over riding to decide. In this paper we study different papers in which at least one algorithm of data mining used for the prediction of coronary diseases. As of the study it is observed that Naïve Bayes Technique increase the accuracy of the coronary diseases prediction system. The commonly used techniques for Heart Disease Prediction and their complexities are outlined in this paper. D. Haripriya | Dr. M. Lovelin Ponn Felciah "Prognosis of Cardiac Disease using Data Mining Techniques: A Comprehensive Survey" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26605.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26605/prognosis-of-cardiac-disease-using-data-mining-techniques-a-comprehensive-survey/d-haripriya
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
Data mining techniques play very important role in health care industry. Liver disease is one of the growing
diseases these days due to the changed life style of people. Various authors have worked in the field of classification of
data and they have used various classification techniques like Decision Tree, Support Vector Machine, Naïve Bayes,
Artificial Neural Network (ANN) etc. These techniques can be very useful in timely and accurate classification and
prediction of diseases and better care of patients. The main focus of this work is to analyze the use of data mining
techniques by different authors for the prediction and classification of liver patient.
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...cscpconf
Machine learning algorithms are used to diagnosis for many diseases after very important improvements of classification algorithms as well as having large data sets and high performing computational units. All of these increased the accuracy of these methods. The diagnosis of thyroid gland disorders is one of the application for important classification problem. This study majorly focuses on thyroid gland medical diseases caused by underactive or overactive thyroid glands. The dataset used for the study was taken from UCI repository. Classification of this thyroid disease dataset was a considerable task using decision tree algorithm. The overall
prediction accuracy is 100% for training and in range between 98.7% and 99.8% for testing. In this study, we developed the Machine Learning tool for Thyroid Disease Diagnosis (MLTDD), an Intelligent thyroid gland disease prediction tool in Python, which can effectively help to make the right decision, has been designed using PyDev, which is python IDE for Eclipse.
Mining of Prevalent Ailments in a Health Database Using Fp-Growth AlgorithmWaqas Tariq
Health databases are characterised by large number of attributes such as personal biological and diagnosis information, health history, prescription, billing information and so on. The increasing need for providing enhanced medical system has necessitated the need for adopting an efficient data mining technique for extracting hidden and useful information from health database. In the past, many data mining algorithms such as Apriori, Eclat, H-Mine have been developed with deficiency in time-space trade off. In this work, an enhanced FP-growth frequent pattern mining algorithm coined FP-Ail is applied to students’ health database with a view to provide information about prevalent ailments and suggestions for managing the identified ailments. FP-Ail is tested on a student’s health database of a tertiary institution in Nigeria and the results obtained could be used by the management of the health centre for enhanced strategic decision making about health care. FP-Ail also provides the possibility to refine the minimum support threshold interactively, and to see the changes instantly.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Preprocessing and Classification in WEKA Using Different ClassifiersIJERA Editor
Data mining is a process of extracting information from a dataset and transform it into understandable structure
for further use, also it discovers patterns in large data sets [1]. Data mining has number of important techniques
such as preprocessing, classification. Classification is one such technique which is based on supervised learning.
It is a technique used for predicting group membership for the data instance. Here in this paper we use
preprocessing, classification on diabetes database. Here we apply classifiers on this database and compare the
result based on certain parameters using WEKA. 77.2 million people in India are suffering from pre diabetes.
ICMR estimates that around 65.1million are diabetes patients. Globally in year 2010, 227 to 285 million people
had diabetes, out of that 90% cases are related to type 2 ,this is equal to 3.3% of the population with equal rates
in both women and men in 2011 it resulted in 1.4 million deaths worldwide making it the leading cause of
death.
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM csandit
The aim of this paper is to use data mining techniq
ue and opinion mining(OM) concepts to the
field of health informatics. The decision making in
health informatics involves number of
opinions given by the group of medical experts for
specific disease in the form of decision based
opinions which will be presented in medical databas
e in the form of text. These decision based
opinions are then mined from database with the help
of mining technique. Text document
clustering plays major role in the fast developing
information Explosion. It is considered as tool
for performing information based operations. Text d
ocument clustering generates clusters from
whole document collection automatically, normally K
-means clustering technique used for text
document clustering. In this paper we use Bisecting
K-means clustering technique and it is
better compared to traditional K-means technique. T
he objective is to study the revealed
groupings of similar opinion-types associated with
the likelihood of physicians and medical
experts.
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
Two approaches to building models for prediction of the onset of Type diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Supervised were compared with decision trees and unsupervised of both types of classifiers. In this study, the system and the test most likely to confirm a diagnosis based on the pre-test probability computed from the patient's information including symptoms and the results of previous tests. If the patient's disease post-test probability is higher than the treatment threshold, a diagnostic decision will be made, and vice versa. Otherwise, the patient needs more tests to help make a decision. The system will then recommend the next optimal test and repeat the same process. In this thesis find out which approach is better on diabetes dataset in weka framework. Also use feature selection techniques which reduce the features and complexities of process
Standardization and wider use of Electronic Health records (EHR) creates opportunities for
better understanding patterns of illness and care within and across medical systems. In the healthcare
systems, hidden event signatures allow taking decision for patient’s diagnosis, prognosis, and
management. Temporal history of event codes embedded in patients' records, investigates frequently
occurring sequences of event codes across patients. There is a framework that enables the
representation, retrieval, and mining of high order latent event structure and relationships within
single and multiple event sequences. There is a wealth of hidden information present in the large
databases. Different data mining techniques can be used for retrieving data. A classifier approach for
detection of diabetes is presented in this paper and shows how Naive Bayes can be used for
classification purpose. In this system, medical data is categories into five categories namely low,
average, high and very high and critical, treatment is given as per the predicted category. The system
will predict the class label of unknown sample. Hence two basic functions namely classification
(training) and prediction (testing) will be performed. An algorithm and database used affects the
accuracy of the system. It can answer complex queries for diagnosing diabetes disease and thus assist
healthcare practitioners to make intelligent clinical decisions which traditional decision support
systems cannot.Over the last decade, so many information visualization techniques have been
developed to support the exploration of large data sets. There are various interactive visual data
mining tools available for visual data analysis. It is possible to perform clinical assessment for visual
interactive knowledge discovery in large electronic health record databases. In this paper, we
proposed that it is possible to develop a tool for data visualization for interactive knowledge
discovery.
SURVEY OF DATA MINING TECHNIQUES USED IN HEALTHCARE DOMAINijistjournal
Health care industry produces enormous quantity of data that clutches complex information relating to patients and their medical conditions. Data mining is gaining popularity in different research arenas due to its infinite applications and methodologies to mine the information in correct manner. Data mining techniques have the capabilities to discover hidden patterns or relationships among the objects in the medical data. In last decade, there has been increase in usage of data mining techniques on medical data for determining useful trends or patterns that are used in analysis and decision making. Data mining has an infinite potential to utilize healthcare data more efficiently and effectually to predict different kind of disease. This paper features various Data Mining techniques such as classification, clustering, association and also highlights related work to analyse and predict human disease.
An efficient stacking based NSGA-II approach for predicting type 2 diabetesIJECEIAES
Diabetes has been acknowledged as a well-known risk factor for renal and cardiovascular disorders, cardiac stroke and leads to a lot of morbidity in the society. Reducing the disease prevalence in the community will provide substantial benefits to the community and lessen the burden on the public health care system. So far, to detect the disease innumerable data mining approaches have been used. These days, incorporation of machine learning is conducive for the construction of a faster, accurate and reliable model. Several methods based on ensemble classifiers are being used by researchers for the prediction of diabetes. The proposed framework of prediction of diabetes mellitus employs an approach called stacking based ensemble using non-dominated sorting genetic algorithm (NSGA-II) scheme. The primary objective of the work is to develop a more accurate prediction model that reduces the lead time i.e., the time between the onset of diabetes and clinical diagnosis. Proposed NSGA-II stacking approach has been compared with Boosting, Bagging, Random Forest and Random Subspace method. The performance of Stacking approach has eclipsed the other conventional ensemble methods. It has been noted that k-nearest neighbors (KNN) gives a better performance over decision tree as a stacking combiner.
Disease prediction in big data healthcare using extended convolutional neural...IJAAS Team
Diabetes Mellitus is one of the growing fatal diseases all over the world. It leads to complications that include heart disease, stroke, and nerve disease, kidney damage. So, Medical Professionals want a reliable prediction system to diagnose Diabetes. To predict the diabetes at earlier stage, different machine learning techniques are useful for examining the data from different sources and valuable knowledge is synopsized. So, mining the diabetes data in an efficient way is a crucial concern. In this project, a medical dataset has been accomplished to predict the diabetes. The R-Studio and Pypark software was employed as a statistical computing tool for diagnosing diabetes. The PIMA Indian database was acquired from UCI repository will be used for analysis. The dataset was studied and analyzed to build an effective model that predicts and diagnoses the diabetes disease earlier.
ARTICLEAnalysing the power of deep learning techniques ovedessiechisomjj4
ARTICLE
Analysing the power of deep learning techniques over the
traditional methods using medicare utilisation and provider data
Varadraj P. Gurupura, Shrirang A. Kulkarnib, Xinliang Liua, Usha Desai c and Ayan Nasird
aDepartment of Health Management and Informatics, University of Central Florida, Orlando, FL, USA; bSchool of
Computer Science and Engineering, Vellore Institute of Technology, Vellore, India; cDepartment of Electronics and
Communication Engineering, Nitte Mahalinga Adyanthaya Memorial Institute of Technology, Nitte, Udupi, India;
dUCF School of Medicine, University of Central Florida, Orlando, FL, USA
ABSTRACT
Deep Learning Technique (DLT) is the sub-branch of Machine
Learning (ML) which assists to learn the data in multiple levels of
representation and abstraction and shows impressive performance
on many Artificial Intelligence (AI) tasks. This paper presents a new
method to analyse the healthcare data using DLT algorithms and
associated mathematical formulations. In this study, we have first
developed a DLT to programme two types of deep learning neural
networks, namely: (a) a two-hidden layer network, and (b) a three-
hidden layer network. The data was analysed for predictability in
both of these networks. Additionally, a comparison was also made
with simple and multiple Linear Regression (LR). The demonstration
of successful application of this method is carried out using the
dataset that was constructed based on 2014 Medicare Provider
Utilization and Payment Data. The results indicate a stronger case
to use DLTs compared to traditional techniques like LR. Furthermore,
it was identified that adding more hidden layers to neural network
constructed for performing deep learning analysis did not have
much impact on predictability for the dataset considered in this
study. Therefore, the experimentation described in this article sets
up a case for using DLTs over the traditional predictive analytics. The
investigators assume that the algorithms described for deep learning
is repeatable and can be applied for other types of predictive ana-
lysis on healthcare data. The observed results indicate, the accuracy
obtained by DLT was 40% more accurate than the traditional multi-
variate LR analysis.
ARTICLE HISTORY
Received 16 April 2018
Accepted 30 August 2018
KEYWORDS
Deep Learning Technique
(DLT); medicare data;
Machine Learning (ML);
Linear Regression (LR);
Confusion Matrix (CM)
Introduction
Methods involving Artificial Intelligence (AI) associated with Deep Learning Technique (DLT)
and Machine Learning (ML) are slowly but surely being used in medical and health infor-
matics. Traditionally, techniques such as Linear Regression (LR) (Nimon & Oeswald, 2013),
Analysis of Variance (ANOVA) (Kim, 2014), and Multivariate Analysis of Variance (MANOVA)
(Xu, 2014) (Malehi et al., 2015) have been used for predicting outcomes in healthcare.
However, in the recent years the methods of analysis applied are changing towards the
aforementi ...
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEMEPublication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEME Publication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
Ascendable Clarification for Coronary Illness Prediction using Classification...ijtsrd
Coronary disease is predicted by classification technique. The data mining tool WEKA has been exploited for implementing Naïve Bayes classifier. Proposed work is trapped with a specific end goal to enhance the execution of models. For improving the classification accuracy Naïve Bayes is combined with Bagging and Attribute Selection. Trial results demonstrated a critical change over in the current Naïve Bayes classifier. This approach enhances the classification accuracy and reduces computational time. D. Haripriya | Dr. M. Lovelin Ponn Felciah "Ascendable Clarification for Coronary Illness Prediction using Classification Mining and Feature Selection Performances" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26690.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26690/ascendable-clarification-for-coronary-illness-prediction-using-classification-mining-and-feature-selection-performances/d-haripriya
Kidney Failure Due to Diabetics – Detection using Classification Algorithm in...IIRindia
In order to analyse the chosen data from various points of view, data mining is used as the effective process. This process is also used to sum-up all those views into useful information. There are several types of algorithms in data mining such as Classification algorithms, Regression, Segmentation algorithms, association algorithms, sequence analysis algorithms, etc.,. The classification algorithm can be used to bifurcate the data set from the given data set and foretell one or more discrete variables, based on the other attributes in the dataset. The ID3 (Iterative Dichotomiser 3) algorithm is an original data set S as the root node. An unutilised attribute of the data set S calculates the entropy H(S) (or Information gain IG (A)) of the attribute. Upon its selection, the attribute should have the smallest entropy (or largest information gain) value. The prime objective of this paper is to analyze the data from a Kidney disorder due to diabetics by using classification technique to predict class accurately.
COMPARISON AND EVALUATION DATA MINING TECHNIQUES IN THE DIAGNOSIS OF HEART DI...ijcsa
Heart disease is one of the biggest health problems in the world because of high mortality and morbidity
caused by the disease. The use of data mining on medical data brought valuable and effective life
achievements and can enhance medical knowledge to make necessary decisions. Data mining plays an
important role in the field of medical science to solve health problems and diagnose ailments in critical
conditions and in normal conditions. For this reason, in this paper, data mining techniques are used to
diagnose heart disease from a dataset that includes 200 samples from different patients. Techniques used to
diagnose heart disease include Bagging, AdaBoostM1, Random Forest, Naive Bayes, RBF Network, IBK,
and NNge that all the techniques used to diagnose heart disease use Weka tool. Then these techniques are
compared to determine which is more accurate in the diagnosis of heart disease that according to the
results, it was found that the RBF Network with the accuracy of 88.2% is the most accurate classification in
the diagnosis of heart disease.
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
Environmental changes and food habits affect people's health with numerous diseases in today's life. Machine learning is a technique that plays a vital role in predicting diseases from collected data. The health sector has plenty of electronic medical data, which helps this technique to diagnose various diseases quickly and accurately. There has been an improvement in accuracy in medical data analysis as data continues to grow in the medical field. Doctors may have a hard time predicting symptoms accurately. This proposed work utilized Kaggle data to predict and diagnose heart and diabetic diseases. The diseases heart and diabetes are the foremost cause of higher death rates for people. The dataset contains target features for the diagnosis of heart disease. This work finds the target variable for diabetic disease by comparing the patient's blood sugars to normal levels. Blood pressure, body mass index (BMI), and other factors diagnose these diseases and disorders. This work justifies the filter method and principal component analysis for selecting and extracting the feature. The main aim of this work is to highlight the implementation of three ensemble techniques-Adaptive boost, Extreme Gradient boosting, and Gradient boosting-as well as the emphasis placed on the accuracy of the results.
Abstract:In health care domain, data mining plays a vital role for predicting diseases. For detecting a disease number of tests should be required from the patient but number of test should be reduced while using data mining techniques. The data mining technique analyze the test parameters and it concludes the associative relation between the parameters that reduces the tests and the reduced test plays key role in time and performance. In this study medical terms such as sex, blood pressure, and cholesterol like nineteen input attributes are used. In this paper association among various attributes which are the causative factors of heart diseases are analyzed. The patient’s records are observed before prediction and the factors are grouped as per its severity level. In this system the level of causative factors are categorized using K-Means clustering technique and it distinguishes the risky and non-risky factors. Frequent risk factors are mined from the clinical heart database using Apriori algorithm. The risk factors are taken for this study to predict the risk level and find the co-ordination among the factors that helps the medical people to predict the disease with minimum tests and treatments.
Similar to International Journal of Computational Engineering Research(IJCER) (20)
Heart Diseases Diagnosis Using Data Mining Techniques
International Journal of Computational Engineering Research(IJCER)
1. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
An Empirical Study about Type2 Diabetics using Duo mining Approach
1
V.V.Jaya Rama Krishnaiah, 2D.V. ChandraShekar,
3
Dr. R. Satya Prasad, 4 Dr. K. Ramchand H Rao
1
Associate Prof, Dept of CS, ASN College, Tenali,
2
Associate Prof, Dept of CS, TJPS COLLEGE, GUNTUR,
3
Associate Professor, Department of CSE, Acharya Nagarjuna University, Guntur,
4
Professor, Department of CSE, ASN Womens Engineering College, Nelapadu,
Abstract:
Due to the revolutionary change in data mining and bio-informatics, it is very useful to use data mining techniques to
evaluate and analyze bio-medical data. In this paper we propose a frame work called duo-mining tool for intelligent Text
mining system for diabetic patients depending on their medical test reports. Diabetes is a chronic disease and major
problem of morbidity and mortality in developing countries. The International Diabetes Federation estimates that 285
million people around the world have diabetes. This total is expected to rise to 438 million within 20 years. Type-2 diabetes
mellitus (T2DM) is the most common type of diabetes and accounts for 90-95% of all diabetes. Detection of T2DM from
various factors or symptoms became an issue which was not free from false presumptions accompanied by unpredictable
effects. According to this context, data mining and machine learning could be used as an alternative way help us in knowledge
discovery from data. We applied several learning methods, such as K-Nearest Neighbor, decision tree, support vector
machines, acquire information from historical data of patient‟s from medical practicing centers in and around Guntur. Rules
are extracted from Decision tree to offer decision-making support through early detection of T2DM for clinicians. Through
this paper, we tried to determine how the extracted knowledge by the Text Mining is integrated with expert system knowledge
to assist crucial decision making process.
Keywords: Text Mining, K-Nearest Neighbor, Support Vector Machines, Decision Trees, Type-2 diabetes, Duo Mining, Data
Mining.
1. Introduction
Diabetes is an illness which occurs as a result of problems with the production and supply of insulin in the body [1].
People with diabetes have high level of glucose or “high blood sugar” called hyperglycemia. This leads to serious long-term
complications such as eye disease, kidney disease, nerve disease, disease of the circulatory system, and amputation this is not
the result of an accident.
Diabetes also imposes a large economic impact on the national healthcare system. Healthcare expenditures on
diabetes will account for 11.6% of the total healthcare expenditure in the world in 2010. About 95% of the countries covered
in this report will spend 5% or more, and about 80% of the countries will spend between 5% and 13% of their total healthcare
dollars on diabetes [2]
Type-2 diabetes mellitus (T2DM) is the most common type of diabetes and accounts for 90-95% of all diabetes
patients and most common in people older than 45 who are overweight. However, as a consequence of increased obesity
among the young, it is becoming more common in children and young adults [1]. In T2DM, the pancreas may produce
adequate amounts of insulin to metabolize glucose (sugar), but the body is unable to utilize it efficiently. Over time, insulin
production decreases and blood glucose levels rise. T2DM patients do not require insulin treatment to remain alive, although
up to 20% are treated with insulin to control blood glucose levels [3].
Diabates has no obvious clinical symptoms and not been easy to know, so that many diabetes patient unable to obtain
the right diagnosis and the treatment. Therefore, it is important to take the early detection, prevent and treat diabetes disease,
especially for T2DM.
Recent studies by the National Institute of Diabetes and Digestive and Kidney Diseases (DCCT) in India shown that
effective control of blood sugar level is beneficial in preventing and delaying the progression of complications of diabetes [4].
Adequate treatment of diabetes is also important, as well as lifestyle factor such as smoking and maintaining healthy
bodyweight [3].
According to this context, data mining and machine learning could be used as an alternative way in discovering
knowledge from the patient medical records and classification task has shown remarkable success in the area of
employing computer aided diagnostic systems (CAD) as a “second opinion” to improve diagnostic decisions [5]. In this
area, classifier such as SVMs have demonstrated highly competitive performance in numerous real-world application such
medical diagnosis, SVMs as one of the most popular, state-of-the-art data mining tools for data mining and learning [6].
In modern medicine, large amount of data are collected, but there is no comprehensive analysis to this data.
Intelligent data analysis such as data mining was deployed in order to support the creation of knowledge to help clinicians in
Issn 2250-3005(online) October| 2012 Page 33
2. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
making decisions. The role of data mining is to extract interesting (non-trivial, implicit, previously unknown and potentially
useful) patterns or knowledge from large amounts of data, in such a way that they can be put to use in areas such as decision
support, prediction and estimation [7].
With this paper, we make two contributions. We present empirical result of inductive methods for detecting T2DM
using machine learning and data mining. We structured the paper as follow: section 2 provides a brief explanation about
several classifiers and medical data used in this research and the detailed information about text tokens were also explained in
this section. Section 3 gives experimental design, and experimental result and discussion; we concluded the paper with
summarization of the result by emphasizing this study and further research.
2. Research Method
2.1. Data Collection
We collected diabetic‟s patients from one of the government public hospital NRI and Manipal Institute , Guntur-
Andhra Pradesh, India from 2009 to 2011. The patients included only type-2 diabetes, whereas other types of diabetes were
excluded. All patients of this database are men and women of age between 0-20 Years. The variable takes the value “TRUE”
and “FALSE”, where “TRUE” means a positive test for T2DM and “FALSE” means a negative test for T2DM.
It is important to examine the data with preprocessing which consist of cleaning, transformation and integration. The analysis
clinical attributes: (1) Gender, (2) Body mass, (3) Blood pressure, (4) Hyperlipidemia or Cholesterol (5) Fasting blood sugar
(FBS), (6) Instant blood sugar, (7) Family history, (8) Diabetes Guest history, (9) Habitual Smoker, (10) Plasma insulin or
Glucose, and (11) Age. The preprocessing method is briefly explained in next section.
2.2. Methodology
2.2.1 Text Mining
Text mining is also called as intelligent text analysis, text data mining, or knowledge discovery in text uncovers
previously invisible patterns in existing resources. Text mining involves the application of techniques from areas such as
information retrieval, natural language processing, information extraction and data mining Text Mining itself is not a function,
it combines different functionalities Searching, Information Extraction (IE), Categorization, Summarization, Prioritization,
Clustering, Information Monitor and Information Retrieval. Information Retrieval (IR) systems identify the documents in a
collection which match a user‟s query. The major steps in text mining were 1) Text Gathering and 2) Text Preprocessing. Text
gathering includes collection of raw documents like Patient information which were in the text or script format and these
documents may contain unstructured data. Preprocessing phase starts with tokenization. Tokenization is division of a document
into terms. This process also referred as feature generation. In text preprocessing The raw data in the form of text files is
collected from text scripts or Flat files. The data is converted in to structured format and stored in Microsoft Access Database.
To do the Information Extraction and Search, we use an mechanism call Duo-Mining developed in Java as the part of
the Text Mining Tool and converts the Unstructured data into the structures manner.
The following Figure 1, shows the Text Mining and Conversion of unstructured data into structures data set.
Text Tokeniz Feature
Gatheri ation Extracti
ng on
Transfor Data
mation Analysis
Scrip
into
ts
Dataset
Figure 1: Conversion of Unstructured data into Structured Data Set using Duo-Mining Tool.
Issn 2250-3005(online) October| 2012 Page 34
3. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
Figure 2: A Prototype Model of a Duo-Mining Tools
2.2.2. Support Vector Machines (SVMs)
Support vector machine (SVMs) are supervised learning methods that generate input- output mapping functions from
a set of labeled training datasets. The mapping function can be either a classifiaction function or a regression function [6].
According to Vapnik [11], SVMs has strategy to find the best hyperplane on input space called the structural minimization
principle from statistical learning theory.
Given the training datasets of the form {(x1,c1),(x2,c2),...,(xn,cn)} where ci is either 1 (“yes”) or 0 (“no”), an SVM finds
the optimal separating hyperplane with the largest margin. Equation (1) and (2) represents the separating hyperplanes in the
case of separable datasets.
w.xi+b ≥ +1, for ci = +1 …………………………………(1)
w.xi+b ≤ -1, for ci = -1 ………………………………..(2)
The problem is to minimize |w| subject to constraint (1). This is called constrained quadratic programming (QP)
optimization problem represented by:
2
minimize (1/2) ||w||
subject to ci(w.xi – b) ≥ 1 …………………………………….(3)
Sequential minimal optimization (SMO) is one of efficient algorithm for training SVM [12]
2.2.3. K-Nearest Neighbors
One of the simplest learning methods is the Nearest Neibhor [13]. To classify an unknown instance, the performance
element finds the example in the collection most similar to the unknown and returns the example‟s class label as its prediction
for the unknown. Variants of this method, such as IBSk, find the k most similar instances and return the majority vote of their
class labels as the prediction.
2.2.4 Decision Tree
A decision tree is a tree with internal nodes corresponding to attributes and leaf nodes corresponding to class labels.
Most implementations use the gain ratio for attribute selection, a measure based on the information gain [12].
The following Table 1 illustrates the data specifications
Table 1: Data Set for Diabetes Specification
NO Attribute Name Explanation
1 Plasma Insulin/Glucose Glucose Concentration (high/Low)
2 FBS Fasting blood sugar (mg/dl)
3 Body Mass Index (BMI) Body mass of patients (Kg)
4 Blood Pressure Blood Pressure in mmHg
5 IBS Instant Blood Sugar (mg/dl)
6 AGE Patient age (child, adult, old)
7 Diabetes Gestational When pregnant women, who have never had diabetes before (Boolean)
8 Family History Condition of abnormally elevated levels of any or all lipids and/or lipoproteins in the
blood(Boolean)
9 Hyperlipidemia/Cholesterol In the blood(Boolean)
10 Smoker Patient‟s smoking habit (Boolean)
11 Gender Patient‟s gender (male or female)
Issn 2250-3005(online) October| 2012 Page 35
4. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
3. Analysis and Results
We conducted two experimental studies using our data collection described previously. We first applied all the
classification methods the dataset collected, and we examined and validated the accuracy both in quantitative and qualitative
measure. Quantitative measure is computed in percent, whereas qualitative measure is acceptance degree of patterns by
clinicians.
Testing for type 2 diabetes in children
Criteria – To identify and group classes of people according to age, sex and race in the form of cluster patterns, based on
various datasets, we also performed various types of comparison with health with respect to symptoms of diabetic‟s patients.
Overweight (BMI > 85th percentile for age and sex, weight for height > 85th percentile, or weight > 120% of ideal for height)
Plus, Any two of the following risk factors:
1. Family history of type 2 diabetes in first- or second-degree relative
2. Race/ethnicity (Indian)
3. Signs of insulin resistance or conditions associated with insulin resistance (acanthosis, hypertension, dyslipidaemia, or
PCOS)
The following Tables 2.1(a) to Table 7.1(b) illustrates the different comparisons between healthy people and Diabetes.
Table-2.1(a): (Diabetes Mellitus): Comparison of Anti Oxidants with GHb in healthy People between the age group of 0- 20
years
Patient Age Sex Blood Calcium
code Glucose
(>120) Zinc Copper Folic acid Homo-cysteine
1 4 F 216 8 45 75 4.45 17
2 9 F 158 7.8 60 69 3.98 19
3 11 M 182 8.1 57 62 4.01 14
4 14 M 172 9 52 70 4.06 27
5 16 F 201 8.4 59 74 3.76 16
6 18 M 177 8.3 44 68 5.29 13
7 19 F 208 7.8 52 76 3.18 39
8 20 M 182 7.5 49 79 4.97 15
Table-2.1(b) (Healthy): Comparison of Anti Oxidants with GHb in healthy People between the age group of 0- 20 years
Patient code Age Sex Blood Glucose Calcium
(<=120) Zinc Copper Folic acid Homocysteine
1 5 M 98 9.3 80 105 5.98 6
2 8 F 102 9.2 85 110 6.2 8
3 10 F 79 9.5 86 106 6.51 8
4 15 M 110 9.8 90 101 6.16 9
5 20 M 120 9.6 92 112 6.05 7
Table-3.1(a) (Diabetis): Comparison of Thyroid function in Diabetes Mellitus Patients between the age group of 0- 20
years
Patient code Age Sex T3 T4 TSH
1 4 F 0.98 5.62 1.09
2 9 F 0.85 6.24 2.64
3 11 M 0.82 6.24 2.5
4 14 M 1.01 7.25 3.18
5 16 F 0.48 3.52 15.9
6 18 M 1.09 6.85 2.51
7 19 F 0.71 8.1 1.42
8 20 M 1.12 5.97 1.08
Issn 2250-3005(online) October| 2012 Page 36
5. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
Table-3.1(b) (Healthy): Comparison of Thyroid function in Healthy people between the age group of 0- 20 years
Patient code Age Sex T3 T4 TSH
1 5 M 0.87 6.45 0.98
2 8 F 1.04 5.93 1.05
3 10 F 0.92 7.01 2.45
4 15 M 0.89 5.64 1.81
5 20 M 1.02 6.2 2.15
Table-4.1(a) (Diabetis): Comparison of Enzymes and GHb in Diabetes Mellitus Patients between the age group of 0- 20
years
Patient code Age Sex SGPT SGOT Alk. PHO Glycosilated haemoglobin
(GHb)
1 4 F 17 29 418 8.2
2 9 F 28 31 214 6.6
3 11 M 24 11 129 7.3
4 14 M 34 36 98 7
5 16 F 21 28 254 7.8
6 18 M 19 25 208 7.1
7 19 F 34 24 205 8
8 20 M 32 34 365 7.3
Table-4.1(b) (Healthy): Comparison of Enzymes and GHb in Healthy Peoples between the age group of 0- 20 years
Patient code Age Sex SGPT SGOT Alk. PHO Glycosilated haemoglobin
GHb)
1 5 M 22 26 148 4.9
2 8 F 25 30 152 5
3 10 F 31 34 160 4.4
4 15 M 27 25 132 5.3
5 20 M 20 24 141 5.6
Table-5.1(a) (Diabetis): Comparison of Lipids (Different type of cholesterols) in Diabetes Mellitus Patients between the
age group of 0- 20 years
Patient Age Sex BMI Cholesterol HDL-Chol Triglycrides VLDL LDL
code
1 4 F Over weight 101 41 125 25 35
2 9 F Over weight 132 38 148 30 64
3 11 M Over weight 130 45 150 30 55
4 14 M Over weight 148 40 129 26 82
5 16 F Proportional 167 42 115 23 102
6 18 M Over weight 180 41 127 25 114
7 19 F Proportional 125 39 142 28 58
8 20 M Over weight 184 39 150 30 115
Table-5.1(b) (Healthy): Comparison of Lipids (Different type of cholesterols) in Healthy People between the age group
of 0- 20 years
Patient code Age Sex BMI Cholesterol HDL-Chol Triglycr VLDL LDL
Ides
1 5 M Thin 162 42 128 27 93
2 8 F Proportional 158 40 110 22 96
3 10 F Thin 170 38 132 26 106
4 15 M Thin 165 40 108 22 103
5 20 M Proportional 160 42 115 23 95
Issn 2250-3005(online) October| 2012 Page 37
6. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
Table-6.1(a) (Diabetes): Comparison of Regular parameters in Diabetes Mellitus Patients between the age group of 0- 20
years
Patient code Age Sex Macro Anti-Insulin Hb % Micro
Albunuria (in g/dl) albunuria
1 4 F Normal 8.1 8.9 10
2 9 F Normal 5.6 7.6 18
3 11 M Normal 3.7 10.1 9
4 14 M Traces 9.1 9.5 22
5 16 F Normal 5.4 8.4 17
6 18 M Normal 7.9 9.1 16
7 19 F Normal 2.4 10.2 20
8 20 M + 9.1 6.9 42
Table-6.1(b) (Healthy): Comparison of Regular parameters in Healthy people between the age group of 0- 20 years
Patient code Age Sex Macro Anti-Insulin Hb % Micro
albunuria (in g/dl) albunuria
1 5 M Normal 0.5 12.5 12
2 8 F Normal 0.1 13.6 17
3 10 F Normal 0.2 13.0 18
4 15 M Normal 0.6 12.9 20
5 20 M Normal 0.3 14.0 16
Table-7.1(a) (Diabetes): Comparison of Regular parameters in Diabetes Mellitus Patients between the age group of 0- 20
years
Patient code Age Sex Blood Urea Creatinine Albumin Bilirubin
Glucose
1 4 F 216 45 1.2 2 0.9
2 9 F 158 38 1.1 2.9 0.7
3 11 M 182 42 1 2.6 1.1
4 14 M 172 29 1.4 3.8 0.9
5 16 F 201 34 1.2 2.5 0.8
6 18 M 177 24 0.8 2.9 1
7 19 F 208 45 1.3 3.2 1
8 20 M 182 56 1.8 3.9 1.1
Table-7.1(b) (Healthy): Comparison of Regular parameters in Healthy people between the age group of 0- 20 years
Patient code Age Sex Blood Urea Creatinine Albumin Bilirubin
Glucose
1 5 M 98 18 0.7 3.6 1.8
2 8 F 102 24 0.6 3.8 1.2
3 10 F 79 30 1 3.2 1
4 15 M 110 22 0.9 3.5 0.8
5 20 M 121 28 1 3.8 1
The following Table 8 illustrates the Classification accuracy (%) of each spitted feature in the dataset
Table 8. Classification Accuracy (%)
Attribute KNN SVMs DT Average
Blood Sugar/ 95,86 95,86 95,17 95,51
FBS
Plasmainsulin/ 95,40 96,55 96,78 95,86
Glucose
Hyperlipidemia/Choleste 95,17 97,01 96,55 96,36
rol
BMI 95,40 96,55 96,78 95,86
Average 95,45 95,49 96,32
Issn 2250-3005(online) October| 2012 Page 38
7. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
This research has four main outcomes regarding to detect T2DM. First, we start the assumption with Plasma insulin or
Glucose and then BMI, Cholesterol and Blood Sugar. As compared the average classification accuracy of three mechanisms.
Decision Tree mechanism shows the better performance. Surprisingly the average accuracy for Plasma insulin or Glucose and
BMI were appearing same in all mechanisms.
Second, internist detected T2DM only by their experience. Thus, presumption attributes such as smoker and
gestational history was avoided. Our study finds those attributes was found in many diabetic patients.
The following Table 9 illustrates about the extracted rules for determining the diabetics for the age group 0-20.
Table 9. Qualitative Measure in Detecting T2DM
PATTERN/RULES Extracted from Decision Tree Internist’s acceptance (Yes/No)
R1 : If Plasmainsulin is high and BMI is overweight and Hyperlipidemia is equal to 0 and YES
family history is equal to 0 and smoker is equal to 1 then class YES
R2 : Else IF plasmainsulin is high and BMI is proportional and hyperlipidemia is equal to 1 YES
and IBS is greater than or equal to 200 mg/dl AND Class YES ELSE IF plasmainsulin is low
AND FBS is less than or equal to 126 mg/dl then Class NO
R3: AND blood pressure is greater than or equal to 140/90 mmHg AND IBS is less than NO
or equal to 200 mg/dl THEN class NO ELSE IF plasmainsulin is low AND FBS is greater than
or equal to 126 mg/dl AND BMI is proportional AND IBS is less than or equal to 200 mg/dl
THEN class NO
R4 : ELSE IF plasmainsulin is high AND BMI is thin AND FBS is greater than or equal to YES
126 mg/dl AND hyperlipidemia is equal to 1 THEN class Yes
R5 : hyperlipidemia is equal to 1 AND IBS is greater than or equal to 200 mg/d AND FBS is NO
greater than or equal to 126 mg/dl THEN class NO
The following Figure 3 shows the decision tree for the above deduced Rules
Figure 3: Decision Tree for Type 2 Diabetics for the age group 0-20
Figure 4: Accuracy Chart
Issn 2250-3005(online) October| 2012 Page 39
8. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
The above graph gives a comparative report of predicting clinical trial of type 2 diabetics patients for age group 0-20
years, Decision tree method is the best prediction from the other two algorithms.
Our study involved every other children's hospital and NRI Hospital (n = 31), each diabetologist in private practice (n
= 122), and every internal medicine unit (n = 164) in BW. A written clinical results and test with questionnaire and were used
to identify children with T2DM and MODY who had been examined at any of these institutions between 2009 and 2011.
Population data were drawn from the national census of 2011 and the subsequent annual updates.
Fig 5: Count of Diabetic Effected Persons for the Age Group 0-20 Years
Fig 6 : BMI Comparison Cart for Age Groups 0-20
From the live chart (Figure 4, figure 5 and Figure 6), Type 2 diabetes is estimated to affect next over 4.5 million people
in the Andhra Pradesh , India. The likelihood of developing type 2 diabetes is influenced by genetics and environment, If either
parent has type 2 diabetes, the risk of inheritance of type 2 diabetes is 15%, If both parents have type 2 diabetes, the risk of
inheritance is 75%, Almost 1 in 3 people with type 2 diabetes develops over kidney disease, Within 5 years of diagnosis of type
2 diabetes, 60% of people diagnosed have some degree of retinopathy, type 2 diabetes carries the risk of diabetes complications
over a long period of time.
The major complication effect the patients with type 2 diabetes from the above three graphs :
Retinopathy. Up to 20%, most commonly occurs after the onset of puberty and after 5 to 10 years of diabetes duration, it has
been reported in prepubertal children and with diabetes duration of only 1 to 2 years. Referrals should be made to eye care
professionals with expertise in diabetic retinopathy, an understanding of the risk for retinopathy in the pediatric population, as
well as experience in counseling the pediatric patient and family on the importance of early prevention/intervention. For
children with type 2 diabetes, the first ophthalmologic examination should be obtained once the child is 10 years of age or older
and has had diabetes for 3 to 5 years.
In type 2 diabetes, the initial examination should be shortly after diagnosis. In type 1 and type 2 diabetes, annual
routine follow-up is generally recommended. Less frequent examinations may be acceptable on the advice of an eye care
professional.
Issn 2250-3005(online) October| 2012 Page 40
9. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
Nephropathy. – 12% of the patients were affected, To reduce the risk and/or slow the progression of nephropathy, optimize
glucose and blood pressure control. In type 2 diabetes, annual screening should be considered at diagnosis. Screening may be
done with a random spot urine sample analyzed for microalbumin-to-creatinine ratio. Confirmed, persistently elevated
microalbumin levels should be treated with an ACE inhibitor, titrated to normalization of microalbumin excretion if possible.
Neuropathy. Although it is unclear whether foot examinations are important in children and adolescents, annual foot
examinations are painless, inexpensive, and provide an opportunity for education about foot care. The risk for foot
complications is increased in people who have had diabetes over 10 years.
Lipids. Based on data obtained from studies in adults, having diabetes is equivalent to having had a heart attack, making
diabetes a key risk factor for future cardiovascular disease.
In children older than 2 years of age with a family history of total cholesterol over 240 mg/dl, or a cardiovascular
event before age 55, or if family history is unknown, perform a lipid profile after diagnosis of diabetes and when glucose
control has been established. If family history is not a concern, then perform a lipid profile at puberty.
Children with type 2 diabetes should have a lipid profile soon after diagnosis when blood glucose control has been
achieved and annually thereafter. Experts also recommend lipid testing every two years if the lipid profile is normal.
To assess the prevalence of type 2 diabetes mellitus (T2DM) and Maturity onset diabetes of the young (MODY) in
children and adolescents aged 0-20 yr in Guntur-Vijawayada(AP), India, and to compare our results with various algorithm
techniques.
4. Conclusions:
The prevalence of T2DM for the age range from 0 to 20 yr is 2.30/100 000, whereas the prevalence of MODY in the
same age range is 2.39/100 000. The median age of patients with T2DM was 15.8 yr, and 13.9 yr for MODY patients. The
majority of patients with either T2DM or MODY were treated in children's hospitals and by consultant diabetologists. A
molecular genetic analysis was done to substantiate the clinical diagnosis in less than half of the recruits (14.3% of T2DM and
44.8% of MODY patients). The prevalence of T2DM and MODY is considerably lower than the prevalence of type 1 diabetes.
Type 2 diabetes thus continues to be a rare disease in children and adolescents in Andhra Pradesh, as is also the case in other
states of India.
This paper collects and analyzes medical patient record of type-2 diabetes mellitus (T2DM) with knowledge discovery
techniques to extract the information in the form of text or script format from T2DM patients in one of public hospital in
Guntur, Andhra Pradesh, India. The experiment has successfully performed with several data mining techniques and Decision
Tree mechanism as part of data mining technique achieves better performance than other classical methods such as K-NN
Method. Extracted rules using decision tree are conformed to clinician‟s knowledge and more importantly, we found some
major attributes such as plasmainsulin, BMI became a significant factor in our case study.This research might have some
limitations and is being optimized. Later, it will focus on increasing the datasets in order to maximize result and discover novel
optimal algorithm. As further researches, it would interest to include other risk factors such as sedentary lifestyle, Familiy
History, Blood Pressure and Smoker.
References
[1] International Diabetes Federation (IDF), What is diabetes?, World Health Organisation, accessed January 2010,
http://www.idf.org
[2] Zang Ping, et al. Economic Impact of Diabetes, International Diabetes Federation, accessed January 2010,
http://www.diabetesatlas.org/sites/default/files/Economic% 20impact%20of%20Diabetes.pdf.
[3] Holt, Richard I. G., et al, editors. Textbook of Diabetes. 4th ed., West Sussex: Wiley-Blackwell; 2010.
[4] National Diabetes Information Clearinghouse (NDIC), The Diabetes Control and Complications Trial and Follow-up
Study, accessed January 2010, http://diabetes.niddk.nih.gov/dm/pubs/control.
[5] N. Lavrac, E. Keravnou, and B. Zupan, Intelligent Data Analysis in Medicine, in Encyclopedia of Computer Science and
Technology, vol.42, New York: Dekker, 2000.
[6] Olson, David L and Dursun Dulen. Advanced Data Mining Techniques, Berlin: Springer Verlag, 2008.
[7] Huang, Y., et al. Feature Selection and Classification Model Construction on Type 2 Diabetic Patients‟ Data. Journal of
Artificial Intelligence in Medicine, 2007; 41: 251-262.
[8] Barakat, et al. Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus. IEEE Transactions on
Information Technology in BioMedicine, 2009.
[9] Polat, Kemal and Salih Gunes. An Expert System Approach Based on Principal Component Analysis and Adaptive
Neuro-Fuzzy Inference System to Diagnosis of Diabetes Disease. Expert System with Applications, Elsevier, 2007: 702-
710.
Issn 2250-3005(online) October| 2012 Page 41
10. International Journal Of Computational Engineering Research (ijceronline.com) Vol. 2 Issue. 6
[10] Yue, et al. An Intelligent Diagnosis to Type 2 Diabetes Based on QPSO Algorithm and WLSSVM. International
Symposium on Intelligent Information Technology Application Workshops, IEEE Computer Society, 2008.
[11] Vapnik, V. The Nature of Statistical Learning Theory 2nd Edition, New York: Springer Verlag, 2000.
[12] Witten, I.H., Frank, E. Data mining: Practical Machine Learning Tools and Techniques 2nd Edition. San Fransisco:
Morgan Kaufmann. 2005.
[13] Alpaydm, Ethem. Introduction to Machine Learning, Massachusetts: MIT Press, 2004: 154-155.
[14] Han, J. and Micheline Kamber. Data Mining: Concepts and Techniques, San Fransisco: Morgan Kaufmann Publisher,
2006: 310-311.
[15] Kohavi, R., Scaling Up the Accuracy of Naive Bayes Classifiers: A Decision Tree Hybrid, Proceedings of the 2nd
International Conference on Knowledge Discovery and Data Mining, 1996.
[16] Freund, Y., Schapire, R.E. Experiments with a New Boosting Algorithm. Proceedings of the Thirteenth International
Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1996: 148–156.
[17] Opitz, D., Maclin, R.: Popular Ensemble Methods: An Empirical Study. Journal of Artificial Intelligence Research,1999,
11: 169–198.
[18] Fawcett, Tom. An Introduction to ROC Analysis. Pattern Recognition Letters, Elsevier, 2006; 27: 861- 874.
[19] Zou, Kelly H. ROC literature research, On-line bibiography accessed February 2011,
http://www.spl.harvard.edu/archive/spl-pre2007/pages/ppl/zou/roc.html
[20] V.V.Jaya Rama Krishnaiah, D.V.Chandra Sekhar, Dr. K.Ramchand H Rao, Dr. R Satya Prasad, Predicting the Diabetes
using Duo Mining Approach, International Journal of Advanced Research in Computer and Communication
Engineering, Vol. 1, Issue 6, August 2012
AUTHOR’S PROFILE
V.V.Jaya Rama Krishnaiah, received Master„s degree in Computer Application from Acharya Nagrajuna
University,Guntur, India, Master of Philosophy from Vinayaka University, Salem . He is currently working
as Associate Professor, in the Department of Computer Science, A.S.N. Degree College, Tenali, which is
affiliated to Acharya Nagarjuna University. He has 14 years teaching experience. He is currently pursuing
Ph.D., at Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur,
Andhra Pradesh, India. His research area is Clustering in Databases. He has published several papers in
National & International Journals.
D.V. Chandra Shekar, received Master of Engineering with Computer Science & Engineering He is
currently working as Associate Professor, in the Department of Computer Science, T.J.P.S COLLEGE (P.G
COURSES),Guntur, which is affiliated to Acharya Nagarjuna University. He has 14 years teaching
experience and 1 years of Industry experience. He has published 52 papers in National & International
Journals.
Dr. R.Satya Prasad, received Ph.D in Computer Sceicne in the faculty of Engineering in 2007 from Acharya
Nagarjuna University, Guntur, Andhra Pradesh, India. He have a satisfactory consistent academic track of
record and received Gold medal from Acharya Nagarjuna University for his outstanding performance in a
first rank in Masters Degree. He is currently working as Associate Professor in the Department of Computer
Science and Engineering.
Dr.K Ramchand H Rao, received Doctorate in from Acharya Nagarjuna University, Master„s degree in
Technology with Computer Science from Dr. M.G.R University, Chennai, Tamilnadu, India. He is currently
working as Professor and Head of the Department, Department of Computer Science and Engineering, A.S.N.
Women‟s Engineering College, Tenali, which is affiliated to JNTU Kakinada. He has 18 years teaching
experience and 2 years of Industry experience at Morgan Stanly, USA as Software Analyst. His research area
is Software Engineering. He has published several papers in National & International Journals.
Issn 2250-3005(online) October| 2012 Page 42