The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
Performance evaluation of random forest with feature selection methods in pre...IJECEIAES
Data mining is nothing but the process of viewing data in different angle and compiling it into appropriate information. Recent improvements in the area of data mining and machine learning have empowered the research in biomedical field to improve the condition of general health care. Since the wrong classification may lead to poor prediction, there is a need to perform the better classification which further improves the prediction rate of the medical datasets. When medical data mining is applied on the medical datasets the important and difficult challenges are the classification and prediction. In this proposed work we evaluate the PIMA Indian Diabtes data set of UCI repository using machine learning algorithm like Random Forest along with feature selection methods such as forward selection and backward elimination based on entropy evaluation method using percentage split as test option. The experiment was conducted using R studio platform and we achieved classification accuracy of 84.1%. From results we can say that Random Forest predicts diabetes better than other techniques with less number of attributes so that one can avoid least important test for identifying diabetes.
Various Data Mining Techniques for Diabetes Prognosis: A Reviewijtsrd
Most of the food we eat is converted to glucose, or sugar which is used for energy. When you have diabetes, your body either doesnt make enough insulin or cannot use its own insulin as well as it should. This causes sugar to build up in your blood leading to complications like heart disease, stroke, neuropathy, poor circulation leading to loss of limbs, blindness, kidney failure, nerve damage, and death. Data mining adopts a series of pattern recognition technologies and statistical and mathematical techniques to discover the possible rules or relationships that govern the data in the databases. Data mining plays an important role in data prediction. There are different types of diseases predicted in data mining namely Hepatitis, Lung Cancer, Liver disorder, Breast cancer, Thyroid disease, Diabetes etc¦ This paper analyzes the Diabetes predictions. Misba Reyaz | Gagan Dhawan"Various Data Mining Techniques for Diabetes Prognosis: A Review" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd12927.pdf http://www.ijtsrd.com/engineering/computer-engineering/12927/various-data-mining-techniques-for-diabetes-prognosis-a-review/misba-reyaz
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESIAEME Publication
Diabetes mellitus is a common disease caused by a set of metabolic ailments
where the sugar stages over drawn-out period is very high. It touches diverse organs
of the human body which therefore harm a huge number of the body's system, in
precise the blood strains and nerves. Early prediction in such disease can be exact
and save human life. To achieve the goal, this research work mainly discovers
numerous factors associated to this disease using machine learning techniques.
Machine learning methods provide effectual outcome to extract knowledge by building
predicting models from diagnostic medical datasets together from the diabetic
patients. Quarrying knowledge from such data can be valuable to predict diabetic
patients. In this research, six popular used machine learning techniques, namely
Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), C4.5 Decision
Tree (DT), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) are
compared in order to get outstanding machine learning techniques to forecast diabetic
mellitus. Our new outcome shows that Support Vector Machine (SVM) achieved
higher accuracy compared to other machine learning techniques.
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET
Abstract
Omics techniques (e.g., i.e., transcriptomics, genomics, and epigenomics) report quantitative measures of more than tens of thousands of biological features and provide a more comprehensive molecular perspective of studied diabetes mechanisms compared to transitional approaches. Identifying representative molecular signatures from the tremendous number of biological features becomes a central problem in utilizing the data for clinical decision-making. Exploring the complex causal relations of the identified representative molecular signatures and diabetes phenotypes can be the most effective and efficient ways to improve the understanding of diabetes and assess the cause of diabetes for the new patients with already collected data influencing (e.g., TEDDY project). However, due to the unavoidable patient heterogeneity, statistical randomness, and experimental noise in the high-dimension, low-sample-size omics data of the diabetic patients, utilizing the available data for clinical decision-making remains an ongoing challenge for many researchers. To overcome the limitations, in this study we developed (1) a generative adversarial network (GAN)-based model to generate synthetic omics data for the samples with few omics profiles available; (2) a deep learning-based fusion network model for phenotype prediction of type-1 diabetes; (3) a long short-term memory (LSTM)-based model for predicting outcomes of islet autoantibody and persistent positivity. The models are tested on the multi-omics data in TEDDY project.
Presenter: Wei Zhang, Ph.D. Assistant Professor, Department of Computer Science & Genomics and Bioinformatics Cluster, University of Central Florida
Upcoming webinars schedule: https://dknet.org/about/webinar
Performance evaluation of random forest with feature selection methods in pre...IJECEIAES
Data mining is nothing but the process of viewing data in different angle and compiling it into appropriate information. Recent improvements in the area of data mining and machine learning have empowered the research in biomedical field to improve the condition of general health care. Since the wrong classification may lead to poor prediction, there is a need to perform the better classification which further improves the prediction rate of the medical datasets. When medical data mining is applied on the medical datasets the important and difficult challenges are the classification and prediction. In this proposed work we evaluate the PIMA Indian Diabtes data set of UCI repository using machine learning algorithm like Random Forest along with feature selection methods such as forward selection and backward elimination based on entropy evaluation method using percentage split as test option. The experiment was conducted using R studio platform and we achieved classification accuracy of 84.1%. From results we can say that Random Forest predicts diabetes better than other techniques with less number of attributes so that one can avoid least important test for identifying diabetes.
Various Data Mining Techniques for Diabetes Prognosis: A Reviewijtsrd
Most of the food we eat is converted to glucose, or sugar which is used for energy. When you have diabetes, your body either doesnt make enough insulin or cannot use its own insulin as well as it should. This causes sugar to build up in your blood leading to complications like heart disease, stroke, neuropathy, poor circulation leading to loss of limbs, blindness, kidney failure, nerve damage, and death. Data mining adopts a series of pattern recognition technologies and statistical and mathematical techniques to discover the possible rules or relationships that govern the data in the databases. Data mining plays an important role in data prediction. There are different types of diseases predicted in data mining namely Hepatitis, Lung Cancer, Liver disorder, Breast cancer, Thyroid disease, Diabetes etc¦ This paper analyzes the Diabetes predictions. Misba Reyaz | Gagan Dhawan"Various Data Mining Techniques for Diabetes Prognosis: A Review" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd12927.pdf http://www.ijtsrd.com/engineering/computer-engineering/12927/various-data-mining-techniques-for-diabetes-prognosis-a-review/misba-reyaz
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESIAEME Publication
Diabetes mellitus is a common disease caused by a set of metabolic ailments
where the sugar stages over drawn-out period is very high. It touches diverse organs
of the human body which therefore harm a huge number of the body's system, in
precise the blood strains and nerves. Early prediction in such disease can be exact
and save human life. To achieve the goal, this research work mainly discovers
numerous factors associated to this disease using machine learning techniques.
Machine learning methods provide effectual outcome to extract knowledge by building
predicting models from diagnostic medical datasets together from the diabetic
patients. Quarrying knowledge from such data can be valuable to predict diabetic
patients. In this research, six popular used machine learning techniques, namely
Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), C4.5 Decision
Tree (DT), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) are
compared in order to get outstanding machine learning techniques to forecast diabetic
mellitus. Our new outcome shows that Support Vector Machine (SVM) achieved
higher accuracy compared to other machine learning techniques.
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET
Abstract
Omics techniques (e.g., i.e., transcriptomics, genomics, and epigenomics) report quantitative measures of more than tens of thousands of biological features and provide a more comprehensive molecular perspective of studied diabetes mechanisms compared to transitional approaches. Identifying representative molecular signatures from the tremendous number of biological features becomes a central problem in utilizing the data for clinical decision-making. Exploring the complex causal relations of the identified representative molecular signatures and diabetes phenotypes can be the most effective and efficient ways to improve the understanding of diabetes and assess the cause of diabetes for the new patients with already collected data influencing (e.g., TEDDY project). However, due to the unavoidable patient heterogeneity, statistical randomness, and experimental noise in the high-dimension, low-sample-size omics data of the diabetic patients, utilizing the available data for clinical decision-making remains an ongoing challenge for many researchers. To overcome the limitations, in this study we developed (1) a generative adversarial network (GAN)-based model to generate synthetic omics data for the samples with few omics profiles available; (2) a deep learning-based fusion network model for phenotype prediction of type-1 diabetes; (3) a long short-term memory (LSTM)-based model for predicting outcomes of islet autoantibody and persistent positivity. The models are tested on the multi-omics data in TEDDY project.
Presenter: Wei Zhang, Ph.D. Assistant Professor, Department of Computer Science & Genomics and Bioinformatics Cluster, University of Central Florida
Upcoming webinars schedule: https://dknet.org/about/webinar
Predictive Analytics and Machine Learning for Healthcare - DiabetesDr Purnendu Sekhar Das
Machine Learning on clinical datasets to predict the risk of chronic disease conditions like Type 2 Diabetes mellitus beforehand; as well as predicting outcomes like hospital readmission using EMR RWE data.
This is a small presentation on my project , diabetes prediction using R language.The method used is knn(K nearest neighbour). it the basic Machine learning algorithm.
We are predicting Heart Disease by Taking 14 Medical Parameters as an inputs through 2 data Minning Techniques(Decision Tree(Faster) And KNN neighbour Algorithms(Slower)).
And Visualizing The dataset.If the output 1 then it means Higher Chances of getting Heart Attack ,if 0 then it means Less chances of Heart Attack.
Machine Learning for Disease PredictionMustafa Oğuz
A great application field of machine learning is predicting diseases. This presentation introduces what is preventable diseases and deaths. Then examines three diverse papers to explain what has been done in the field and how the technology works. Finishes with future possibilities and enablers of the disease prediction technology.
Psdot 14 using data mining techniques in heartZTech Proje
FINAL YEAR IEEE PROJECTS,
EMBEDDED SYSTEMS PROJECTS,
ENGINEERING PROJECTS,
MCA PROJECTS,
ROBOTICS PROJECTS,
ARM PIC BASED PROJECTS, MICRO CONTROLLER PROJECTS Z Technologies, Chennai
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
Data mining techniques are used for a variety of applications. In healthcare industry, datamining plays an important
role in predicting diseases. For detecting a disease number of tests should be required from the patient. But using data
mining technique the number of tests can be reduced. This reduced test plays an important role in time and performance.
This report analyses data mining techniques which can be used for predicting different types of diseases. This report reviewed
the research papers which mainly concentrate on predicting various disease
A major challenge facing healthcare organizations (hospitals, medical centers) is
the provision of quality services at affordable costs. Quality service implies diagnosing
patients correctly and administering treatments that are effective. Poor clinical decisions
can lead to disastrous consequences which are therefore unacceptable. Hospitals must
also minimize the cost of clinical tests. They can achieve these results by employing
appropriate computer-based information and/or decision support systems.
Most hospitals today employ some sort of hospital information systems to manage
their healthcare or patient data.
These systems are designed to support patient billing, inventory management and generation of simple statistics. Some hospitals use decision support systems, but they are largely limited. Clinical decisions are often made based on doctors’ intuition and experience rather than on the knowledge rich data hidden in the database.
This practice leads to unwanted biases, errors and excessive medical costs which affects the quality of service provided to patients.
Prediction of Heart Disease using Machine Learning Algorithms: A Surveyrahulmonikasharma
According to recent survey by WHO organisation 17.5 million people dead each year. It will increase to 75 million in the year 2030[1].Medical professionals working in the field of heart disease have their own limitation, they can predict chance of heart attack up to 67% accuracy[2], with the current epidemic scenario doctors need a support system for more accurate prediction of heart disease. Machine learning algorithm and deep learning opens new door opportunities for precise predication of heart attack. Paper provideslot information about state of art methods in Machine learning and deep learning. An analytical comparison has been provided to help new researches’ working in this field.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%
Predictive Analytics and Machine Learning for Healthcare - DiabetesDr Purnendu Sekhar Das
Machine Learning on clinical datasets to predict the risk of chronic disease conditions like Type 2 Diabetes mellitus beforehand; as well as predicting outcomes like hospital readmission using EMR RWE data.
This is a small presentation on my project , diabetes prediction using R language.The method used is knn(K nearest neighbour). it the basic Machine learning algorithm.
We are predicting Heart Disease by Taking 14 Medical Parameters as an inputs through 2 data Minning Techniques(Decision Tree(Faster) And KNN neighbour Algorithms(Slower)).
And Visualizing The dataset.If the output 1 then it means Higher Chances of getting Heart Attack ,if 0 then it means Less chances of Heart Attack.
Machine Learning for Disease PredictionMustafa Oğuz
A great application field of machine learning is predicting diseases. This presentation introduces what is preventable diseases and deaths. Then examines three diverse papers to explain what has been done in the field and how the technology works. Finishes with future possibilities and enablers of the disease prediction technology.
Psdot 14 using data mining techniques in heartZTech Proje
FINAL YEAR IEEE PROJECTS,
EMBEDDED SYSTEMS PROJECTS,
ENGINEERING PROJECTS,
MCA PROJECTS,
ROBOTICS PROJECTS,
ARM PIC BASED PROJECTS, MICRO CONTROLLER PROJECTS Z Technologies, Chennai
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
Data mining techniques are used for a variety of applications. In healthcare industry, datamining plays an important
role in predicting diseases. For detecting a disease number of tests should be required from the patient. But using data
mining technique the number of tests can be reduced. This reduced test plays an important role in time and performance.
This report analyses data mining techniques which can be used for predicting different types of diseases. This report reviewed
the research papers which mainly concentrate on predicting various disease
A major challenge facing healthcare organizations (hospitals, medical centers) is
the provision of quality services at affordable costs. Quality service implies diagnosing
patients correctly and administering treatments that are effective. Poor clinical decisions
can lead to disastrous consequences which are therefore unacceptable. Hospitals must
also minimize the cost of clinical tests. They can achieve these results by employing
appropriate computer-based information and/or decision support systems.
Most hospitals today employ some sort of hospital information systems to manage
their healthcare or patient data.
These systems are designed to support patient billing, inventory management and generation of simple statistics. Some hospitals use decision support systems, but they are largely limited. Clinical decisions are often made based on doctors’ intuition and experience rather than on the knowledge rich data hidden in the database.
This practice leads to unwanted biases, errors and excessive medical costs which affects the quality of service provided to patients.
Prediction of Heart Disease using Machine Learning Algorithms: A Surveyrahulmonikasharma
According to recent survey by WHO organisation 17.5 million people dead each year. It will increase to 75 million in the year 2030[1].Medical professionals working in the field of heart disease have their own limitation, they can predict chance of heart attack up to 67% accuracy[2], with the current epidemic scenario doctors need a support system for more accurate prediction of heart disease. Machine learning algorithm and deep learning opens new door opportunities for precise predication of heart attack. Paper provideslot information about state of art methods in Machine learning and deep learning. An analytical comparison has been provided to help new researches’ working in this field.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%
Chronic Kidney Disease prediction is one of the most important issues in healthcare analytics. The most interesting and challenging tasks in day to day life is prediction in medical field. In this paper, we employ some machine learning techniques for predicting the chronic kidney disease using clinical data. We use three machine learning algorithms such as Decision Tree(DT) algorithm, Naive Bayesian (NB) algorithm. The performance of the above models are compared with each other in order to select the best classifier in predicting the chronic kidney disease for given dataset.
Prognosis of Cardiac Disease using Data Mining Techniques A Comprehensive Surveyijtsrd
The Healthcare exchange generally clinical diagnosis is ended commonly by doctor's knowledge and practice. Computer Aided Decision Support System plays a major task in the medical field. Data mining provides the methodology and technology to modify these rises of data into valuable data for decision making. By utilizing data mining techniques it requires less time for the prediction of the diseases with more accuracy. Among the expanding research on coronary diseases predicting system, it has happened significant to classifications the exploration results and gives readers with a layout of the current coronary diseases forecast strategies in every discussion. Data mining tools can respond to exchange addresses that expectedly being used much time over riding to decide. In this paper we study different papers in which at least one algorithm of data mining used for the prediction of coronary diseases. As of the study it is observed that Naïve Bayes Technique increase the accuracy of the coronary diseases prediction system. The commonly used techniques for Heart Disease Prediction and their complexities are outlined in this paper. D. Haripriya | Dr. M. Lovelin Ponn Felciah "Prognosis of Cardiac Disease using Data Mining Techniques: A Comprehensive Survey" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26605.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26605/prognosis-of-cardiac-disease-using-data-mining-techniques-a-comprehensive-survey/d-haripriya
A comparative analysis of classification techniques on medical data setseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
As we know that health care industry is completely based on assumptions, which after get tested and verified via various tests and patient have to be depend on the doctors knowledge on that topic . so we made a system that uses data mining techniques to predict the health of a person based on various medical test results. so we can predict the health of that person based on that analysis performed by the system.The system currently design only for heart issues, for that we had used Statlog (Heart) Data Set from UCI Machine Learning Repository it includes attributes like age, sex, chest pain type, cholesterol, sugar, outcomes,etc.for training the system. we only need to passed few general inputs in order to generate the prediction and the prediction results from all algorithms are they merged together by calculating there mean value that value shows the actual outcome of the prediction process which entirely works in background
An Approach for Disease Data Classification Using Fuzzy Support Vector MachineIOSRJECE
: Data Mining has great scope in the field of medicine. In this article we introduced one new fuzzy approach for prediction of hepatitis disease. Many researchers have proposed the use of K-nearest neighbor (KNN) for diabetes disease prediction. Some have proposed a different approach by using K-means clustering for reprocessing and then using KNN for classification. In our approach Naive Bayes classifier is used to clean the data. Finally, the classification is done using Fuzzy SVM algorithm. Hepatitis diseases data set is used to test our method. We are able to obtain model more precise than any others available in the literature. The Fuzzy SVM approach produced better result than KNN with Fuzzy c-meansand Fuzzy KNN with Fuzzy c-means. Theintroduction of Fuzzy Support Vector Machine algorithm certainly has a positive effect on the outcome of hepatitis disease. This fuzzy SVM model led to remarkable increase in classification accuracy
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION IJCI JOURNAL
Data mining is a non-trivial process of categorizing valid, novel, potentially useful and ultimately understandable patterns in data. In terms, it accurately state as the extraction of information from a huge database. Data mining is a vital role in several applications such as business organizations, educational institutions, government sectors, health care industry, scientific and engineering. . In the health care
industry, the data mining is predominantly used for disease prediction. Enormous data mining techniques are existing for predicting diseases namely classification, clustering, association rules, summarizations, regression and etc. The main objective of this research work is to predict kidney diseases using classification algorithms such as Naïve Bayes and Support Vector Machine. This research work mainly
focused on finding the best classification algorithm based on the classification accuracy and execution time performance factors. From the experimental results it is observed that the performance of the SVM is better than the Naive Bayes classifier algorithm.
In the era of data-driven warfare, the integration of big data and machine learning (ML) techniques has
become paramount for enhancing defence capabilities. This research report delves into the applications of
big data and ML in the defence sector, exploring their potential to revolutionize intelligence gathering,
strategic decision-making, and operational efficiency. By leveraging vast amounts of data and advanced
algorithms, these technologies offer unprecedented opportunities for threat detection, predictive analysis,
and optimized resource allocation. However, their adoption also raises critical concerns regarding data
privacy, ethical implications, and the potential for misuse. This report aims to provide a comprehensive
understanding of the current state of big data and ML in defence, while examining the challenges and
ethical considerations that must be addressed to ensure responsible and effective implementation.
Cloud Computing, being one of the most recent innovative developments of the IT world, has been
instrumental not just to the success of SMEs but, through their productivity and innovative contribution to
the economy, has even made a remarkable contribution to the economic growth of the United States. To
this end, the study focuses on how cloud computing technology has impacted economic growth through
SMEs in the United States. Relevant literature connected to the variables of interest in this study was
reviewed, and secondary data was generated and utilized in the analysis section of this paper. The findings
of this paper revealed that there have been meaningful contributions that the usage of virtualization has
made in the commercial dealings of small firms in the United States, and this has also been reflected in the
economic growth of the country. This paper further revealed that as important as cloud-based software is,
some SMEs are still skeptical about how it can help improve their business and increase their bottom line
and hence have failed to adopt it. Apart from the SMEs, some notable large firms in different industries,
including information and educational services, have adopted cloud computing technology and hence
contributed to the economic growth of the United States. Lastly, findings from our inferential statistics
revealed that no discernible change has occurred in innovation between small and big businesses in the
adoption of cloud computing. Both categories of businesses adopt cloud computing in the same way, and
their contribution to the American economy has no significant difference in the usage of virtualization.
Energy-constrained Wireless Sensor Networks (WSNs) have garnered significant research interest in
recent years. Multiple-Input Multiple-Output (MIMO), or Cooperative MIMO, represents a specialized
application of MIMO technology within WSNs. This approach operates effectively, especially in
challenging and resource-constrained environments. By facilitating collaboration among sensor nodes,
Cooperative MIMO enhances reliability, coverage, and energy efficiency in WSN deployments.
Consequently, MIMO finds application in diverse WSN scenarios, spanning environmental monitoring,
industrial automation, and healthcare applications.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication. IJCSIT publishes original research papers and review papers, as well as auxiliary material such as: research papers, case studies, technical reports etc.
With growing, Car parking increases with the number of car users. With the increased use of smartphones
and their applications, users prefer mobile phone-based solutions. This paper proposes the Smart Parking
Management System (SPMS) that depends on Arduino parts, Android applications, and based on IoT. This
gave the client the ability to check available parking spaces and reserve a parking spot. IR sensors are
utilized to know if a car park space is allowed. Its area data are transmitted using the WI-FI module to the
server and are recovered by the mobile application which offers many options attractively and with no cost
to users and lets the user check reservation details. With IoT technology, the smart parking system can be
connected wirelessly to easily track available locations.
Welcome to AIRCC's International Journal of Computer Science and Information Technology (IJCSIT), your gateway to the latest advancements in the dynamic fields of Computer Science and Information Systems.
Computer-Assisted Language Learning (CALL) are computer-based tutoring systems that deal with
linguistic skills. Adding intelligence in such systems is mainly based on using Natural Language
Processing (NLP) tools to diagnose student errors, especially in language grammar. However, most such
systems do not consider the modeling of student competence in linguistic skills, especially for the Arabic
language. In this paper, we will deal with basic grammar concepts of the Arabic language taught for the
fourth grade of the elementary school in Egypt. This is through Arabic Grammar Trainer (AGTrainer)
which is an Intelligent CALL. The implemented system (AGTrainer) trains the students through different
questions that deal with the different concepts and have different difficulty levels. Constraint-based student
modeling (CBSM) technique is used as a short-term student model. CBSM is used to define in small grain
level the different grammar skills through the defined skill structures. The main contribution of this paper
is the hierarchal representation of the system's basic grammar skills as domain knowledge. That
representation is used as a mechanism for efficiently checking constraints to model the student knowledge
and diagnose the student errors and identify their cause. In addition, satisfying constraints and the number
of trails the student takes for answering each question and fuzzy logic decision system are used to
determine the student learning level for each lesson as a long-term model. The results of the evaluation
showed the system's effectiveness in learning in addition to the satisfaction of students and teachers with its
features and abilities.
In the realm of computer security, the importance of efficient and reliable user authentication methods has
become increasingly critical. This paper examines the potential of mouse movement dynamics as a
consistent metric for continuous authentication. By analysing user mouse movement patterns in two
contrasting gaming scenarios, "Team Fortress" and "Poly Bridge," we investigate the distinctive
behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond
conventional methodologies by employing a range of machine learning models. These models are carefully
selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as
reflected in their mouse movements. This multifaceted approach allows for a more nuanced and
comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement
dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine
learning models employed in this study demonstrate competent performance in user verification, marking
an improvement over previous methods used in this field. This research contributes to the ongoing efforts to
enhance computer security and highlights the potential of leveraging user behavior, specifically mouse
dynamics, in developing robust authentication systems.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This research aims to further understanding in the field of continuous authentication using behavioural
biometrics. We are contributing a novel dataset that encompasses the gesture data of 15 users playing
Minecraft with a Samsung Tablet, each for a duration of 15 minutes. Utilizing this dataset, we employed
machine learning (ML) binary classifiers, being Random Forest (RF), K-Nearest Neighbors (KNN), and
Support Vector Classifier (SVC), to determine the authenticity of specific user actions. Our most robust
model was SVC, which achieved an average accuracy of approximately 90%, demonstrating that touch
dynamics can effectively distinguish users. However, further studies are needed to make it viable option
for authentication systems. You can access our dataset at the following
link:https://github.com/AuthenTech2023/authentech-repo
This paper discusses the capabilities and limitations of GPT-3 (0), a state-of-the-art language model, in the
context of text understanding. We begin by describing the architecture and training process of GPT-3, and
provide an overview of its impressive performance across a wide range of natural language processing
tasks, such as language translation, question-answering, and text completion. Throughout this research
project, a summarizing tool was also created to help us retrieve content from any types of document,
specifically IELTS (0) Reading Test data in this project. We also aimed to improve the accuracy of the
summarizing, as well as question-answering capabilities of GPT-3 (0) via long text
In the realm of computer security, the importance of efficient and reliable user authentication methods has
become increasingly critical. This paper examines the potential of mouse movement dynamics as a
consistent metric for continuous authentication. By analysing user mouse movement patterns in two
contrasting gaming scenarios, "Team Fortress" and "Poly Bridge," we investigate the distinctive
behavioral patterns inherent in high-intensity and low-intensity UI interactions. The study extends beyond
conventional methodologies by employing a range of machine learning models. These models are carefully
selected to assess their effectiveness in capturing and interpreting the subtleties of user behavior as
reflected in their mouse movements. This multifaceted approach allows for a more nuanced and
comprehensive understanding of user interaction patterns. Our findings reveal that mouse movement
dynamics can serve as a reliable indicator for continuous user authentication. The diverse machine
learning models employed in this study demonstrate competent performance in user verification, marking
an improvement over previous methods used in this field. This research contributes to the ongoing efforts to
enhance computer security and highlights the potential of leveraging user behavior, specifically mouse
dynamics, in developing robust authentication systems.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification.
This work highlights transfer learning’s effectiveness in image classification using CNNs and VGG 16 that
provides insights into the selection of pre-trained models and hyper parameters for optimal performance.
We have proposed a comprehensive approach for image segmentation and classification, incorporating preprocessing techniques, the K-means algorithm for segmentation, and employing deep learning models such
as CNN and VGG 16 for classification.
The security of Electric Vehicle (EV) charging has gained momentum after the increase in the EV adoption
in the past few years. Mobile applications have been integrated into EV charging systems that mainly use a
cloud-based platform to host their services and data. Like many complex systems, cloud systems are
susceptible to cyberattacks if proper measures are not taken by the organization to secure them. In this
paper, we explore the security of key components in the EV charging infrastructure, including the mobile
application and its cloud service. We conducted an experiment that initiated a Man in the Middle attack
between an EV app and its cloud services. Our results showed that it is possible to launch attacks against
the connected infrastructure by taking advantage of vulnerabilities that may have substantial economic and
operational ramifications on the EV charging ecosystem. We conclude by providing mitigation suggestions
and future research directions.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a open access peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
This paper describes the outcome of an attempt to implement the same transitive closure (TC) algorithm
for Apache MapReduce running on different Apache Hadoop distributions. Apache MapReduce is a
software framework used with Apache Hadoop, which has become the de facto standard platform for
processing and storing large amounts of data in a distributed computing environment. The research
presented here focuses on the variations observed among the results of an efficient iterative transitive
closure algorithm when run against different distributed environments. The results from these comparisons
were validated against the benchmark results from OYSTER, an open source Entity Resolution system. The
experiment results highlighted the inconsistencies that can occur when using the same codebase with
different implementations of Map Reduce.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
2024.06.01 Introducing a competency framework for languag learning materials ...
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSIS
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
DOI:10.5121/ijcsit.2018.10102 11
AN ALGORITHM FOR PREDICTIVE DATA
MINING APPROACH IN MEDICAL DIAGNOSIS
Shakuntala Jatav1
and Vivek Sharma2
1
M.Tech Scholar, Department of CSE, TIT College, Bhopal
2
Professor, Department of CSE, TIT College, Bhopal
ABSTRACT
The Healthcare industry contains big and complex data that may be required in order to discover
fascinating pattern of diseases & makes effective decisions with the help of different machine learning
techniques. Advanced data mining techniques are used to discover knowledge in database and for medical
research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more
number of input attributes. The data mining classification techniques, namely Support Vector
Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database.
The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well
as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the
experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease
respectively.
KEYWORDS
Data Mining, Clinical Decision Support System, Disease Prediction, Classification, SVM, RF.
1. INTRODUCTION
Computational health informatics is rising research topic that involving varied sciences like
biomedical, medical, nursing, data technology, computer science, and statistics [1]. Data mining
techniques are applied to predict the effectiveness of huge and complex clinical data in order to
diagnose disease and extract information to suggest effective medical assistance [2]. In bioscience,
doctor’s facilities introduced different knowledge frameworks with plenty of data to manage
medical insurance and patient information however unfortunately, knowledge don't seem to be
mined to find hidden data for effective decision [2][3].
Clinical test outcomes are often created on the basis of doctor’s perception and experience instead
of on the knowledge enrich data masked within the database and generally this procedure prompts
unintended predispositions, doctor’s experience might not be capable to diagnose it accurately that
affects the disease diagnosis system [2][3]. In aid sector, the term data mining will mean to
research the clinical data to predict patient’s health status. Therefore discovering fascinating
pattern from healthcare information, different data mining techniques are applied with statistical
analysis, machine learning and database technology.
Predictive systems are the systems that are wont to predict some outcome on the basis of some
pattern recognition, as shown in figure 1. Disease detection is that the method by which patient’s
diagnosis is performed on the basis of symptoms analyzed which may causes difficulty while
predicting disease affect [4]. As an example, fever itself could be a symptom of the many disorders
that doesn’t tell the healthcare professional what exactly the disease is. Because the results or
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
12
opinion vary from one physician to a different, there's a requirement to help a medical physician
which will have similar opinion certainly symptoms and disorders [5]. This may be done by
analyzing the information generated by medical data or medical records.
Figure 1: A Typical Health Informatics Processing
As a result, the new information is often compared with previous records and optimistic diagnosis
is often done. Predictive medical diagnosis could be a net application which is able to predict a
selected disorder on the basis of symptoms and supply diagnosis for same disorder which is able to
be detected by rule. Healthcare professionals use their previous data and insights to reach a
particular decision regarding any disease or disorder [6]. Within the similar manner, this paper
proposes different classification techniques for diagnosis by using generic disease datasets. This
paper brings into limelight all the benefits and drawbacks of using the various data mining
techniques for the prediction of diseases. It conjointly accounts for the prediction rate for various
techniques therefore, bringing out the comparison between each of them [7]-[10].
Data mining has been successfully used in knowledge discovery for predictive purposes to make
more active and accurate decision [11]. The main focus of the paper is on classification as well as
clustering techniques. In clustering process such as K-means, EM, Fuzzy c-means, etc, data is
partitioned into sets of clusters or sub-classes [12].Machine learning techniques such as KNN,
SVM, Naïve bayes etc, can be used to classify different objects on the basis of a training set of data
whose outcome value is known.
Clustering Techniques
The clustering process divides the data into cluster groups or subclasses. We used four clustering
algorithms, namely K-Means, EM, PAM, Fuzzy C-Means [12]. The K-Means classification
algorithm works by partitioning n observations in k-subclasses defined by centroids, where k is
chosen before the algorithm begins. K-Means and EM are two iterative algorithms. EM
(expectation-maximization) is a statistical model that depends on the unobserved latent variables to
estimate the maximum likelihood parameters. Partitioning around medoids (MAP) is similar to K-
means that partitioning is based on the K-medoids method, which divides data into a number of
disjoint clusters [12]. In fuzzy clustering, data elements can belong to multiple clusters. This is also
called soft clustering.
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
13
Classification Techniques
Machine learning based classification techniques can be used to classify various objects based on a
series of training data whose result value is known. In this study four classification algorithms are
used: KNN, SVM, Naive Bayes and C5.0. In the nearest neighbor KN, the object is classified by
the majority of its neighbors, with the object being assigned to the most commonly used class
among its nearest neighbors. In SVM (Support Vector Machines), data is first converted into a set
of points and then classified into classes that can be separated linearly. The Naive Bayes model
calculates the probability of a set of data that can belong to a class using the Bayes rule. The C5.0
algorithm is a decision tree that recursively separates observations in branches to form a tree to
improve prediction accuracy. It is an improved version of the C4.5 and ID3 algorithms [10]. It also
provides the powerful gain method to increase the accuracy of this classification algorithm [11].
2. RELATED WORK
In [1] neural networks, decision tree and naïve bayes machine learning approach is used to
diagnose heart disease. For optimized feature selection genetic algorithm is used and obtained an
accuracy of 100%, 99.62 and 90.74 respectively.
In [2], performed a prediction of heart disease detection using neural network with genetic
algorithm based feature extraction. Back propagation based neural network weight optimization by
Genetic algorithm is designed and obtained the 89% accuracy of prediction of heart disease.
In [3], author developed a heart disease prediction system using an approach of ANN with LVQ
and achieved accuracy of about 80%, sensitivity of about 85% sensitivity as well as specificity of
about 70%.
In [4] proposed a heart disease diagnosis is proposed using lazy data mining approach with data
reduction strategies i.e. principal component analysis is used to get category association rules. The
result analysis shows that J4.8 has 10.26 enhancement as well as 8.6% enhancement over naïve
bayes.
In [5] presented a heart disease prediction system using data mining approach with two additional
features i.e. obesity and smoking to boost the prediction rate. Neural networks, Decision trees and
Naive Bayes was used in for predicting heart disease with an accuracy of 99.25%, 94.44% and
96.66% respectively.
A web based application has introduced in [6] using Naïve Bayesian algorithm which took
symptoms from user and gave the diagnosis result to the user or patient. In [7] Association rule
mining technique was used for diagnosis of diabetes. The authors concluded that the data mining
techniques when used appropriately increases the computation and also the classification
performance. These rules have the potential to improve the expert system and to make better
clinical decision making. For predicting diabetes disease on weka tool, author in research work [8]
had presented a comparison between Naïve bayes algorithm and decision tree algorithm and
achieved system accuracy of about 79.56% and 76.96% respectively.
In [9] Decision Tree, Naive Bayes, and NBTree algorithms is used for liver disease detection with
10 features. The result analysis with respect to accuracy NBTree algorithm has the highest
accuracy whereas with respect to computational time Naive Bayes algorithm performs better.
In [11] author performed a comparative analysis on clustering and classification algorithms. The
result analysis shows that the classification is better than clustering algorithms with an accuracy of
about 81%.
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
14
In [12] author proposed classification with clustering technique i.e. KNN with FCM clustering and
F-KNN with FCM clustering. It is clear that the Fuzzy KNN with Fuzzy c-means model produced
the better result than the KNN with Fuzzy c-means model on both PIMA and Liver-disorder
datasets. It is also clear that the use of Fuzzy c-means clustering algorithm for preprocessing of
datasets improved the result in terms of classification accuracy and speed by reducing the number
of tuples from the original datasets. From experiment, it is been found that KNN with Fuzzy c-
means have accuracy of 97.02 and Fuzzy KNN with Fuzzy c-means have accuracy of 99.25 on
PIMA dataset whereas on Liver disorder KNN with Fuzzy c-means have accuracy of 96.13 and
Fuzzy KNN with Fuzzy c-means with accuracy of 98.95.
In [13] performed the chronic disease prediction by using data mining approach such as Naïve
Bayes, Decision tree, Support Vector Machine (SVM) and Artificial Neural Networks (ANN) for
the diagnosis of diabetes and heart disease. The result analysis shows that SVM gives highest
accuracy of 95.556% in case of heart disease and Naïve bayes gives accuracy of 73.588% in case
of diabetes. Table I gives the comparative analysis of different existing techniques for heart, liver
and diabetes diseases.
Table 1: Comparative Analysis of Different Techniques in terms of Accuracy
Name of Author Technique Accuracy
Bhatla et al.
Neural Network, Naïve Bayes
and Decision Tree for heart
disease detection.
Neural networks = 100 %
Decision tree= 99.62 %
Naïve Bayes 90.74 %
Amin et al.
Optimized Neural Network for
heart disease detection.
Training data was = 89%
Validation data = 96.2%.
Chen et al.
ANN based heart disease
detection.
ANN = 80%.
Dangare et al.
Decision trees, Neural networks
and Naive Bayes for heart
disease detection.
Neural Networks = 99.25%
Naive Bayes = 94.44 %
Decision Tree =96.66 %
Iyer et al.
Decision tree and Naïve bayes
algorithm for diabetes detection.
Decision Tree =76.96%
Naïve Bayes= 79.56%
Uma Ojha and
Savita Goel
Decision Tree, SVM, FCM
Decision tree (C5.0) =81%
SVM =81%
FCM = 37%
Chetty et al.
KNN and F-KNN for diabetes
and liver disease detection.
KNN = 97.02%
F-KNN = 99.25% for diabetes data.
KNN = 96.13%
F-KNN =98.95% for liver disease data.
Kumari Deepika
and Dr. S. Seema
Naïve Bayes, Decision tree,
Support Vector Machine (SVM)
and Artificial Neural Networks
(ANN) for diabetes and heart
disease detection.
SVM = 95.556% (heart disease)
Naïve Bayes = 73.588% (diabetes).
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
15
3. METHODOLOGY
3.1 Proposed Methodology
One of the interesting and important subjects among researchers in the field of medical and
computer science is diagnosing illness by considering the features that have the most impact on
recognitions. The subject discusses a new concept which is called Medical Data Mining (MDM).
Indeed, data mining methods use different ways such as classification and clustering to classify
diseases and their symptoms which are helpful for diagnosing.
A disease diagnosis system is created in order to predict different diseases such as diabetes,
kidney disease as well as liver disease, etc. System’s workflow is discussed below:
Step 1: Through the proposed application user (doctor, patient, physician etc.) can input the
attribute values of disease and send it to the decision support system for analysis.
Step 2: At decision support system, dataset of different diseases are loaded and apply data mining
algorithms to train dataset. Requested user inputs are collected and processed on server to predict
the diagnosis result.
Step 3: For analyzing healthcare data, major steps of data mining approaches like preprocess data,
replace missing values, feature selection, machine learning and make decision are applied on train
dataset. On the decision support system end different classification algorithms would be executed
on train dataset and ready to classify the test dataset.
In the proposed algorithm Support Vector Machine and Random Forest is used to give clustering
level for different subspaces. The voting model will ensemble all these results and output the final
classification result. Finally, the predicted results will be compared with true labels of the testing
phase.
Figure 2: Proposed Model
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
16
Support Vector Machine
Support vector machine is a machine learning approach that can be used as classifier as well as
for regression. SVM classifies the data into different classes by finding hyperplane (line) which
separates training data into classes. SVM does not overfit the data and gives best classification
performance in terms of precision and accuracy.
SVM does not make any strong assumptions on data. It shows more efficiency for correct
classification of the future data. SVM is classified into 2 categories i.e. Linear and non-Linear. In
linear approach, training data is separated by line i.e. hyperplane.
Random Forest
Random Forest algorithm is capable of performing each classification and regression tasks. the
fundamental principle of RF is that a group of weak learner’s come together to create a robust
learner. Random forest rule uses bagging approach to form the bunch of decision trees with
random set of the information. The model is trained few times on random sample of the dataset to
attain best prediction performance from the RF rule. In this ensemble technique of learning, the
output of all decision trees within the RF is combined to form a final prediction. The final
prediction of the RF rule is derived after polling the results of every decision tree.
Suppose there are N cases within the training set. Then these N samples are taken randomly
however with replacement. These samples are training set for growth of tree. If m < M is specific.
The simplest split of this m is employed to separate the node. The value of m is constant whereas
growing the forest.
3.2 Proposed Algorithm
Input: D {Clininal data};
Output: Label {Disease Label};
Patient’s Label{Normal, Disease}
Step1: Pre-processing and data cleansing
Step2: For each instance in D, do
Find feature vector (V)
Step 3: For each V do
Data clustering using FCM and split data in two halves and classify data using SVM and RF
algorithm
Step 4: Determine the total class label
Find
True_positive (TP)
True_negative (TN)
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
17
False_positive (FP)
False_negative (FN)
Step 5: Find Performance Parameters
Step 6: Predict Disease Class as
if ( class=1) Patient=Normal State
else_if(class =0) Patient=Disease State
end for
3.3 Dataset Description
Pima Indians Diabetes Database
This study used data sets from the Pima Indians Diabetes Database of National Institute of
Diabetes [14]. This dataset consists of 768 samples with 8 numerical valued attribute where 500
are tested negative and 268 are tested positive instances.
Chronic Kidney Disease Dataset
This study used data sets from the university of California Irvine (UCI) repository. The data set
contains 400 patients, where 250 patients were positively affected by kidney disease and as many
as 150 patients do not suffer from kidney disease.
Liver Disorders Data Set
This study used data sets from the university of California Irvine (UCI) repository. This data set
contains 416 liver patient records and 167 non liver patient records collected from North East of
Andhra Pradesh, India.
3.4 Performance Measures
In this study, we used three performance measures: Precision, Accuracy, Recall, F_measure and
Total execution time.
Accuracy is termed as ratio of the number of correctly classified instances to the total number of
instances.
Accuracy = (TP+TN)/(TP+TN+FP+FN)
Precision is the ration of actually true predicted instances out of the total true instances.
Precision = TP/(TP+FP)
Recall is the ratio of actual true instances out of all the items which are true.
Recall = TP/(TP+FN)
F-measure is the harmonic mean of both precision and recall.
F_Measure = 2*(Precision*Recall)/(Precision + Recall)
8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
18
Where TP, TN, FP and FN denote true positives, true negatives, false positives and false
negatives respectively.
4. RESULT ANALYSIS
Below Table 2 and 3 as well as Figure 3 shows the comparative analysis of proposed algorithm
with some existing algorithms.
Table 2: Result Analysis of Diabetes Disease Detection
Diabetes Disease Detection
Recall Precision Accuracy F_measure
1 0.9821 0.9935 0.991
Table 3: Comparative Result Analysis of Diabetes Disease Detection
Accuracy Measurement
Existing Work [14] 90.43%
Proposed Work 99.35%
Figure 3: Comparative Chart of Diabetes Disease Detection
Below Table 4, 5 and 6 shows the parameter values for different diseases such as diabetes disease,
kidney disease as well as liver disease. Similarly figure 4 and 5 shows the corresponding
parametric chart of different disease detection using proposed algorithm.
Table 4: Result Analysis of Kidney Disease Detection
Kidney Disease Detection
Recall Precision Accuracy F_measure
1 0.9875 0.9937 0.9937
Table 5: Result Analysis of Liver Disease Detection
Liver Disease Detection
Recall Precision Accuracy F_measure
0.9667 1 0.9914 0.9831
9. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
19
Figure 4: Parameter Comparison Chart of Different Disease Detection
Figure 5: Time Comparison Chart of Different Disease Detection
5. CONCLUSION
This research paper is mainly focused to predict disease possibility using data mining or machine
learning approach in order to enhance the accuracy or precision of the disease detection expert
system. This paper also shows the related work study of different approaches such as neural
network, naïve bayes, SVM, KNN, FCN, etc and it is concluded that SVM gives the best
performance as compared to the other existing techniques. As a result of study the proposed
algorithm is designed using SVM and RF algorithm and the experimental result shows the
accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively. In future
using data mining approach a new optimized intelligent system can be designed which can give
accurate and efficient result.
10. International Journal of Computer Science & Information Technology (IJCSIT) Vol 10, No 1, February 2018
20
REFERENCES
[1] Nidhi Bhatla, Kiran Jyoti, “An Analysis of Heart Disease Prediction using Different Data Mining
Techniques”,IJERT,Vol 1, Issue 8, 2012.
[2] Syed Umar Amin, Kavita Agarwal, Rizwan Beg, “Genetic Neural Network based Data Mining in
Prediction of Heart Disease using Risk Factors”, IEEE, 2013.
[3] A H Chen, S Y Huang, P S Hong, C H Cheng, E J Lin, “HDPS: Heart Disease Prediction System”,
IEEE, 2011.
[4] M. Akhil Jabbar, B. L Deekshatulu, Priti Chandra, “Heart Disease Prediction using Lazy Associative
Classification”, IEEE, 2013.
[5] Chaitrali S. Dangare, Sulabha S. Apte, “Improved Study of Heart Disease Prediction System using
Data Mining Classification Techniques”, IJCA, Volume 47– No.10, June 2012.
[6] P. Bhandari, S. Yadav, S. Mote, D.Rankhambe, “Predictive System for Medical Diagnosis with
Expertise Analysis”, IJESC, Vol. 6, pp. 4652-4656, 2016.
[7] Nishara Banu, Gomathy, “Disease Forecasting System using Data Mining Methods”, IEEE Transaction
on Intelligent Computing Applications, 2014.
[8] A. Iyer, S. Jeyalatha and R. Sumbaly, “Diagnosis of Diabetes using Classification Mining Techniques”,
IJDKP, Vol. 5, pp. 1-14, 2015.
[9] Sadiyah Noor Novita Alfisahrin and Teddy Mantoro, “Data Mining Techniques for Optimatization of
Liver Disease Classification”, International Conference on Advanced Computer Science Applications
and Technologies, IEEE, pp. 379-384, 2013.
[10] A. Naik and L. Samant, “Correlation Review of Classification Algorithm using Data Mining Tool:
WEKA, Rapidminer , Tanagra ,Orange and Knime”, ELSEVIER, Vol. 85, pp. 662-668, 2016.
[11] Uma Ojha and Savita Goel, “A study on prediction of breast cancer recurrence using data mining
techniques”, International Conference on Cloud Computing, Data Science & Engineering, IEEE, 2017.
[12] Naganna Chetty, Kunwar Singh Vaisla, Nagamma Patil, “An Improved Method for Disease Prediction
using Fuzzy Approach”, International Conference on Advances in Computing and Communication
Engineering, IEEE, pp. 568-572, 2015.
[13] Kumari Deepika and Dr. S. Seema, “Predictive Analytics to Prevent and Control Chronic Diseases”,
International Conference on Applied and Theoretical Computing and Communication Technology,
IEEE, pp. 381-386, 2016.
[14] Emrana Kabir Hashi, Md. Shahid Uz Zaman and Md. Rokibul Hasan, “An Expert Clinical Decision
Support System to Predict Disease Using Classification Techniques”, IEEE, 2017.