Diabetes Mellitus is one of the growing fatal diseases all over the world. It leads to complications that include heart disease, stroke, and nerve disease, kidney damage. So, Medical Professionals want a reliable prediction system to diagnose Diabetes. To predict the diabetes at earlier stage, different machine learning techniques are useful for examining the data from different sources and valuable knowledge is synopsized. So, mining the diabetes data in an efficient way is a crucial concern. In this project, a medical dataset has been accomplished to predict the diabetes. The R-Studio and Pypark software was employed as a statistical computing tool for diagnosing diabetes. The PIMA Indian database was acquired from UCI repository will be used for analysis. The dataset was studied and analyzed to build an effective model that predicts and diagnoses the diabetes disease earlier.
A Survey on Heart Disease Prediction Techniquesijtsrd
Heart disease is the main reason for a huge number of deaths in the world over the last few decades and has evolved as the most life threatening disease. The health care industry is found to be rich in information. So, there is a need to discover hidden patterns and trends in them. For this purpose, data mining techniques can be applied to extract the knowledge from the large sets of data. Many researchers, in recent times have been using several machine learning techniques for predicting the heart related diseases as it can predict the disease effectively. Even though a machine learning technique proves to be effective in assisting the decision makers, still there is a scope for developing an accurate and efficient system to diagnose and predict the heart diseases thereby helping doctors with ease of work. This paper presents a survey of various techniques used for predicting heart disease and reviews their performance. G. Niranjana | Dr I. Elizabeth Shanthi "A Survey on Heart Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38349.pdf Paper Url: https://www.ijtsrd.com/computer-science/other/38349/a-survey-on-heart-disease-prediction-techniques/g-niranjana
Smart Health Prediction Using Data Mining.Data mining is a new powerful technology which is of high interest in computer world. It is a sub field of computer science that uses already existing data in different databases to transform it into new researches and results. It makes use of Artificial Intelligence, machine learning and database management to extract new patterns from large data sets and the knowledge associated with these patterns. The actual task is to extract data by automatic or semi-automatic means. The different parameters included in data mining includes clustering, forecasting, path analysis and predictive analysis.
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
A Survey on Heart Disease Prediction Techniquesijtsrd
Heart disease is the main reason for a huge number of deaths in the world over the last few decades and has evolved as the most life threatening disease. The health care industry is found to be rich in information. So, there is a need to discover hidden patterns and trends in them. For this purpose, data mining techniques can be applied to extract the knowledge from the large sets of data. Many researchers, in recent times have been using several machine learning techniques for predicting the heart related diseases as it can predict the disease effectively. Even though a machine learning technique proves to be effective in assisting the decision makers, still there is a scope for developing an accurate and efficient system to diagnose and predict the heart diseases thereby helping doctors with ease of work. This paper presents a survey of various techniques used for predicting heart disease and reviews their performance. G. Niranjana | Dr I. Elizabeth Shanthi "A Survey on Heart Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38349.pdf Paper Url: https://www.ijtsrd.com/computer-science/other/38349/a-survey-on-heart-disease-prediction-techniques/g-niranjana
Smart Health Prediction Using Data Mining.Data mining is a new powerful technology which is of high interest in computer world. It is a sub field of computer science that uses already existing data in different databases to transform it into new researches and results. It makes use of Artificial Intelligence, machine learning and database management to extract new patterns from large data sets and the knowledge associated with these patterns. The actual task is to extract data by automatic or semi-automatic means. The different parameters included in data mining includes clustering, forecasting, path analysis and predictive analysis.
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
Psdot 14 using data mining techniques in heartZTech Proje
FINAL YEAR IEEE PROJECTS,
EMBEDDED SYSTEMS PROJECTS,
ENGINEERING PROJECTS,
MCA PROJECTS,
ROBOTICS PROJECTS,
ARM PIC BASED PROJECTS, MICRO CONTROLLER PROJECTS Z Technologies, Chennai
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
prediction of heart disease using machine learning algorithmsINFOGAIN PUBLICATION
The successful experiment of data mining in highly visible fields like marketing, e-business, and retail has led to its application in other sectors and industries. Healthcare is being discovered among these areas. There is an opulence of data available within the healthcare systems. However, there is a scarcity of useful analysis tool to find hidden relationships in data. This research intends to provide a detailed description of Naïve Bayes and decision tree classifier that are applied in our research particularly in the prediction of Heart Disease. Some experiment has been conducted to compare the execution of predictive data mining technique on the same dataset, and the consequence reveals that Decision Tree outperforms over Bayesian classification.
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical
research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more
number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well
as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Prediction of Heart Disease using Machine Learning Algorithms: A Surveyrahulmonikasharma
According to recent survey by WHO organisation 17.5 million people dead each year. It will increase to 75 million in the year 2030[1].Medical professionals working in the field of heart disease have their own limitation, they can predict chance of heart attack up to 67% accuracy[2], with the current epidemic scenario doctors need a support system for more accurate prediction of heart disease. Machine learning algorithm and deep learning opens new door opportunities for precise predication of heart attack. Paper provideslot information about state of art methods in Machine learning and deep learning. An analytical comparison has been provided to help new researches’ working in this field.
Framework for efficient transformation for complex medical data for improving...IJECEIAES
The adoption of various technological advancement has been already adopted in the area of healthcare sector. This adoption facilitates involuntary generation of medical data that can be autonomously programmed to be forwarded to a destined hub in the form of cloud storage units. However, owing to such technologies there is massive formation of complex medical data that significantly acts as an overhead towards performing analytical operation as well as unwanted storage utilization. Therefore, the proposed system implements a novel transformation technique that is capable of using a template based stucture over cloud for generating structured data from highly unstructured data in a non-conventional manner. The contribution of the propsoed methodology is that it offers faster processing and storage optimization. The study outcome also proves this fact to show propsoed scheme excels better in performance in contrast to existing data transformation scheme.
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...Editor IJCATR
Data mining refers to extracting knowledge from large amount of data. Real life data mining approaches are
interesting because they often present a different se
t of problems for
diabetic
patient’s
data
.
The
research area to solve
various problems and classification is one of main problem in the field. The research describes algorithmic discussion of J48
,
J48 Graft, Random tree, REP, LAD. Here used to compare the
performance of computing time, correctly classified
instances, kappa statistics, MAE, RMSE, RAE, RRSE and
to find the error rate measurement for different classifiers in
weka .In this paper the
data
classification is diabetic patients data set is develope
d by collecting data from hospital repository
consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine t
ests.
Weka tool is used to classify the data is evaluated using 10 fold cross validat
ion and the results are compared. When the
performance of algorithms
,
we found J48 is better algorithm in most of the cases
An efficient stacking based NSGA-II approach for predicting type 2 diabetesIJECEIAES
Diabetes has been acknowledged as a well-known risk factor for renal and cardiovascular disorders, cardiac stroke and leads to a lot of morbidity in the society. Reducing the disease prevalence in the community will provide substantial benefits to the community and lessen the burden on the public health care system. So far, to detect the disease innumerable data mining approaches have been used. These days, incorporation of machine learning is conducive for the construction of a faster, accurate and reliable model. Several methods based on ensemble classifiers are being used by researchers for the prediction of diabetes. The proposed framework of prediction of diabetes mellitus employs an approach called stacking based ensemble using non-dominated sorting genetic algorithm (NSGA-II) scheme. The primary objective of the work is to develop a more accurate prediction model that reduces the lead time i.e., the time between the onset of diabetes and clinical diagnosis. Proposed NSGA-II stacking approach has been compared with Boosting, Bagging, Random Forest and Random Subspace method. The performance of Stacking approach has eclipsed the other conventional ensemble methods. It has been noted that k-nearest neighbors (KNN) gives a better performance over decision tree as a stacking combiner.
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEMEPublication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
Psdot 14 using data mining techniques in heartZTech Proje
FINAL YEAR IEEE PROJECTS,
EMBEDDED SYSTEMS PROJECTS,
ENGINEERING PROJECTS,
MCA PROJECTS,
ROBOTICS PROJECTS,
ARM PIC BASED PROJECTS, MICRO CONTROLLER PROJECTS Z Technologies, Chennai
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
prediction of heart disease using machine learning algorithmsINFOGAIN PUBLICATION
The successful experiment of data mining in highly visible fields like marketing, e-business, and retail has led to its application in other sectors and industries. Healthcare is being discovered among these areas. There is an opulence of data available within the healthcare systems. However, there is a scarcity of useful analysis tool to find hidden relationships in data. This research intends to provide a detailed description of Naïve Bayes and decision tree classifier that are applied in our research particularly in the prediction of Heart Disease. Some experiment has been conducted to compare the execution of predictive data mining technique on the same dataset, and the consequence reveals that Decision Tree outperforms over Bayesian classification.
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical
research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more
number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well
as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Prediction of Heart Disease using Machine Learning Algorithms: A Surveyrahulmonikasharma
According to recent survey by WHO organisation 17.5 million people dead each year. It will increase to 75 million in the year 2030[1].Medical professionals working in the field of heart disease have their own limitation, they can predict chance of heart attack up to 67% accuracy[2], with the current epidemic scenario doctors need a support system for more accurate prediction of heart disease. Machine learning algorithm and deep learning opens new door opportunities for precise predication of heart attack. Paper provideslot information about state of art methods in Machine learning and deep learning. An analytical comparison has been provided to help new researches’ working in this field.
Framework for efficient transformation for complex medical data for improving...IJECEIAES
The adoption of various technological advancement has been already adopted in the area of healthcare sector. This adoption facilitates involuntary generation of medical data that can be autonomously programmed to be forwarded to a destined hub in the form of cloud storage units. However, owing to such technologies there is massive formation of complex medical data that significantly acts as an overhead towards performing analytical operation as well as unwanted storage utilization. Therefore, the proposed system implements a novel transformation technique that is capable of using a template based stucture over cloud for generating structured data from highly unstructured data in a non-conventional manner. The contribution of the propsoed methodology is that it offers faster processing and storage optimization. The study outcome also proves this fact to show propsoed scheme excels better in performance in contrast to existing data transformation scheme.
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...Editor IJCATR
Data mining refers to extracting knowledge from large amount of data. Real life data mining approaches are
interesting because they often present a different se
t of problems for
diabetic
patient’s
data
.
The
research area to solve
various problems and classification is one of main problem in the field. The research describes algorithmic discussion of J48
,
J48 Graft, Random tree, REP, LAD. Here used to compare the
performance of computing time, correctly classified
instances, kappa statistics, MAE, RMSE, RAE, RRSE and
to find the error rate measurement for different classifiers in
weka .In this paper the
data
classification is diabetic patients data set is develope
d by collecting data from hospital repository
consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine t
ests.
Weka tool is used to classify the data is evaluated using 10 fold cross validat
ion and the results are compared. When the
performance of algorithms
,
we found J48 is better algorithm in most of the cases
An efficient stacking based NSGA-II approach for predicting type 2 diabetesIJECEIAES
Diabetes has been acknowledged as a well-known risk factor for renal and cardiovascular disorders, cardiac stroke and leads to a lot of morbidity in the society. Reducing the disease prevalence in the community will provide substantial benefits to the community and lessen the burden on the public health care system. So far, to detect the disease innumerable data mining approaches have been used. These days, incorporation of machine learning is conducive for the construction of a faster, accurate and reliable model. Several methods based on ensemble classifiers are being used by researchers for the prediction of diabetes. The proposed framework of prediction of diabetes mellitus employs an approach called stacking based ensemble using non-dominated sorting genetic algorithm (NSGA-II) scheme. The primary objective of the work is to develop a more accurate prediction model that reduces the lead time i.e., the time between the onset of diabetes and clinical diagnosis. Proposed NSGA-II stacking approach has been compared with Boosting, Bagging, Random Forest and Random Subspace method. The performance of Stacking approach has eclipsed the other conventional ensemble methods. It has been noted that k-nearest neighbors (KNN) gives a better performance over decision tree as a stacking combiner.
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEMEPublication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEME Publication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
Standardization and wider use of Electronic Health records (EHR) creates opportunities for
better understanding patterns of illness and care within and across medical systems. In the healthcare
systems, hidden event signatures allow taking decision for patient’s diagnosis, prognosis, and
management. Temporal history of event codes embedded in patients' records, investigates frequently
occurring sequences of event codes across patients. There is a framework that enables the
representation, retrieval, and mining of high order latent event structure and relationships within
single and multiple event sequences. There is a wealth of hidden information present in the large
databases. Different data mining techniques can be used for retrieving data. A classifier approach for
detection of diabetes is presented in this paper and shows how Naive Bayes can be used for
classification purpose. In this system, medical data is categories into five categories namely low,
average, high and very high and critical, treatment is given as per the predicted category. The system
will predict the class label of unknown sample. Hence two basic functions namely classification
(training) and prediction (testing) will be performed. An algorithm and database used affects the
accuracy of the system. It can answer complex queries for diagnosing diabetes disease and thus assist
healthcare practitioners to make intelligent clinical decisions which traditional decision support
systems cannot.Over the last decade, so many information visualization techniques have been
developed to support the exploration of large data sets. There are various interactive visual data
mining tools available for visual data analysis. It is possible to perform clinical assessment for visual
interactive knowledge discovery in large electronic health record databases. In this paper, we
proposed that it is possible to develop a tool for data visualization for interactive knowledge
discovery.
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
Two approaches to building models for prediction of the onset of Type diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Supervised were compared with decision trees and unsupervised of both types of classifiers. In this study, the system and the test most likely to confirm a diagnosis based on the pre-test probability computed from the patient's information including symptoms and the results of previous tests. If the patient's disease post-test probability is higher than the treatment threshold, a diagnostic decision will be made, and vice versa. Otherwise, the patient needs more tests to help make a decision. The system will then recommend the next optimal test and repeat the same process. In this thesis find out which approach is better on diabetes dataset in weka framework. Also use feature selection techniques which reduce the features and complexities of process
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
As we know that health care industry is completely based on assumptions, which after get tested and verified via various tests and patient have to be depend on the doctors knowledge on that topic . so we made a system that uses data mining techniques to predict the health of a person based on various medical test results. so we can predict the health of that person based on that analysis performed by the system.The system currently design only for heart issues, for that we had used Statlog (Heart) Data Set from UCI Machine Learning Repository it includes attributes like age, sex, chest pain type, cholesterol, sugar, outcomes,etc.for training the system. we only need to passed few general inputs in order to generate the prediction and the prediction results from all algorithms are they merged together by calculating there mean value that value shows the actual outcome of the prediction process which entirely works in background
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONIJDKP
Developing predictive modelling solutions for risk estimation is extremely challenging in health-care
informatics. Risk estimation involves integration of heterogeneous clinical sources having different
representation from different health-care provider making the task increasingly complex. Such sources are
typically voluminous, diverse, and significantly change over the time. Therefore, distributed and parallel
computing tools collectively termed big data tools are in need which can synthesize and assist the physician
to make right clinical decisions. In this work we propose multi-model predictive architecture, a novel
approach for combining the predictive ability of multiple models for better prediction accuracy. We
demonstrate the effectiveness and efficiency of the proposed work on data from Framingham Heart study.
Results show that the proposed multi-model predictive architecture is able to provide better accuracy than
best model approach. By modelling the error of predictive models we are able to choose sub set of models
which yields accurate results. More information was modelled into system by multi-level mining which has
resulted in enhanced predictive accuracy.
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
Super Capacitor Electronic Circuit Design for Wireless ChargingIJAAS Team
Keeping time as base, a gadget has been proposed, where electrical accessories like Mobiles are charged within a fraction of minutes which is highly efficient and time saver as compared to the present time chargers which take nearly two hours to get fully charged. Objective of this project is to create a circuit which will be charged quickly and wireless. Wireless charging circuit works on the principle of inductive coupling. AC energy has been converted to DC energy through diode rectifier. Oscillator circuit produces high frequency passed by transmitter circuit to transmit magnetic field which is received by receiver circuit. A wireless charging concept with super capacitor will lead to faster charging and long operative life. Here super capacitor is used as a storage device. A Super capacitor has magnificent property, it can charge as well as discharge very quickly and linearly alike battery. The main difference between battery and super capacitor is specific energy, Super capacitor have 10-50 time less than battery.
On the High Dimentional Information Processing in Quaternionic Domain and its...IJAAS Team
There are various high dimensional engineering and scientific applications in communication, control, robotics, computer vision, biometrics, etc.; where researchers are facing problem to design an intelligent and robust neural system which can process higher dimensional information efficiently. The conventional real-valued neural networks are tried to solve the problem associated with high dimensional parameters, but the required network structure possesses high complexity and are very time consuming and weak to noise. These networks are also not able to learn magnitude and phase values simultaneously in space. The quaternion is the number, which possesses the magnitude in all four directions and phase information is embedded within it. This paper presents a well generalized learning machine with a quaternionic domain neural network that can finely process magnitude and phase information of high dimension data without any hassle. The learning and generalization capability of the proposed learning machine is presented through a wide spectrum of simulations which demonstrate the significance of the work.
Using FPGA Design and HIL Algorithm Simulation to Control Visual ServoingIJAAS Team
This is a novel research paper provides an optimal solution for object tracking using visual servoing control system with programmable gate array technology to realize the visual controller. The controller takes in account the robot dynamics to generate the joint torques directly for performing the tasks related to object tracking using visual servoing. Also, the notion of dynamic perceptibility provides the capability of the designed system to track desired objects employing direct visual servoing technique. This idea is assimilated in the suggested controller and realized in the programmable gate array. Additionally, this paper grants an ideal control framework for direct visual servoing robots that incorporates dynamic perceptibility features. With the aim of evaluating the proposed FPGA based architecture, the control algorithm is applied to Hardware-in-the-loop simulation (HIL) set up of three degrees of freedom rigid robotic manipulator with three links. Furthermore, different investigations are performed to demonstrate the behavior of the proposed system when a trajectory adjacent to a singularity is attained.
Mitigation of Selfish Node Attacks In Autoconfiguration of MANETsIJAAS Team
Mobile ad-hoc networks (MANETs) are composed of mobile nodes connected by wireless links without using any pre-existent infrastructure. Hence the assigning of unique IP address to the incoming node becomes difficult. There are various dynamic auto configuration protocols available to assign IP address to the incoming nodes including grid based protocol which assigns IP address with less delay and low protocol overhead. Such protocols get affected by presence of either selfish nodes or malicious nodes. Moreover there is no centralized approach to defend against these threats like in wired network such as firewall, intrusion detection system, proxy etc. The selfish nodes are the nodes which receive packet destined to it and drop packet destined to other nodes in order to save its energy and resources. This behavior of nodes affects normal functioning of auto configuration protocol. Many algorithms are available to isolate selfish nodes but they do not deal with presence of false alarm and protocol overhead. And also there are certain algorithms which use complex formulae and tedious mathematical calculations. The proposed algorithm in this paper helps to overcome the attack of selfish nodes effect in an efficient and scalable address auto configuration protocol that automatically configures a network by assigning unique IP addresses to all nodes with a very low protocol overhead, minimal address acquisition delay and computational overhead.
Vision Based Approach to Sign Language RecognitionIJAAS Team
We propose an algorithm for automatically recognizing some certain amount of gestures from hand movements to help deaf and dumb and hard hearing people. Hand gesture recognition is quite a challenging problem in its form. We have considered a fixed set of manual commands and a specific environment, and develop a effective, procedure for gesture recognition. Our approach contains steps for segmenting the hand region, locating the fingers, and finally classifying the gesture which in general terms means detecting, tracking and recognising. The algorithm is non-changing to rotations, translations and scale of the hand. We will be demonstrating the effectiveness of the technique on real imagery.
Design and Analysis of an Improved Nucleotide Sequences Compression Algorithm...IJAAS Team
DNA (deoxyribonucleic acid), is the hereditary material in humans and almost all other organisms. Nearly every cell in a person’s body has the same DNA. The information in DNA is stored as a code made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). With continuous technology development and growth of sequencing data, large amount of biological data is generated. This large amount of generated data causes difficulty to store, analyse and process DNA sequences. So there is a wide need of reducing the size, for this reason, DNA Compression is employed to reduce the size of DNA sequence. Therefore there is a huge need of compressing the DNA sequence. In this paper, we have proposed an efficient and fast DNA sequence compression algorithm based on differential direct coding and variable look up table (LUT).
Review: Dual Band Microstrip Antennas for Wireless ApplicationsIJAAS Team
In this manuscript, a review of dual band microstrip antennas for wireless communication is presented. This review manuscript discusses regarding the geometric structures, different methods of analysis for antenna characteristics, and different types of wireless applications.
Building Fault Tolerance Within Wsn-A Topology ModelIJAAS Team
Wireless Sensor network plays a crucial role which helps in visualizing, processing, and analyzing the information wirelessly. WSN is a network which consists of huge amount of sensor devices which are of low cost and low powered also known as sensor nodes. These type of networks are generally used in real time applications such as monitoring of environmental conditions, militaries, industries etc., .but the problem that exists in WSN is may be due to different failures such as node failure, link failure, sink failure, interference, power dissipation and collision. If these faults are unable to handle then the desired network criteria’s may not be reached properly which results in inefficiency of the network. So, the main idea behind the investigation is to form a different networking topology which works in the event of failure.
Simplified Space Vector Pulse Width Modulation Based on Switching Schemes wit...IJAAS Team
This paper presents a simplified control strategy of SVPWM with a three segment switching sequence and 7 segment switch frequency for high power multilevel inverter. In the proposed method, the inverter switching sequences are optimized for minimization of device switching sequence frequency and improvement of harmonic spectrum by using the three most derived switching states and one suitable redundant state for each space vector. The proposed 3-segment sequence is compared with conventional 7-segment sequence similar for five level Cascaded H-Bridge inverter with various values of switching frequencies including very low frequency. The output spectrum of the proposed sequence design shows the reduction of device switching frequency and states current and line voltage. THD this minimizing the filter size requirement of the inverter, employed in industrial applications. Where sinusoidal output voltage is required.
An Examination of the Impact of Power Sector Reform on Manufacturing and Serv...IJAAS Team
The main objective of this study is to empirically examine the impact of Power Sector Reform on Manufacturing and Services Sector in Nigeria between 1999-2016. The study employed secondary annual time series data sourced from World Bank database (2016). The methodology adopted for the study was Augmented Dickey-Fuller (ADF); a test for long-run relationship using ARDL Bounds Testing approach with analysis of long-run and shortrun dynamics in the model. A striking revelation from the study is the inverse relationship that exists between manufacturing output and electricity consumption in Nigeria within the period referenced. This negative relationship is not unconnected with widespread allegation of misappropriation of budgeted funds for the Power Sector by successive administrations in Nigeria since 1999. It must be stated in clear terms that constant and consistent electricity generation, transmission and distribution is sine-qua-none for the growth of the national economy. Virtually all sectors of the economy depend on the supply of electricity to do business and so the lack of this vital ingredient of growth contributes in no small measure in stagnating economic growth and development. Efforts at reforming the power sector can only be fruitful when ALL stakeholders in the power sector including the political class put away their personal agendas and take the bull by the horn towards rescuing the nation from the looming danger of stagnant economic growth. Furthermore, there is the need for the Nigerian government to come up with new, better and alternative ways of improving energy generation and supply, as well as proper maintenance of electricity infrastructure in the country.
A Novel Handoff Necessity Estimation Approach Based on Travelling DistanceIJAAS Team
Mobility management is one of the most important challenges in Next
Generation Wireless Networks (NGWNs) as it enables users to move across
geographic boundaries of wireless networks. Nowadays, mobile
communications have heterogeneous wireless networks offering variable
coverage and Quality of Service (QoS). The availability of alternatives
generates a problem of occurrence of unnecessary handoff that results in
wastage of network resources. To avoid this, an efficient algorithm needs to
be developed to minimize the unnecessary handoffs. Conventionally,
whenever Wireless Local Area Network (WLAN) connectivity is available,
the mobile node switch from cellular network to wireless local area network
to gain maximum use of high bandwidth and low cost of wireless local area
network as much as possible. But to maintain call quality and minimum
number of call failure, a considerable proportion of these handovers should
be determined. Our algorithm makes the handoff to wireless local area
network only when the Predicted Received Signal Strength (PRSS) falls
below a threshold value and travelling distance inside the wireless local area
network is larger than a threshold distance.Through MATLAB simulation,
we show that our algorithm is able to improve handover performance.
The Combination of Steganography and Cryptography for Medical Image ApplicationsIJAAS Team
To give more security for the biomedical images for the patient betterment as well privacy for the patient highly confidently patient image report can be placed in database. If unknown persons like hospital staffs, relatives and third parties like intruder trying to see the report it has in the form of hidden state in another image. The patient detail like MRI image has been converted into any form of steganography. Then, encrypt those image by using proposed cryptography algorithm and place in the database.
Physical Properties and Compressive Strength of Zinc Oxide Nanopowder as a Po...IJAAS Team
In this study, the application of nanotechnology was applied in the dentistry field, especially in the innovation of dental amalgam material. To date, mercury (Hg) has been used widely as dental amalgam material with consideration of the cheap price, ease of use, and good mechanical strength. However, last few years, many problems have been faced in the dentistry field due to the use of mercury. Hence, new material is needed as an innovation to eliminate the mercury from dental amalgam composition. This research was conducted to analyze the physical properties and compressive strength of zinc oxide (ZnO) nanopowder as a potential dental amalgam material. The physical properties such as morphology and dimensions were analyzed by SEM and XRD. Further, the compression test was conducted by using hydraulic press machine. The results showed that the ZnO nanopowder analyzed has the particle size of 14.34 nm with the morphology classified as nanorods type. On the compression load of 500 kg, the average of ZnO green density is 3.170 g/cm3. This value experienced the increase of 4.763% when the load was set to 1000 kg, and 7.539% at 2000 kg. The dwelling time also took the same effect. At 30 seconds, the average of ZnO green density is 3.260 g/cm3. This value experienced the increase of 0.583% at 60 seconds and 3.098% at 90 seconds.
Experimental and Modeling Dynamic Study of the Indirect Solar Water Heater: A...IJAAS Team
The Indirect Solar Water Heater System (SWHS) with Forced Circulation is modeled by proposing a theoretical dynamic multi-node model. The SWHS, which works with a 1,91 m2 PFC and 300 L storage tank, and it is equipped with available forced circulation scale system fitted with an automated subsystem that controlled hot water, is what the experimental setup consisted of. The system, which 100% heated water by only using solar energy. The experimental weather conditions are measured every one minute. The experiments validation steps were performed for two periods, the first one concern the cloudy days in December, the second for the sunny days in May; the average deviations between the predicted and the experimental values is 2 %, 5 % for the water temperature output and for the useful energy are 4 %, 9 % respectively for the both typical days, which is very satisfied. The thermal efficiency was determined experimentally and theoretically and shown to agree well with the EN12975 standard for the flow rate between 0,02 kg/s and 0,2kg/s.
SLIC Superpixel Based Self Organizing Maps Algorithm for Segmentation of Micr...IJAAS Team
We can find the simultaneous monitoring of thousands of genes in parallel Microarray technology. As per these measurements, microarray technology have proven powerful in gene expression profiling for discovering new types of diseases and for predicting the type of a disease. Gridding, Intensity extraction, Enhancement and Segmentation are important steps in microarray image analysis. This paper gives simple linear iterative clustering (SLIC) based self organizing maps (SOM) algorithm for segmentation of microarray image. The clusters of pixels which share similar features are called Superpixels, thus they can be used as mid-level units to decrease the computational cost in many vision applications. The proposed algorithm utilizes superpixels as clustering objects instead of pixels. The qualitative and quantitative analysis shows that the proposed method produces better segmentation quality than k-means, fuzzy cmeans and self organizing maps clustering methods.
An Improved Greedy Parameter Stateless Routing in Vehicular Ad Hoc NetworkIJAAS Team
Congestion problem and packet delivery related issues in the vehicular ad hoc network environment is a widely researched problem in recent years. Many network designers utilize various algorithms for the design of ad hoc networks and compare their results with the existing approaches. The design of efficient network protocol is a major challenge in vehicular ad hoc network which utilizes the value of GPS and other parameters associated with the vehicles. In this paper GPSR protocol is improved and compared with the existing GPSR protocol and AODV protocol on the basis of various performance parameters like throughput of the network, delay and packet delivery ratio. The results also validate the performance of the proposed approach.
A Novel CAZAC Sequence Based Timing Synchronization Scheme for OFDM SystemIJAAS Team
Several classical timing synchronization schemes have been proposed for the timing synchronization in OFDM systems based on the correlation between identical parts of OFDM symbol. These schemes show poor performance due to the presence of plateau and significant side lobe. In this paper we present a timing synchronization schemes with timing metric based on a Constant Amplitude Zero Auto Correlation (CAZAC) sequence. The performance of the proposed timing synchronization scheme is better than the classical techniques.
Workload Aware Incremental Repartitioning of NoSQL for Online Transactional P...IJAAS Team
Numerous applications are deployed on the web with the increasing popularity of internet. The applications include, 1) Banking applications, 2) Gaming applications, 3) E-commerce web applications. Different applications reply on OLTP (Online Transaction Processing) systems. OLTP systems need to be scalable and require fast response. Today modern web applications generate huge amount of the data which one particular machine and Relational databases cannot handle. The E-Commerce applications are facing the challenge of improving the scalability of the system. Data partitioning technique is used to improve the scalability of the system. The data is distributed among the different machines which results in increasing number of transactions. The work-load aware incremental repartitioning approach is used to balance the load among the partitions and to reduce the number of transactions that are distributed in nature. Hyper Graph Representation technique is used to represent the entire transactional workload in graph form. In this technique, frequently used items are collected and Grouped by using Fuzzy C-means Clustering Algorithm. Tuple Classification and Migration Algorithm is used for mapping clusters to partitions and after that tuples are migrated efficiently.
A Fusion Based Visibility Enhancement of Single Underwater Hazy ImageIJAAS Team
Underwater images are prone to contrast loss, limited visibility, and undesirable color cast. For underwater computer vision and pattern recognition algorithms, these images need to be pre-processed. We have addressed a novel solution to this problem by proposing fully automated underwater image dehazing using multimodal DWT fusion. Inputs for the combinational image fusion scheme are derived from Singular Value Decomposition (SVD) and Discrete Wavelet Transform (DWT) for contrast enhancement in HSV color space and color constancy using Shades of Gray algorithm respectively. To appraise the work conducted, the visual and quantitative analysis is performed. The restored images demonstrate improved contrast and effective enhancement in overall image quality and visibility. The proposed algorithm performs on par with the recent underwater dehazing techniques.
Graph Based Workload Driven Partitioning System by Using MongoDBIJAAS Team
The web applications and websites of the enterprises are accessed by a huge number of users with the expectation of reliability and high availability. Social networking sites are generating the data exponentially large amount of data. It is a challenging task to store data efficiently. SQL and NoSQL are mostly used to store data. As RDBMS cannot handle the unstructured data and huge volume of data, so NoSQL is better choice for web applications. Graph database is one of the efficient ways to store data in NoSQL. Graph database allows us to store data in the form of relation. In Graph representation each tuple is represented by node and the relationship is represented by edge. But, to handle the exponentially growth of data into a single server might decrease the performance and increases the response time. Data partitioning is a good choice to maintain a moderate performance even the workload increases. There are many data partitioning techniques like Range, Hash and Round robin but they are not efficient for the small transactions that access a less number of tuples. NoSQL data stores provide scalability and availability by using various partitioning methods. To access the Scalability, Graph partitioning is an efficient way that can be easily represent and process that data. To balance the load data are partitioned horizontally and allocate data across the geographical available data stores. If the partitions are not formed properly result becomes expensive distributed transactions in terms of response time. So the partitioning of the tuple should be based on relation. In proposed system, Schism technique is used for partitioning the Graph. Schism is a workload aware graph partitioning technique. After partitioning the related tuples should come into a single partition. The individual node from the graph is mapped to the unique partition. The overall aim of Graph partitioning is to maintain nodes onto different distributed partition so that related data come onto the same cluster.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
When stars align: studies in data quality, knowledge graphs, and machine lear...
Disease prediction in big data healthcare using extended convolutional neural network techniques
1. International Journal of Advances in Applied Sciences (IJAAS)
Vol. 9, No. 2, June 2020, pp. 85~92
ISSN: 2252-8814, DOI: 10.11591/ijaas.v9.i2.pp85-92 85
Journal homepage: http://ijaas.iaescore.com
Disease prediction in big data healthcare using extended
convolutional neural network techniques
Asadi Srinivasulu, Asadi Pushpa
Data Analytics Research Laboratory, Sree Vidyanikethan Engineering College, India
Article Info ABSTRACT
Article history:
Received May 3, 2019
Revised Feb 2, 2020
Accepted Mar 14, 2020
Diabetes Mellitus is one of the growing fatal diseases all over the world.
It leads to complications that include heart disease, stroke, and nerve disease,
kidney damage. So, Medical Professionals want a reliable prediction system
to diagnose Diabetes. To predict the diabetes at earlier stage, different
machine learning techniques are useful for examining the data from different
sources and valuable knowledge is synopsized. So, mining the diabetes data
in an efficient way is a crucial concern. In this project, a medical dataset has
been accomplished to predict the diabetes. The R-Studio and Pypark software
was employed as a statistical computing tool for diagnosing diabetes.
The PIMA Indian database was acquired from UCI repository will be used
for analysis. The dataset was studied and analyzed to build an effective
model that predicts and diagnoses the diabetes disease earlier.
Keywords:
CNN,
Diabetes,
Neural networks,
RNN
SVM, This is an open access article under the CC BY-SA license.
Corresponding Author:
Asadi Srinivasulu,
Data Analytics Research Laboratory,
Sree Vidyanikethan Engineering College,
Sree Sainath Nagar, Tirupati, Andhra Pradesh 517102, India.
Email: srinu.asadi@gmail.com
1. INTRODUCTION
As we know that the growth in technology helps the computers to produce huge amount of data.
Additionally, such advancements and innovations in the medical database management systems generate
large volumes of medical data. Healthcare industry contains very large and sensitive data. This data needs to
be treated very careful to get benefitted from it. Diabetic Mellitus is a set of associated diseases in which
the human body is unable to control the quantity of sugar in the blood. It results in high sugar levels in blood,
may be as the body does not produce sufficient insulin, or may because cells do not react to the produced
insulin. The focus is to develop the prediction models by using certain machine learning algorithms.
The Machine Learning is an application of artificial intelligence as it helps the computer to learn on its own.
The two classification of ML are supervised and unsupervised. The Supervised learning calculation utilizes
the past experience to influence expectations on new or inconspicuous information while unsupervised
calculations to can draw derivations from datasets. Machine learning algorithms are:
Supervised learning techniques:
Classification
The procedure of finding the obscure information of the class name which is utilizing recent known
information is called as class mark which is intern called as classification. The following are Popular
Classification Algorithms:
Random forest
2. ISSN: 2252-8814
Int J Adv Appl Sci, Vol. 9, No. 2, June 2020: 85 – 92
86
SVM
K-Nearest neighbors
Decision tree
Naïve Bayes
Regression
A supervised learning algorithm such as classification which finds the relationship between some
independent variables with some dependent variables isn called Regression. The popular Regression
algorithms are:
Simple Linear Regression
Multiple Linear Regression
Logistic Regression
Polynomial Regression
Linear Discriminant Analysis (LDA)
Unsupervised Learning techniques:
Clustering
The process which classifies the similar objects into groups called as clustering mechanism. Some
of the clustering techniques are:
K-means clustering
Hierarchical clustering
R studio
An Integrated Development Environment (IDE) for R programming language which was founded
by Jjallaire is called as R Studio. The command line that R Studio uses is interpreter. R studio used for
statistical computing and graphics. R Studio is having many built-in packages so it can manipulate huge
dataset for analysis.
2. LITERATURE REVIEW
The usage of big data for predicting diabetes has been conducted in many researches. Table 1
display researches in the field and also critique given for each research papers. Considering the critique and
notes of each published research, this research will propose a new model for resolving problems from
previous research.
Tabel 1. Review of related research
No. Paper Author(s) Name of the Journal Methods Findings Notes/Critique
1. Predicting
Diabetes in
Medical
Datasets Using
Machine
Learning
Techniques
Uswa Ali Zia,
Dr. Naeem
Khan.
International Journal
of Scientific &
Engineering
Research
(IJSER).
Boot
strapping
resampling
technique to
enhance the
accuracy and
then applying
i. Naive
Bayes,
ii.Decision
Trees
iii.k-Nearest
Neighbors
(k-NN)
After Bootstrapping
Accuracy:
i.NaiveBayes-
74.89%
ii.Decision Trees-
94.44%
iii. k-NN(for k=1)
93.79%
4. k-NN(for k=3) -
76.79%
i.Plan to use further
more advanced
classifiers such as
Neural Networks.
ii. It should consider
some other important
factors that are related
to gestational
diabetes, like
metabolic syndrome,
family history, habit
of smoking, lazy
routines, some dietary
patterns etc.
2. Prediction of
Diabetes Using
Data Mining
Techniques
FikirteGirma,
Woldemichael,
Sumitra
Menaria
International
Conference on
Trends in Electronics
and Informatics
(ICOEI)
i. Back
Propagation
Algorithm
ii. J48
Algorithm
iii. Naïve
Bayes
Classifier
iv. Support
Vector
Machine.
Back Propagation
Algorithm has
Accuracy-83.11%
Sensitivity- 86.53%
Specificity-76%
i. Increment the
accuracy of the
algorithms.
3. Int J Adv Appl Sci ISSN: 2252-8814
Disease prediction in big data healthcare using extended convolutional neural network… (Asadi Srinivasulu)
87
Table continued
3. Diabetes
Disease
Prediction
Using Data
Mining
Deeraj Shetty,
Kishor Rit,
Sohail Shaikh,
Nikita Patils
International
Conference on
Innovations in
Information,
Embedded and
Communication
Systems (ICIIECS).
i. Naïve
Bayes
ii. k-NN
algorithms
prediction of the
disease will be
done with the help
of Bayesian
algorithm and KNN
algorithm and
analyze them by
taking various
attributes of
diabetes.
i. Increment the
accuracy of the
algorithms.
ii. So Working on
some more attributes
which is used to
tackle the diabetes
even more.
4. Classification
of
Diabetic
Patients by
using Efficient
Prediction
from Big Data
using R Studio
K. Sharmila,
Dr. S.A. Vetha
Manickam
International Journal
of Advanced
Engineering
Research and Science
(IJAERS).
Decision tree i. Using R, the
dataset is analyzed
and the correlation
coefficient for two
attributes is
calculated.
ii. Decision Tree is
used to predict the
type of Diabetes.
Possibility of
developing efficient
predictive models
using the information
from the analysis
which is already
carried out.
5. Diagnosis of
diabetes using
Classification
mining
Techniques
AiswaryaIyer,
S. Jeyalatha
and Ronak
Sumbaly
International Journal
of Data Mining &
Knowledge
Management Process
(IJDKP).
Decision tree
Naïve Bayes.
J48 Cross
validation-74.8698
%
J48 Percentage
Split-76.9565 %
Naive Bayes-
79.5652 %
i.In future the work,
planned to be
gathering the
information from
different locales over
the world.
ii.This work can be
improved and
extended for the
automation of
diabetes analysis.
6. An Disease
Diagnosis
using Data
Mining
Techniques
And Empirical
study.
M. Deepika,
Dr. K.
Kalaiselvi
The 2nd
International
Conference on
Inventive
Communication and
Computational
Technologies
(ICICCT).
i.Artificial
Neural
Network
ii.Decision
Tree
iii. Logistic
Regression
iv. Naïve
Bayes
v. SVM
Artificial Neural
Network: 73.23%
Logistic Regression
:76.13%
Decision Tree
:77.87%
Efficient and
Accurate classifier
can be developed.
3. PROPOSED SYSTEM
We propose a classification model with boosted accuracy to predict the diabetic patient. In this
model, we have employed different machine learning techniques are using like classification, regression and
clustering. The major focus is to increase the accuracy by using resample technique on a benchmark well
renowned PIMA diabetes dataset that was acquired from UCI machine learning repository, having eight
attributes and one class label. The proposed framework is shown in Figure 1. The description of each phase is
mentioned.
3.1. Data selection
Data selection is a process in which the most relevant data is selected from a specific domain to
derive values that are informative and facilitate learning. PIMA diabetes dataset having 8 attributes that are
used to predict the diabetes at earlier stage. This dataset is obtained from UCI repository.
3.2. Data pre-processing
Data pre-processing is a Machine Learning technique that includes changing crude information into
reasonable configuration. It includes Data Cleaning, Data Integration, Data Transformation, and
Data Discretization.
3.3. Feature extraction through principle component analysis
Feature Extraction on the dataset to determine the most suitable set of attributes that can help
achieve better classification. The set of attributes suggested by the PCA are termed as feature vector.
Feature reduction or dimensionality reduction will be benefitted us by reducing the computation and
space complexity.
4. ISSN: 2252-8814
Int J Adv Appl Sci, Vol. 9, No. 2, June 2020: 85 – 92
88
3.4. Resampling Filter
The supervised Resample filter is applied to the pre-processed dataset. Re-sampling is a series of
methods used to reconstruct your sample data sets, including training sets and validation sets. In this study,
Boot strapping resampling technique to enhance the accuracy.
Figure 1. Proposed system for diabetes prediction system
4. MACHINE LEARNING TECHNIQUES
4.1. Classification
4.1.1. Random forest
The outfit learning technique used for the classification and regression that operates by constructing
the multitude of decision trees at training time and outputting the class i.e mode of the classes or
the regression of the individual trees. Irregular choice woods right for choice trees propensity which is used
for over fitting on to their preparation set.
5. Int J Adv Appl Sci ISSN: 2252-8814
Disease prediction in big data healthcare using extended convolutional neural network… (Asadi Srinivasulu)
89
4.1.2. Support vector machine (SVM)
SVM is a division of Supervised Learning Algorithm. The strategy used to perform regression,
classification and outlier detection of data.SVM will be grouping the information dependent that on the hyper
plane. The hyper plane is used to totally isolate the two classes in the best way and the most extreme edge
hyper plane ought to be picked as a best separator. The two types SVM Classifiers that are been used are
used are: Linear Classifier and Non-Linear Classifier.
4.1.3. Decision tree
The algorithm which is mainly used to produce a classification on training data and regression
model into a tree structure is called as Decision tree algorithm, it is based on previous data to classify/predict
class or target variables of future/new data with the help of decision rules or decision trees. Decision tree can
be useful for both numerical and categorical data. The tree in which the root node in each level is a starting
point or the best splitting attribute in that position which helps to test on an attribute is called as complete
decision tree. The yield of the test will create branches. Leaf hub will go about as a last class mark or target
variable to characterize/foresee the new information. Arrangement rules are attracted from root to leaf.
4.1.4. Naïve bayes
The algorithm performs classification tasks in the field of ML are called as Naïve Bayes. It can
perform classification very well on the dataset even it has huge records with multi class and binary class
classification problems. The application of Naive Bayes is mainly to text analysis and Natural Language
Processing. It works based on conditional probability. It can be represented (1).
𝑃(𝑀|𝑁) =
𝑃( 𝑀| 𝑁)𝑃(𝑀)
𝑃(𝑁)
(1)
Here M and N are two events and, P(M|N) is the conditional probability of M given N.P(M) is
the probability of M. P(N) is the probability of N. P (N|M) is the conditional probability of N given M.
4.1.5. K-nearest neighbors
The supervised classifier which is a best choice for K-NN is called as k-Nearest Neighbor. It is
a best choice for the classification of k-NN kind of problems. In order to predict the target label of a test data,
KNN which finds distance between nearest training data class labels and new test data point in the presence
of K value? KNN uses K variable value between 0 to 10 normally.
4.2. Regression
4.2.1. Simple linear regression
The linear Regression algorithm which explains the relationship between independent and
dependent variables to predict the values of the dependent variable is called as Simple Linear Regression
algorithm. Simple regression uses one independent variable. The simple linear regression model is
represented (2).
y= (b0 +b1x) (2)
Here, x(independent variable) and y (dependant variable) are two factors involved in simple linear
regression analysis. Also, b0 is the Y-intercept and b1 is the Slope.
4.2.2. Multiple linear regressions
It explains the relationship between two or more independent variables and a dependent variable to
predict the values of the dependent variable. It uses two or more independent variables. Dependent variable
has a continuous and independent variable has discrete or continuous values. The multiple linear regression
model is represented as (3)
y= (p0 +p1x1+p2x2+…+pnxn) (3)
Here x1, x2... xn (independent variable) and y (dependant variable) are two factors involved in
multiple linear regression analysis. Also b0 is the y-intercept and p1, p2… pn is the slope.
6. ISSN: 2252-8814
Int J Adv Appl Sci, Vol. 9, No. 2, June 2020: 85 – 92
90
4.2.3. Logistic regression
The predictive analysis which is used for the dependent variable is categorical called as Logistical
Regression. Logistical Regression explains the relationship between one dependent variable and one or more
independent variables. The various types of Logistic Regression are:
Multinomial Logistic Regression (many)
Binary Logistic Regression (two)
Ordinal Logistic Regression (1)
The categorical response has only two possible outcomes. Multinomial Logistic Regression has
three or more outcomes without ordering whereas Ordinal Logistic Regression has three or more outcomes
with ordering.
4.2.4. Polynomial regression
The form of regression analysis which explains the relationship between the independent variable
and dependent variable as an nth degree polynomial is called as polynomial regression. It fits a non-linear
relationship between the value of independent variable and conditional mean of dependent variable. It is
represented as (4).
x = a + b * y ^ n (4)
Here p is Dependent Variable, q is Independent Variable and n is Degree.
It is used to fit the data very well when the data is below and above the regression model. It
minimizes the cost function and provides optimum result on the regression.
4.2.5. Linear discriminant analysis
The process of using various data items and applying different functions to that set to analyze
classes of objects or items separately is called Linear Discriminant Analysis. Image Recognition and
Predictive analytics use this Linear Discriminant Analysis
4.3. Clustering
4.3.1. K-means clustering
The unsupervised machine learning algorithm which is used to solve clustering problems by
classifying the dataset into a number of clusters k (group of similar objects), which defines the number of
clusters which is assumed before classifying the dataset.
4.3.2. Hierarchical clustering
The type of clustering algorithm which is used to build a hierarchy of clusters is called hierarchical
clustering. The two types of Hierarchical Clustering are:
4.3.3. Agglomerative clustering
It is used to group objects into clusters based on their similarity. The result obtained at last is a tree
representation of objects called Dendrogram.
4.3.4. Divisive analysis
This is a best down methodology where all perceptions begin in one bunch, and parts are performed
recursively as one moves down the pecking order. A hierarchical clustering is often represented as
a dendrogram. Each cluster will be representing with centroids. Distance will be calculated by using linkage.
5. RESULTS AND ANALYSIS
Indian diabetes dataset named PIMA were used for analysis for this study. It consists of eight
independent attributes and one independent class attribute. The study was implemented by R programming
language using R Studio. Machine learning algorithms like classification (Decision Tree, Naïve Bayes, k-NN
and Random Forest), regression (linear, multiple, logistic, LDA) and clustering (k-means, hierarchical
agglomerative) are used to predict the diabetics disease in early stages as shown in Table 1. Measure
Performance model by using accuracy as shown in Figure 2.
8. ISSN: 2252-8814
Int J Adv Appl Sci, Vol. 9, No. 2, June 2020: 85 – 92
92
[9] K. Simonyan and A. Zisserman, “Very Deep Convolutional Network for Large-Scale Image Recognition,”
International Conference on Learning Representations (ICLR), 2015.
[10] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going
deeper with convolutions,”2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1–9,
2015.
[11] K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.
[12] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,”
MICCAI 2015, pp. 234?241, 2015.
[13] F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully Convolutional Networks for Volumetric Medical Image
Segmentation,” 2016 Fourth International Conference on 3D Vision, pp.565-571, 2016.
[14] E. Gibson, W. Li, C. Sudre, L. Fidon, D. I. Shakir, G. Wang Z. Eaton- Rosen, R. Gray, T. Doel, Y. Hu, T. Whyntie,
P. Nachev, M. Modat D. C. Barratt, S. Ourselin, M. Jorge Cardoso and T. Vercauteren, “NiftyNet: a deep-learning
platform for medical imaging,” Computer Methods and Programs in Biomedicine, vol. 158, pp. 113 - 122, 2018.
[15] S. Jgou, M. Drozdzal, D. Vzquez, A. Romero, and Y. Bengio, “The One Hundred Layers Tiramisu: Fully
Convolutional DenseNets for Semantic Segmentation,” arXiv:1611.09326v2 [cs.CV],2016.
[16] W. Chen, Y. Zhang, J. He, Y. Qiao, Y. Chen, H. Shi, X. Tang, “W-net: Bridged U-net for 2D Medical Image
Segmentation,” arXiv:1807.04459v1 [cs.CV] 12 Jul 2018.
[17] G. Yang, H. Jing, “Multiple Convolutional Neural Network for Feature Extraction,” International Conference on
Intelligent Computing (ICIC), 2015.
[18] J. Ding, A. Li, Z. Hu, and L. Wang, “Accurate Pulmonary Nodule Detection in Computed Tomography Images
Using Deep Convolutional Neural Networks,” Computing Research Repository (CoRR) - arXiv, 2017.
[19] Q. Song, L. Zhao, X. Luo, and X. Dou, “Using Deep Learning for Classification of Lung Nodules on Computed
Tomography Images,” Journal of Healthcare Engineering, vol. 2017, 2017.
[20] H. Chougrad, H. Zouaki, O. Alheyane “Convolutional Neural Networks for Breast Cancer Screening: Transfer
Learning with Exponential Decay,”- arXiv, 2017.
[21] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau& S. Thrun, “Dermatologist-level
classification of skin cancer with deep neural networks,” Nature, vol. 542, pp. 115–118, 2017.
[22] E. Sert, S. Ertekin, U. Halici, “Ensemble of Convolutional Neural Networks for Classification of Breast
Microcalcification from Mammograms,” 2017 39th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), pp. 689–692, 2017.
[23] N. C. F. Codella, Q. B. Nguyen, S. Pankanti, D. Gutman, B. Helba, A. Halpern, J. R. Smith, “Deep learning
ensembles for melanoma recognition in dermoscopy images,” Computing Research Repository (CoRR), vol.
abs/1610.04662, 2016.
[24] K. J. Geras, S. Wolfson, Y. Shen, S. Gene Kim, L. Moy, K. Cho, “High-Resolution Breast Cancer Screening with
Multi-View Deep Convolutional Neural Networks,” Computing Research Repository (CoRR) - arXiv, 2017.