This document describes a study that uses machine learning techniques to predict heart disease and diabetes from medical data. The study collected data from a public repository and preprocessed it to handle missing values. Feature selection was performed using chi-square and principal component analysis to identify important features. Three boosting classifiers - Adaptive boosting, Gradient boosting, and Extreme Gradient boosting - were trained on the data and evaluated based on accuracy. The results showed that the boosting classifiers achieved accurate prediction for both heart disease and diabetes, with the highest accuracy reported for specific classifiers and diseases.
Predictive machine learning applying cross industry standard process for data...IAESIJAI
Currently, type 2 diabetes mellitus is one of the world's most prevalent diseases and has claimed millions of people's lives. The present research aims to know the impact of the use of machine learning in the diagnostic process of type 2 diabetes mellitus and to offer a tool that facilitates the diagnosis of the dis-ease quickly and easily. Different machine learning models were designed and compared, being random forest was the algorithm that generated the model with the best performance (90.43% accuracy), which was integrated into a web platform, working with the PIMA dataset, which was validated by specialists from the Peruvian League for the Fight against Diabetes organization. The result was a decrease of (A) 88.28% in the information collection time, (B) 99.99% in the diagnosis time, (C) 44.42% in the diagnosis cost, and (D) 100% in the level of difficulty, concluding that the application of machine learning can significantly optimize the diagnostic process of type 2 diabetes mellitus.
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
Two approaches to building models for prediction of the onset of Type diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Supervised were compared with decision trees and unsupervised of both types of classifiers. In this study, the system and the test most likely to confirm a diagnosis based on the pre-test probability computed from the patient's information including symptoms and the results of previous tests. If the patient's disease post-test probability is higher than the treatment threshold, a diagnostic decision will be made, and vice versa. Otherwise, the patient needs more tests to help make a decision. The system will then recommend the next optimal test and repeat the same process. In this thesis find out which approach is better on diabetes dataset in weka framework. Also use feature selection techniques which reduce the features and complexities of process
A disorder or illness called heart failure results in the heart becoming weak or
damaged. In order to avoid heart failure early on, it is crucial to understand the
causes of heart failure. Based on validation, two experimental processing steps
will be applied to the dataset of clinical records related to heart failure. Testing
will be done in the first step utilizing six different classification algorithms,
including K-nearest neighbor, neural network, random forest, decision tree,
Naïve Bayes, and support vector machine (SVM). Cross-validation was
employed to conduct the test. According to the results, the random forest
algorithm performed better than the other five algorithms in tests employing
the algorithm. Subsequent testing uses an algorithm with the best accuracy
value, which will then be tested again using split validation with varying split
ratios and genetic algorithms as a selection feature. The value generated from
testing using the genetic algorithm selection feature is better than the random
forest algorithm alone, which is recorded to produce an accuracy value of
93.36% in predicting the survival of heart failure patients.
Predictive machine learning applying cross industry standard process for data...IAESIJAI
Currently, type 2 diabetes mellitus is one of the world's most prevalent diseases and has claimed millions of people's lives. The present research aims to know the impact of the use of machine learning in the diagnostic process of type 2 diabetes mellitus and to offer a tool that facilitates the diagnosis of the dis-ease quickly and easily. Different machine learning models were designed and compared, being random forest was the algorithm that generated the model with the best performance (90.43% accuracy), which was integrated into a web platform, working with the PIMA dataset, which was validated by specialists from the Peruvian League for the Fight against Diabetes organization. The result was a decrease of (A) 88.28% in the information collection time, (B) 99.99% in the diagnosis time, (C) 44.42% in the diagnosis cost, and (D) 100% in the level of difficulty, concluding that the application of machine learning can significantly optimize the diagnostic process of type 2 diabetes mellitus.
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
Two approaches to building models for prediction of the onset of Type diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Supervised were compared with decision trees and unsupervised of both types of classifiers. In this study, the system and the test most likely to confirm a diagnosis based on the pre-test probability computed from the patient's information including symptoms and the results of previous tests. If the patient's disease post-test probability is higher than the treatment threshold, a diagnostic decision will be made, and vice versa. Otherwise, the patient needs more tests to help make a decision. The system will then recommend the next optimal test and repeat the same process. In this thesis find out which approach is better on diabetes dataset in weka framework. Also use feature selection techniques which reduce the features and complexities of process
A disorder or illness called heart failure results in the heart becoming weak or
damaged. In order to avoid heart failure early on, it is crucial to understand the
causes of heart failure. Based on validation, two experimental processing steps
will be applied to the dataset of clinical records related to heart failure. Testing
will be done in the first step utilizing six different classification algorithms,
including K-nearest neighbor, neural network, random forest, decision tree,
Naïve Bayes, and support vector machine (SVM). Cross-validation was
employed to conduct the test. According to the results, the random forest
algorithm performed better than the other five algorithms in tests employing
the algorithm. Subsequent testing uses an algorithm with the best accuracy
value, which will then be tested again using split validation with varying split
ratios and genetic algorithms as a selection feature. The value generated from
testing using the genetic algorithm selection feature is better than the random
forest algorithm alone, which is recorded to produce an accuracy value of
93.36% in predicting the survival of heart failure patients.
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
With the promises of predictive analytics in big data, and the use of machine learning algorithms,
predicting future is no longer a difficult task, especially for health sector, that has witnessed a great
evolution following the development of new computer technologies that gave birth to multiple fields of
research. Many efforts are done to cope with medical data explosion on one hand, and to obtain useful
knowledge from it, predict diseases and anticipate the cure on the other hand. This prompted researchers
to apply all the technical innovations like big data analytics, predictive analytics, machine learning and
learning algorithms in order to extract useful knowledge and help in making decisions. In this paper, we
will present an overview on the evolution of big data in healthcare system, and we will apply three learning
algorithms on a set of medical data. The objective of this research work is to predict kidney disease by
using multiple machine learning algorithms that are Support Vector Machine (SVM), Decision Tree (C4.5),
and Bayesian Network (BN), and chose the most efficient one.
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGIJDKP
Heart disease is most common disease reported currently in the United States among both the genders and
according to official statistics about fifty percent of the American population is suffering from some form of
cardiovascular disease. This paper performs chi square tests and linear regression analysis to predict
heart disease based on the symptoms like chest pain and dizziness. This paper will help healthcare sectors
to provide better assistance for patients suffering from heart disease by predicting it in beginning stage of
disease. Chi square test is conducted to identify whether there is a relation between chest pain and heart
disease cases in the United States by analyzing heart disease dataset from IEEE Data Port. The test results
and analysis show that males in the United States are most likely to develop heart disease with the
symptoms like chest pain, dizziness, shortness of breath, fatigue, and nausea. This test also shows that
there is a week corelation of 0.5 is identified which shows people with all ages including teens can face
heart diseases and its prevalence increase with age. Also, the tests indicate that 90 percent of the
participant who are facing severe chest pain is suffering from heart disease where majority of the
successful heart disease identified is in males and only 10 percent participants are identified as healthy.
The evaluated p-values are much greater than the statistical threshold of 0.05 which concludes factors like
sex, Exercise angina, Cholesterol, old peak, ST_Slope, obesity, and blood sugar play significant role in
onset of cardiovascular disease. We have tested the dataset with prediction model built on logistic
regression and observed an accuracy of 85.12 percent.
Prediction of the risk of developing heart disease using logistic regressionIJECEIAES
Heart disease (HD) accounts for more deaths every year than other illnesses. World Health Organization (WHO) assessed 17.9 million life losses caused by heart disease in 2016, demonstrating 31% of all international life losses. Three-quarters of these life losses occur in low and middle-income nations. Machine learning (ML), due to advanced precision in pattern recognition and classification, demonstrates to be in effect in complementing decisionmaking and threat prediction from the huge number of HD data created by the healthcare sector. Thus, this study aims to develop a logistic regression model (LRM) for predicting the risk of getting HD in ten years. The study explores the different methodologies for improving the performance of base LRM for predicting whether a person gets HD after ten years or not. The result demonstrates the capability of LRM in predicting the risks of getting HD after ten years. The LRM achieves 97.35% accuracy with the recursive feature elimination and random under-sampling. This implies that the LRM can play an important role in precautionary methods to avoid the risk of HD.
Chronic disease (CD) such as kidney disease and causes severe challenging issues to the people all around the world. Chronic kidney disease (CKD) and diabetes mellitus (DM) are considered in this paper. Predicting the diseases in earlier stage, gives better preventive measures to the people. Healthcare domain leads to tremendous cost savings and improved health status of the society. The main objective of this paper is to develop an algorithm to predict CKD occurrence using machine learning (ML) technique. The commonly used classification algorithms namely logistic regression (LR), random forest (RF), conditional random forest (CRF), and recurrent neural networks (RNN) are considered to predict the disease at an earlier stage. The proposed algorithm in this paper uses medical code data to predict disease at an earlier stage.
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACHindexPub
Predicting the onset of heart disease accurately is essential for early diagnosis and prevention of this global pandemic. The paper suggests a hybrid method to improve heart disease prediction. The research examines several machine learning (ML) models for detecting heart illness and assesses how well they predict heart disease. To enhance precision, the hybrid method employs not one but many machine learning methods. The hybrid method employs SVMs, random forests, and neural networks as its machine- learning algorithms. When it comes to classification, SVM is a very effective method. The data points are separated into classes, and the optimal hyperplane to do this is the goal. SVM can learn the boundaries and patterns between various risk variables and efficiently categorize people as having heart disease or not. Random forests are a kind of ensemble learning that uses several individual decision trees to make a final determination. The characteristics used to construct each decision tree are chosen at random. Each decision tree contributes to the overall forecast, which is then aggregated. Due to their versatility, random forests may be used to the prediction of cardiovascular disease. Neural networks are a kind of algorithm that takes their cues from the way the human brain operates. They are made up of several layers of artificial neurons working together to learn intricate patterns from data. Medical diagnosis is only one field where neural networks have been shown to be useful. In the hybrid method, neural networks may learn complex associations between risk factors and cardiovascular disease and provide reliable prognoses based on this information. The hybrid method enhances the accuracy of heart disease prediction by combining the benefits of various machine-learning techniques
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...csitconf
Feature Selection (FS) has become the focus of much research on decision support systems
areas for which datasets with tremendous number of variables are analyzed. In this paper we
present a new method for the diagnosis of Coronary Artery Diseases (CAD) founded on Genetic
Algorithm (GA) wrapped Bayes Naïve (BN) based FS.
Basically, CAD dataset contains two classes defined with 13 features. In GA–BN algorithm, GA
generates in each iteration a subset of attributes that will be evaluated using the BN in the
second step of the selection procedure. The final set of attribute contains the most relevant
feature model that increases the accuracy. The algorithm in this case produces 85.50%
classification accuracy in the diagnosis of CAD. Thus, the asset of the Algorithm is then
compared with the use of Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and
C4.5 decision tree Algorithm. The result of classification accuracy for those algorithms are
respectively 83.5%, 83.16% and 80.85%. Consequently, the GA wrapped BN Algorithm is
correspondingly compared with other FS algorithms. The Obtained results have shown very
promising outcomes for the diagnosis of CAD.
Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...cscpconf
Feature Selection (FS) has become the focus of much research on decision support systems areas for which datasets with tremendous number of variables are analyzed. In this paper we
present a new method for the diagnosis of Coronary Artery Diseases (CAD) founded on Genetic Algorithm (GA) wrapped Bayes Naïve (BN) based FS. Basically, CAD dataset contains two classes defined with 13 features. In GA–BN algorithm, GA
generates in each iteration a subset of attributes that will be evaluated using the BN in the second step of the selection procedure. The final set of attribute contains the most relevant feature model that increases the accuracy. The algorithm in this case produces 85.50% classification accuracy in the diagnosis of CAD. Thus, the asset of the Algorithm is then compared with the use of Support Vector Machine (SVM), Multi-Layer erceptron (MLP) and C4.5 decision tree Algorithm. The result of classification accuracy for those algorithms are respectively 83.5%, 83.16% and 80.85%. Consequently, the GA wrapped BN Algorithm is correspondingly compared with other FS algorithms. The Obtained results have shown very promising outcomes for the diagnosis of CAD.
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUEScscpconf
The health sector has witnessed a great evolution following the development of new computer technologies, and that pushed this area to produce more medical data, which gave birth to multiple fields of research. Many efforts are done to cope with the explosion of medical data on one hand, and to obtain useful knowledge from it on the other hand. This prompted researchers to apply all the technical innovations like big data analytics, predictive analytics, machine learning and learning algorithms in order to extract useful knowledge and help in making decisions. With the promises of predictive analytics in big data, and the use of machine learning
algorithms, predicting future is no longer a difficult task, especially for medicine because predicting diseases and anticipating the cure became possible. In this paper we will present an overview on the evolution of big data in healthcare system, and we will apply a learning algorithm on a set of medical data. The objective is to predict chronic kidney diseases by using Decision Tree (C4.5) algorithm.
Clinical data science is a rapidly evolving field that utilizes advanced analytics and machine learning techniques to extract meaningful insights from large scale healthcare data. In recent years, there has been a significant increase in the availability of electronic health records, genomic data, wearable devices, and other digital health technologies, generating vast amounts of data. This article presents a comprehensive review of the current state of clinical data science and its future prospects. The review begins by providing an overview of the foundational concepts and methodologies employed in clinical data science. It explores various data sources, including structured and unstructured data, and highlights the challenges associated with data quality, privacy, and interoperability. The role of artificial intelligence and machine learning algorithms in data analysis and prediction is examined, along with the importance of data preprocessing and feature selection techniques. G. Dileepkumar | Nimisha Prajapati | Simhavalli Godavarthi "Clinical Data Science and its Future" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd58588.pdf Paper URL: https://www.ijtsrd.com.com/pharmacy/pharmacy-practice/58588/clinical-data-science-and-its-future/g-dileepkumar
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEMijaia
Misguided information in health care has caused much havoc that have led to the death of millions of people as a result of misclassification, and inconsistent health care records; hence the objective of this paper is to develop an improved clinical decision support system. This system incorporated hybrid system
of non-knowledge based and knowledge based decision support system for the diagnosis of diseases and proper health care delivery records using prostate cancer and diabetes datasets to train and validate the model. The min-max method was adopted in normalizing the datasets, while genetic algorithm was
deployed in initiating the training weights of the MLP. The result obtained in this paper yielded a classification accuracy of 98%, sensitivity of 0.98 and specificity of 100 for prostate cancer and accuracy of 94%, sensitivity of 0.94 and specificity of 0.67 for diabetes.
K-Nearest Neighbours based diagnosis of hyperglycemiaijtsrd
AI or artificial intelligence is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using the rules to reach approximate or definite conclusions), and self-correction. As a result, Artificial Intelligence is gaining Importance in science and engineering fields. The use of Artificial Intelligence in medical diagnosis too is becoming increasingly common and has been used widely in the diagnosis of cancers, tumors, hepatitis, lung diseases, etc... The main aim of this paper is to build an Artificial Intelligent System that after analysis of certain parameters can predict that whether a person is diabetic or not. Diabetes is the name used to describe a metabolic condition of having higher than normal blood sugar levels. Diabetes is becoming increasingly more common throughout the world, due to increased obesity - which can lead to metabolic syndrome or pre-diabetes leading to higher incidences of type 2 diabetes. Authors have identified 10 parameters that play an important role in diabetes and prepared a rich database of training data which served as the backbone of the prediction algorithm. Keeping in view this training data authors developed a system that uses the artificial neural networks algorithm to serve the purpose. These are capable of predicting new observations (on specific variables) from previous observations (on the same or other variables) after executing a process of so-called learning from existing training data (Haykin 1998).The results indicate that the performance of KNN method when compared with the medical diagnosis system was found to be 91%. This system can be used to assist medical programs especially in geographically remote areas where expert human diagnosis not possible with an advantage of minimal expenses and faster results. Abid Sarwar"K-Nearest Neighbours based diagnosis of hyperglycemia" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-1 , December 2017, URL: http://www.ijtsrd.com/papers/ijtsrd7046.pdf http://www.ijtsrd.com/computer-science/artificial-intelligence/7046/k-nearest-neighbours-based-diagnosis-of-hyperglycemia/abid-sarwar
Detection of myocardial infarction on recent dataset using machine learningIJICTJOURNAL
In developing countries such as India, with a large aging population and limited access to medical facilities, remote and timely diagnosis of myocardial infarction (MI) has the potential to save the life of many. An electrocardiogram is the primary clinical tool utilized in the onset or detection of a previous MI incident. Artificial intelligence has made a great impact on every area of research as well as in medical diagnosis. In medical diagnosis, the hypothesis might be doctors' experience which would be used as input to predict a disease that saves the life of mankind. It is been observed that a properly cleaned and pruned dataset provides far better accuracy than an unclean one with missing values. Selection of suitable techniques for data cleaning alongside proper classification algorithms will cause the event of prediction systems that give enhanced accuracy. In this proposal detection of myocardial infarction using new parameters is proposed with increased accuracy and efficiency of the existing model. Additional parameters are used to predict MI with more accuracy. The proposed model is used to predict an early diagnosis of MI with the help of expertise experiences and data gathered from hospitals.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
More Related Content
Similar to Machine learning approach for predicting heart and diabetes diseases using data-driven analysis
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
With the promises of predictive analytics in big data, and the use of machine learning algorithms,
predicting future is no longer a difficult task, especially for health sector, that has witnessed a great
evolution following the development of new computer technologies that gave birth to multiple fields of
research. Many efforts are done to cope with medical data explosion on one hand, and to obtain useful
knowledge from it, predict diseases and anticipate the cure on the other hand. This prompted researchers
to apply all the technical innovations like big data analytics, predictive analytics, machine learning and
learning algorithms in order to extract useful knowledge and help in making decisions. In this paper, we
will present an overview on the evolution of big data in healthcare system, and we will apply three learning
algorithms on a set of medical data. The objective of this research work is to predict kidney disease by
using multiple machine learning algorithms that are Support Vector Machine (SVM), Decision Tree (C4.5),
and Bayesian Network (BN), and chose the most efficient one.
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGIJDKP
Heart disease is most common disease reported currently in the United States among both the genders and
according to official statistics about fifty percent of the American population is suffering from some form of
cardiovascular disease. This paper performs chi square tests and linear regression analysis to predict
heart disease based on the symptoms like chest pain and dizziness. This paper will help healthcare sectors
to provide better assistance for patients suffering from heart disease by predicting it in beginning stage of
disease. Chi square test is conducted to identify whether there is a relation between chest pain and heart
disease cases in the United States by analyzing heart disease dataset from IEEE Data Port. The test results
and analysis show that males in the United States are most likely to develop heart disease with the
symptoms like chest pain, dizziness, shortness of breath, fatigue, and nausea. This test also shows that
there is a week corelation of 0.5 is identified which shows people with all ages including teens can face
heart diseases and its prevalence increase with age. Also, the tests indicate that 90 percent of the
participant who are facing severe chest pain is suffering from heart disease where majority of the
successful heart disease identified is in males and only 10 percent participants are identified as healthy.
The evaluated p-values are much greater than the statistical threshold of 0.05 which concludes factors like
sex, Exercise angina, Cholesterol, old peak, ST_Slope, obesity, and blood sugar play significant role in
onset of cardiovascular disease. We have tested the dataset with prediction model built on logistic
regression and observed an accuracy of 85.12 percent.
Prediction of the risk of developing heart disease using logistic regressionIJECEIAES
Heart disease (HD) accounts for more deaths every year than other illnesses. World Health Organization (WHO) assessed 17.9 million life losses caused by heart disease in 2016, demonstrating 31% of all international life losses. Three-quarters of these life losses occur in low and middle-income nations. Machine learning (ML), due to advanced precision in pattern recognition and classification, demonstrates to be in effect in complementing decisionmaking and threat prediction from the huge number of HD data created by the healthcare sector. Thus, this study aims to develop a logistic regression model (LRM) for predicting the risk of getting HD in ten years. The study explores the different methodologies for improving the performance of base LRM for predicting whether a person gets HD after ten years or not. The result demonstrates the capability of LRM in predicting the risks of getting HD after ten years. The LRM achieves 97.35% accuracy with the recursive feature elimination and random under-sampling. This implies that the LRM can play an important role in precautionary methods to avoid the risk of HD.
Chronic disease (CD) such as kidney disease and causes severe challenging issues to the people all around the world. Chronic kidney disease (CKD) and diabetes mellitus (DM) are considered in this paper. Predicting the diseases in earlier stage, gives better preventive measures to the people. Healthcare domain leads to tremendous cost savings and improved health status of the society. The main objective of this paper is to develop an algorithm to predict CKD occurrence using machine learning (ML) technique. The commonly used classification algorithms namely logistic regression (LR), random forest (RF), conditional random forest (CRF), and recurrent neural networks (RNN) are considered to predict the disease at an earlier stage. The proposed algorithm in this paper uses medical code data to predict disease at an earlier stage.
ENHANCING ACCURACY IN HEART DISEASE PREDICTION: A HYBRID APPROACHindexPub
Predicting the onset of heart disease accurately is essential for early diagnosis and prevention of this global pandemic. The paper suggests a hybrid method to improve heart disease prediction. The research examines several machine learning (ML) models for detecting heart illness and assesses how well they predict heart disease. To enhance precision, the hybrid method employs not one but many machine learning methods. The hybrid method employs SVMs, random forests, and neural networks as its machine- learning algorithms. When it comes to classification, SVM is a very effective method. The data points are separated into classes, and the optimal hyperplane to do this is the goal. SVM can learn the boundaries and patterns between various risk variables and efficiently categorize people as having heart disease or not. Random forests are a kind of ensemble learning that uses several individual decision trees to make a final determination. The characteristics used to construct each decision tree are chosen at random. Each decision tree contributes to the overall forecast, which is then aggregated. Due to their versatility, random forests may be used to the prediction of cardiovascular disease. Neural networks are a kind of algorithm that takes their cues from the way the human brain operates. They are made up of several layers of artificial neurons working together to learn intricate patterns from data. Medical diagnosis is only one field where neural networks have been shown to be useful. In the hybrid method, neural networks may learn complex associations between risk factors and cardiovascular disease and provide reliable prognoses based on this information. The hybrid method enhances the accuracy of heart disease prediction by combining the benefits of various machine-learning techniques
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DISE...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%.
EVALUATING THE ACCURACY OF CLASSIFICATION ALGORITHMS FOR DETECTING HEART DIS...mlaij
The healthcare industry generates enormous amounts of complex clinical data that make the prediction of
disease detection a complicated process. In medical informatics, making effective and efficient decisions is
very important. Data Mining (DM) techniques are mainly used to identify and extract hidden patterns and
interesting knowledge to diagnose and predict diseases in medical datasets. Nowadays, heart disease is
considered one of the most important problems in the healthcare field. Therefore, early diagnosis leads to
a reduction in deaths. DM techniques have proven highly effective for predicting and diagnosing heart
diseases. This work utilizes the classification algorithms with a medical dataset of heart disease; namely,
J48, Random Forest, and Naïve Bayes to discover the accuracy of their performance. We also examine the
impact of the feature selection method. A comparative and analysis study was performed to determine the
best technique using Waikato Environment for Knowledge Analysis (Weka) software, version 3.8.6. The
performance of the utilized algorithms was evaluated using standard metrics such as accuracy, sensitivity
and specificity. The importance of using classification techniques for heart disease diagnosis has been
highlighted. We also reduced the number of attributes in the dataset, which showed a significant
improvement in prediction accuracy. The results indicate that the best algorithm for predicting heart
disease was Random Forest with an accuracy of 99.24%
SUPERVISED FEATURE SELECTION FOR DIAGNOSIS OF CORONARY ARTERY DISEASE BASED O...csitconf
Feature Selection (FS) has become the focus of much research on decision support systems
areas for which datasets with tremendous number of variables are analyzed. In this paper we
present a new method for the diagnosis of Coronary Artery Diseases (CAD) founded on Genetic
Algorithm (GA) wrapped Bayes Naïve (BN) based FS.
Basically, CAD dataset contains two classes defined with 13 features. In GA–BN algorithm, GA
generates in each iteration a subset of attributes that will be evaluated using the BN in the
second step of the selection procedure. The final set of attribute contains the most relevant
feature model that increases the accuracy. The algorithm in this case produces 85.50%
classification accuracy in the diagnosis of CAD. Thus, the asset of the Algorithm is then
compared with the use of Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and
C4.5 decision tree Algorithm. The result of classification accuracy for those algorithms are
respectively 83.5%, 83.16% and 80.85%. Consequently, the GA wrapped BN Algorithm is
correspondingly compared with other FS algorithms. The Obtained results have shown very
promising outcomes for the diagnosis of CAD.
Supervised Feature Selection for Diagnosis of Coronary Artery Disease Based o...cscpconf
Feature Selection (FS) has become the focus of much research on decision support systems areas for which datasets with tremendous number of variables are analyzed. In this paper we
present a new method for the diagnosis of Coronary Artery Diseases (CAD) founded on Genetic Algorithm (GA) wrapped Bayes Naïve (BN) based FS. Basically, CAD dataset contains two classes defined with 13 features. In GA–BN algorithm, GA
generates in each iteration a subset of attributes that will be evaluated using the BN in the second step of the selection procedure. The final set of attribute contains the most relevant feature model that increases the accuracy. The algorithm in this case produces 85.50% classification accuracy in the diagnosis of CAD. Thus, the asset of the Algorithm is then compared with the use of Support Vector Machine (SVM), Multi-Layer erceptron (MLP) and C4.5 decision tree Algorithm. The result of classification accuracy for those algorithms are respectively 83.5%, 83.16% and 80.85%. Consequently, the GA wrapped BN Algorithm is correspondingly compared with other FS algorithms. The Obtained results have shown very promising outcomes for the diagnosis of CAD.
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUEScscpconf
The health sector has witnessed a great evolution following the development of new computer technologies, and that pushed this area to produce more medical data, which gave birth to multiple fields of research. Many efforts are done to cope with the explosion of medical data on one hand, and to obtain useful knowledge from it on the other hand. This prompted researchers to apply all the technical innovations like big data analytics, predictive analytics, machine learning and learning algorithms in order to extract useful knowledge and help in making decisions. With the promises of predictive analytics in big data, and the use of machine learning
algorithms, predicting future is no longer a difficult task, especially for medicine because predicting diseases and anticipating the cure became possible. In this paper we will present an overview on the evolution of big data in healthcare system, and we will apply a learning algorithm on a set of medical data. The objective is to predict chronic kidney diseases by using Decision Tree (C4.5) algorithm.
Clinical data science is a rapidly evolving field that utilizes advanced analytics and machine learning techniques to extract meaningful insights from large scale healthcare data. In recent years, there has been a significant increase in the availability of electronic health records, genomic data, wearable devices, and other digital health technologies, generating vast amounts of data. This article presents a comprehensive review of the current state of clinical data science and its future prospects. The review begins by providing an overview of the foundational concepts and methodologies employed in clinical data science. It explores various data sources, including structured and unstructured data, and highlights the challenges associated with data quality, privacy, and interoperability. The role of artificial intelligence and machine learning algorithms in data analysis and prediction is examined, along with the importance of data preprocessing and feature selection techniques. G. Dileepkumar | Nimisha Prajapati | Simhavalli Godavarthi "Clinical Data Science and its Future" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd58588.pdf Paper URL: https://www.ijtsrd.com.com/pharmacy/pharmacy-practice/58588/clinical-data-science-and-its-future/g-dileepkumar
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEMijaia
Misguided information in health care has caused much havoc that have led to the death of millions of people as a result of misclassification, and inconsistent health care records; hence the objective of this paper is to develop an improved clinical decision support system. This system incorporated hybrid system
of non-knowledge based and knowledge based decision support system for the diagnosis of diseases and proper health care delivery records using prostate cancer and diabetes datasets to train and validate the model. The min-max method was adopted in normalizing the datasets, while genetic algorithm was
deployed in initiating the training weights of the MLP. The result obtained in this paper yielded a classification accuracy of 98%, sensitivity of 0.98 and specificity of 100 for prostate cancer and accuracy of 94%, sensitivity of 0.94 and specificity of 0.67 for diabetes.
K-Nearest Neighbours based diagnosis of hyperglycemiaijtsrd
AI or artificial intelligence is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using the rules to reach approximate or definite conclusions), and self-correction. As a result, Artificial Intelligence is gaining Importance in science and engineering fields. The use of Artificial Intelligence in medical diagnosis too is becoming increasingly common and has been used widely in the diagnosis of cancers, tumors, hepatitis, lung diseases, etc... The main aim of this paper is to build an Artificial Intelligent System that after analysis of certain parameters can predict that whether a person is diabetic or not. Diabetes is the name used to describe a metabolic condition of having higher than normal blood sugar levels. Diabetes is becoming increasingly more common throughout the world, due to increased obesity - which can lead to metabolic syndrome or pre-diabetes leading to higher incidences of type 2 diabetes. Authors have identified 10 parameters that play an important role in diabetes and prepared a rich database of training data which served as the backbone of the prediction algorithm. Keeping in view this training data authors developed a system that uses the artificial neural networks algorithm to serve the purpose. These are capable of predicting new observations (on specific variables) from previous observations (on the same or other variables) after executing a process of so-called learning from existing training data (Haykin 1998).The results indicate that the performance of KNN method when compared with the medical diagnosis system was found to be 91%. This system can be used to assist medical programs especially in geographically remote areas where expert human diagnosis not possible with an advantage of minimal expenses and faster results. Abid Sarwar"K-Nearest Neighbours based diagnosis of hyperglycemia" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-1 , December 2017, URL: http://www.ijtsrd.com/papers/ijtsrd7046.pdf http://www.ijtsrd.com/computer-science/artificial-intelligence/7046/k-nearest-neighbours-based-diagnosis-of-hyperglycemia/abid-sarwar
Detection of myocardial infarction on recent dataset using machine learningIJICTJOURNAL
In developing countries such as India, with a large aging population and limited access to medical facilities, remote and timely diagnosis of myocardial infarction (MI) has the potential to save the life of many. An electrocardiogram is the primary clinical tool utilized in the onset or detection of a previous MI incident. Artificial intelligence has made a great impact on every area of research as well as in medical diagnosis. In medical diagnosis, the hypothesis might be doctors' experience which would be used as input to predict a disease that saves the life of mankind. It is been observed that a properly cleaned and pruned dataset provides far better accuracy than an unclean one with missing values. Selection of suitable techniques for data cleaning alongside proper classification algorithms will cause the event of prediction systems that give enhanced accuracy. In this proposal detection of myocardial infarction using new parameters is proposed with increased accuracy and efficiency of the existing model. Additional parameters are used to predict MI with more accuracy. The proposed model is used to predict an early diagnosis of MI with the help of expertise experiences and data gathered from hospitals.
Similar to Machine learning approach for predicting heart and diabetes diseases using data-driven analysis (20)
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
A novel ensemble model for detecting fake newsIAESIJAI
Due the growing proliferation of fake news over the past couple of years, our objective in this paper is to propose an ensemble model for the automatic classification of article news as being either real or fake. For this purpose, we opt for a blending technique that combines three models, namely bidirectional long short-term memory (Bi-LSTM), stochastic gradient descent classifier and ridge classifier. The implementation of the proposed model (i.e. BI-LSR) on real world datasets, has shown outstanding results. In fact, it achieved an accuracy score of 99.16%. Accordingly, this ensemble learning has proven to do perform better than individual conventional machine learning and deep learning models as well as many ensemble learning approaches cited in the literature.
K-centroid convergence clustering identification in one-label per type for di...IAESIJAI
Disease prediction is a high demand field which requires significant support from machine learning (ML) to enhance the result efficiency. The research works on application of K-means clustering supervised classification in disease prediction where each class only has one labeled data. The K-centroid convergence clustering identification (KC3 I) system is based on semi-K-means clustering but only requires single labeled data per class for the training process with the training dataset to update the centroid. The KC3 I model also includes a dictionary box to index all the input centroids before and after the updating process. Each centroid matches with a corresponding label inside this box. After the training process, each time the input features arrive, the trained centroid will put them to its cluster depending on the Euclidean distance, then convert them into the specific class name, which is coherent to that centroid index. Two validation stages were carried out and accomplished the expectation in terms of precision, recall, F1-score, and absolute accuracy. The last part demonstrates the possibility of feature reduction by selecting the most crucial feature with the extra tree classifier method. Total data are fed into the KC3 I system with the most important features and remain the same accuracy.
Plant leaf detection through machine learning based image classification appr...IAESIJAI
Since maize is a staple diet for people, especially vegetarians and vegans, maize leaf disease has a significant influence here on the food industry including maize crop productivity. Therefore, it should be understood that maize quality must be optimal; yet, to do so, maize must be safeguarded from several illnesses. As a result, there is a great demand for such an automated system that can identify the condition early on and take the appropriate action. Early disease identification is crucial, but it also poses a major obstacle. As a result, in this research project, we adopt the fundamental k-nearest neighbor (KNN) model and concentrate on building and developing the enhanced k-nearest neighbor (EKNN) model. EKNN aids in identifying several classes of disease. To gather discriminative, boundary, pattern, and structurally linked information, additional high-quality fine and coarse features are generated. This information is then used in the classification process. The classification algorithm offers high-quality gradient-based features. Additionally, the proposed model is assessed using the Plant-Village dataset, and a comparison with many standard classification models using various metrics is also done.
Backbone search for object detection for applications in intrusion warning sy...IAESIJAI
In this work, we propose a novel backbone search method for object detection for applications in intrusion warning systems. The goal is to find a compact model for use in embedded thermal imaging cameras widely used in intrusion warning systems. The proposed method is based on faster region-based convolutional neural network (Faster R-CNN) because it can detect small objects. Inspired by EfficientNet, the sought-after backbone architecture is obtained by finding the most suitable width scale for the base backbone (ResNet50). The evaluation metrics are mean average precision (mAP), number of parameters, and number of multiply–accumulate operations (MACs). The experimental results showed that the proposed method is effective in building a lightweight neural network for the task of object detection. The obtained model can keep the predefined mAP while minimizing the number of parameters and computational resources. All experiments are executed elaborately on the person detection in intrusion warning systems (PDIWS) dataset.
Deep learning method for lung cancer identification and classificationIAESIJAI
Lung cancer (LC) is calming many lives and is becoming a serious cause of concern. The detection of LC at an early stage assists the chances of recovery. Accuracy of detection of LC at an early stage can be improved with the help of a convolutional neural network (CNN) based deep learning approach. In this paper, we present two methodologies for Lung cancer detection (LCD) applied on Lung image database consortium (LIDC) and image database resource initiative (IDRI) data sets. Classification of these LC images is carried out using support vector machine (SVM), and deep CNN. The CNN is trained with i) multiple batches and ii) single batch for LC image classification as non cancer and cancer image. All these methods are being implemented in MATLAB. The accuracy of classification obtained by SVM is 65%, whereas deep CNN produced detection accuracy of 80% and 100% respectively for multiple and single batch training. The novelty of our experimentation is near 100% classification accuracy obtained by our deep CNN model when tested on 25 Lung computed tomography (CT) test images each of size 512×512 pixels in less than 20 iterations as compared to the research work carried out by other researchers using cropped LC nodule images.
Optically processed Kannada script realization with Siamese neural network modelIAESIJAI
Optical character recognition (OCR) is a technology that allows computers to recognize and extract text from images or scanned documents. It is commonly used to convert printed or handwritten text into machine-readable format. This Study presents an OCR system on Kannada Characters based on siamese neural network (SNN). Here the SNN, a Deep neural network which comprises of two identical convolutional neural network (CNN) compare the script and ranks based on the dissimilarity. When lesser dissimilarity score is identified, prediction is done as character match. In this work the authors use 5 classes of Kannada characters which were initially preprocessed using grey scaling and convert it to pgm format. This is directly input into the Deep convolutional network which is learnt from matching and non-matching image between the CNN with contrastive loss function in Siamese architecture. The Proposed OCR system uses very less time and gives more accurate results as compared to the regular CNN. The model can become a powerful tool for identification, particularly in situations where there is a high degree of variation in writing styles or limited training data is available.
Embedded artificial intelligence system using deep learning and raspberrypi f...IAESIJAI
Melanoma is a kind of skin cancer that originates in melanocytes responsible for producing melanin, it can be a severe and potentially deadly form of cancer because it can metastasize to other regions of the body if not detected and treated early. To facilitate this process, Recently, various computer-assisted low-cost, reliable, and accurate diagnostic systems have been proposed based on artificial intelligence (AI) algorithms, particularly deep learning techniques. This work proposed an innovative and intelligent system that combines the internet of things (IoT) with a Raspberry Pi connected to a camera and a deep learning model based on the deep convolutional neural network (CNN) algorithm for real-time detection and classification of melanoma cancer lesions. The key stages of our model before serializing to the Raspberry Pi: Firstly, the preprocessing part contains data cleaning, data transformation (normalization), and data augmentation to reduce overfitting when training. Then, the deep CNN algorithm is used to extract the features part. Finally, the classification part with applied Sigmoid Activation Function. The experimental results indicate the efficiency of our proposed classification system as we achieved an accuracy rate of 92%, a precision of 91%, a sensitivity of 91%, and an area under the curve- receiver operating characteristics (AUC-ROC) of 0.9133.
Deep learning based biometric authentication using electrocardiogram and irisIAESIJAI
Authentication systems play an important role in wide range of applications. The traditional token certificate and password-based authentication systems are now replaced by biometric authentication systems. Generally, these authentication systems are based on the data obtained from face, iris, electrocardiogram (ECG), fingerprint and palm print. But these types of models are unimodal authentication, which suffer from accuracy and reliability issues. In this regard, multimodal biometric authentication systems have gained huge attention to develop the robust authentication systems. Moreover, the current development in deep learning schemes have proliferated to develop more robust architecture to overcome the issues of tradition machine learning based authentication systems. In this work, we have adopted ECG and iris data and trained the obtained features with the help of hybrid convolutional neural network- long short-term memory (CNN-LSTM) model. In ECG, R peak detection is considered as an important aspect for feature extraction and morphological features are extracted. Similarly, gabor-wavelet, gray level co-occurrence matrix (GLCM), gray level difference matrix (GLDM) and principal component analysis (PCA) based feature extraction methods are applied on iris data. The final feature vector is obtained from MIT-BIH and IIT Delhi Iris dataset which is trained and tested by using CNN-LSTM. The experimental analysis shows that the proposed approach achieves average accuracy, precision, and F1-core as 0.985, 0.962 and 0.975, respectively.
Hybrid channel and spatial attention-UNet for skin lesion segmentationIAESIJAI
Melanoma is a type of skin cancer which has affected many lives globally. The American Cancer Society research has suggested that it a serious type of skin cancer and lead to mortality but it is almost 100% curable if it is detected and treated in its early stages. Currently automated computer vision-based schemes are widely adopted but these systems suffer from poor segmentation accuracy. To overcome these issue, deep learning (DL) has become the promising solution which performs extensive training for pattern learning and provide better classification accuracy. However, skin lesion segmentation is affected due to skin hair, unclear boundaries, pigmentation, and mole. To overcome this issue, we adopt UNet based deep learning scheme and incorporated attention mechanism which considers low level statistics and high-level statistics combined with feedback and skip connection module. This helps to obtain the robust features without neglecting the channel information. Further, we use channel attention, spatial attention modulation to achieve the final segmentation. The proposed DL based scheme is instigated on publically available dataset and experimental investigation shows that the proposed Hybrid Attention UNet approach achieves average performance as 0.9715, 0.9962, 0.9710.
Photoplethysmogram signal reconstruction through integrated compression sensi...IAESIJAI
The transmission of photoplethysmogram (PPG) signals in real-time is extremely challenging and facilitates the use of an internet of things (IoT) environment for healthcare- monitoring. This paper proposes an approach for PPG signal reconstruction through integrated compression sensing and basis function aware shallow learning (CSBSL). Integrated-CSBSL approach for combined compression of PPG signals via multiple channels thereby improving the reconstruction accuracy for the PPG signals essential in healthcare monitoring. An optimal basis function aware shallow learning procedure is employed on PPG signals with prior initialization; this is further fine-tuned by utilizing the knowledge of various other channels, which exploit the further sparsity of the PPG signals. The proposed method for learning combined with PPG signals retains the knowledge of spatial and temporal correlation. The proposed Integrated-CSBSL approach consists of two steps, in the first step the shallow learning based on basis function is carried out through training the PPG signals. The proposed method is evaluated using multichannel PPG signal reconstruction, which potentially benefits clinical applications through PPG monitoring and diagnosis.
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
Speaker identification is biometrics that classifies or identifies a person from other speakers based on speech characteristics. Recently, deep learning models outperformed conventional machine learning models in speaker identification. Spectrograms of the speech have been used as input in deep learning-based speaker identification using clean speech. However, the performance of speaker identification systems gets degraded under noisy conditions. Cochleograms have shown better results than spectrograms in deep learning-based speaker recognition under noisy and mismatched conditions. Moreover, hybrid convolutional neural network (CNN) and recurrent neural network (RNN) variants have shown better performance than CNN or RNN variants in recent studies. However, there is no attempt conducted to use a hybrid CNN and enhanced RNN variants in speaker identification using cochleogram input to enhance the performance under noisy and mismatched conditions. In this study, a speaker identification using hybrid CNN and the gated recurrent unit (GRU) is proposed for noisy conditions using cochleogram input. VoxCeleb1 audio dataset with real-world noises, white Gaussian noises (WGN) and without additive noises were employed for experiments. The experiment results and the comparison with existing works show that the proposed model performs better than other models in this study and existing works.
Multi-channel microseismic signals classification with convolutional neural n...IAESIJAI
Identifying and classifying microseismic signals is essential to warn of mines’ dangers. Deep learning has replaced traditional methods, but labor-intensive manual identification and varying deep learning outcomes pose challenges. This paper proposes a transfer learning-based convolutional neural network (CNN) method called microseismic signals-convolutional neural network (MS-CNN) to automatically recognize and classify microseismic events and blasts. The model was instructed on a limited sample of data to obtain an optimal weight model for microseismic waveform recognition and classification. A comparative analysis was performed with an existing CNN model and classical image classification models such as AlexNet, GoogLeNet, and ResNet50. The outcomes demonstrate that the MS-CNN model achieved the best recognition and classification effect (99.6% accuracy) in the shortest time (0.31 s to identify 277 images in the test set). Thus, the MS-CNN model can efficiently recognize and classify microseismic events and blasts in practical engineering applications, improving the recognition timeliness of microseismic signals and further enhancing the accuracy of event classification.
Sophisticated face mask dataset: a novel dataset for effective coronavirus di...IAESIJAI
Efficient and accurate coronavirus disease (COVID-19) surveillance necessitates robust identification of individuals wearing face masks. This research introduces the sophisticated face mask dataset (SFMD), a comprehensive compilation of high-quality face mask images enriched with detailed annotations on mask types, fits, and usage patterns. Leveraging cutting-edge deep learning models—EfficientNet-B2, ResNet50, and MobileNet-V2—, we compare SFMD against two established benchmarks: the real-world masked face dataset (RMFD) and the masked face recognition dataset (MFRD). Across all models, SFMD consistently outperforms RMFD and MFRD in key metrics, including accuracy, precision, recall, and F1 score. Additionally, our study demonstrates the dataset's capability to cultivate robust models resilient to intricate scenarios like low-light conditions and facial occlusions due to accessories or facial hair.
Transfer learning for epilepsy detection using spectrogram imagesIAESIJAI
Epilepsy stands out as one of the common neurological diseases. The neural activity of the brain is observed using electroencephalography (EEG). Manual inspection of EEG brain signals is a slow and arduous process, which puts heavy load on neurologists and affects their performance. The aim of this study is to find the best result of classification using the transfer learning model that automatically identify the epileptic and the normal activity, to classify EEG signals by using images of spectrogram which represents the percentage of energy for each coefficient of the continuous wavelet. Dataset includes the EEG signals recorded at monitoring unit of epilepsy used in this study to presents an application of transfer learning by comparing three models Alexnet, visual geometry group (VGG19) and residual neural network ResNet using different combinations with seven different classifiers. This study tested the models and reached a different value of accuracy and other metrics used to judge their performances, and as a result the best combination has been achieved with ResNet combined with support vector machine (SVM) classifier that classified EEG signals with a high success rate using multiple performance metrics such as 97.22% accuracy and 2.78% the value of the error rate.
Deep neural network for lateral control of self-driving cars in urban environ...IAESIJAI
The exponential growth of the automotive industry clearly indicates that self-driving cars are the future of transportation. However, their biggest challenge lies in lateral control, particularly in urban bottlenecking environments, where disturbances and obstacles are abundant. In these situations, the ego vehicle has to follow its own trajectory while rapidly correcting deviation errors without colliding with other nearby vehicles. Various research efforts have focused on developing lateral control approaches, but these methods remain limited in terms of response speed and control accuracy. This paper presents a control strategy using a deep neural network (DNN) controller to effectively keep the car on the centerline of its trajectory and adapt to disturbances arising from deviations or trajectory curvature. The controller focuses on minimizing deviation errors. The Matlab/Simulink software is used for designing and training the DNN. Finally, simulation results confirm that the suggested controller has several advantages in terms of precision, with lateral deviation remaining below 0.65 meters, and rapidity, with a response time of 0.7 seconds, compared to traditional controllers in solving lateral control.
Attention mechanism-based model for cardiomegaly recognition in chest X-Ray i...IAESIJAI
Recently, cardiovascular diseases (CVDs) have become a rapidly growing problem in the world, especially in developing countries. The latter are facing a lifestyle change that introduces new risk factors for heart disease, that requires a particular and urgent interest. Besides, cardiomegaly is a sign of cardiovascular diseases that refers to various conditions; it is associated with the heart enlargement that can be either transient or permanent depending on certain conditions. Furthermore, cardiomegaly is visible on any imaging test including Chest X-Radiation (X-Ray) images; which are one of the most common tools used by Cardiologists to detect and diagnose many diseases. In this paper, we propose an innovative deep learning (DL) model based on an attention module and MobileNet architecture to recognize Cardiomegaly patients using the popular Chest X-Ray8 dataset. Actually, the attention module captures the spatial relationship between the relevant regions in Chest X-Ray images. The experimental results show that the proposed model achieved interesting results with an accuracy rate of 81% which makes it suitable for detecting cardiomegaly disease.
Efficient commodity price forecasting using long short-term memory modelIAESIJAI
Predicting commodity prices, particularly food prices, is a significant concern for various stakeholders, especially in regions that are highly sensitive to commodity price volatility. Historically, many machine learning models like autoregressive integrated moving average (ARIMA) and support vector machine (SVM) have been suggested to overcome the forecasting task. These models struggle to capture the multifaceted and dynamic factors influencing these prices. Recently, deep learning approaches have demonstrated considerable promise in handling complex forecasting tasks. This paper presents a novel long short-term memory (LSTM) network-based model for commodity price forecasting. The model uses five essential commodities namely bread, meat, milk, oil, and petrol. The proposed model focuses on advanced feature engineering which involves moving averages, price volatility, and past prices. The results reveal that our model outperforms traditional methods as it achieves 0.14, 3.04%, and 98.2% for root mean square error (RMSE), mean absolute percentage error (MAPE), and R-squared (R2 ), respectively. In addition to the simplicity of the model, which consists of an LSTM single-cell architecture that reduced the training time to a few minutes instead of hours. This paper contributes to the economic literature on price prediction using advanced deep learning techniques as well as provides practical implications for managing commodity price instability globally.
1-dimensional convolutional neural networks for predicting sudden cardiacIAESIJAI
Sudden cardiac arrest (SCA) is a serious heart problem that occurs without symptoms or warning. SCA causes high mortality. Therefore, it is important to estimate the incidence of SCA. Current methods for predicting ventricular fibrillation (VF) episodes require monitoring patients over time, resulting in no complications. New technologies, especially machine learning, are gaining popularity due to the benefits they provide. However, most existing systems rely on manual processes, which can lead to inefficiencies in disseminating patient information. On the other hand, existing deep learning methods rely on large data sets that are not publicly available. In this study, we propose a deep learning method based on one-dimensional convolutional neural networks to learn to use discrete fourier transform (DFT) features in raw electrocardiogram (ECG) signals. The results showed that our method was able to accurately predict the onset of SCA with an accuracy of 96% approximately 90 minutes before it occurred. Predictions can save many lives. That is, optimized deep learning models can outperform manual models in analyzing long-term signals.
A deep learning-based approach for early detection of disease in sugarcane pl...IAESIJAI
In many regions of the nation, agriculture serves as the primary industry. The farming environment now faces a number of challenges to farmers. One of the major concerns, and the focus of this research, is disease prediction. A methodology is suggested to automate a process for identifying disease in plant growth and warning farmers in advance so they can take appropriate action. Disease in crop plants has an impact on agricultural production. In this work, a novel DenseNet-support vector machine: explainable artificial intelligence (DNet-SVM: XAI) interpretation that combines a DenseNet with support vector machine (SVM) and local interpretable model-agnostic explanation (LIME) interpretation has been proposed. DNet-SVM: XAI was created by a series of modifications to DenseNet201, including the addition of a support vector machine (SVM) classifier. Prior to using SVM to identify if an image is healthy or un-healthy, images are first feature extracted using a convolution network called DenseNet. In addition to offering a likely explanation for the prediction, the reasoning is carried out utilizing the visual cue produced by the LIME. In light of this, the proposed approach, when paired with its determined interpretability and precision, may successfully assist farmers in the detection of infected plants and recommendation of pesticide for the identified disease.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4
Machine learning approach for predicting heart and diabetes diseases using data-driven analysis
1. IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 12, No. 4, December 2023, pp. 1687~1694
ISSN: 2252-8938, DOI: 10.11591/ijai.v12.i4.pp1687-1694 1687
Journal homepage: http://ijai.iaescore.com
Machine learning approach for predicting heart and diabetes
diseases using data-driven analysis
Usha Sekar, Kanchana Selvarajan
Department of Computer Science, Faculty of Science and Humanities, SRM Institute of Science and Technology, Kattankulathur,
TamilNadu, India
Article Info ABSTRACT
Article history:
Received Jun 23, 2022
Revised Jan 28, 2023
Accepted Mar 10, 2023
Environmental changes and food habits affect people's health with numerous
diseases in today's life. Machine learning is a technique that plays a vital role
in predicting diseases from collected data. The health sector has plenty of
electronic medical data, which helps this technique to diagnose various
diseases quickly and accurately. There has been an improvement in accuracy
in medical data analysis as data continues to grow in the medical field. Doctors
may have a hard time predicting symptoms accurately. This proposed work
utilized Kaggle data to predict and diagnose heart and diabetic diseases. The
diseases heart and diabetes are the foremost cause of higher death rates for
people. The dataset contains target features for the diagnosis of heart disease.
This work finds the target variable for diabetic disease by comparing the
patient's blood sugars to normal levels. Blood pressure, body mass index
(BMI), and other factors diagnose these diseases and disorders. This work
justifies the filter method and principal component analysis for selecting and
extracting the feature. The main aim of this work is to highlight the
implementation of three ensemble techniques-Adaptive boost, Extreme
Gradient boosting, and Gradient boosting-as well as the emphasis placed on
the accuracy of the results.
Keywords:
Adaptive boost
Chi-square
Gradient boost
Prediction
Principal component analysis
This is an open access article under the CC BY-SA license.
Corresponding Author:
Usha Sekar
Department of Computer Science, Faculty of Science and Humanities, SRM Institute of Science and
Technology, Kattankulathur, Chengalpattu Dt., TamilNadu, India
Email: us3648@srmist.edu.in
1. INTRODUCTION
Healthy living is crucial to a good quality of life. A healthcare professional prevents, treats, and
inspects diseases to improve health. Because of the inexactness of the information provided by the patient, it
can be challenging to determine a specific disease based on their symptoms [1]. Globally predicting diseases
is a crucial challenge in fundamental problems [2]. Many diseases are associated with particular symptoms and
signs. It can be inherited, caused by infection, or triggered by stress [3]. Due to the residents' modern lifestyle,
there is a risk of mortality and morbidity from diseases like heart disease, chronic respiratory disease, and
diabetes [4]. Nowadays, millions of people worldwide suffer and die from many diseases [5]. The majority of
people with multiple disorders are also distressed by numerous infections.
In today's society, predicting disease based on early-stage symptoms is a very tough challenge for
physicians in the medical field. The field of medical informatics and disease prediction has become increasingly
relevant to the community of data scientists in recent years. Data repute, multi-attribution, incompleteness, and a
close correlation will occur when manually collecting medical data, making it difficult to identify disease
symptoms. The extensive use of computer-based technologies in the health sector has resulted in the availability
of colossal health databases for researchers. Many surgical research studies are using these electronic records [6].
Nowadays, all hospitals maintain electronic health data for patients to find the symptoms and diagnose the disease.
2. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 4, December 2023: 1687-1694
1688
A health care system can be revolutionized by analyzing and interpreting the information recorded in electronic
health records, providing feedback, and implementing changes based on collected data [7].
Machine learning techniques have become increasingly significant in various fields in the past decade,
including the health care system and biomedical research [8]. In addition, correctly medicating a patient with
a large amount of data is an enormous task. Since the advent of the digital era and technological innovations,
several multidimensional patient data sets have been developed, including clinical data, hospital resource
information, and patient disease diagnosis information. A complex data set must be analyzed to extract valuable
insights [9]. The proposed work aims to develop heart and diabetic disease prediction models from a single
dataset incorporating machine learning algorithms, specifically supervised learning methods that employ the
ensemble method for more than one disease prediction in a single dataset.
Heart disease has been the leading cause of death worldwide in recent decades [10]. Since the heart
is a significant part of the human body, various factors cause heart disease, and people exhibit different
symptoms [11]. People consider the disease diabetes is a high sub challenge with deadly chronic disease [12].
Due to the increase in sugar in blood and fat, people affect their daily lives a lot.
In health care systems, machine learning techniques can support medical practitioners in promptly
and cost-effectively diagnosing various diseases from medical data [13]. Specifying probable disorders could
help patients conduct medical tests on targeted medicine. The patient might skip extensive medical tests due to
a lack of medical information, leading to severe health problems. In most cases, machine learning identifies
the patterns in massive datasets, which can involve human intelligence. Machine learning (ML) approaches
can aid in building prediction models that can handle and analyze vast volumes of complex medical data and
efficiently find the presence or absence of disease in a patient, which can help address this difficulty [14].
The main aim of recursion enhanced random forest with an improved linear model (RFRF-ILM) is to
find the key features. The prediction model produces better performance by combining the classification model.
This work compares the essential variables that suggest that coronary artery disease develops more frequently
as people age [15]. This paper describes disease progression and predicts disease outcomes. A proposed novel
approach [16] uses a model with various features and known classifier techniques to recognize the relevant
factors through a machine learning algorithm, leading to better predicting accuracy of cardiovascular disease.
Based on the prediction model's hybrid random forests with the linear model (HRFLM), it produced 88.7% of
the accuracy value. According to [17], it is tough to identify diabetic disease. A rigorous framework has been
developed by rejecting outliers and eliminating missing values. After selecting features, various machine
classifiers standardize the data. By estimating the area under receiver operator characteristic curve, the method
improved the outcome by weighting the classifier model and producing a better prediction.
The work [18] has proposed a hybrid technique by applying different machine learning classifiers to
diagnose cardiovascular disease-the various classifier helps this study to evaluate the performance metrics
using weka and keel tools. The primary intent of this work is to choose the best classifier by comparing each
classifier's accuracy value. This system [19] used a python tool to perform preprocessing with neighborhood
cleaning rule and feature engineering. AutoML, advanced extended gradient boost and advanced ensemble
bagging models are applied. Specialists perform this work to identify whether or not someone has
cardiovascular disease and diagnose the patient's condition. Studies suggested in [20] used four ML methods
to estimate diabetes risk, where bagging and boosting techniques were used to enhance robustness. Among the
existing algorithms, the Random Forest algorithm provides the most accurate results. The study employs [21]
the AdaBoost and bagging ensemble techniques using the J48 (c4.5) decision tree as a base learner and
standalone data mining methodology. The method applied was to classify the patients with diabetes using
diabetes risk indicators. In the study, the Adaboost ensemble method outperformed bagging and a standalone
J48 decision tree in terms of overall performance.
The [22] work aims to develop a model to predict diabetics. K-nearest neighbor (KNN) helps reduce
the processing time, and support vector machine (SVM) allocates a class for all the sample datasets. Selecting
features in this work helps build four classifiers. In addition, the researchers used four algorithms in this study
to determine the efficacy and accuracy of predicting whether or not people will have diabetes. According to
the study [23], A hierarchical ensemble model combines a decision tree and logistic regression classifiers
trained independently. The neural network joined with the previous model at the next level provides overall
better accuracy.
This work mainly focuses on diagnosing the risk of heart and diabetes diseases and encourages people
to have good health. The proposed study reveals that two chronic ailments, such as diabetes and heart disease,
can be predicted using the filter method chi-square and principal component analysis (PCA). Creating
classification techniques in diagnostics can help to avoid human error. The model utilized ensemble boosting
strategies such as Adaboost, Gradient boost, and Extreme Gradient boost to improve prediction accuracy.
Accordingly, the rest of the paper follows section 2 as a method. Section 3 presents a result and discussion.
Finally, section 4 covers the conclusion with future work.
3. Int J Artif Intell ISSN: 2252-8938
Machine learning approach for predicting heart and diabetes diseases using data-driven … (Usha Sekar)
1689
2. METHOD
This section describes datasets, feature selection, and the ensemble, such as Adaptive boost, Gradient
boost, and Extreme Gradient boost classifier. Figure 1 depicts the pipeline of disease prediction—the proposed
system structured into different phases. The phases contained in this work are data collection, data
preprocessing and selecting features, feature extraction, splitting the data, classifier models, evaluation metrics,
and comparison of ensemble classifier models.
Figure 1. Disease prediction pipeline
2.1. Dataset collection
The detection of disease using machine learning is a challenging task. Rather than model complexity,
interpretability, or computational burden, the doctor is most concerned with whether the model is reliable and
effective in predicting illness. Phase one of the work proposed was a collection of data from the University of
California Irvine (UCI) repository. The data collection has 12 attributes and one target attribute. The dataset
has continuous or categorical data types.
2.2. Data preprocessing
Data preprocessing is one of the processes considered the most crucial step in classification. The
process is to remove the inappropriate inexplicable, and continual features. The feature contains noisy data, a
format that cannot be used in the model and fills the missing values using the KNN Impute method. The range
of glucose values helps to determine the independent variable for diabetes.
2.3. Feature selection and extraction
In machine learning, feature selection techniques play a pivotal role in selecting the features [24]. The
selection method reduced the original feature set into several sub-features to reduce model complexity, improve
computational efficiency, and reduce generalization errors caused by irrelevant features. All the features are
ranked based on the chi-square method's score. The chi-square approach is a statistical procedure that shows
how well the observed frequency data values match the predicted frequency data values for independent
variables. Extracting only the best features is essential to maximize a machine learning classifier's performance
since irrelevant features can negatively affect performance. This phase involves identifying imperative features
within the dataset using principal component analysis.
2.4. Boosting classifier
In a boosting algorithm, the classifiers generate sequentially. Boosting [25] is designed to train a set
of classifiers consecutively and then combine them for prediction, where the later classifiers correct mistakes
made by the earlier ones. A boosting classifier turns weak classifiers into more robust models to enhance
accuracy. This work trained different boosting classifier techniques, i.e., Adaptive boosting, Extreme Gradient
boosting machine, and Gradient boosting machine, to predict the heart and diabetics diseases.
2.4.1. Adaptive boosting
The adaptive boosting algorithm has been widely used in classification [26]. In 1997, Freund and
Schapire proposed the adaptive boosting (AdaBoost) algorithm. This boosting technique helps weak learners
perform better using an ensemble approach. It improves the performance of the classifier when used alongside
different algorithm types. Adaptative boosting is exceptionally robust to noise and outliers in data.
4. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 4, December 2023: 1687-1694
1690
2.4.2. Gradient boosting
Gradient boosting (GB) has typically solved the regression and classification problems. The prediction
model constructs through a set of decision trees constructed stage-by-stage. Decision trees are frequently used
to accomplish Gradient boosting. The primary benefit of Gradient boosting is that it reduces the remaining
preceding time in each calculation. In terms of generalization, the GB performs well as an ensemble classifier.
The GB implements a regularisation term, regulating the model's complexity and preventing overfitting [27].
2.4.3. Extreme gradient boosting
The XGBoost algorithm is said to be a robust boosting algorithm. It's a more advanced variant of the
Gradient boosting method that was introduced to predict the errors or residuals from prior models, and then the
new model is blended with the old. This model controls overfitting, eliminating interference with outliers and
making the model more accurate and stable [28].
2.5. Classification accuracy
The proposed system used a boosting classifier such as Adaboost, XGBoost, and Gradient boost to
predict the disease. One of the essential performance indicators for categorization is accuracy. It says that the
percentage of the total sample is correctly classified. A classification report performs all the boosting classifiers
to produce accurate results. These classifiers combine to assess their performance based on classification
accuracy [29]. The accuracy measure assesses the ability of a model to predict the future. The given (1)
represents the formula for accurate classification.
FN
FP
TN
TP
TN
TP
Accuracy
+
+
+
+
=
(1)
Where,
− TP: The classifier predicted TRUE, which was the correct class in the case of true positive.
− TN: In the case of real negatives, the classifier predicted FALSE, and it was the suitable class.
− FP: When there are false positives (FP), the classifier predicts TRUE, and the correct class is FALSE.
− FN: Models predict false when they have diseases in the case of false negatives.
3. RESULTS AND DISCUSSION
The proposed system utilizes the same dataset to diagnose heart and diabetic ailments. The dataset
contains 12 independent and one dependent feature. The features included in this dataset are id, age, sex,
weight, height, gender, blood pressure (both systolic and diastolic), cholesterol, glucose, smoking, alcohol, and
physical activity. It has come to be known that the given dataset can also predict diabetic disease by finding
separate target variables. Algorithm 1 shows that the target feature of a person with diabetes has been evaluated
based on the range of glucose values. In the target attribute, the value is 1 for patients with diabetes, whereas
0 is the value for those who do not have diabetes. The given dataset contains two target attributes: diabetes and
heart target. The proposed work develops a prediction model that takes the symptoms from the user and predicts
the heart and diabetics diseases.
Algorithm 1. Finding target features for diabetic
Require: Input: Health Care dataset.
for ∀ glucose feature do
if data(value)>range
dia_target=1
else
dia_target=0
end if
end for
The proposed work uses the Correlation and Chi-square selection method to select features after data
preprocessing. A heat map represents the correlation between the target and other features and shows the
relationship between the features. Figure 2 and Figure 3 uses a heat map to highlight the correlation between
the dataset's attributes in both predictions. This heat map represents values as colors in a two-dimensional
representation. In one glance, it provides a quick visual summary of data. The viewer can easily comprehend
complex datasets using more elaborate heat maps. The feature method increases the classification accuracy.
According to the principle of feature importance, all the features have a score value that determines
the extent. Figure 4 describes the most significant feature for prediction based on the feature importance
5. Int J Artif Intell ISSN: 2252-8938
Machine learning approach for predicting heart and diabetes diseases using data-driven … (Usha Sekar)
1691
generated by the filter method. The subset of features used to predict both diseases is different in this work.
This works estimated the most significant features as sysbp, glucose, age, chol, ciger for heart disease and
api_hi, weight, api, age, and cholesterol for diabetic disease. The highest rank helps to select the features to
predict heart and diabetes disease based on the importance score.
Figure 2. Correlation features of diabetic’s disease
Figure 3. Correlation features of heart disease
6. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 4, December 2023: 1687-1694
1692
In the final phase, the methodology provides a better accuracy by boosting classifiers, including
AdaBoosting, Gradient Boosting, and XGBoosting. These results were obtained by combining selected
parameters (by using chi-square) with PCA to devise the best classifiers to diagnose the disease. PCA reduces
the dimensionality of the input and lowers computation complexity, and speeds up the training process by
applying the principle component analysis to the input features. In this study Figure 5 depicts the most
significant accuracy outcomes of heart and diabetics diseases.
Figure 4. Feature importance of heart and diabetics disease
Figure 5. Classification accuracy of heart and diabetic disease
The result section compares all boosting classifiers in both diseases' predictions. The classification
results are shown in Table 1. The table shows the classification results for heart disease and people with
diabetes using the boosting classifier and concluded that Extreme Gradient boosting (XGBoosting) performed
well and produced the highest accuracy value.
Table 1. Classification result
Classifier
Accuracy Score
Heart Disease Diabetic Disease
Adaptive Boosting 71.4 94.09
Gradient Boosting 70.9 95.2
Extreme Gradient Boosting 71.9 96.48
4. CONCLUSION
This study aims to develop a dependable and accurate predictive model for heart and diabetic disease.
It has used a single dataset for predicting heart and diabetic disease. The given dataset has both target variables
for heart and diabetic diseases. Filter method Chi-square selected the feature to diagnose disease. PCA was
7. Int J Artif Intell ISSN: 2252-8938
Machine learning approach for predicting heart and diabetes diseases using data-driven … (Usha Sekar)
1693
used to extract the features. The three ensemble boosting classifiers: are Adaboost, Gradient boost, and
XGBoost. Results showed that XGBoost provides a higher accuracy value than other boosting algorithms in
both disease predictions. Future work needs a better performance metric value by implementing a hybrid model
for both diseases.
ACKNOWLEDGEMENTS
I am pleased to thank my research supervisor Dr. S. Kanchana for her guidance and enthusiastic
encouragement of my research work. This article receives no financial support for its research, authorship,
and/or publication.
REFERENCES
[1] A. K. Yadav, R. Shukla, and T. R. Singh, “Machine learning in expert systems for disease diagnostics in human healthcare,”
Machine Learning, Big Data, and IoT for Medical Informatics, pp. 179–200, 2021, doi: 10.1016/B978-0-12-821777-1.00022-7.
[2] P. G. Shynu, V. G. Menon, R. L. Kumar, S. Kadry, and Y. Nam, “Blockchain-based secure healthcare application for diabetic-
cardio disease prediction in fog computing,” IEEE Access, vol. 9, pp. 45706–45720, 2021, doi: 10.1109/ACCESS.2021.3065440.
[3] K. Burse, V. P. S. Kirar, A. Burse, and R. Burse, “Various preprocessing methods for neural network based heart disease prediction,”
Advances in Intelligent Systems and Computing, vol. 851, pp. 55–65, 2019, doi: 10.1007/978-981-13-2414-7_6.
[4] P. Priyanga, V. V. Pattankar, and S. Sridevi, “A hybrid recurrent neural network-logistic chaos-based whale optimization framework
for heart disease prediction with electronic health records,” Computational Intelligence, vol. 37, no. 1, pp. 315–343, 2021,
doi: 10.1111/coin.12405.
[5] A. Elumalai, P. B. Maruthi, N. Gautam, S. Priyadharshini, and M. Suganthy, “RETRACTED ARTICLE: Optimal prediction of
attacks and arterial stiffness effects on heart disease by hybrid machine learning algorithm,” Journal of Ambient Intelligence and
Humanized Computing, vol. 13, p. 83, 2022, doi: 10.1007/s12652-020-02706-4.
[6] S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease
prediction,” BMC Medical Informatics and Decision Making, vol. 19, no. 1, 2019, doi: 10.1186/s12911-019-1004-8.
[7] A. M. Khedr, Z. Al Aghbari, A. Al Ali, and M. Eljamil, “An efficient association rule mining from distributed medical databases
for predicting heart diseases,” IEEE Access, vol. 9, pp. 15320–15333, 2021, doi: 10.1109/ACCESS.2021.3052799.
[8] A. K. Dubey, “Optimized hybrid learning for multi disease prediction enabled by lion with butterfly optimization algorithm,”
Sadhana - Academy Proceedings in Engineering Sciences, vol. 46, no. 2, 2021, doi: 10.1007/s12046-021-01574-8.
[9] R. Manne and S. C. Kantheti, “Application of artificial intelligence in healthcare: chances and challenges,” Current Journal of
Applied Science and Technology, pp. 78–89, 2021, doi: 10.9734/cjast/2021/v40i631320.
[10] R. C. Ripan et al., “A data-driven heart disease prediction model through k-means clustering-based anomaly detection,” SN
Computer Science, vol. 2, no. 2, 2021, doi: 10.1007/s42979-021-00518-7.
[11] R. Kumar and P. Rani, “Comparative analysis of decision support system for heart disease,” Advances in Mathematics: Scientific
Journal, vol. 9, no. 6, pp. 3349–3356, 2020, doi: 10.37418/amsj.9.6.15.
[12] U. Ahmed et al., “Prediction of diabetes empowered with fused machine learning,” IEEE Access, vol. 10, pp. 8529–8538, 2022,
doi: 10.1109/ACCESS.2022.3142097.
[13] L. Men, N. Ilk, X. Tang, and Y. Liu, “Multi-disease prediction using LSTM recurrent neural networks,” Expert Systems with
Applications, vol. 177, 2021, doi: 10.1016/j.eswa.2021.114905.
[14] M. N. Uddin and R. K. Halder, “An ensemble method based multilayer dynamic system to predict cardiovascular disease using
machine learning approach,” Informatics in Medicine Unlocked, vol. 24, 2021, doi: 10.1016/j.imu.2021.100584.
[15] C. Guo, J. Zhang, Y. Liu, Y. Xie, Z. Han, and J. Yu, “Recursion enhanced random forest with an improved linear model (RERF-
ILM) for heart disease detection on the internet of medical things platform,” IEEE Access, vol. 8, pp. 59247–59256, 2020,
doi: 10.1109/ACCESS.2020.2981159.
[16] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine learning techniques,” IEEE
Access, vol. 7, pp. 81542–81554, 2019, doi: 10.1109/ACCESS.2019.2923707.
[17] M. K. Hasan, M. A. Alam, D. Das, E. Hossain, and M. Hasan, “Diabetes prediction using ensembling of different machine learning
classifiers,” IEEE Access, vol. 8, pp. 76516–76531, 2020, doi: 10.1109/ACCESS.2020.2989857.
[18] F. Z. Abdeldjouad, M. Brahami, and N. Matta, “A hybrid approach for heart disease diagnosis and prediction using machine learning
techniques,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 12157 LNCS, pp. 299–306, 2020, doi: 10.1007/978-3-030-51517-1_26.
[19] M. Fayez and S. Kurnaz, “RETRACTED ARTICLE: Novel method for diagnosis diseases using advanced high-performance
machine learning system (Applied Nanoscience, (2023), 13),” Applied Nanoscience (Switzerland), vol. 13, no. 3, p. 1787, 2023,
doi: 10.1007/s13204-021-01990-6.
[20] N. Nai-Arun and R. Moungmai, “Comparison of classifiers for the risk of diabetes prediction,” Procedia Computer Science, vol.
69, pp. 132–142, 2015, doi: 10.1016/j.procs.2015.10.014.
[21] S. Perveen, M. Shahbaz, A. Guergachi, and K. Keshavjee, “Performance analysis of data mining classification techniques to predict
diabetes,” Procedia Computer Science, vol. 82, pp. 115–121, 2016, doi: 10.1016/j.procs.2016.04.016.
[22] M. Panda, D. P. Mishra, S. M. Patro, and S. R. Salkuti, “Prediction of diabetes disease using machine learning algorithms,” IAES
International Journal of Artificial Intelligence, vol. 11, no. 1, pp. 284–290, 2022, doi: 10.11591/ijai.v11.i1.pp284-290.
[23] M. Abedini, A. Bijari, and T. Banirostam, “Classification of Pima Indian diabetes dataset using ensemble of decision tree, logistic
regression and neural network,” Ijarcce, vol. 9, no. 7, pp. 1–4, 2020, doi: 10.17148/ijarcce.2020.9701.
[24] E. Nasarian et al., “Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection
integrated with balancing approach,” Pattern Recognition Letters, vol. 133, pp. 33–40, 2020, doi: 10.1016/j.patrec.2020.02.010.
[25] B. A. Tama and K. H. Rhee, “Tree-based classifier ensembles for early detection method of diabetes: an exploratory study,”
Artificial Intelligence Review, vol. 51, no. 3, pp. 355–370, 2019, doi: 10.1007/s10462-017-9565-3.
[26] Y. Wang and L. Feng, “An adaptive boosting algorithm based on weighted feature selection and category classification confidence,”
Applied Intelligence, vol. 51, no. 10, pp. 6837–6858, 2021, doi: 10.1007/s10489-020-02184-3.
8. ISSN: 2252-8938
Int J Artif Intell, Vol. 12, No. 4, December 2023: 1687-1694
1694
[27] P. Theerthagiri and J. Vidya, “Cardiovascular disease prediction using recursive feature elimination and gradient boosting
classification techniques,” Expert Systems, vol. 39, no. 9, 2022, doi: 10.1111/exsy.13064.
[28] H. Jiang et al., “Machine learning-based models to support decision-making in emergency department triage for patients with
suspected cardiovascular disease,” International Journal of Medical Informatics, vol. 145, 2021,
doi: 10.1016/j.ijmedinf.2020.104326.
[29] D. Ananey-Obiri and E. Sarku, “Predicting the presence of heart diseases using comparative data mining and machine learning
algorithms,” International Journal of Computer Applications, vol. 176, no. 11, pp. 17–21, 2020, doi: 10.5120/ijca2020920034.
BIOGRAPHIES OF AUTHORS
S. Usha received the B.Sc. & MCA. degree, respectively, from Madurai Kamaraj
Univeristy. She has worked as an Assistant Professor for 12 yrs in SRM Institute of Science
& Technology. Now, currently she is pursuing Ph.D as Full Time Research Scholar in
Department of Computer Science, SRM Institute of Science & Technology, Kattankulathur,
Chennai, India. Her research area includes Image Processing, Data Mining, Cloud
Computing, Machine Learning, and Deep Learning. She has published a paper in
International journal and presented paper in national and international conference. She can
be contacted at email: us3648@srmist.edu.in
Dr. S. Kanchana Working as an Assistant Professor in the Department of
Computer Science at SRM Institute of Science and Technology, Chennai. She obtained her
Ph.D degree from Bharathiar University. She has published more than 15 research papers in
National and International Journals and presented paper in a Conferences. She has received
Best poster Presentation Award in ISCA-2015. Her research interest includes Data Mining,
Machine Learning, IOT, and Cloud Computing. She can be contacted at email:
kanchans@srmist.edu.in