This paper presents an optimized stacking-based hybrid machine learning approach for predicting early-stage diabetes mellitus (DM) using the PIMA Indian diabetes (PID) dataset and early-stage diabetes risk prediction (ESDRP) dataset. The methodology involves handling missing values through mean imputation, balancing the dataset using the synthetic minority over-sampling technique (SMOTE), normalizing features, and employing a stratified train-test split. Logistic regression (LR), naïve Bayes (NB), AdaBoost with support vector machines (AdaBoost+SVM), artificial neural network (ANN), and k-nearest neighbors (k-NN) are used as base learners (level 0), while random forest (RF) meta-classifier serves as the level 1 model to combine their predictions. The proposed model achieves impressive accuracy rates of 99.7222% for the ESDRP dataset and 94.2085% for the PID dataset, surpassing existing literature by absolute differences ranging from 10.2085% to 16.7222%. The stacking-based hybrid model offers advantages for early-stage DM prediction by leveraging multiple base learners and a meta-classifier. SMOTE addresses class imbalance, while feature normalization ensures fair treatment of features during training. The findings suggest that the proposed approach holds promise for early-stage DM prediction, enabling timely interventions and preventive measures.
DIABETES PREDICTOR USING ENSEMBLE TECHNIQUEIRJET Journal
This document describes a study that developed an ensemble machine learning model to predict diabetes using the Pima Indian Diabetes dataset. The study used various machine learning algorithms like decision trees, random forest, SVM, and multilayer perceptron. It then proposed weighting and integrating the outputs of these models to improve diabetes prediction performance, where weights were calculated based on each model's AUC, F1 score, accuracy, and recall on the classification task. The models were evaluated using cross-validation on the Pima Indian Diabetes dataset under the same parameter settings. Previous literature that used machine learning techniques for diabetes prediction is also reviewed.
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
This document describes a study that uses machine learning techniques to predict heart disease and diabetes from medical data. The study collected data from a public repository and preprocessed it to handle missing values. Feature selection was performed using chi-square and principal component analysis to identify important features. Three boosting classifiers - Adaptive boosting, Gradient boosting, and Extreme Gradient boosting - were trained on the data and evaluated based on accuracy. The results showed that the boosting classifiers achieved accurate prediction for both heart disease and diabetes, with the highest accuracy reported for specific classifiers and diseases.
FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...IAEME Publication
The race for urbanization and thirst for high living status leads to unhealthy life.
As the result a rapid growth in number of diabetic patients in urban areas
approaching to its deadline. In this situation it become a prime necessity for
physicians and health workers to recognize accurate growth rate in number of
diabetic patients. Artificial Neural Network is used as one of the artificial intelligent
technique for forestalling growth rate of type II diabetic patients. Diabetes occurred
due to increased level of glucose in blood. In this paper, an intense survey is done for
the prediction of Type II diabetes using different Data Mining tools and Artificial
Neural Network techniques, is presented. This survey is aimed to recognize and
propose an effective technique for earlier prediction of the Type II diabetes. The data
mining techniques like C4.5 Classifier, Support Vector Machine and K-Nearest
Neighbour are compared for this work with Artificial Neural Network. As the results
Artificial Neural Network found with a great accuracy of 89%.
An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...Scott Faria
The document presents a study that uses machine learning approaches to predict diabetes for both typical and non-typical cases. Three machine learning algorithms (Bagging, Logistic Regression, Random Forest) were applied to a dataset of 340 patients with 26 features, and their accuracy was measured. Random Forest performed best with an accuracy of 90.29%, followed by Bagging at 89.12% and Logistic Regression at 83.24%.
IRJET - E-Health Chain and Anticipation of Future DiseaseIRJET Journal
The document proposes an E-Health Chain system that uses machine learning algorithms to predict future diseases and maintain electronic health records more efficiently than traditional paper-based systems. Key features include digital prescriptions that allow patients to purchase medicines using a unique ID, electronic health records that give doctors access to patient data from remote locations, emergency response features like ambulance dispatch, and medication reminders for patients. The system aims to help patients take preventive measures by predicting future diseases using algorithms trained on historical health data. A literature review covers previous research on disease prediction using decision trees, neural networks, and other machine learning methods applied to medical datasets.
An efficient stacking based NSGA-II approach for predicting type 2 diabetesIJECEIAES
Diabetes has been acknowledged as a well-known risk factor for renal and cardiovascular disorders, cardiac stroke and leads to a lot of morbidity in the society. Reducing the disease prevalence in the community will provide substantial benefits to the community and lessen the burden on the public health care system. So far, to detect the disease innumerable data mining approaches have been used. These days, incorporation of machine learning is conducive for the construction of a faster, accurate and reliable model. Several methods based on ensemble classifiers are being used by researchers for the prediction of diabetes. The proposed framework of prediction of diabetes mellitus employs an approach called stacking based ensemble using non-dominated sorting genetic algorithm (NSGA-II) scheme. The primary objective of the work is to develop a more accurate prediction model that reduces the lead time i.e., the time between the onset of diabetes and clinical diagnosis. Proposed NSGA-II stacking approach has been compared with Boosting, Bagging, Random Forest and Random Subspace method. The performance of Stacking approach has eclipsed the other conventional ensemble methods. It has been noted that k-nearest neighbors (KNN) gives a better performance over decision tree as a stacking combiner.
This document describes an advanced machine learning approach for predicting skin cancer. It discusses using machine learning algorithms like Naive Bayes, Decision Tree, Random Forest on a dataset to estimate disease risk and determine algorithm accuracy. The paper focuses on developing a system that integrates symptom and medical data using machine learning algorithms like K-means to provide accurate disease predictions.
DIAGNOSIS OF OBESITY LEVEL BASED ON BAGGING ENSEMBLE CLASSIFIER AND FEATURE S...ijaia
In the current era, the amount of data generated from various device sources and business transactions is
rising exponentially, and the current machine learning techniques are not feasible for handling the massive
volume of data. Two commonly adopted schemes exist to solve such issues scaling up the data mining
algorithms and data reduction. Scaling the data mining algorithms is not the best way, but data reduction
is feasible. There are two approaches to reducing datasets selecting an optimal subset of features from the
initial dataset or eliminating those that contribute less information. Overweight and obesity are increasing
worldwide, and forecasting future overweight or obesity could help intervention. Our primary objective is
to find the optimal subset of features to diagnose obesity. This article proposes adapting a bagging
algorithm based on filter-based feature selection to improve the prediction accuracy of obesity with a
minimal number of feature subsets. We utilized several machine learning algorithms for classifying the
obesity classes and several filter feature selection methods to maximize the classifier accuracy. Based on
the results of experiments, Pairwise Consistency and Pairwise Correlation techniques are shown to be
promising tools for feature selection in respect of the quality of obtained feature subset and computation
efficiency. Analyzing the results obtained from the original and modified datasets has improved the
classification accuracy and established a relationship between obesity/overweight and common risk factors
such as weight, age, and physical activity patterns.
DIABETES PREDICTOR USING ENSEMBLE TECHNIQUEIRJET Journal
This document describes a study that developed an ensemble machine learning model to predict diabetes using the Pima Indian Diabetes dataset. The study used various machine learning algorithms like decision trees, random forest, SVM, and multilayer perceptron. It then proposed weighting and integrating the outputs of these models to improve diabetes prediction performance, where weights were calculated based on each model's AUC, F1 score, accuracy, and recall on the classification task. The models were evaluated using cross-validation on the Pima Indian Diabetes dataset under the same parameter settings. Previous literature that used machine learning techniques for diabetes prediction is also reviewed.
Machine learning approach for predicting heart and diabetes diseases using da...IAESIJAI
This document describes a study that uses machine learning techniques to predict heart disease and diabetes from medical data. The study collected data from a public repository and preprocessed it to handle missing values. Feature selection was performed using chi-square and principal component analysis to identify important features. Three boosting classifiers - Adaptive boosting, Gradient boosting, and Extreme Gradient boosting - were trained on the data and evaluated based on accuracy. The results showed that the boosting classifiers achieved accurate prediction for both heart disease and diabetes, with the highest accuracy reported for specific classifiers and diseases.
FORESTALLING GROWTH RATE IN TYPE II DIABETIC PATIENTS USING DATA MINING AND A...IAEME Publication
The race for urbanization and thirst for high living status leads to unhealthy life.
As the result a rapid growth in number of diabetic patients in urban areas
approaching to its deadline. In this situation it become a prime necessity for
physicians and health workers to recognize accurate growth rate in number of
diabetic patients. Artificial Neural Network is used as one of the artificial intelligent
technique for forestalling growth rate of type II diabetic patients. Diabetes occurred
due to increased level of glucose in blood. In this paper, an intense survey is done for
the prediction of Type II diabetes using different Data Mining tools and Artificial
Neural Network techniques, is presented. This survey is aimed to recognize and
propose an effective technique for earlier prediction of the Type II diabetes. The data
mining techniques like C4.5 Classifier, Support Vector Machine and K-Nearest
Neighbour are compared for this work with Artificial Neural Network. As the results
Artificial Neural Network found with a great accuracy of 89%.
An Empirical Study On Diabetes Mellitus Prediction For Typical And Non-Typica...Scott Faria
The document presents a study that uses machine learning approaches to predict diabetes for both typical and non-typical cases. Three machine learning algorithms (Bagging, Logistic Regression, Random Forest) were applied to a dataset of 340 patients with 26 features, and their accuracy was measured. Random Forest performed best with an accuracy of 90.29%, followed by Bagging at 89.12% and Logistic Regression at 83.24%.
IRJET - E-Health Chain and Anticipation of Future DiseaseIRJET Journal
The document proposes an E-Health Chain system that uses machine learning algorithms to predict future diseases and maintain electronic health records more efficiently than traditional paper-based systems. Key features include digital prescriptions that allow patients to purchase medicines using a unique ID, electronic health records that give doctors access to patient data from remote locations, emergency response features like ambulance dispatch, and medication reminders for patients. The system aims to help patients take preventive measures by predicting future diseases using algorithms trained on historical health data. A literature review covers previous research on disease prediction using decision trees, neural networks, and other machine learning methods applied to medical datasets.
An efficient stacking based NSGA-II approach for predicting type 2 diabetesIJECEIAES
Diabetes has been acknowledged as a well-known risk factor for renal and cardiovascular disorders, cardiac stroke and leads to a lot of morbidity in the society. Reducing the disease prevalence in the community will provide substantial benefits to the community and lessen the burden on the public health care system. So far, to detect the disease innumerable data mining approaches have been used. These days, incorporation of machine learning is conducive for the construction of a faster, accurate and reliable model. Several methods based on ensemble classifiers are being used by researchers for the prediction of diabetes. The proposed framework of prediction of diabetes mellitus employs an approach called stacking based ensemble using non-dominated sorting genetic algorithm (NSGA-II) scheme. The primary objective of the work is to develop a more accurate prediction model that reduces the lead time i.e., the time between the onset of diabetes and clinical diagnosis. Proposed NSGA-II stacking approach has been compared with Boosting, Bagging, Random Forest and Random Subspace method. The performance of Stacking approach has eclipsed the other conventional ensemble methods. It has been noted that k-nearest neighbors (KNN) gives a better performance over decision tree as a stacking combiner.
This document describes an advanced machine learning approach for predicting skin cancer. It discusses using machine learning algorithms like Naive Bayes, Decision Tree, Random Forest on a dataset to estimate disease risk and determine algorithm accuracy. The paper focuses on developing a system that integrates symptom and medical data using machine learning algorithms like K-means to provide accurate disease predictions.
DIAGNOSIS OF OBESITY LEVEL BASED ON BAGGING ENSEMBLE CLASSIFIER AND FEATURE S...ijaia
In the current era, the amount of data generated from various device sources and business transactions is
rising exponentially, and the current machine learning techniques are not feasible for handling the massive
volume of data. Two commonly adopted schemes exist to solve such issues scaling up the data mining
algorithms and data reduction. Scaling the data mining algorithms is not the best way, but data reduction
is feasible. There are two approaches to reducing datasets selecting an optimal subset of features from the
initial dataset or eliminating those that contribute less information. Overweight and obesity are increasing
worldwide, and forecasting future overweight or obesity could help intervention. Our primary objective is
to find the optimal subset of features to diagnose obesity. This article proposes adapting a bagging
algorithm based on filter-based feature selection to improve the prediction accuracy of obesity with a
minimal number of feature subsets. We utilized several machine learning algorithms for classifying the
obesity classes and several filter feature selection methods to maximize the classifier accuracy. Based on
the results of experiments, Pairwise Consistency and Pairwise Correlation techniques are shown to be
promising tools for feature selection in respect of the quality of obtained feature subset and computation
efficiency. Analyzing the results obtained from the original and modified datasets has improved the
classification accuracy and established a relationship between obesity/overweight and common risk factors
such as weight, age, and physical activity patterns.
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATAcscpconf
Many studies uses different data mining techniques to analyze mass spectrometry data and
extract useful knowledge about biomarkers. These Biomarkers allow the medical experts to
determine whether an individual has a disease or not. Some of these studies have proposed
models that have obtained high accuracy. However, the black-box nature and complexity of the
proposed models have posed significant issues. Thus, to address this problem and build an
accurate model, we use a genetic algorithm for feature selection along with a rule-based
classifier, namely Genetic Rule-Based Classifier algorithm for Mass Spectra data (GRC-MS).
According to the literature, rule-based classifiers provide understandable rules, but not
accurate. In addition, genetic algorithms have achieved excellent results when used with
different classifiers for feature selection. Experiments are conducted on real dataset and the
proposed classifier GRC-MS achieves 99.7% accuracy. In addition, the generated rules are
more understandable than those of other classifier models
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEMEPublication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEME Publication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
Improving the performance of k nearest neighbor algorithm for the classificat...IAEME Publication
The document discusses improving the performance of the k-nearest neighbor (kNN) algorithm for classifying diabetes datasets with missing values. It first provides background on diabetes and challenges with missing data. It then describes various data preprocessing techniques used to handle missing values, including mean imputation. The document outlines the kNN classification algorithm and metrics like accuracy and error rate to evaluate performance. It applies these techniques to the Pima Indian diabetes dataset and finds that imputing missing values along with suitable preprocessing like normalization increases classification accuracy compared to ignoring missing values or imputation alone.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
Hybrid filtering methods for feature selection in high-dimensional cancer dataIJECEIAES
Statisticians in both academia and industry have encountered problems with high-dimensional data. The rapid feature increase has caused the feature count to outstrip the instance count. There are several established methods when selecting features from massive amounts of breast cancer data. Even so, overfitting continues to be a problem. The challenge of choosing important features with minimum loss in a different sample size is another area with room for development. As a result, the feature selection technique is crucial for dealing with high-dimensional data classification issues. This paper proposed a new architecture for high-dimensional breast cancer data using filtering techniques and a logistic regression model. Essential features are filtered out using a combination of hybrid chi–square and hybrid information gain (hybrid IG) with logistic regression as classifier. The results showed that hybrid IG performed the best for high-dimensional breast and prostate cancer data. The top 50 and 22 features outperformed the other configurations, with the highest classification accuracies of 86.96% and 82.61%, respectively, after integrating the hybrid information gain and logistic function (hybrid IG+LR) with a sample size of 75. In the future, multiclass classification of multidimensional medical data to be evaluated using data from a different domain.
Prediction of the risk of developing heart disease using logistic regressionIJECEIAES
Heart disease (HD) accounts for more deaths every year than other illnesses. World Health Organization (WHO) assessed 17.9 million life losses caused by heart disease in 2016, demonstrating 31% of all international life losses. Three-quarters of these life losses occur in low and middle-income nations. Machine learning (ML), due to advanced precision in pattern recognition and classification, demonstrates to be in effect in complementing decisionmaking and threat prediction from the huge number of HD data created by the healthcare sector. Thus, this study aims to develop a logistic regression model (LRM) for predicting the risk of getting HD in ten years. The study explores the different methodologies for improving the performance of base LRM for predicting whether a person gets HD after ten years or not. The result demonstrates the capability of LRM in predicting the risks of getting HD after ten years. The LRM achieves 97.35% accuracy with the recursive feature elimination and random under-sampling. This implies that the LRM can play an important role in precautionary methods to avoid the risk of HD.
IRJET- Diabetes Diagnosis using Machine Learning AlgorithmsIRJET Journal
This document presents research on using machine learning algorithms to diagnose diabetes. The researchers collected a dataset of 15,000 patient records from the National Institute of Diabetes and Digestive and Kidney Diseases. They analyzed the dataset and used machine learning algorithms like decision trees, naive Bayes, support vector machines, and k-nearest neighbors to build predictive models. The models were evaluated based on accuracy and other performance metrics. The naive Bayes classifier achieved the highest accuracy of 72% in predicting whether patients had diabetes. The research aims to develop a machine learning system that can predict diabetes early to help treat patients before the disease becomes critical.
"Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning...IRJET Journal
This document discusses various machine learning methods that can be used to predict obesity, including logistic regression, linear regression, decision trees, k-nearest neighbors (KNN), XGBoost, and neural networks. It provides examples of how each method works and some advantages and disadvantages. Specifically, it describes how logistic regression and linear regression can be applied to predict obesity risk based on characteristics like BMI, waist circumference, diet, activity level, and genetics. Decision trees and KNN are also explained as methods that can incorporate multiple risk factors to make predictions. The document emphasizes that no single model is perfect for predicting obesity due to its complex causes, and that combining methods may improve accuracy.
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
Two approaches to building models for prediction of the onset of Type diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Supervised were compared with decision trees and unsupervised of both types of classifiers. In this study, the system and the test most likely to confirm a diagnosis based on the pre-test probability computed from the patient's information including symptoms and the results of previous tests. If the patient's disease post-test probability is higher than the treatment threshold, a diagnostic decision will be made, and vice versa. Otherwise, the patient needs more tests to help make a decision. The system will then recommend the next optimal test and repeat the same process. In this thesis find out which approach is better on diabetes dataset in weka framework. Also use feature selection techniques which reduce the features and complexities of process
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESIAEME Publication
Diabetes mellitus is a common disease caused by a set of metabolic ailments
where the sugar stages over drawn-out period is very high. It touches diverse organs
of the human body which therefore harm a huge number of the body's system, in
precise the blood strains and nerves. Early prediction in such disease can be exact
and save human life. To achieve the goal, this research work mainly discovers
numerous factors associated to this disease using machine learning techniques.
Machine learning methods provide effectual outcome to extract knowledge by building
predicting models from diagnostic medical datasets together from the diabetic
patients. Quarrying knowledge from such data can be valuable to predict diabetic
patients. In this research, six popular used machine learning techniques, namely
Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), C4.5 Decision
Tree (DT), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) are
compared in order to get outstanding machine learning techniques to forecast diabetic
mellitus. Our new outcome shows that Support Vector Machine (SVM) achieved
higher accuracy compared to other machine learning techniques.
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
The document describes a predictive data mining algorithm for medical diagnosis that uses support vector machine (SVM) and random forest (RF) algorithms. It analyzes diabetes, kidney, and liver disease databases using these techniques. The proposed algorithm applies SVM and RF to the datasets and achieves high prediction accuracies of 99.35%, 99.37%, and 99.14% for diabetes, kidney, and liver diseases respectively. It also compares the performance of SVM and RF based on metrics like precision, recall, accuracy, and execution time.
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
This document summarizes a research article that proposes using big data analytics and data mining techniques like association rule mining and classification to analyze medical data from diabetes patients. The researchers apply an apriori algorithm within a MapReduce framework to discover associations between diseases and symptoms using a diabetes dataset from the UCI machine learning repository containing 50 attributes. The results of their proposed method are evaluated based on metrics like precision, accuracy, recall, and F-score.
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
This document discusses using big data analytics and data mining techniques like hierarchical decision tree networks, association rule mining, and multiclass outlier classification to analyze medical data from diabetes patients. The proposed approach first uses MapReduce to process a large diabetes dataset from the UCI machine learning repository containing 50 attributes. It then applies a hierarchical decision tree network that uses decision trees and a hierarchical attention network. Next, it implements an association rule mining algorithm called Apriori to discover relationships between diseases and symptoms. Finally, it performs multiclass outlier classification based on the predictions from association rule mining. The goal is to accurately classify diabetes patient data and predict insulin needs based on attributes like medication and past patient records.
Improving of classification accuracy of cyst and tumor using local polynomial...TELKOMNIKA JOURNAL
This document presents a study that aims to improve the classification accuracy of cysts and tumors using dental panoramic images. The study proposes using a nonparametric regression model based on local polynomial estimation. Previous studies that used support vector machine methods achieved classification accuracies of 76.67% and 63.3%. The proposed method applies image processing, dimension reduction using GEFA, and classifies images as cysts or tumors using generalized additive models with local polynomial estimators. This approach achieved a classification accuracy of 90.91%, higher than the support vector machine method. Validation using Press's Q statistical test also showed the nonparametric regression model gave a significant result, indicating it improves classification accuracy over previous methods.
Diagnosis of rheumatoid arthritis using an ensemble learning approachcsandit
Rheumatoid arthritis is one of the diseases that it
s cause is unknown yet; exploring the field of
medical data mining can be helpful in early diagnos
is and treatment of the disease. In this
study, a predictive model is suggested that diagnos
es rheumatoid arthritis. The rheumatoid
arthritis dataset was collected from 2,564 patients
referred to rheumatology clinic. For each
patient a record consists of several clinical and d
emographic features is saved. After data
analysis and pre-processing operations, three diffe
rent methods are combined to choose proper
features among all the features. Various data class
ification algorithms were applied on these
features. Among these algorithms Adaboost had the h
ighest precision. In this paper, we
proposed a new classification algorithm entitled CS
-Boost that employs Cuckoo search
algorithm for optimizing the performance of Adaboos
t algorithm. Experimental results show
that the CS-Boost algorithm enhance the accuracy of
Adaboost in predicting of Rheumatoid
Arthritis.
DIAGNOSIS OF RHEUMATOID ARTHRITIS USING AN ENSEMBLE LEARNING APPROACH cscpconf
Rheumatoid arthritis is one of the diseases that its cause is unknown yet; exploring the field of
medical data mining can be helpful in early diagnosis and treatment of the disease. In this
study, a predictive model is suggested that diagnoses rheumatoid arthritis. The rheumatoid
arthritis dataset was collected from 2,564 patients referred to rheumatology clinic. For each
patient a record consists of several clinical and demographic features is saved. After data
analysis and pre-processing operations, three different methods are combined to choose proper
features among all the features. Various data classification algorithms were applied on these
features. Among these algorithms Adaboost had the highest precision. In this paper, we
proposed a new classification algorithm entitled CS-Boost that employs Cuckoo search
algorithm for optimizing the performance of Adaboost algorithm. Experimental results show
that the CS-Boost algorithm enhance the accuracy of Adaboost in predicting of Rheumatoid
Arthritis.
Performance evaluation of random forest with feature selection methods in pre...IJECEIAES
Data mining is nothing but the process of viewing data in different angle and compiling it into appropriate information. Recent improvements in the area of data mining and machine learning have empowered the research in biomedical field to improve the condition of general health care. Since the wrong classification may lead to poor prediction, there is a need to perform the better classification which further improves the prediction rate of the medical datasets. When medical data mining is applied on the medical datasets the important and difficult challenges are the classification and prediction. In this proposed work we evaluate the PIMA Indian Diabtes data set of UCI repository using machine learning algorithm like Random Forest along with feature selection methods such as forward selection and backward elimination based on entropy evaluation method using percentage split as test option. The experiment was conducted using R studio platform and we achieved classification accuracy of 84.1%. From results we can say that Random Forest predicts diabetes better than other techniques with less number of attributes so that one can avoid least important test for identifying diabetes.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
More Related Content
Similar to Optimized stacking ensemble for early-stage diabetes mellitus prediction
GRC-MS: A GENETIC RULE-BASED CLASSIFIER MODEL FOR ANALYSIS OF MASS SPECTRA DATAcscpconf
Many studies uses different data mining techniques to analyze mass spectrometry data and
extract useful knowledge about biomarkers. These Biomarkers allow the medical experts to
determine whether an individual has a disease or not. Some of these studies have proposed
models that have obtained high accuracy. However, the black-box nature and complexity of the
proposed models have posed significant issues. Thus, to address this problem and build an
accurate model, we use a genetic algorithm for feature selection along with a rule-based
classifier, namely Genetic Rule-Based Classifier algorithm for Mass Spectra data (GRC-MS).
According to the literature, rule-based classifiers provide understandable rules, but not
accurate. In addition, genetic algorithms have achieved excellent results when used with
different classifiers for feature selection. Experiments are conducted on real dataset and the
proposed classifier GRC-MS achieves 99.7% accuracy. In addition, the generated rules are
more understandable than those of other classifier models
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEMEPublication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
A CONCEPTUAL APPROACH TO ENHANCE PREDICTION OF DIABETES USING ALTERNATE FEATU...IAEME Publication
Machine learning algorithms play a vital role in prediction of many diseases such as heart disease, diabetes, cancer, lung disease etc. The applicability of machine learning algorithms to healthcare domain relieves the burden of physicians as it is impractical to scan manually all the data collected over a period of time in order to arrive at some valuable information. Machine learning algorithms learn from the training dataset and they become capable of thinking like a human. Once the algorithm completes it learning with training dataset, it can automatically predict the target output label of any unseen data. In this work, predicting diabetes using machine learning algorithms has been taken up. A conceptual architecture has been proposed based on big data architecture.
Improving the performance of k nearest neighbor algorithm for the classificat...IAEME Publication
The document discusses improving the performance of the k-nearest neighbor (kNN) algorithm for classifying diabetes datasets with missing values. It first provides background on diabetes and challenges with missing data. It then describes various data preprocessing techniques used to handle missing values, including mean imputation. The document outlines the kNN classification algorithm and metrics like accuracy and error rate to evaluate performance. It applies these techniques to the Pima Indian diabetes dataset and finds that imputing missing values along with suitable preprocessing like normalization increases classification accuracy compared to ignoring missing values or imputation alone.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
Hybrid filtering methods for feature selection in high-dimensional cancer dataIJECEIAES
Statisticians in both academia and industry have encountered problems with high-dimensional data. The rapid feature increase has caused the feature count to outstrip the instance count. There are several established methods when selecting features from massive amounts of breast cancer data. Even so, overfitting continues to be a problem. The challenge of choosing important features with minimum loss in a different sample size is another area with room for development. As a result, the feature selection technique is crucial for dealing with high-dimensional data classification issues. This paper proposed a new architecture for high-dimensional breast cancer data using filtering techniques and a logistic regression model. Essential features are filtered out using a combination of hybrid chi–square and hybrid information gain (hybrid IG) with logistic regression as classifier. The results showed that hybrid IG performed the best for high-dimensional breast and prostate cancer data. The top 50 and 22 features outperformed the other configurations, with the highest classification accuracies of 86.96% and 82.61%, respectively, after integrating the hybrid information gain and logistic function (hybrid IG+LR) with a sample size of 75. In the future, multiclass classification of multidimensional medical data to be evaluated using data from a different domain.
Prediction of the risk of developing heart disease using logistic regressionIJECEIAES
Heart disease (HD) accounts for more deaths every year than other illnesses. World Health Organization (WHO) assessed 17.9 million life losses caused by heart disease in 2016, demonstrating 31% of all international life losses. Three-quarters of these life losses occur in low and middle-income nations. Machine learning (ML), due to advanced precision in pattern recognition and classification, demonstrates to be in effect in complementing decisionmaking and threat prediction from the huge number of HD data created by the healthcare sector. Thus, this study aims to develop a logistic regression model (LRM) for predicting the risk of getting HD in ten years. The study explores the different methodologies for improving the performance of base LRM for predicting whether a person gets HD after ten years or not. The result demonstrates the capability of LRM in predicting the risks of getting HD after ten years. The LRM achieves 97.35% accuracy with the recursive feature elimination and random under-sampling. This implies that the LRM can play an important role in precautionary methods to avoid the risk of HD.
IRJET- Diabetes Diagnosis using Machine Learning AlgorithmsIRJET Journal
This document presents research on using machine learning algorithms to diagnose diabetes. The researchers collected a dataset of 15,000 patient records from the National Institute of Diabetes and Digestive and Kidney Diseases. They analyzed the dataset and used machine learning algorithms like decision trees, naive Bayes, support vector machines, and k-nearest neighbors to build predictive models. The models were evaluated based on accuracy and other performance metrics. The naive Bayes classifier achieved the highest accuracy of 72% in predicting whether patients had diabetes. The research aims to develop a machine learning system that can predict diabetes early to help treat patients before the disease becomes critical.
"Predictive Modelling for Overweight and Obesity: Harnessing Machine Learning...IRJET Journal
This document discusses various machine learning methods that can be used to predict obesity, including logistic regression, linear regression, decision trees, k-nearest neighbors (KNN), XGBoost, and neural networks. It provides examples of how each method works and some advantages and disadvantages. Specifically, it describes how logistic regression and linear regression can be applied to predict obesity risk based on characteristics like BMI, waist circumference, diet, activity level, and genetics. Decision trees and KNN are also explained as methods that can incorporate multiple risk factors to make predictions. The document emphasizes that no single model is perfect for predicting obesity due to its complex causes, and that combining methods may improve accuracy.
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...IJARIIT
Two approaches to building models for prediction of the onset of Type diabetes mellitus in juvenile subjects were examined. A set of tests performed immediately before diagnosis was used to build classifiers to predict whether the subject would be diagnosed with juvenile diabetes. A modified training set consisting of differences between test results taken at different times was also used to build classifiers to predict whether a subject would be diagnosed with juvenile diabetes. Supervised were compared with decision trees and unsupervised of both types of classifiers. In this study, the system and the test most likely to confirm a diagnosis based on the pre-test probability computed from the patient's information including symptoms and the results of previous tests. If the patient's disease post-test probability is higher than the treatment threshold, a diagnostic decision will be made, and vice versa. Otherwise, the patient needs more tests to help make a decision. The system will then recommend the next optimal test and repeat the same process. In this thesis find out which approach is better on diabetes dataset in weka framework. Also use feature selection techniques which reduce the features and complexities of process
PREDICTION OF DIABETES MELLITUS USING MACHINE LEARNING TECHNIQUESIAEME Publication
Diabetes mellitus is a common disease caused by a set of metabolic ailments
where the sugar stages over drawn-out period is very high. It touches diverse organs
of the human body which therefore harm a huge number of the body's system, in
precise the blood strains and nerves. Early prediction in such disease can be exact
and save human life. To achieve the goal, this research work mainly discovers
numerous factors associated to this disease using machine learning techniques.
Machine learning methods provide effectual outcome to extract knowledge by building
predicting models from diagnostic medical datasets together from the diabetic
patients. Quarrying knowledge from such data can be valuable to predict diabetic
patients. In this research, six popular used machine learning techniques, namely
Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), C4.5 Decision
Tree (DT), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) are
compared in order to get outstanding machine learning techniques to forecast diabetic
mellitus. Our new outcome shows that Support Vector Machine (SVM) achieved
higher accuracy compared to other machine learning techniques.
The Healthcare industry contains big and complex data that may be required in order to discover fascinating pattern of diseases & makes effective decisions with the help of different machine learning techniques. Advanced data mining techniques are used to discover knowledge in database and for medical research. This paper has analyzed prediction systems for Diabetes, Kidney and Liver disease using more number of input attributes. The data mining classification techniques, namely Support Vector Machine(SVM) and Random Forest (RF) are analyzed on Diabetes, Kidney and Liver disease database. The performance of these techniques is compared, based on precision, recall, accuracy, f_measure as well as time. As a result of study the proposed algorithm is designed using SVM and RF algorithm and the experimental result shows the accuracy of 99.35%, 99.37 and 99.14 on diabetes, kidney and liver disease respectively.
AN ALGORITHM FOR PREDICTIVE DATA MINING APPROACH IN MEDICAL DIAGNOSISijcsit
The document describes a predictive data mining algorithm for medical diagnosis that uses support vector machine (SVM) and random forest (RF) algorithms. It analyzes diabetes, kidney, and liver disease databases using these techniques. The proposed algorithm applies SVM and RF to the datasets and achieves high prediction accuracies of 99.35%, 99.37%, and 99.14% for diabetes, kidney, and liver diseases respectively. It also compares the performance of SVM and RF based on metrics like precision, recall, accuracy, and execution time.
A comprehensive study on disease risk predictions in machine learning IJECEIAES
Over recent years, multiple disease risk prediction models have been developed. These models use various patient characteristics to estimate the probability of outcomes over a certain period of time and hold the potential to improve decision making and individualize care. Discovering hidden patterns and interactions from medical databases with growing evaluation of the disease prediction model has become crucial. It needs many trials in traditional clinical findings that could complicate disease prediction. A Comprehensive study on different strategies used to predict disease is conferred in this paper. Applying these techniques to healthcare data, has improvement of risk prediction models to find out the patients who would get benefit from disease management programs to reduce hospital readmission and healthcare cost, but the results of these endeavors have been shifted.
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
This document summarizes a research article that proposes using big data analytics and data mining techniques like association rule mining and classification to analyze medical data from diabetes patients. The researchers apply an apriori algorithm within a MapReduce framework to discover associations between diseases and symptoms using a diabetes dataset from the UCI machine learning repository containing 50 attributes. The results of their proposed method are evaluated based on metrics like precision, accuracy, recall, and F-score.
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
This document discusses using big data analytics and data mining techniques like hierarchical decision tree networks, association rule mining, and multiclass outlier classification to analyze medical data from diabetes patients. The proposed approach first uses MapReduce to process a large diabetes dataset from the UCI machine learning repository containing 50 attributes. It then applies a hierarchical decision tree network that uses decision trees and a hierarchical attention network. Next, it implements an association rule mining algorithm called Apriori to discover relationships between diseases and symptoms. Finally, it performs multiclass outlier classification based on the predictions from association rule mining. The goal is to accurately classify diabetes patient data and predict insulin needs based on attributes like medication and past patient records.
Improving of classification accuracy of cyst and tumor using local polynomial...TELKOMNIKA JOURNAL
This document presents a study that aims to improve the classification accuracy of cysts and tumors using dental panoramic images. The study proposes using a nonparametric regression model based on local polynomial estimation. Previous studies that used support vector machine methods achieved classification accuracies of 76.67% and 63.3%. The proposed method applies image processing, dimension reduction using GEFA, and classifies images as cysts or tumors using generalized additive models with local polynomial estimators. This approach achieved a classification accuracy of 90.91%, higher than the support vector machine method. Validation using Press's Q statistical test also showed the nonparametric regression model gave a significant result, indicating it improves classification accuracy over previous methods.
Diagnosis of rheumatoid arthritis using an ensemble learning approachcsandit
Rheumatoid arthritis is one of the diseases that it
s cause is unknown yet; exploring the field of
medical data mining can be helpful in early diagnos
is and treatment of the disease. In this
study, a predictive model is suggested that diagnos
es rheumatoid arthritis. The rheumatoid
arthritis dataset was collected from 2,564 patients
referred to rheumatology clinic. For each
patient a record consists of several clinical and d
emographic features is saved. After data
analysis and pre-processing operations, three diffe
rent methods are combined to choose proper
features among all the features. Various data class
ification algorithms were applied on these
features. Among these algorithms Adaboost had the h
ighest precision. In this paper, we
proposed a new classification algorithm entitled CS
-Boost that employs Cuckoo search
algorithm for optimizing the performance of Adaboos
t algorithm. Experimental results show
that the CS-Boost algorithm enhance the accuracy of
Adaboost in predicting of Rheumatoid
Arthritis.
DIAGNOSIS OF RHEUMATOID ARTHRITIS USING AN ENSEMBLE LEARNING APPROACH cscpconf
Rheumatoid arthritis is one of the diseases that its cause is unknown yet; exploring the field of
medical data mining can be helpful in early diagnosis and treatment of the disease. In this
study, a predictive model is suggested that diagnoses rheumatoid arthritis. The rheumatoid
arthritis dataset was collected from 2,564 patients referred to rheumatology clinic. For each
patient a record consists of several clinical and demographic features is saved. After data
analysis and pre-processing operations, three different methods are combined to choose proper
features among all the features. Various data classification algorithms were applied on these
features. Among these algorithms Adaboost had the highest precision. In this paper, we
proposed a new classification algorithm entitled CS-Boost that employs Cuckoo search
algorithm for optimizing the performance of Adaboost algorithm. Experimental results show
that the CS-Boost algorithm enhance the accuracy of Adaboost in predicting of Rheumatoid
Arthritis.
Performance evaluation of random forest with feature selection methods in pre...IJECEIAES
Data mining is nothing but the process of viewing data in different angle and compiling it into appropriate information. Recent improvements in the area of data mining and machine learning have empowered the research in biomedical field to improve the condition of general health care. Since the wrong classification may lead to poor prediction, there is a need to perform the better classification which further improves the prediction rate of the medical datasets. When medical data mining is applied on the medical datasets the important and difficult challenges are the classification and prediction. In this proposed work we evaluate the PIMA Indian Diabtes data set of UCI repository using machine learning algorithm like Random Forest along with feature selection methods such as forward selection and backward elimination based on entropy evaluation method using percentage split as test option. The experiment was conducted using R studio platform and we achieved classification accuracy of 84.1%. From results we can say that Random Forest predicts diabetes better than other techniques with less number of attributes so that one can avoid least important test for identifying diabetes.
Similar to Optimized stacking ensemble for early-stage diabetes mellitus prediction (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Optimized stacking ensemble for early-stage diabetes mellitus prediction
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 13, No. 6, December 2023, pp. 7048~7055
ISSN: 2088-8708, DOI: 10.11591/ijece.v13i6.pp7048-7055 7048
Journal homepage: http://ijece.iaescore.com
Optimized stacking ensemble for early-stage diabetes mellitus
prediction
Aman, Rajender Singh Chhillar
Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, India
Article Info ABSTRACT
Article history:
Received May 23, 2023
Revised Jul 9, 2023
Accepted Jul 17, 2023
This paper presents an optimized stacking-based hybrid machine learning
approach for predicting early-stage diabetes mellitus (DM) using the PIMA
Indian diabetes (PID) dataset and early-stage diabetes risk prediction
(ESDRP) dataset. The methodology involves handling missing values
through mean imputation, balancing the dataset using the synthetic minority
over-sampling technique (SMOTE), normalizing features, and employing a
stratified train-test split. Logistic regression (LR), naïve Bayes (NB),
AdaBoost with support vector machines (AdaBoost+SVM), artificial neural
network (ANN), and k-nearest neighbors (k-NN) are used as base learners
(level 0), while random forest (RF) meta-classifier serves as the level 1
model to combine their predictions. The proposed model achieves
impressive accuracy rates of 99.7222% for the ESDRP dataset and
94.2085% for the PID dataset, surpassing existing literature by absolute
differences ranging from 10.2085% to 16.7222%. The stacking-based hybrid
model offers advantages for early-stage DM prediction by leveraging
multiple base learners and a meta-classifier. SMOTE addresses class
imbalance, while feature normalization ensures fair treatment of features
during training. The findings suggest that the proposed approach holds
promise for early-stage DM prediction, enabling timely interventions and
preventive measures.
Keywords:
Artificial neural network
Diabetes mellitus
Feature normalization
Random forest
Stacking
This is an open access article under the CC BY-SA license.
Corresponding Author:
Aman
Department of Computer Science and Applications, Maharshi Dayanand University
Rohtak, India
Email: sei@live.in
1. INTRODUCTION
Diabetes Mellitus is a persistent metabolic ailment characterized by elevated blood glucose levels.
As per the International Diabetes Federation, approximately 463 million adults were affected by diabetes in
2019, with projections indicating a surge to 700 million individuals by 2045 [1]. In the Indian context, the
incidence of diabetes is substantial, with an approximate population of 77 million adult individuals affected
by the condition in the year 2019 [2]. Furthermore, there is a significant association between type 2 diabetes
mellitus (T2DM) and depression among individuals in India. In a study conducted in Hue City, Vietnam,
it was observed that approximately 23.2% of patients diagnosed with T2DM experienced symptoms of
depression [3].
Diabetes is often regarded as a systemic disease with far-reaching consequences, as it exerts
detrimental effects on vital organs. Individuals diagnosed with diabetes face an increased susceptibility to
various complications, including but not limited to miscarriage, renal failure, myocardial infarction, vision
impairment, and other chronic and potentially life-threatening conditions [4]. Therefore, it is essential to
diagnose diabetes mellitus (DM) faster to prevent or delay the onset of these complications. Machine learning
2. Int J Elec & Comp Eng ISSN: 2088-8708
Optimized stacking ensemble for early-stage diabetes mellitus prediction (Aman)
7049
(ML) algorithms (such as support vector machine (SVM) [5], k-nearest neighbors (k-NN) [6], random forest
(RF) [7], and artificial neural network (ANN) [8]) can help in the early diagnosis and accurate prediction of
DM by analyzing various health indicators such as plasma glucose concentration, serum insulin resistance,
and blood pressure [9]–[11]. Timely identification and precise prognostication of DM hold paramount
importance in facilitating efficacious interventions and optimal disease management. Leveraging the
capabilities of ML algorithms, which possess the ability to process vast volumes of data, enables the
detection of intricate patterns that might elude human experts [12]–[14]. However, there is a need for further
research and improved methodologies to enhance diagnostic accuracy and personalized treatment strategies.
To address this problem, this study proposes a novel stacking-based hybrid ML approach for the
prediction of early-stage DM. By integrating multiple base classifiers through a stacked classifier, the
proposed approach can capture complex relationships and patterns within the data, leading to improved
predictive performance. The use of ML algorithms offers a comprehensive understanding of DM and aids in
disease management, including the identification of individuals at risk of developing complications
[15]–[18]. This approach has the potential to improve health outcomes and enhance the quality of life for
individuals with DM [19], [20].
In recent years, there has been a growing emphasis on the early detection and prediction of DM,
prompting extensive research in this field. Doğru et al. [21] introduced a hybrid super ensemble learning
model that integrated multiple algorithms, yielding remarkable accuracy rates of 99.6%, 92%, and 98% for
the prediction of early-stage DM across diverse datasets. Krishnamoorthi et al. [22] devised an innovative
healthcare disease prediction framework for DM, leveraging RF and SVM models, attaining an accuracy of
86% in DM prediction. In the study [23], a stacked-based model was employed to predict the presence of DM
in individuals. Compared to other existing models such as LR, NB, and linear discriminant analysis (LDA),
the stacked-based model predicted blood sugar disease with 93.1% accuracy. This demonstrates the
effectiveness of the stacked ensemble method for enhancing DM prediction results. In addition, Chakravarthy
and Rajaguru [24] proposed a voting-based approach for the early diagnosis of DM. To enhance DM
prediction, they applied a mixture of three ML algorithms: LR, RF, and XGBoost classifiers. After evaluating
the efficacy of each algorithm separately, it was found that the ensemble method with weighted voting
provided the best results for binary classification in terms of accuracy, precision, and F1-score. Mushtaq
et al. [25] studied the effectiveness of voting-based models and hyperparameter-tuned ML algorithms for
predicting DM. The study focused on addressing the challenge of imbalanced datasets through the
implementation of methods such as Tomek and synthetic minority over-sampling technique (SMOTE). A
two-stage model selection approach was employed, where LR, SVM, k-NN, gradient boost, NB, and RF
algorithms were evaluated. RF emerged as the top-performing algorithm, achieving an accuracy of 80.7%
after dataset balancing using SMOTE. Subsequently, a voting algorithm was applied to combine three
superior models, resulting in an accuracy of 82.0%. These studies have demonstrated the effectiveness of
various ML algorithms in enhancing DM prediction results.
The specific aim of this work is to develop a novel stacking-based hybrid ML approach for the
prediction of early-stage DM. The integration of a stacked classifier in this research enables the combination
of multiple base classifiers, leveraging their collective decision-making capabilities to enhance prediction
accuracy. By employing the stacked classifier, the proposed approach can effectively capture complex
relationships and patterns within the data, leading to improved predictive performance for early-stage DM
detection. Through a comprehensive review of existing literature, we compare our proposed approach with
the currently available models and methodologies. Section 2 describes the detailed methodology of the
proposed model containing data pre-processing, handling missing values, balancing the dataset, normalizing
features, and hyperparameter tuning of base and meta classifiers. Section 3 presents the results of the
experiments conducted, including accuracy rates and performance metrics of the stacking-based hybrid
model. Finally, section 4 compares the results and performance of the proposed stacking-based hybrid model
with existing literature on early-stage DM prediction, and highlights the improvements achieved by the
proposed approach.
2. MATERIAL AND METHOD
The methodology employed in this study aims to develop a precise and reliable model for predicting
early-stage DM. The process, as illustrated in Figure 1, follows a systematic approach involving data
acquisition, data preprocessing, model formulation, model training, and model evaluation. The initial step
involves gathering relevant datasets for analysis. Subsequently, the collected data undergoes preprocessing to
ensure its quality and eliminate inconsistencies. Once the data is appropriately prepared, the model formulation
stage entails selecting suitable algorithms and techniques for constructing the predictive model. The model is
then trained using the preprocessed data, optimizing its parameters and refining its performance. Finally, the
model's predictive accuracy and performance are evaluated using appropriate evaluation metrics.
3. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 7048-7055
7050
To reproduce the results obtained in this study, all experiments were conducted on the Waikato
environment for knowledge analysis (WEKA) platform [26]. The experimental setup utilized a system
equipped with a 3.2 GHz Intel Core i5 CPU and 16 GB RAM. WEKA is a comprehensive and user-friendly
environment for data analysis and machine learning tasks, providing a diverse range of algorithms and
evaluation methods. By utilizing the WEKA platform, researchers can replicate the procedures described in
this study and achieve comparable outcomes.
Figure 1. Workflow of the proposed stacking-based hybrid model for early-stage DM prediction
2.1. Data pre-processing
The data pre-processing phase involves handling missing values through mean imputation,
addressing class imbalance through SMOTE, and normalizing the datasets. These steps prepare the datasets
for subsequent modelling and analysis, ensuring a reliable and standardized foundation for accurate early-
stage DM prediction. In this study, both the PID dataset [27] and the ESDRP dataset [28] were collected from
the OpenML website, a reliable platform for sharing datasets and machine learning experiments. The PID
dataset initially consists of 768 instances and 9 attributes, while the ESDRP dataset comprises 520 instances
and 17 attributes as shown in Table 1. To handle missing values in the datasets, the mean imputation
technique is employed. This can be implemented using the “ReplaceMissingValues” filter available in the
WEKA platform (WEKA.filters.unsupervised.attribute.ReplaceMissingValues). This filter replaces the
missing values with the mean value of the corresponding attribute, ensuring that the datasets are complete
and ready for analysis.
4. Int J Elec & Comp Eng ISSN: 2088-8708
Optimized stacking ensemble for early-stage diabetes mellitus prediction (Aman)
7051
There are 320 occurrences labelled “Positive” and 200 examples labelled “Negative” in the DM
prediction dataset. Before data preprocessing, there are 268 cases labelled “Positive” and 500 instances
labelled “Negative” in the PID dataset. The SMOTE algorithm is utilized to address class imbalancing issue
by generating synthetic examples of the minority class. This can be achieved using the “SMOTE” filter in
WEKA (WEKA.filters.supervised.instance.SMOTE), with specific parameters such as the number of
instances to generate (P), the number of nearest neighbors (K), and the class index (C). By applying this
filter, the class imbalance is alleviated, allowing for more balanced datasets. Following the application of
SMOTE, the number of cases labelled “Positive” rises to 320, equaling the number of instances labelled
“Negative” at 400. Similarly, the number of “Positive” examples in the PID dataset climbs to 536 and the
number of “Negative” instances increases to 500.
Normalization is performed on the preprocessed datasets to ensure consistency and comparability.
The “Normalize” filter in WEKA (WEKA.filters.unsupervised.attribute.Normalize) can be applied to scale all
attribute values between 0 and 1. This step eliminates any potential bias introduced by varying attribute
scales, enhancing the accuracy and interpretability of the subsequent modeling and analysis.
Table 1. Characteristics of diabetes datasets
PID dataset ESDRP dataset
Description This dataset was collected from the PID population in
Arizona, USA. It consists of data from females of
PID heritage and includes attributes such as age,
body mass index (BMI), number of pregnancies,
glucose levels, insulin levels
This dataset was collected from patients who were
suspected of having early-stage DM. The data
includes various biomedical attributes such as age,
BMI, glucose levels, and insulin levels
Size 768×9 520×17
Target variable Outcome (Positive/Negative) Class (Positive/Negative)
Missing values
and unbalanced
Yes Yes
2.2. Classifiers and hyperparameter tuning
This section encompasses the process of learner selection and hyperparameter tuning for the
stacking-based hybrid ML approach. In this approach, base learners are utilized as level 0 models within the
stacking ensemble, while hyperparameter tuning is employed to enhance their performance. A meta-classifier
is employed as the level 1 model in the stacking ensemble, which combines the predictions from the base
learners and generates the final prediction. This integration of diverse outputs from the base learners enables
improved overall predictive performance [29]. In this study, the meta-classifier used is RF. To optimize the
performance of the base learners, hyperparameter tuning was conducted. Table 2 provides an overview of the
hyperparameter settings for each classifier used in this study, specifically for the PID dataset and the ESDRP
dataset. The hyperparameters were fine-tuned using cross-validation with the CVParameterSelection
technique. This approach enabled the selection of the most suitable hyperparameter values for each classifier,
ensuring optimal performance within the stacking ensemble.
Table 2. Hyperparameters used for classifier tuning with CVParameterSelection
Classifier PID dataset ESDRP dataset
NB No hyperparameters No hyperparameters
LR R 0.01 (Ridge Parameter), M 4 (Max Iterations) R 0.01, M 4
RF Bagsizepercent 100, Batch size 100, Max Depth 100,
Num Features 8, Num Iterations 200
Bagsizepercent 100, Batch size 100, Max Depth 100,
Num Features 8, Num Iterations 200
k-NN F (Weight by 1-Distance), k 5 (Number of Neighbors) F (Weight by 1-Distance), k 1
ANN Number of hidden layers (a): 8, Batch size: 1000,
Epochs: 560
Number of hidden layers (a): 16, Batch size: 1000,
Epochs: 560
AdaBoost+SVM P 10 (Weight Threshold), l 1 (Number of Iterations) P 100, l 1
Naïve Bayes (NB) is a probabilistic classifier that assumes independence between features and
calculates the probability of a sample belonging to a specific class using Bayes' theorem. It is
computationally efficient and works well with high-dimensional data but may oversimplify complex
relationships between features [30]. To apply naïve Bayes in WEKA, researchers can utilize the naïve Bayes
classifier available in the platform's library.
Logistic regression (LR) is a linear classifier that models the connection between the input features
and the sample's likelihood of belonging to a certain class. It is interpretable, and supports categorical and
continuous data, but presupposes a linear relationship between features and the target variable's log odds
[31]. In WEKA, the Logistic classifier can be used to apply logistic regression.
5. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 7048-7055
7052
k-nearest neighbors (k-NN) classifier is a non-parametric algorithm that assigns a sample to the
predominant class based on the class labels of its k closest neighbors in the feature space. It is easy to
comprehend and does not require model training, but it can be sensitive to the selection of k and may be
subject to the curse of dimensionality for high-dimensional data [32]. In WEKA, the instance-based learning
with k-NN (IBk) classifier can be used for k-NN classification.
Artificial neural networks (ANNs) are a class of mathematical models that simulate the behavior of
biological neural networks. Composed of interconnected nodes called neurons, organized in layered
architectures, ANNs aim to unravel intricate relationships between input features and target variables. ANN
can capture non-linear relationships, but require careful architecture design, and training data, and can be
computationally intensive [33], [34]. In WEKA, researchers can utilize the MultilayerPerceptronClassifier to
implement ANNs.
AdaBoost with support vector machine is a boosting-based classifier that combines multiple weak
classifiers to create a strong classifier. In this case, SVMs are used as the weak classifiers. AdaBoost
iteratively adjusts the weights of misclassified samples to focus on difficult-to-classify instances. SVMs
provide robust classification boundaries but may be sensitive to the choice of kernel and hyperparameters
[35]. In WEKA, researchers can apply AdaBoost with SVM using the AdaBoostM1 classifier.
Random forest (RF) is a classifier that uses a bagging approach to aggregate the predictions of many
DTs. It utilizes bootstrapping and feature randomization to reduce overfitting and improve generalization. It
handles both categorical and continuous features and provides feature importance measures, but can be
computationally expensive for large datasets [36]. In WEKA, the random forest classifier can be used to
apply random forest classification.
3. RESULTS AND DISCUSSION
After the evaluation in WEKA, the performance metrics for each hyper-tuned model were obtained.
These metrics typically include accuracy, precision, recall, F-measure, mean absolute error (MAE), and area
under the curve (AUC). For the ESDRP dataset, the hyper-tuned models were individually optimized using
the CVParameterSelection technique in WEKA to maximize their performance before being combined in the
proposed stacking-based hybrid model. The evaluation results showcased promising outcomes, with each
model demonstrating its strengths and areas of improvement. Table 3 presents a comprehensive performance
analysis, highlighting the accuracy achieved by each hyper-tuned model. Notably, the proposed stacking-
based hybrid model stands out with an impressive accuracy of 99.7222%. This result surpasses the other
hyper-tuned models, including NB (92.5926%), LR (93.5185%), RF (99.0741%), k-NN (98.6111%), ANN
(96.2963%), and AdaBoost with SVM (93.9815%). These findings emphasize the effectiveness of the hyper-
tuning process and the potential of the stacking ensemble approach.
Moving on to the PID dataset, similar efforts were made to optimize the hyper-tuned models using
the CVParameterSelection technique in WEKA. The performance analysis in Table 4 provides valuable
insights into the performance of each model on this specific dataset. The proposed stacking-based hybrid
model demonstrates remarkable accuracy, achieving a score of 94.2085%. This accuracy outperforms other
hyper-tuned models such as ANN (87.4598%), NB (77.8135%), LR (82.9582%), RF (90.9968%), k-NN
(81.0289%), and AdaBoost with SVM (81.672%). These results underscore the importance of the
hyperparameter tuning process and its impact on the models' performance.
The outstanding performance of the proposed model can be attributed to its stacking-based hybrid
approach, which combines the predictions of the hyper-tuned base learners, effectively leveraging their
strengths. It is important to note that each base learner was hyper-tuned independently to optimize its
performance, ensuring that the model benefits from the best possible configurations for each classifier.
The use of the RF meta-classifier in the stacking ensemble further enhances the model's predictive
capability.
Table 3. Comparative analysis of models performance on the ESDRP dataset
Model Accuracy (%) MAE Precision Recall F-measure AUC
ANN 96.2963 0.0418 0.963 0.963 0.963 0.974
NB 92.5926 0.0867 0.927 0.937 0.926 0.967
LR 93.5185 0.0803 0.936 0.935 0.935 0.958
RF 99.0741 0.0348 0.991 0.991 0.991 0.999
k-NN 98.6111 0.015 0.986 0.986 0.986 0.999
AdaBoost + SVM 93.9815 0.0602 0.940 0.940 0.940 0.937
Proposed 99.7222 0.002 0.997 0.997 0.997 1
6. Int J Elec & Comp Eng ISSN: 2088-8708
Optimized stacking ensemble for early-stage diabetes mellitus prediction (Aman)
7053
Table 4. Comparative analysis of models performance on the PID dataset
Model Accuracy (%) MAE Precision Recall F-measure AUC
ANN 87.4598 0.1438 0.875 0.875 0.875 0.912
NB 77.8135 0.2717 0.780 0.778 0.778 0.841
LR 82.9582 0.2526 0.830 0.830 0.830 0.868
RF 90.9968 0.1534 0.910 0.910 0.910 0.965
k-NN 81.0289 0.241 0.817 0.810 0.808 0.884
AdaBoost + SVM 81.672 0.1833 0.817 0.817 0.817 0.816
Proposed 94.2085 0.0967 0.945 0.942 0.943 0.984
4. COMPARISON WITH THE EXISTING LITERATURE
The comparison with existing literature reveals a diverse range of models and methodologies
proposed for the prediction of early-stage DM. However, the proposed model, which utilizes a stacking
ensemble approach, demonstrates superior performance in DM prediction when compared to the existing
models. This highlights the effectiveness and potential of the novel approach in enhancing the accuracy and
reliability of early-stage DM prediction. According to Table 5, the proposed model achieves an accuracy of
94.2085% on the dataset and 99.7222% on the ESDRP dataset. This reflects a substantial improvement over
the existing models, with the proposed model outperforming them by absolute differences ranging from
10.2085% to 16.7222% in terms of accuracy.
Table 5. Comparison of models for early-stage DM prediction in existing literature
Reference Model Methodology Accuracy in % (Dataset)
Proposed Level 0 (ANN, NB, LR, AdaBoost
+ SVM, k-NN), Level 1(RF)
Stacking ensemble 94.2085% (PID dataset), and 99.7222%
(ESDRP dataset)
[21] Level 0 (LR, DT, RF, gradient
boosting), Level 1 (SVM)
Stacked ensemble 92% (PID dataset), 99.6% (ESDRP dataset),
and 98% (Diabetes 130-US hospitals dataset)
[22] LR, RF, SVM, and KNN with grid
search
Parameter optimization
using grid search
83% (PID dataset)
[23] Level 0 (RF, DT, gradient boost,
Gaussian NB, SVM
KNN), Level 1 (LR)
Stacking ensemble 93% (PID dataset)
[24] LR, RF, extreme gradient boosting Voting ensemble 92.21% (PID dataset)
[25] LR, SVM, k-NN, gradient boost,
NB, and RF with majority voting
Voting ensemble 82% (PID dataset)
5. CONCLUSION
In conclusion, this research work introduces a novel stacking-based hybrid machine learning
approach for accurately predicting early-stage DM. The approach combines multiple base learners at level 0
and utilizes an RF meta-classifier at level 1 to effectively aggregate their predictions. The obtained high
accuracy rates on the early-stage DM and PID datasets highlight the effectiveness of the proposed model in
predicting early-stage DM. The proposed approach demonstrates significant improvements over existing
literature, outperforming them by absolute differences ranging from 10.2085% to 16.7222% in terms of
accuracy. This substantial enhancement in accuracy showcases the superiority of the proposed model. The
findings of this study have several implications. Firstly, the high accuracy rates achieved by the proposed
model indicate its potential as a valuable tool in aiding early intervention and prevention strategies for DM.
Timely identification of individuals at risk can enable proactive healthcare management and improve patient
outcomes. Secondly, the stacking-based hybrid approach proves to be an effective methodology for
integrating the predictions of multiple base learners, leveraging their strengths and enhancing overall
performance. Future research endeavors should focus on further validating the generalizability of the
proposed model on larger and more diverse datasets. Additionally, exploring feature importance analysis
techniques can enhance the interpretability of the model, enabling better insights into the factors influencing
early-stage DM.
REFERENCES
[1] F. Alanazi and V. Gay, “e-health for diabetes self-management in Saudi Arabia: barriers and solutions,” Proceedings of the
36th International Business Information Management Association Conference (IBIMA), pp. 4–5, Feb. 2020, doi:
10.2196/preprints.18085.
[2] M. S. Paulo, N. M. Abdo, R. Bettencourt-Silva, and R. H. Al-Rifai, “Gestational diabetes mellitus in europe: a systematic review
and meta-analysis of prevalence studies,” Frontiers in Endocrinology, vol. 12, Dec. 2021, doi: 10.3389/fendo.2021.691033.
[3] T. L. Haraldsdottir et al., “Diabetes mellitus prevalence in tuberculosis patients and the background population in Guinea-Bissau:
a disease burden study from the capital Bissau,” Transactions of The Royal Society of Tropical Medicine and Hygiene, vol. 109,
7. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 13, No. 6, December 2023: 7048-7055
7054
no. 6, pp. 400–407, Jun. 2015, doi: 10.1093/trstmh/trv030.
[4] P. Fasching, “The new ways of preventing and treating diabetes mellitus,” in Practical Issues in Geriatrics, Springer International
Publishing, 2019, pp. 71–81.
[5] A. M. Elshewey, M. Y. Shams, N. El-Rashidy, A. M. Elhady, S. M. Shohieb, and Z. Tarek, “Bayesian optimization with support
vector machine model for parkinson disease classification,” Sensors, vol. 23, no. 4, Feb. 2023, doi: 10.3390/s23042085.
[6] S. Lahmiri, “Integrating convolutional neural networks, kNN, and Bayesian optimization for efficient diagnosis of Alzheimer’s
disease in magnetic resonance images,” Biomedical Signal Processing and Control, vol. 80, Feb. 2023, doi:
10.1016/j.bspc.2022.104375.
[7] Q. Abbas, A. Hussain, and A. R. Baig, “CAD-ALZ: a blockwise fine-tuning strategy on convolutional model and random forest
classifier for recognition of multistage alzheimer’s disease,” Diagnostics, vol. 13, no. 1, Jan. 2023, doi:
10.3390/diagnostics13010167.
[8] R. Sawhney, A. Malik, S. Sharma, and V. Narayan, “A comparative assessment of artificial intelligence models used for early
prediction and evaluation of chronic kidney disease,” Decision Analytics Journal, vol. 6, Mar. 2023, doi:
10.1016/j.dajour.2023.100169.
[9] G. Annuzzi et al., “Impact of nutritional factors in blood glucose prediction in type 1 diabetes through machine learning,” IEEE
Access, vol. 11, pp. 17104–17115, 2023, doi: 10.1109/ACCESS.2023.3244712.
[10] K. Liu et al., “Machine learning models for blood glucose prediction in patients with Diabetes Mellitus: a systematic review and
network meta-analysis,” SSRN Electronic Journal, 2023, doi: 10.2139/ssrn.4401684.
[11] D. Srivastava, H. Pandey, and A. K. Agarwal, “Complex predictive analysis for health care: a comprehensive review,” Bulletin of
Electrical Engineering and Informatics (BEEI), vol. 12, no. 1, pp. 521–531, Feb. 2023, doi: 10.11591/eei.v12i1.4373.
[12] H. Pallathadka, M. Mustafa, D. T. Sanchez, G. S. Sajja, S. Gour, and M. Naved, “Impact of machine learning on management,
healthcare and agriculture,” Materials Today: Proceedings, vol. 80, pp. 2803–2806, 2023, doi: 10.1016/j.matpr.2021.07.042.
[13] S. Yang, P. Varghese, E. Stephenson, K. Tu, and J. Gronsbell, “Machine learning approaches for electronic health records
phenotyping: a methodical review,” Journal of the American Medical Informatics Association, vol. 30, no. 2, pp. 367–381,
Jan. 2023, doi: 10.1093/jamia/ocac216.
[14] T. A. Assegie, T. Karpagam, S. Subramanian, S. M. Janakiraman, J. Arumugam, and D. O. Ahmed, “Prediction of patient survival
from heart failure using a cox-based model,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
vol. 27, no. 3, pp. 1550–1556, Sep. 2022, doi: 10.11591/ijeecs.v27.i3.pp1550-1556.
[15] Aman and R. S. Chhillar, “Analyzing predictive algorithms in data mining for cardiovascular disease using WEKA tool,”
International Journal of Advanced Computer Science and Applications, vol. 12, no. 8, pp. 144–150, 2021, doi:
10.14569/IJACSA.2021.0120817.
[16] A. Darolia and R. S. Chhillar, “Analyzing three predictive algorithms for diabetes mellitus against the pima indians dataset,” ECS
Transactions, vol. 107, no. 1, pp. 2697–2704, Apr. 2022, doi: 10.1149/10701.2697ecst.
[17] M. Atif, F. Anwer, F. Talib, R. Alam, and F. Masood, “Analysis of machine learning classifiers for predicting diabetes mellitus in
the preliminary stage,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 12, no. 3, pp. 1302–1311, Sep. 2023,
doi: 10.11591/ijai.v12.i3.pp1302-1311.
[18] K. Yothapakdee, S. Charoenkhum, and T. Boonnuk, “Improving the efficiency of machine learning models for predicting blood
glucose levels and diabetes risk,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 27, no. 1,
Jul. 2022, doi: 10.11591/ijeecs.v27.i1.pp555-562.
[19] G. Mathur, A. Pandey, and S. Goyal, “Applications of machine learning in healthcare,” in The Internet of Medical Things (IoMT)
and Telemedicine Frameworks and Applications, 2022, pp. 177–195.
[20] S. A. Abdulkareem, H. Y. Radhi, Y. A. Fadil, and H. Mahdi, “Soft computing techniques for early diabetes prediction,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 25, no. 2, pp. 1167–1176, Feb. 2022, doi:
10.11591/ijeecs.v25.i2.pp1167-1176.
[21] A. Doğru, S. Buyrukoğlu, and M. Arı, “A hybrid super ensemble learning model for the early-stage prediction of diabetes risk,”
Medical , Biological Engineering, and Computing, vol. 61, no. 3, pp. 785–797, Mar. 2023, doi: 10.1007/s11517-022-02749-z.
[22] R. Krishnamoorthi et al., “A novel diabetes healthcare disease prediction framework using machine learning techniques,” Journal
of Healthcare Engineering, vol. 2022, pp. 1–10, Jan. 2022, doi: 10.1155/2022/1684017.
[23] S. R., S. M., M. K. Hasan, R. A. Saeed, S. A. Alsuhibany, and S. Abdel-Khalek, “An empirical model to predict the diabetic
positive using stacked ensemble approach,” Frontiers in Public Health, vol. 9, Jan. 2022, doi: 10.3389/fpubh.2021.792124.
[24] S. R. S. Chakravarthy and H. Rajaguru, “Ensemble-based weighted voting approach for the early diagnosis of diabetes mellitus,”
in Lecture Notes on Data Engineering and Communications Technologies, vol. 93, Springer Nature Singapore, 2022,
pp. 451–460.
[25] Z. Mushtaq, M. F. Ramzan, S. Ali, S. Baseer, A. Samad, and M. Husnain, “Voting classification-based diabetes mellitus
prediction using hypertuned machine-learning techniques,” Mobile Information Systems, vol. 2022, pp. 1–16, Mar. 2022, doi:
10.1155/2022/6521532.
[26] Weka Wiki, “Weka 3 - data mining with open source machine learning software in Java.” University of Waikato,
https://www.cs.waikato.ac.nz/ml/weka/ (accessed May 22, 2023).
[27] D. Carrion, “Pima-Indians-Diabetes,” OpenML, 2022. Accessed: May 22, 2023. [Online], Available:
https://www.openml.org/search?type=data&status=active&id=43582&sort=runs
[28] D. Carrion, “Early-stage-diabetes-risk-prediction-dataset,” OpenML, 2022. Accessed May 22, 2023. [Online], Available:
https://www.openml.org/search?type=data&status=active&id=43643&sort=runs
[29] T. Yoon and D. Kang, “Multi-modal stacking ensemble for the diagnosis of cardiovascular diseases,” Journal of Personalized
Medicine, vol. 13, no. 2, Feb. 2023, doi: 10.3390/jpm13020373.
[30] R. Rajni and A. Amandeep, “RB-Bayes algorithm for the prediction of diabetic in Pima Indian dataset,” International Journal of
Electrical and Computer Engineering (IJECE), vol. 9, no. 6, pp. 4866–4872, Dec. 2019, doi: 10.11591/ijece.v9i6.pp4866-4872.
[31] Z. Li, S. Pang, H. Qu, and W. Lian, “Logistic regression prediction models and key influencing factors analysis of diabetes based
on algorithm design,” Neural Computing and Applications, Mar. 2023, doi: 10.1007/s00521-023-08447-7.
[32] S. Avinash, H. N. N. Kumar, M. S. G. Prasad, R. M. Naik, and G. Parveen, “Early detection of malignant tumor in lungs using
feed-forward neural network and k-nearest neighbor classifier,” SN Computer Science, vol. 4, no. 2, Feb. 2023, doi:
10.1007/s42979-022-01606-y.
[33] S. P. Shankar, M. S. Supriya, D. Varadam, M. Kumar, H. Gupta, and R. Saha, “A comprehensive study on algorithms and
applications of artificial intelligence in diagnosis and prognosis: AI for healthcare,” in Digital Twins and Healthcare: Trends,
Techniques, and Challenges, 2022, pp. 35–54.
8. Int J Elec & Comp Eng ISSN: 2088-8708
Optimized stacking ensemble for early-stage diabetes mellitus prediction (Aman)
7055
[34] G. Alfian, Y. M. Saputra, L. Subekti, A. D. Rahmawati, F. T. D. Atmaji, and J. Rhee, “Utilizing deep neural network for web-
based blood glucose level prediction system,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
vol. 30, no. 3, pp. 1829–1837, Jun. 2023, doi: 10.11591/ijeecs.v30.i3.pp1829-1837.
[35] A. Belghit, M. Lazri, F. Ouallouche, K. Labadi, and S. Ameur, “Optimization of one versus all-SVM using AdaBoost algorithm
for rainfall classification and estimation from multispectral MSG data,” Advances in Space Research, vol. 71, no. 1, pp. 946–963,
Jan. 2023, doi: 10.1016/j.asr.2022.08.075.
[36] R. R. K. AL-Taie, B. J. Saleh, A. Y. Falih Saedi, and L. A. Salman, “Analysis of WEKA data mining algorithms Bayes net,
random forest, MLP and SMO for heart disease prediction system: a case study in Iraq,” International Journal of Electrical and
Computer Engineering (IJECE), vol. 11, no. 6, pp. 5229–5239, Dec. 2021, doi: 10.11591/ijece.v11i6.pp5229-5239.
BIOGRAPHIES OF AUTHORS
Aman is a researcher in the field of Data Mining at the Department of Computer
Science and Applications, Maharshi Dayanand University, Rohtak, India. His research focuses
on data science, healthcare, metaheuristic algorithms, and cyber security. He has published his
work on prestigious platforms such as Scopus, web of science (WoS), and respected
conferences. Additionally, he has authored a book on MATLAB. He is a member
of the International Association of Engineers (IAENG). He can be contacted at email:
sei@live.in.
Rajender Singh Chhillar is a professor and former head of the Department of
Computer Science at Maharshi Dayanand University, Rohtak, India. He obtained his Ph.D. in
Computer Science from Maharshi Dayanand University, Rohtak, India, and a master's degree
from Kurukshetra University, Kurukshetra, India. Additionally, he holds a Master of Business
Administration (MBA) degree from Sikkim Manipal University, Sikkim, India. His research
interests include software engineering, software testing, software metrics, web metrics, bio
metrics, data mining, computer networking, and software design. He has published over 100
journal papers and 65 conference papers in recent years. Rajender has also written two books
in the fields of software engineering and information technology. He is a director of the CMAI
Asia Association in New Delhi and a senior member of the IACSIT in Singapore.
Furthermore, he is a member of the Computer Society of India. He can be contacted at email:
chhillar02@gmail.com.