Leasing vehicles are a company engaged in the field of vehicle loans. Purchase by way of credit becomes a mainstay because it can attract potential customers to generate more profit. But if there is a mistake in approving a customer candidate, the risk of stalled credit payments can happen. To minimize the risk, it can be applied the certain data mining technique to predict the future behavior of the customers. In this study, it is explored in some data mining techniques such as C4.5 and Naive Bayes for this purpose. The customer attributes used in this study are: salary, age, marital status, other installments and worthiness. The experiments are performed by using the Weka software. Based on evaluation criteria, i.e. accuracy, C4.5 algorithm outperforms compared to Naive Bayes. The percentage split experiment scenarios provide the precision value of 89.16% and the accuracy value of 83.33% wheres the cross validation experiment scenarios give the higher accuracy values of all used k-fold. The C4.5 experiment results also confirm that the most influential instant data attribute in this research is the salary.
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...IRJET Journal
This document discusses using data mining techniques to predict heart disease outcomes. It analyzes clinical data on cardiovascular diseases using predictive algorithms like naive Bayes, k-means clustering, and decision trees. The study aims to build models that can help identify relationships in medical data and predict future health system details for heart conditions. It compares the performance of different predictive data mining methods on a cardiovascular disease database. The top-performing technique was found to be naive Bayes classification. The models seek to help doctors better understand heart disease risk factors and trends to improve diagnosis.
This document discusses using data mining classifiers and attribute reduction techniques to predict chronic kidney disease (CKD) more accurately and efficiently. It first provides background on CKD and the need for early detection. It then discusses data mining, classification algorithms, attribute selection filters and wrappers. The document analyzes several studies that predicted CKD using techniques like decision trees, SVM and Naive Bayes. It describes the dataset used from the UCI repository and evaluation metrics. The results section compares J48, Decision Tree and IBK classifiers with and without attribute reduction using CfsSubsetEval, ClassifierSubsetEval and WrapperSubsetEval. Attribute reduction improved accuracy, especially for IBK which achieved 100% accuracy with 72% fewer attributes.
IRJET - Chronic Kidney Disease Prediction using Data Mining and Machine LearningIRJET Journal
This document discusses predicting chronic kidney disease through data mining and machine learning techniques. It examines using KNN, SVM, and ensemble models on a dataset of 400 patient records with 24 attributes related to chronic kidney disease. For data mining, SVM with an RBF kernel achieved 87% accuracy. For machine learning, KNN and SVM ensemble achieved over 92% accuracy. The document reviews several related studies applying classification algorithms like decision trees, neural networks, and Naive Bayes to chronic kidney disease prediction and their limitations. It then describes the KNN algorithm and its application to this problem in more detail.
Psdot 14 using data mining techniques in heartZTech Proje
The document proposes applying data mining techniques to identify suitable heart disease treatments. It discusses using single and hybrid data mining on diagnosis and treatment data to determine if models can reliably predict treatments as they do diagnoses. The proposed system would apply various data mining algorithms to both diagnosis and treatment data to investigate if hybrid models improve treatment prediction accuracy over single techniques.
Early Identification of Diseases Based on Responsible Attribute using Data Mi...IRJET Journal
This document describes a proposed method for early identification of diseases using data mining and classification techniques. It begins with an introduction to classification and discusses how it is commonly used in healthcare for tasks like predicting patient risk levels. It then reviews related literature applying classification methods to diseases like heart disease and diabetes. The document outlines the problem of selecting the best classification technique for a given healthcare dataset. It proposes an architecture and method for disease prediction that assigns recommended values to attributes and classifies unknown data based on calculating totals. The method is experimentally analyzed using a heart disease dataset, and its accuracy is compared to Bayesian classification. In conclusion, the proposed method seeks to reduce attributes and complexity while accurately classifying patient data for early disease identification.
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
This document proposes a new framework for predicting heart disease using machine learning techniques. It first discusses techniques like artificial neural networks and Naive Bayes classification that can be used for classification. It also discusses feature selection techniques like principal component analysis and information gain that can reduce the number of attributes before classification. The proposed framework would take a dataset, apply feature selection to reduce attributes, then use two classification algorithms (ANN and Naive Bayes) on the reduced dataset to select important attributes for heart disease prediction. This is intended to help identify key attributes and predict heart disease symptoms more efficiently.
One of the major purposes manufacturers incorporate AI or ML in their applications is to ease software computations and to predict precise results. I think compared to any other application, a medical application requires a lot of precise computations and therefore, AI is a perfect solution to enhance performance and productivity. While reading the health-tech news, I came across recent research in this regard, the use of AI in predicting a potential stroke or cardiac arrest. ..
Heart Disease Prediction Using Data MiningIRJET Journal
This document discusses using data mining techniques like Naive Bayes and Weighted Associative Classifier (WAC) to predict heart disease. It analyzes a dataset containing factors like age, sex, medical history, and test results. Naive Bayes and WAC are used to generate rules for predicting whether a patient has heart disease risk. The system was able to indicate heart disease risk levels based on the patient's data. The document concludes the approach was effective for heart disease prediction and automation could further improve clinical decision making.
IRJET- Develop Futuristic Prediction Regarding Details of Health System for H...IRJET Journal
This document discusses using data mining techniques to predict heart disease outcomes. It analyzes clinical data on cardiovascular diseases using predictive algorithms like naive Bayes, k-means clustering, and decision trees. The study aims to build models that can help identify relationships in medical data and predict future health system details for heart conditions. It compares the performance of different predictive data mining methods on a cardiovascular disease database. The top-performing technique was found to be naive Bayes classification. The models seek to help doctors better understand heart disease risk factors and trends to improve diagnosis.
This document discusses using data mining classifiers and attribute reduction techniques to predict chronic kidney disease (CKD) more accurately and efficiently. It first provides background on CKD and the need for early detection. It then discusses data mining, classification algorithms, attribute selection filters and wrappers. The document analyzes several studies that predicted CKD using techniques like decision trees, SVM and Naive Bayes. It describes the dataset used from the UCI repository and evaluation metrics. The results section compares J48, Decision Tree and IBK classifiers with and without attribute reduction using CfsSubsetEval, ClassifierSubsetEval and WrapperSubsetEval. Attribute reduction improved accuracy, especially for IBK which achieved 100% accuracy with 72% fewer attributes.
IRJET - Chronic Kidney Disease Prediction using Data Mining and Machine LearningIRJET Journal
This document discusses predicting chronic kidney disease through data mining and machine learning techniques. It examines using KNN, SVM, and ensemble models on a dataset of 400 patient records with 24 attributes related to chronic kidney disease. For data mining, SVM with an RBF kernel achieved 87% accuracy. For machine learning, KNN and SVM ensemble achieved over 92% accuracy. The document reviews several related studies applying classification algorithms like decision trees, neural networks, and Naive Bayes to chronic kidney disease prediction and their limitations. It then describes the KNN algorithm and its application to this problem in more detail.
Psdot 14 using data mining techniques in heartZTech Proje
The document proposes applying data mining techniques to identify suitable heart disease treatments. It discusses using single and hybrid data mining on diagnosis and treatment data to determine if models can reliably predict treatments as they do diagnoses. The proposed system would apply various data mining algorithms to both diagnosis and treatment data to investigate if hybrid models improve treatment prediction accuracy over single techniques.
Early Identification of Diseases Based on Responsible Attribute using Data Mi...IRJET Journal
This document describes a proposed method for early identification of diseases using data mining and classification techniques. It begins with an introduction to classification and discusses how it is commonly used in healthcare for tasks like predicting patient risk levels. It then reviews related literature applying classification methods to diseases like heart disease and diabetes. The document outlines the problem of selecting the best classification technique for a given healthcare dataset. It proposes an architecture and method for disease prediction that assigns recommended values to attributes and classifies unknown data based on calculating totals. The method is experimentally analyzed using a heart disease dataset, and its accuracy is compared to Bayesian classification. In conclusion, the proposed method seeks to reduce attributes and complexity while accurately classifying patient data for early disease identification.
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
This document proposes a new framework for predicting heart disease using machine learning techniques. It first discusses techniques like artificial neural networks and Naive Bayes classification that can be used for classification. It also discusses feature selection techniques like principal component analysis and information gain that can reduce the number of attributes before classification. The proposed framework would take a dataset, apply feature selection to reduce attributes, then use two classification algorithms (ANN and Naive Bayes) on the reduced dataset to select important attributes for heart disease prediction. This is intended to help identify key attributes and predict heart disease symptoms more efficiently.
One of the major purposes manufacturers incorporate AI or ML in their applications is to ease software computations and to predict precise results. I think compared to any other application, a medical application requires a lot of precise computations and therefore, AI is a perfect solution to enhance performance and productivity. While reading the health-tech news, I came across recent research in this regard, the use of AI in predicting a potential stroke or cardiac arrest. ..
Heart Disease Prediction Using Data MiningIRJET Journal
This document discusses using data mining techniques like Naive Bayes and Weighted Associative Classifier (WAC) to predict heart disease. It analyzes a dataset containing factors like age, sex, medical history, and test results. Naive Bayes and WAC are used to generate rules for predicting whether a patient has heart disease risk. The system was able to indicate heart disease risk levels based on the patient's data. The document concludes the approach was effective for heart disease prediction and automation could further improve clinical decision making.
IRJET- Chronic Kidney Disease Prediction based on Naive Bayes TechniqueIRJET Journal
This document discusses using a Naive Bayes technique to predict chronic kidney disease (CKD) based on patient data. It begins by introducing data mining and its applications in healthcare to extract useful information from large datasets. It then reviews literature on using classification algorithms like Naive Bayes for disease detection. Next, it describes the limitations of existing manual CKD prediction systems. The proposed system would automate CKD prediction using a Naive Bayes classifier to help doctors diagnose the disease which affects many worldwide. The methodology involves collecting clinical data, pre-processing it, then applying the Naive Bayes technique to extract patterns and predict CKD.
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUEScscpconf
The health sector has witnessed a great evolution following the development of new computer technologies, and that pushed this area to produce more medical data, which gave birth to multiple fields of research. Many efforts are done to cope with the explosion of medical data on one hand, and to obtain useful knowledge from it on the other hand. This prompted researchers to apply all the technical innovations like big data analytics, predictive analytics, machine learning and learning algorithms in order to extract useful knowledge and help in making decisions. With the promises of predictive analytics in big data, and the use of machine learning
algorithms, predicting future is no longer a difficult task, especially for medicine because predicting diseases and anticipating the cure became possible. In this paper we will present an overview on the evolution of big data in healthcare system, and we will apply a learning algorithm on a set of medical data. The objective is to predict chronic kidney diseases by using Decision Tree (C4.5) algorithm.
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
Data mining techniques are used for a variety of applications. In healthcare industry, datamining plays an important
role in predicting diseases. For detecting a disease number of tests should be required from the patient. But using data
mining technique the number of tests can be reduced. This reduced test plays an important role in time and performance.
This report analyses data mining techniques which can be used for predicting different types of diseases. This report reviewed
the research papers which mainly concentrate on predicting various disease
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
This document discusses applying machine learning algorithms to predict chronic kidney disease. It:
1) Applied three algorithms (C4.5 decision tree, SVM, and Bayesian Network) to a chronic kidney disease dataset containing 400 patients and 24 attributes to classify patients as having chronic kidney disease or not.
2) Found that the C4.5 decision tree algorithm had the best performance based on accuracy (63%), error rate (0.37), kappa statistic (0.97), and other evaluation metrics. SVM and Bayesian Network performance was lower.
3) Concludes C4.5 decision tree is the most efficient algorithm for predicting chronic kidney disease based on this medical dataset.
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET Journal
This document summarizes a research paper that aims to predict autism spectrum disorder (ASD) based on behavioral features using machine learning. The researchers collected ASD screening data from different age groups to develop and evaluate neural network models for predicting ASD. They achieved up to 90% accuracy in predicting ASD. The researchers concluded that machine learning is a promising approach for ASD prediction but noted limitations like lack of large datasets. They plan to improve the models by collecting more data from various sources.
This document describes a heart disease prediction system that uses machine learning algorithms to analyze patient data and predict the presence and severity of heart disease. The system uses four algorithms - random forest, naive bayes, decision tree, and linear regression - to build predictive models using a dataset of 801 patients with 12 medical attributes. The models are evaluated on their accuracy in both detecting heart disease and classifying its severity from 0 to 4. Random forest achieved the highest accuracy of 95.09% while naive bayes had the lowest at 60.38%. The system provides a way to more accurately diagnose heart disease early using data mining of existing patient records.
IRJET - Classification and Prediction for Hospital Admissions through Emergen...IRJET Journal
This document discusses using machine learning techniques to predict hospital admissions from emergency departments in order to improve patient flow and reduce overcrowding. It compares the performance of logistic regression and random forest algorithms on a dataset. Logistic regression identified several factors related to admissions including age, arrival mode, previous admissions. Random forests had the lowest accuracy. Predictive models could allow advance planning of resources to prevent bottlenecks. Future work involves exploring additional machine learning methods.
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
This document summarizes research on using data mining techniques to predict heart disease. It discusses previous work using classification, clustering, association rule mining and other techniques on several heart disease datasets. Classification algorithms like naive bayes, decision trees and neural networks have been widely used with naive bayes found to often provide the best performance. Feature selection and attribute reduction are also examined. The document provides an overview of the key steps and techniques in medical data mining and predictive analysis for heart disease.
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
This document outlines a disease prediction and doctor appointment system using machine learning. The objectives are to provide quick medical diagnosis to rural patients and enhance access to medical specialists. Five machine learning algorithms - Decision Tree, Random Forest, Naive Bayes, K-Nearest Neighbors, and Support Vector Machine - are used for disease prediction. The system displays predicted diseases and accuracy scores for each algorithm. Users can then book appointments with specialist doctors for their predicted disease.
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET Journal
This document presents a study on using the Naive Bayes classification technique to predict heart disease risk levels based on patient attributes. The study uses a heart disease dataset containing records of patients with 13 attributes each. The Naive Bayes classifier is applied to both the training dataset of 457 records and testing dataset of 88 records. The performance is evaluated based on various metrics like accuracy, precision, recall, F-measure etc. On the training data, the Naive Bayes classifier achieved 96.28% accuracy and on the testing data it achieved 98.86% accuracy, demonstrating it can accurately predict heart disease risk levels.
This document presents a comparative study of various data mining techniques for predicting heart disease using collected and standard heart disease datasets. Five classifiers - KStar, J48, SMO, Bayes Net, and MLP - were evaluated based on their accuracy and training time. SMO had the highest accuracy of 84-89% and MLP had the lowest training time of 0.33-0.75 seconds. The techniques are also compared based on their average classification variance. The study concludes with receiver operating characteristic curves showing the performance of the techniques on the two datasets.
Existing model uses structured data to predict the patients of either high risk or low risk.
But for a complex disease, structured data is not a good way to describe the disease.
We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital.
In this paper, we mainly focus on the risk prediction of cerebral infarction.
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
This document describes a study on applying data mining techniques to analyze and predict heart disease. It discusses how data mining can extract valuable knowledge from healthcare data. The study uses several data mining techniques like decision trees, naive Bayes classification, clustering, and association rule mining on heart disease datasets from UC Irvine to predict heart disease. Experimental results show that multilayer neural networks and classification techniques like naive Bayes had higher prediction accuracy compared to other methods.
A Survey on Heart Disease Prediction Techniquesijtsrd
Heart disease is the main reason for a huge number of deaths in the world over the last few decades and has evolved as the most life threatening disease. The health care industry is found to be rich in information. So, there is a need to discover hidden patterns and trends in them. For this purpose, data mining techniques can be applied to extract the knowledge from the large sets of data. Many researchers, in recent times have been using several machine learning techniques for predicting the heart related diseases as it can predict the disease effectively. Even though a machine learning technique proves to be effective in assisting the decision makers, still there is a scope for developing an accurate and efficient system to diagnose and predict the heart diseases thereby helping doctors with ease of work. This paper presents a survey of various techniques used for predicting heart disease and reviews their performance. G. Niranjana | Dr I. Elizabeth Shanthi "A Survey on Heart Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38349.pdf Paper Url: https://www.ijtsrd.com/computer-science/other/38349/a-survey-on-heart-disease-prediction-techniques/g-niranjana
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONIJDKP
Developing predictive modelling solutions for risk estimation is extremely challenging in health-care
informatics. Risk estimation involves integration of heterogeneous clinical sources having different
representation from different health-care provider making the task increasingly complex. Such sources are
typically voluminous, diverse, and significantly change over the time. Therefore, distributed and parallel
computing tools collectively termed big data tools are in need which can synthesize and assist the physician
to make right clinical decisions. In this work we propose multi-model predictive architecture, a novel
approach for combining the predictive ability of multiple models for better prediction accuracy. We
demonstrate the effectiveness and efficiency of the proposed work on data from Framingham Heart study.
Results show that the proposed multi-model predictive architecture is able to provide better accuracy than
best model approach. By modelling the error of predictive models we are able to choose sub set of models
which yields accurate results. More information was modelled into system by multi-level mining which has
resulted in enhanced predictive accuracy.
IRJET-Survey on Data Mining Techniques for Disease PredictionIRJET Journal
This document discusses using data mining techniques to predict disease, specifically focusing on heart disease. It provides an overview of different classification algorithms that can be used for disease prediction, including decision trees, Bayesian classifiers, multilayer perceptrons, and ensemble techniques. These algorithms are analyzed based on their accuracy, time efficiency, and area under the ROC curve. The document also reviews related literature applying various data mining methods like decision trees, KNN, and support vector machines to heart disease prediction. Overall, the document examines using classification algorithms and data mining to extract patterns from medical data that can help predict heart disease and other illnesses.
Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. More details are available here http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
This document discusses medical data mining and classification techniques. It begins with an introduction to data mining and its applications in healthcare to improve treatment. Medical data mining can help discover patterns in medical data to aid diagnosis. Classification algorithms like decision trees can categorize medical records and help predict outcomes. Specifically, the document discusses the J48 decision tree algorithm available in the WEKA data mining tool, which implements the C4.5 algorithm for classification. Decision trees work by recursively splitting the data into subsets based on attribute values, forming a tree structure. The document concludes that while data mining can help with medical analysis, results from small medical datasets should be interpreted cautiously.
Enhanced Detection System for Trust Aware P2P Communication NetworksEditor IJCATR
Botnet is a number of computers that have been set up to forward transmissions to other computers unknowingly to the user
of the system and it is most significant to detect the botnets. However, peer-to-peer (P2P) structured botnets are very difficult to detect
because, it doesn’t have any centralized server. In this paper, we deliver an infrastructure of P2P that will improve the trust of the peers
and its data. In order to identify the botnets we provide a technique called data provenance integrity. It will ensure the correct origin or
source of information and prevents opponents from using host resources. A reputation based trust model is used for selecting the
trusted peer. In this model, each peer has a reputation value which is calculated based on its past activity. Here a hash table is used for
efficient file searching and data stored in it is based on the reputation value.
IRJET- Chronic Kidney Disease Prediction based on Naive Bayes TechniqueIRJET Journal
This document discusses using a Naive Bayes technique to predict chronic kidney disease (CKD) based on patient data. It begins by introducing data mining and its applications in healthcare to extract useful information from large datasets. It then reviews literature on using classification algorithms like Naive Bayes for disease detection. Next, it describes the limitations of existing manual CKD prediction systems. The proposed system would automate CKD prediction using a Naive Bayes classifier to help doctors diagnose the disease which affects many worldwide. The methodology involves collecting clinical data, pre-processing it, then applying the Naive Bayes technique to extract patterns and predict CKD.
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
There are huge amounts of data in the medical industry which is not processed properly and hence cannot be used effectively in making decisions. We can use data mining techniques to mine these patterns and relationships. This research has developed a prototype Heart Disease Prediction using data mining techniques, namely Neural Network, K-Means Clustering and Frequent Item Set Generation. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease to be established. Performance of these techniques is compared through sensitivity, specificity and accuracy. It has been observed that Artificial Neural Networks outperform K Means clustering in all the parameters i.e. Sensitivity, Specificity and Accuracy.
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUEScscpconf
The health sector has witnessed a great evolution following the development of new computer technologies, and that pushed this area to produce more medical data, which gave birth to multiple fields of research. Many efforts are done to cope with the explosion of medical data on one hand, and to obtain useful knowledge from it on the other hand. This prompted researchers to apply all the technical innovations like big data analytics, predictive analytics, machine learning and learning algorithms in order to extract useful knowledge and help in making decisions. With the promises of predictive analytics in big data, and the use of machine learning
algorithms, predicting future is no longer a difficult task, especially for medicine because predicting diseases and anticipating the cure became possible. In this paper we will present an overview on the evolution of big data in healthcare system, and we will apply a learning algorithm on a set of medical data. The objective is to predict chronic kidney diseases by using Decision Tree (C4.5) algorithm.
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
Data mining techniques are used for a variety of applications. In healthcare industry, datamining plays an important
role in predicting diseases. For detecting a disease number of tests should be required from the patient. But using data
mining technique the number of tests can be reduced. This reduced test plays an important role in time and performance.
This report analyses data mining techniques which can be used for predicting different types of diseases. This report reviewed
the research papers which mainly concentrate on predicting various disease
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
This document discusses applying machine learning algorithms to predict chronic kidney disease. It:
1) Applied three algorithms (C4.5 decision tree, SVM, and Bayesian Network) to a chronic kidney disease dataset containing 400 patients and 24 attributes to classify patients as having chronic kidney disease or not.
2) Found that the C4.5 decision tree algorithm had the best performance based on accuracy (63%), error rate (0.37), kappa statistic (0.97), and other evaluation metrics. SVM and Bayesian Network performance was lower.
3) Concludes C4.5 decision tree is the most efficient algorithm for predicting chronic kidney disease based on this medical dataset.
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...IRJET Journal
This document summarizes a research paper that aims to predict autism spectrum disorder (ASD) based on behavioral features using machine learning. The researchers collected ASD screening data from different age groups to develop and evaluate neural network models for predicting ASD. They achieved up to 90% accuracy in predicting ASD. The researchers concluded that machine learning is a promising approach for ASD prediction but noted limitations like lack of large datasets. They plan to improve the models by collecting more data from various sources.
This document describes a heart disease prediction system that uses machine learning algorithms to analyze patient data and predict the presence and severity of heart disease. The system uses four algorithms - random forest, naive bayes, decision tree, and linear regression - to build predictive models using a dataset of 801 patients with 12 medical attributes. The models are evaluated on their accuracy in both detecting heart disease and classifying its severity from 0 to 4. Random forest achieved the highest accuracy of 95.09% while naive bayes had the lowest at 60.38%. The system provides a way to more accurately diagnose heart disease early using data mining of existing patient records.
IRJET - Classification and Prediction for Hospital Admissions through Emergen...IRJET Journal
This document discusses using machine learning techniques to predict hospital admissions from emergency departments in order to improve patient flow and reduce overcrowding. It compares the performance of logistic regression and random forest algorithms on a dataset. Logistic regression identified several factors related to admissions including age, arrival mode, previous admissions. Random forests had the lowest accuracy. Predictive models could allow advance planning of resources to prevent bottlenecks. Future work involves exploring additional machine learning methods.
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
This document summarizes research on using data mining techniques to predict heart disease. It discusses previous work using classification, clustering, association rule mining and other techniques on several heart disease datasets. Classification algorithms like naive bayes, decision trees and neural networks have been widely used with naive bayes found to often provide the best performance. Feature selection and attribute reduction are also examined. The document provides an overview of the key steps and techniques in medical data mining and predictive analysis for heart disease.
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
This document outlines a disease prediction and doctor appointment system using machine learning. The objectives are to provide quick medical diagnosis to rural patients and enhance access to medical specialists. Five machine learning algorithms - Decision Tree, Random Forest, Naive Bayes, K-Nearest Neighbors, and Support Vector Machine - are used for disease prediction. The system displays predicted diseases and accuracy scores for each algorithm. Users can then book appointments with specialist doctors for their predicted disease.
IRJET- The Prediction of Heart Disease using Naive Bayes ClassifierIRJET Journal
This document presents a study on using the Naive Bayes classification technique to predict heart disease risk levels based on patient attributes. The study uses a heart disease dataset containing records of patients with 13 attributes each. The Naive Bayes classifier is applied to both the training dataset of 457 records and testing dataset of 88 records. The performance is evaluated based on various metrics like accuracy, precision, recall, F-measure etc. On the training data, the Naive Bayes classifier achieved 96.28% accuracy and on the testing data it achieved 98.86% accuracy, demonstrating it can accurately predict heart disease risk levels.
This document presents a comparative study of various data mining techniques for predicting heart disease using collected and standard heart disease datasets. Five classifiers - KStar, J48, SMO, Bayes Net, and MLP - were evaluated based on their accuracy and training time. SMO had the highest accuracy of 84-89% and MLP had the lowest training time of 0.33-0.75 seconds. The techniques are also compared based on their average classification variance. The study concludes with receiver operating characteristic curves showing the performance of the techniques on the two datasets.
Existing model uses structured data to predict the patients of either high risk or low risk.
But for a complex disease, structured data is not a good way to describe the disease.
We propose a new convolutional neural network (CNN)-based multimodal disease risk prediction algorithm using structured and unstructured data from hospital.
In this paper, we mainly focus on the risk prediction of cerebral infarction.
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
This document describes a study on applying data mining techniques to analyze and predict heart disease. It discusses how data mining can extract valuable knowledge from healthcare data. The study uses several data mining techniques like decision trees, naive Bayes classification, clustering, and association rule mining on heart disease datasets from UC Irvine to predict heart disease. Experimental results show that multilayer neural networks and classification techniques like naive Bayes had higher prediction accuracy compared to other methods.
A Survey on Heart Disease Prediction Techniquesijtsrd
Heart disease is the main reason for a huge number of deaths in the world over the last few decades and has evolved as the most life threatening disease. The health care industry is found to be rich in information. So, there is a need to discover hidden patterns and trends in them. For this purpose, data mining techniques can be applied to extract the knowledge from the large sets of data. Many researchers, in recent times have been using several machine learning techniques for predicting the heart related diseases as it can predict the disease effectively. Even though a machine learning technique proves to be effective in assisting the decision makers, still there is a scope for developing an accurate and efficient system to diagnose and predict the heart diseases thereby helping doctors with ease of work. This paper presents a survey of various techniques used for predicting heart disease and reviews their performance. G. Niranjana | Dr I. Elizabeth Shanthi "A Survey on Heart Disease Prediction Techniques" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38349.pdf Paper Url: https://www.ijtsrd.com/computer-science/other/38349/a-survey-on-heart-disease-prediction-techniques/g-niranjana
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONIJDKP
Developing predictive modelling solutions for risk estimation is extremely challenging in health-care
informatics. Risk estimation involves integration of heterogeneous clinical sources having different
representation from different health-care provider making the task increasingly complex. Such sources are
typically voluminous, diverse, and significantly change over the time. Therefore, distributed and parallel
computing tools collectively termed big data tools are in need which can synthesize and assist the physician
to make right clinical decisions. In this work we propose multi-model predictive architecture, a novel
approach for combining the predictive ability of multiple models for better prediction accuracy. We
demonstrate the effectiveness and efficiency of the proposed work on data from Framingham Heart study.
Results show that the proposed multi-model predictive architecture is able to provide better accuracy than
best model approach. By modelling the error of predictive models we are able to choose sub set of models
which yields accurate results. More information was modelled into system by multi-level mining which has
resulted in enhanced predictive accuracy.
IRJET-Survey on Data Mining Techniques for Disease PredictionIRJET Journal
This document discusses using data mining techniques to predict disease, specifically focusing on heart disease. It provides an overview of different classification algorithms that can be used for disease prediction, including decision trees, Bayesian classifiers, multilayer perceptrons, and ensemble techniques. These algorithms are analyzed based on their accuracy, time efficiency, and area under the ROC curve. The document also reviews related literature applying various data mining methods like decision trees, KNN, and support vector machines to heart disease prediction. Overall, the document examines using classification algorithms and data mining to extract patterns from medical data that can help predict heart disease and other illnesses.
Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. More details are available here http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
This document discusses medical data mining and classification techniques. It begins with an introduction to data mining and its applications in healthcare to improve treatment. Medical data mining can help discover patterns in medical data to aid diagnosis. Classification algorithms like decision trees can categorize medical records and help predict outcomes. Specifically, the document discusses the J48 decision tree algorithm available in the WEKA data mining tool, which implements the C4.5 algorithm for classification. Decision trees work by recursively splitting the data into subsets based on attribute values, forming a tree structure. The document concludes that while data mining can help with medical analysis, results from small medical datasets should be interpreted cautiously.
Enhanced Detection System for Trust Aware P2P Communication NetworksEditor IJCATR
Botnet is a number of computers that have been set up to forward transmissions to other computers unknowingly to the user
of the system and it is most significant to detect the botnets. However, peer-to-peer (P2P) structured botnets are very difficult to detect
because, it doesn’t have any centralized server. In this paper, we deliver an infrastructure of P2P that will improve the trust of the peers
and its data. In order to identify the botnets we provide a technique called data provenance integrity. It will ensure the correct origin or
source of information and prevents opponents from using host resources. A reputation based trust model is used for selecting the
trusted peer. In this model, each peer has a reputation value which is calculated based on its past activity. Here a hash table is used for
efficient file searching and data stored in it is based on the reputation value.
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...Editor IJCATR
Data mining refers to extracting knowledge from large amount of data. Real life data mining approaches are
interesting because they often present a different se
t of problems for
diabetic
patient’s
data
.
The
research area to solve
various problems and classification is one of main problem in the field. The research describes algorithmic discussion of J48
,
J48 Graft, Random tree, REP, LAD. Here used to compare the
performance of computing time, correctly classified
instances, kappa statistics, MAE, RMSE, RAE, RRSE and
to find the error rate measurement for different classifiers in
weka .In this paper the
data
classification is diabetic patients data set is develope
d by collecting data from hospital repository
consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine t
ests.
Weka tool is used to classify the data is evaluated using 10 fold cross validat
ion and the results are compared. When the
performance of algorithms
,
we found J48 is better algorithm in most of the cases
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Editor IJCATR
Data mining refers to extracting knowledge from large amount of data. Real life data mining approaches are
interesting because they often present a different set of problems for diabetic patient’s data. The research area to solve
various problems and classification is one of main problem in the field. The research describes algorithmic discussion of J48,
J48 Graft, Random tree, REP, LAD. Here used to compare the performance of computing time, correctly classified
instances, kappa statistics, MAE, RMSE, RAE, RRSE and to find the error rate measurement for different classifiers in
weka .In this paper the data classification is diabetic patients data set is developed by collecting data from hospital repository
consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine tests.
Weka tool is used to classify the data is evaluated using 10 fold cross validation and the results are compared. When the
performance of algorithms, we found J48 is better algorithm in most of the cases.
Assessment of Decision Tree Algorithms on Student’s RecitalIRJET Journal
This document presents a study that compares the performance of various decision tree algorithms (J48, Hoeffding Tree, Random Forest, Random Tree, REPTree, Decision Stump) on student academic performance data. The study uses educational datasets containing student marks and percentages to classify students into performance grades (A,B,C) and predict their marks in future semesters. The decision tree algorithms are implemented on the datasets using the WEKA data mining tool. The algorithms are evaluated and compared based on accuracy in classifying students and predicting future marks. The results show that J48, Random Forest and Random Tree algorithms achieved 100% accuracy on the training and some test datasets, performing the best among the algorithms evaluated.
This document describes a disease prediction system that uses machine learning algorithms like decision trees, random forests and naive Bayes to predict a disease based on symptoms provided by a patient. The researchers developed a logistic regression model to take in symptoms and predict the likely disease. It was created using Python and aims to help busy professionals more easily identify health issues before they become serious. The system was built using techniques like data collection, preprocessing, model training/evaluation and aims to improve performance over iterations. It was found to provide time savings and early disease warnings compared to traditional diagnosis methods.
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET Journal
This document describes research on improving the accuracy of heart disease prediction using hybrid machine learning techniques. The researchers collected data on patient biomarkers and risk factors from hospitals and online repositories. They applied data preprocessing, feature selection, and various classification models like decision trees, support vector machines, random forests, and K-nearest neighbors. Evaluating the models showed that a hybrid of fuzzy K-nearest neighbor and K-nearest neighbor achieved the highest accuracy rate of 94% for heart disease prediction. The researchers then built a web application using this hybrid model to allow users to predict their risk of heart disease online with high accuracy. The study demonstrates that machine learning can effectively analyze medical data and help predict diseases.
Multi Disease Detection using Deep LearningIRJET Journal
1) The document proposes a system for multi-disease detection using deep learning that could provide early detection of chronic diseases like heart disease, cancer, and diabetes from medical data and save lives.
2) It reviews literature on disease prediction using machine learning algorithms like CNN, KNN, decision trees, and support vector machines. CNN showed slightly better accuracy than KNN for general disease detection.
3) The proposed system would use deep learning models to detect and classify diseases from medical images and data with high accuracy, helping doctors verify test results and enhancing their experience with diseases. It aims to reduce the costs of diagnostic testing for chronic conditions.
IRJET- Data Mining Techniques to Predict DiabetesIRJET Journal
This document discusses using data mining techniques to predict diabetes. It begins with an introduction to diabetes and what causes high blood sugar. It then discusses how data mining of patient purchase histories can show connections to medication adherence. Various data mining techniques are explored, including decision trees and the Apriori algorithm, to analyze medical data and extract patterns to improve diagnosis and treatment recommendations for patients. The goal is to help doctors and patients choose the most effective and lowest cost treatment options based on analyses of large diabetes datasets.
Predictions And Analytics In Healthcare: Advancements In Machine LearningIRJET Journal
This document discusses advancements in machine learning and predictive analytics for healthcare. It begins with an introduction discussing how technologies like machine learning and artificial intelligence can help researchers and doctors achieve goals faster when integrated with healthcare. The document then reviews literature on challenges with analyzing big healthcare data due to issues like data variety, speed and volume. It discusses different machine learning algorithms that have been used for disease prediction and diagnosis, including decision trees, random forests, bagging and boosting. The methodology section outlines the use of an ensemble approach, combining multiple models to improve overall accuracy. Technologies implemented in this work include Python libraries like Pandas, NumPy and Scikit-learn for data processing and modeling, along with Flask and AWS for web app deployment. The
IRJET- Hybrid Architecture of Heart Disease Prediction System using Genetic N...IRJET Journal
This document proposes a hybrid system using genetic algorithms and neural networks to predict heart disease risk more efficiently. The system is trained on a dataset containing patient risk factors and diagnoses. It then classifies new patient samples to predict the presence or absence of heart disease. The accuracy, mean squared error, and regression of the proposed hybrid system are compared to other traditional algorithms and found to perform better. A genetic algorithm is used to optimize the neural network architecture and weights. The system could help doctors diagnose heart disease earlier based on risk factors before costly testing.
IRJET- Heart Disease Prediction and RecommendationIRJET Journal
This document describes a study that developed a machine learning model to predict heart disease risk and provide recommendations. The study used a decision tree algorithm and the Cleveland heart disease dataset to train a model. The model takes in 14 clinical attributes to predict the risk of heart disease on a scale of 0 to 1. It then provides control measure recommendations based on the predicted risk level to help users reduce their risk. The system was designed to be implemented as an Android application for users to input their data and receive the prediction and recommendations.
IRJET - Prediction and Analysis of Multiple Diseases using Machine Learni...IRJET Journal
This document discusses using machine learning techniques to predict and analyze multiple diseases. It presents research using KNN, support vector machine, random forest, and decision tree algorithms applied to a medical database to predict future and previous diseases. The goal is to provide a smart card method for easily and accurately diagnosing disease by storing an individual's full medical record. It reviews related work applying various machine learning classifiers like decision trees, naive Bayes, and logistic regression to diseases such as heart disease, diabetes, and cancer. The conclusion is that machine learning applied to medical data can help predict disease and save time for patients and doctors.
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
This document analyzes the efficiency of various prediction algorithms for mining biological databases. It discusses prediction through mining biological databases to identify disease risks. It then evaluates several prediction algorithms (ZeroR, OneR, JRip, PART, Decision Table) on a breast cancer dataset using measures like accuracy, sensitivity, specificity, and predictive values. The results show that the JRip and PART algorithms generally had the highest accuracy rates, around 70%, while ZeroR had the lowest accuracy. However, ZeroR had a perfect positive predictive value. The study aims to assess the most efficient algorithms for predictive mining of biological data.
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization IJECEIAES
Credit scoring is a procedure that exists in every financial institution. A way to predict whether the debtor was qualified to be given the loan or not and has been a major concern in the overall steps of the loan process. Almost all banks and other financial institutions have their own credit scoring methods. Nowadays, data mining approach has been accepted to be one of the wellknown methods. Certainly, accuracy was also a major issue in this approach. This research proposed a hybrid method using CART algorithm and Binary Particle Swarm Optimization. Performance indicators that are used in this research are classification accuracy, error rate, sensitivity, specificity, and precision. Experimental results based on the public dataset showed that the proposed method accuracy is 78 % and 87.53 %. In compare to several popular algorithms, such as neural network, logistic regression and support vector machine, the proposed method showed an outstanding performance.
Heart Disease Prediction using Data MiningIRJET Journal
This document describes a study that uses data mining techniques like neural networks and genetic algorithms to predict heart disease based on major risk factors. The proposed system initializes neural network weights using a genetic algorithm for feature selection and classification to build an intelligent clinical decision support system. It analyzes heart disease risk factors like age, cholesterol, blood pressure, smoking status and diabetes using a neuro-fuzzy model optimized with a genetic algorithm. The system is able to predict heart disease with 89% accuracy and can help detect the disease early to improve treatment outcomes.
IRJET- Diabetes Diagnosis using Machine Learning AlgorithmsIRJET Journal
This document presents research on using machine learning algorithms to diagnose diabetes. The researchers collected a dataset of 15,000 patient records from the National Institute of Diabetes and Digestive and Kidney Diseases. They analyzed the dataset and used machine learning algorithms like decision trees, naive Bayes, support vector machines, and k-nearest neighbors to build predictive models. The models were evaluated based on accuracy and other performance metrics. The naive Bayes classifier achieved the highest accuracy of 72% in predicting whether patients had diabetes. The research aims to develop a machine learning system that can predict diabetes early to help treat patients before the disease becomes critical.
A COMPREHENSIVE SURVEY ON CARDIAC ARREST RISK LEVEL PREDICTION SYSTEMIRJET Journal
This document summarizes research on predicting cardiac arrest risk levels using machine learning techniques. It discusses how techniques like naive Bayes, support vector machine, KNN, logistic regression, decision trees, and random forests can be used to classify patient risk levels based on medical data. Accuracy rates from prior studies using these methods on cardiac datasets ranged from 60% to over 99%, depending on the techniques and attributes used. The document also outlines some challenges in cardiac risk prediction, such as choosing the appropriate dataset, attributes, algorithms and evaluating model performance.
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...IRJET Journal
This document describes a study that uses supervised machine learning algorithms to predict breast cancer. Three algorithms - decision tree, logistic regression, and random forest - are applied to preprocessed breast cancer data. The random forest model achieved the best accuracy at 98.6% for predicting whether a tumor was benign or malignant. The study aims to develop an early prediction system for breast cancer using machine learning techniques.
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNINGIRJET Journal
This document summarizes a research paper that evaluates different machine learning algorithms for detecting blood diseases from laboratory test results. It first introduces the objective to classify and predict diseases like anemia and leukemia. It then evaluates three algorithms: Gaussian, Random Forest, and Support Vector Classification (SVC). SVC achieved the highest accuracy of 98% for anemia detection. The models are deployed using Streamlit so users can access them online or offline. Benefits include low hardware requirements and mobile access. Future work will add more disease predictions and integrate nutritional guidance.
Similar to Comparative Study of Classification Method on Customer Candidate Data to Predict its Potential Risk (20)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Neural network optimizer of proportional-integral-differential controller par...IJECEIAES
Wide application of proportional-integral-differential (PID)-regulator in industry requires constant improvement of methods of its parameters adjustment. The paper deals with the issues of optimization of PID-regulator parameters with the use of neural network technology methods. A methodology for choosing the architecture (structure) of neural network optimizer is proposed, which consists in determining the number of layers, the number of neurons in each layer, as well as the form and type of activation function. Algorithms of neural network training based on the application of the method of minimizing the mismatch between the regulated value and the target value are developed. The method of back propagation of gradients is proposed to select the optimal training rate of neurons of the neural network. The neural network optimizer, which is a superstructure of the linear PID controller, allows increasing the regulation accuracy from 0.23 to 0.09, thus reducing the power consumption from 65% to 53%. The results of the conducted experiments allow us to conclude that the created neural superstructure may well become a prototype of an automatic voltage regulator (AVR)-type industrial controller for tuning the parameters of the PID controller.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
A review on features and methods of potential fishing zoneIJECEIAES
This review focuses on the importance of identifying potential fishing zones in seawater for sustainable fishing practices. It explores features like sea surface temperature (SST) and sea surface height (SSH), along with classification methods such as classifiers. The features like SST, SSH, and different classifiers used to classify the data, have been figured out in this review study. This study underscores the importance of examining potential fishing zones using advanced analytical techniques. It thoroughly explores the methodologies employed by researchers, covering both past and current approaches. The examination centers on data characteristics and the application of classification algorithms for classification of potential fishing zones. Furthermore, the prediction of potential fishing zones relies significantly on the effectiveness of classification algorithms. Previous research has assessed the performance of models like support vector machines, naïve Bayes, and artificial neural networks (ANN). In the previous result, the results of support vector machine (SVM) were 97.6% more accurate than naive Bayes's 94.2% to classify test data for fisheries classification. By considering the recent works in this area, several recommendations for future works are presented to further improve the performance of the potential fishing zone models, which is important to the fisheries community.
Electrical signal interference minimization using appropriate core material f...IJECEIAES
As demand for smaller, quicker, and more powerful devices rises, Moore's law is strictly followed. The industry has worked hard to make little devices that boost productivity. The goal is to optimize device density. Scientists are reducing connection delays to improve circuit performance. This helped them understand three-dimensional integrated circuit (3D IC) concepts, which stack active devices and create vertical connections to diminish latency and lower interconnects. Electrical involvement is a big worry with 3D integrates circuits. Researchers have developed and tested through silicon via (TSV) and substrates to decrease electrical wave involvement. This study illustrates a novel noise coupling reduction method using several electrical involvement models. A 22% drop in electrical involvement from wave-carrying to victim TSVs introduces this new paradigm and improves system performance even at higher THz frequencies.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018 : 4763 - 4771
4764
customer candidates whom are predicted will have payment problems in the future to assist in determining
the prospective customer credit more as well.
In the study published in the Journal entitled "C4.5 Algorithm to Predict the Impact of the
Earthquake" it is describe about the earthquake that cannot be predicted when it would happen, but we can
predict the expected impact of the quake based on seismic data that never happened before. One of the
methods used to dig or to search for information on old data is data mining algorithm C4.5. The output of the
algorithm C4.5 in predicting the impact of the quake is divided into three parts. Namely, there are no
impact/minor damage, severe damage, and the damage and tsunami. By predicting the implications of the
earthquake, it is expected to minimize the quack impact. This study uses the C4.5 algorithm to predict the
effects of earthquakes while the attributes that are used are the epicenter, distance from the beach, depth,
scale, duration, and effect. The results of the study show the pattern to predict is based on the effects of
earthquakes. If the scale is low, it does not cause any effect. If the scale is medium and in short duration, then
there is no effect. If the scale is medium and in long duration, then it will cause the broken. If the scale is
height and in a certain distance from the coast or it is happening on the land, it will cause the broken too. If
the scale is height and its distance from the coast is very far, then it will cause broken and tsunami. If the
scale is height and its distance from the coast is far and the epicenter in the sea, it will cause broken and
tsunami [6].
The other study that utilizes the C4.5 is also presented in [7]. The study describes about rainfall, soil
data and climate dataset that are used to predict the crop production. These types of datasets are preprocessed
to remove the unwanted and null data in the dataset. The feature extraction method is used to extract a subset
of new features from the datasets through functional mapping to maintain the information. In feature
selection, genetic algorithm is used to select optimal features. The genetic algorithm provides the opportunity
to discover the optimum solution. The enhanced ANFIS classifier then is used. The ANFIS classifier is the
improvement of C4.5 classifier in hidden layer to generate the rules to predict the yield. By enhancing the
C4.5, the experimental results of proposed work show better accuracy of 92.50 % than existing classifier. The
comparative study of decision tree variants performance of information mining in the forest burned area is
conducted by Putri et al as published in [8]. The study conducted comparative analysis of three decision tree
variants ie. CART, C5.0, and C4.5 algorithm. Of these three decision techniques, the C5.0 algorithm is the
most suitable for spatial data of the forest burned area. The algorithm is outperform shown by its accuracy is
99.79%.
In [9] authors show their study in using Naive Bayes classifier to predict the patient’s hypertension
disease. The hypertension disease is a significant health problem, and patients may not be able to recognize
this disease for years. But in the other side, it's still difficult to answer complex queries such as “Given
patient records, predict the probability of patients getting hypertension”. Most of the time, clinical decisions
are often made based on doctors intuition and experience rather than on the knowledge rich data hidden in the
database. In this study, the Naive Bayes algorithm is employed to make a model with predictive capabilities.
It provides new ways that of exploring and understanding knowledge. Attributes used in this research are as
follows sex, chest pain, exam, age, systolic BP, diastolic BP, cholesterol, fasting blood sugar, thalach, old
peak, the risk of hypertension. The Naive Bayes experiments in the study give performances as: the recall is
83.70%, the precision is 83.60% and the accuracy is 83.67%. Another interesting of naïve Bayes application
for classification purpose is presented in [10]. In the study author present the result of the Zakah receiver
classification experiment that utilizes the naïve Bayes classifier. According the experiment results, the
classifier provides good accuracy i.e. 85 %. One of the application of naïve Bayes classifiers in media social
mining domain is discussed in [11]. The study explored the application of Multinomial Naïve Bayes classifier
technique to mine the sentiment opinion pattern of GSM based on customer’s twitter account. By using 1665
features of the dataset, the technique provides the accuracy results of 73.15 %.
In this work we perform an experimental study of Naive Bayes and C4.5 algorithm that applied to
the company leasing customer data history. The purpose of the data is to evaluate the performance of both
algorithms in assisting the company leasing to make the decision regarding the approval of customers
candidate who apply the leasing. The such study is critical to local Indonesia context since the financial
technology is currently growing quickly while the information technology, especially the software/
application, the environment is still in the initial phase. According to the author's knowledge, there is a very
limited publication related the application of Artificial Intelligent or Machine Learning to this domain for
Indonesia cases.
3. Int J Elec & Comp Eng ISSN: 2088-8708
Comparative Study of Classification Method on Customer Candidate Data (Mujiono Sadikin)
4765
2. MATERIAL AND METHOD
2.1. Classification
Classification is one of the Data Mining techniques that is mainly used to analyze a given dataset
and takes each instance of it and assigns this instance to a particular class such that classification error will be
least. It is used to extract models that accurately define important data classes within the given dataset.
Classification is a two step process. During the first step the model is created by applying a classification
algorithm for training data set, then in the second step the extracted model is tested against a predefined test
dataset to measure the model trained performance and accuracy. So classification is the process to assign a
class label from dataset whose class label is unknown [9].
2.2. C4.5 Algorithm
C4.5 algorithm is an algorithm used to construct a decision tree [12], a classification and prediction
methods are extremely powerful and famous. Decision tree method changes the very large fact into a
decision tree that represents the rule. The decision tree is also useful to explore the data in finding the
relationship between input variables and a certain output/target variable. In general, C4.5 algorithm to
construct a decision tree is described as follows:
a. Select an attribute as root.
b. Create a branch for each value.
c. For the case of the branches.
d. Repeat the process for each branch until all cases the branches have the same class.
To select an attribute as roots, is based on the highest gain value from the existing attributes. To
calculate the gain used formula as follows:
𝐺𝑎𝑖𝑛(𝑆, 𝐴) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑
| 𝑆𝑖 |
| 𝑆 |
𝑛
𝑖=1
Information:
S: The sets of cases
A: Attribute
n: The number of partitions attribute A
|Si|: Number of cases in the i partitions
|S|: Number of cases on S
Meanwhile, the calculation of entropy value follows:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = ∑ − 𝑝𝑖 ∗ 𝑙𝑜𝑔2 𝑝𝑖
𝑛
𝑖=1
Information:
S: The sets of cases
A: Feature
n: The number of partitions S
pi: The proportion of Si againts S
2.3. Naive Bayes Algorithm
Naive Bayes algorithm studies the events of the database record by calculating the variables which
are analyzed with other variables [13]. The result of this process is we can predict something such as whether
or a person coming from certain groups based on variables attached to it. Additionally, Naive Bayes can also
analyze the variables that most influence in the form of probabilities. Naive Bayes is a simple probability-
based prediction techniques based on the application of Bayes theorem to assume strong independence. The
steps below are Naive Bayes stages process:
a. Counting the number of classes / labels
b. Counting the number of cases per class
c. Multiply all class variables
d. Compare results per class
The formula of Naive Bayes Algorithm is as follows:
𝑃 (𝐶 | 𝑋 ) =
𝑃(𝑥 | 𝑐) 𝑃(𝑐)
𝑃(𝑋)
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018 : 4763 - 4771
4766
Information:
x : Data with unknown class
c : Hypothesis of data is a specific class
P (c | x) : The probability of a hypothesis based on the conditions
P (c) : The probability of a hypothesis
P (x | c) : Probability based on hypothetical conditions
P (x) : Probability c
2.4. Weka Tools
Weka is a collection of machine learning algorithms for data mining tasks. Weka stands for Waikato
Environment for Knowledge Learning. It was developed by the University of Waikato, New Zealand. Weka
contains tools for data pre-processing, classification, regression, clustering, association rules, and
visualization [14]. The workflow of Weka would be follows in Figure 1.
Figure 1. Weka Flow
2.5. Data Set
The data source used in this research is collected from one of the leasing companies located in the
area of Cikupa-Tangerang, Banten Province. The total amount of data collected are 560 record data, each
instant contains 5 attributes, namely: age, marital status, salary, other installments and worthiness as
presented as Table 1. Worthiness attribute is the target variable/label. Some samples of data instant are
described in Table 2.
Table 1. Data Set Attribute
No. Attribute Attribute Value
1 Age(Years) 23, 40, 50 so on
2 Salary(Rupiah) 1 Milion, 4 Milion so on
3 Other Installments Yes, No
4 Marital Status Married, Single
5 Worthiness Worth It, Not Worth It
Table 2. Example of Data Set Attribute Value
No. Age Salary Other Installments Marital Status Worthiness
1 21 4.400.000 IDR No Married Not Worth It
2 23 10.600.000 IDR Yes Married Not Worth It
3 43 14.000.000 IDR Yes Married Worth It
4 54 13.000.000 IDR No Married Worth It
5 25 4.700.000 IDR Yes Single Not Worth It
Two of four attributes, age and salary, can contain values in wide range, so this condition will make
suffer in its computation. To deal with this problem we apply the categorization mechanism to both of
attribute values as presented in Table 3. Table 4 shows data example.
5. Int J Elec & Comp Eng ISSN: 2088-8708
Comparative Study of Classification Method on Customer Candidate Data (Mujiono Sadikin)
4767
Table 3. Data Set Attribute Categorization
No. Attribute Attribute Value Attribute Categorization
1 Age(Years) 23, 40, 50 so on - Age < 45 : Young
- Age > 45 : Old
2 Salary(Rupiah) 1 million, 4 million so on - < 5 million: Low
- 5 – 10 million: Middle
- > 10 million: High
3 Other Installments Yes, No Yes, No
4 Marital Status Married, Single Married, Single
5 Worthiness Worth It, Not Worth It Worth It, Not Worth It
Table 4. Example of Data Set Categorization
No Age Salary Other Installments Marital Status Worthiness
1 Young Low No Married Not Worth It
2 Young High Yes Married Not Worth It
3 Young High Yes Married Worth It
4 Old High No Married Worth It
5 Young Low Yes Single Not Worth It
2.6. Experiment Scenario
The main parts of experiment scenario consist of two steps. The first step is to obtain the best model
from each algorithm and the second is to compete the both best models obtained. The detail of experiment
stages and scenario is illustrated as the Figure 2.
Figure 2.The Experiment Scenario
The data collected is not ready yet to be processed by the algorithm since there are too many biases
or ambiguous contained on it, so it needs to perform the data preprocessing operation. In this step we perform
data cleaning by ignoring the uncompleted data. The next step of data preprocessing is a data transformation
that transforms the data format to format that compatible with Weka tools. Data splitting is then applied to
the data to divide the data into two parts: training data and testing data. In this case, we use 80% parts of the
data for data training, and the rest as data testing. The same training data is then used to train both of
algorithm to provide the models which will be tested with the same data testing. For both algorithms used, we
perform twenty experiment runs to get the best model of each algorithm. Both of the best models are then
competed to evaluate their performance and to get the best model among of C4.5 and Naive Bayes.
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018 : 4763 - 4771
4768
2.7. Data Preprocessing
Data preprocessing is required to improve the quality of the data by removing the unwanted data
from the original data [15]. Preprocessing data is important since the raw data contains missing values, noisy,
and inconsistent data it will result in data not qualified. In this study, we do data preprocessing as follows:
a. Data Cleaning
Data cleaning is to do data cleaning of the noise found in the form of missing values, inconsistent
data, and redundant data. All the above attributes will then be selected to obtain attributes that contain
relevant values, not missing values, and not redundant, where the three requirements are the
prerequisites that must be done in data mining so that will be obtained a clean dataset for use in the data
mining stage . In this dataset found 1 missing value, the technique that will be done for 1 missing value
record is to delete it record.
b. Data Transformation
The data transformation stage is at this stage the data is converted into the appropriate form for
processing in data mining. In this study the data will be processed from Microsoft excel will be
converted into a CSV file (Comma Separated Values) which can be used for data processing on Weka
tools.
2.8. Evaluation
To evaluate the performance of both algorithms, we use the common criteria in data mining i.e.
precision, recall, and accuracy. The calculation of those parameters is performed by to provide a confusion
matrix. A confusion matrix contains information about actual and predicted class provided by a classification
system [16]. All correct classifications that lie along the diagonal from the north-west corner to the south-east
corner also is called True Positives (TP) and True Negatives (TN) while other cells are stated as the False
Positives (FP) and False Negatives (FN)[17]. In this study, the likely cases are considered as the positive
case, while the unlikely and probable cases are the negative cases. The definitions of these parameters are
presented as follows:
a. True positives (TP) are correctly classified yes cases.
b. False positives (FP) are incorrectly classified no cases.
c. True negatives (TN) are correctly classified no cases.
d. False negatives (FN) are incorrectly classified yes cases.
The true positive/negative and false positive/negative values recorded from the confusion matrix,
then can be used to evaluate the performance of the prediction model. A description of the definition and
expressions of the metrics is presented as follows[18]:
a. Recall is an average per-class effectiveness of a classifier to identify class labels.
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
b. Precision is the ability of a classifier to determine the positive labels by using one versus all approach.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
c. Accuracy is the sum of the ratios of correct classifications to the number of total classifications by using
a one versus all approach.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
3. RESULTS AND DISCUSSION
This section presents the experimental results and analysis of this study which utilize two clasifiers,
C4.5 and Naive Bayes. Three experiments scenarios based on percentage data splitting are performed to each
algorithm. The first experiment uses 60% of training data and 40% of the data testing, the second experiment
uses 70% of training data and 30% of the data testing, and the third experiment uses 80% training data and 20
% data testing. The experiment which provides the highest performance values for each method is used as a
model to find the best method by re-testing on provided data testing. The Table 5 presents the average
performance parameter values of each experiment scenario of C4.5 on model testing stages, while Table 6
shows the results of Naive Bayes. Based on the achieved value of accuracy criteria, the first experiment
7. Int J Elec & Comp Eng ISSN: 2088-8708
Comparative Study of Classification Method on Customer Candidate Data (Mujiono Sadikin)
4769
scenario is the best for both of the algorithm. In the first scenario, the C4.5 accuracy is 82.59 %, whereas the
Naive Bayes accuracy is 80.35 %.
Table 5. C4.5 Algorithm Test Performance
Experiment Accuracy Precision Recall
1 82.59% 86.77% 82.03%
2 80.37% 85.10% 80.80%
3 80.37% 87.50% 80%
Table 6. Naive Bayes Algorithm Test Performance
Experiment Accuracy Precision Recall
1 80.35% 80.16% 82.90%
2 77.38% 78.72% 80.43%
3 77.68% 81.25% 81.25%
The next stage of the experiment is to compare the best model provided from each experiment
scenario which are run for both algorithms. These two models then are applied to the data testing that has
been provided to get which of algorithm that is suitable for the study case. The results of this comparison
stage are presented as Table 7. Table 7 shows that the C4.5 algorithm is superior compared to the Naive
Bayes algorithm with its accuracy is 83.33%, while the Naive Bayes algorithm achieved is 80.67%.
Table 7. Comparison C4.5 Best Model and Naive Bayes Best Model on Testing Stage
Criteria C4.5 Algorithm Naive Bayes Algorithm
Accuracy 83.33% 80.67%
Precision 89.16% 80.72%
Recall 82.22% 83.75%
To validate the result above, we perform the next experiment based on the cross validation
evaluation scenario. Three different k-folds are used in the scenario i.e. 5-fold, 10-fold, 20-fold and each
these k-fold is applied to both C4.5 and Naïve Bayes as well. The results are presented as Table 8 and Table
9. Table 8 presents C4.5 performance, whereas Table 9 presents Naïve Bayes performance. The cross
validation experiment confirms that, in this case, C4.5 achieves better performance compared to Naïve
Bayes. Of all k-folds applied C4.5 presents better accuracy than Naïve Bayes. The other information
presented by the results is their different performance pattern. C4.5 gives a better accuracy performance for
the less k-fold, whereas Naïve Bayes better accuracy performances are provided by the bigger k-fold.
Table 8. C4.5 Cross Validation Scenario Performance
Precision Recall Accuracy
5-fold 80.48% 83.07% 81.58%
10-fold 80.73% 83.07% 81.56%
20-fold 81.17% 83.06% 81.50%
Table 9. Naive Bayes C4.5 Cross Validation Scenario Performance
Precision Recall Accuracy
5-fold 76.47% 84.86% 80.39%
10-fold 76.73% 84.87% 80.41%
20-fold 77.25% 84.85% 80.41%
The superiority of C4.5 compared to Naive Bayes can be understood since all of the input variable
are independence each other, so C4.5 is more suitable to this characteristic of data. On the other side, the
nature of the Naive Bayes algorithm is based on the conditional probability of input variables, so in this case
the advantages of Naive Bayes is less use. Another implication shown by the results is that the customer
leasing application tends to fall into recommender application rather than classification.
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 8, No. 6, December 2018 : 4763 - 4771
4770
4. CONCLUSION AND FUTURE STUDY
In this study, C4.5 Algorithm and Naive Bayes Algorithm were implemented on a customer credit
dataset to predict the potential risk in the future. Based on two types of experiments scenario results, C4.5
algorithm achieves better performance. The results study presents that the recommender system as the
characteristics of C4.5 is more suitable than Naive Bayes which work based on conditional probability of the
input variables. Whereas, on C4.5 algorithm salary attribute is the most influential attribute shown by the its
significant value of entropy gain compared to other input variables. The dominant influence of the salary
attribute is also presented in every experiment scenario where the attribute is always selected as the root node
of the tree. In the future study, we will explore some opportunities to apply the others technique in this
domain. We also will investigate the other real applications which still open to exploit such as: customer
care, sales recommender, and micro finance which is growing quickly.
REFERENCES
[1] Karim M, Rahman RM. Decision Tree and Naïve Bayes Algorithm for Classification and Generation of Actionable
Knowledge for Direct Marketing. J Softw Eng Appl 2013; 06: 196–206.
[2] Dimitoglou G, Dimitoglou G, Adams JA, et al. Comparison of the C4 . 5 and a Naive Bayes Classifier for the
Prediction of Lung Cancer Comparison of the C4 . 5 and a Naive Bayes Classifier for the Prediction of Lung
Cancer Survivability.
[3] Arifin MF, Fitrianah D. Penerapan Algoritma Klasifikasi C4.5 Dalam Rekomendasi Penerimaan Mitra Penjualan
Studi Kasus : PT Atria Artha Persada. InComTech 2018; 8: 87–102.
[4] Jafar Hamid A, Ahmed TM. Developing Prediction Model of Loan Risk in Banks Using Data Mining. Mach Learn
Appl An Int J 2016; 3: 1–9.
[5] Krichene A. Using a naive Bayesian classifier methodology for loan risk assessment. J Econ Financ Adm Sci 2017;
22: 3–24.
[6] Buulolo E, Silalahi N, Fadlina, et al. C4.5 Algorithm To Predict the Impact of the Earthquake. Int J Eng Res
Technol 2017; 6: 10–15.
[7] Poongodi S, Babu MR. Prediction of Crop Production using Improved C4 . 5 with ANFIS Classifier. 10.
[8] Thariqa P, Sitanggang IS, Syaufina L. Comparative Analysis of Spatial Decision Tree Algorithms for Burned Area
of Peatland in Rokan Hilir Riau. Telkomnika (Telecommunication Comput Electron Control 2016; 14: 684–691.
[9] Nikam SS. A Comparative Study of Classification Techniques in Data Mining Algorithms. Orient J Comput Sci
Technol 2015; 8: 13–19.
[10] Basri Hasanuddin Z, Syarif S. Zakah Management System using Approach Classification. Telkomnika
(Telecommunication Comput Electron Control 2017; 15: 1852–1857.
[11] Susanti AR, Djatna T, Kusuma WA. Twitter’s Sentiment Analysis on Gsm Services using Multinomial Naïve
Bayes. TELKOMNIKA (Telecommunication Comput Electron Control 2017; 15: 1354.
[12] Larose DT. DISCOVERING KNOWLEDGE IN DATA An Introduction to Data Mining. John Wiley & Sons, Inc.,
2015.
[13] Patil, T. R., Sherekar MS. No Title. Perform Anal Naive Bayes J48 Classif Algorithm Data Classif; 6.
[14] waikato. Weka 3: Data Mining Software in Java.
[15] Ȧ SR, Sonika Ȧ. Effectiveness of Data Preprocessing for Data Mining. 2014; 4: 3480–3483.
[16] Santra a. K, Christy CJ. Genetic Algorithm and Confusion Matrix for Document Clustering. Int J Comput Sci
2012; 9: 322–328.
[17] Sadikin M, Fanany MI, Basaruddin T. A New Data Representation Based on Training Data Characteristics to
Extract Drug Name Entity in Medical Text. 2016.
[18] Mehdiyev N, Enke D, Fettke P, et al. Evaluating Forecasting Methods by Considering Different Accuracy
Measures. Procedia Comput Sci 2016; 95: 264–271.
BIOGRAPHIES OF AUTHORS
Mujiono Sadikin is faculty member of Faculty of Computer Science Universitas Mercu Buana
Jakarta. He held doctoral degree from Univiersitas Indonesia, Jakarta 2017. His research area is
in Data Mining, Machine Learning, and IT Governance as well. Some of his experiences are: As
team leader in IT Governance an Procedure preparation of Directorate Land & Transportations
Ministery of Transportation, Team leader of IT Audit and Assessment Universitas Mercu Buana,
and some more. Since 2012 he leads the University of Mercu Buana IT Directorate as the
Director.
9. Int J Elec & Comp Eng ISSN: 2088-8708
Comparative Study of Classification Method on Customer Candidate Data (Mujiono Sadikin)
4771
Fahri Alfiandi is a student in Faculty of Computer Science, Universitas Mercu Buana,
Indonesia. He was born in Jakarta on December 16th
, 1995. He is interested in data mining,
algorithm analysis and programming.