Computing advances in data storage are leading to rapid growth in large-scale datasets. Using all features increases temporal/spatial complexity and negatively influences performance. Feature selection is a fundamental stage in data preprocessing, removing redundant and irrelevant features to minimize the number of features and enhance the performance of classification accuracy. Numerous optimization algorithms were employed to handle feature selection (FS) problems, and they outperform conventional FS techniques. However, there is no metaheuristic FS method that outperforms other optimization algorithms in many datasets. This motivated our study to incorporate the advantages of various optimization techniques to obtain a powerful technique that outperforms other methods in many datasets from different domains. In this article, a novel combined method GASI is developed using swarm intelligence (SI) based feature selection techniques and genetic algorithms (GA) that uses a multi-objective fitness function to seek the optimal subset of features. To assess the performance of the proposed approach, seven datasets have been collected from the UCI repository and exploited to test the newly established feature selection technique. The experimental results demonstrate that the suggested method GASI outperforms many powerful SI-based feature selection techniques studied. GASI obtains a better average fitness value and improves classification performance.
In this research, a hybrid wrapper model is proposed to identify the featured gene subset from the gene expression data. To balance the gap between exploration
and exploitation, a hybrid model with a popular meta-heuristic algorithm named
spider monkey optimizer (SMO) and simulated annealing (SA) is applied. In the proposed model, ReliefF is used as a filter to obtain the relevant gene subset
from dataset by removing the noise and outliers prior to feeding the data to the
wrapper SMO. To enhance the quality of the solution, simulated annealing is
deployed as local search with the SMO in the second phase, which will guide to the detection of the most optimal feature subset. To evaluate the performance of the proposed model, support vector machine (SVM) as a fitness function to recognize the most informative biomarker gene from the cancer datasets along with University of California, Irvine (UCI) datasets. To further evaluate the model, 4 different classifiers (SVM, naΒ¨Δ±ve Bayes (NB), decision tree (DT), and k-nearest neighbors (KNN)) are used. From the experimental results and analysis, itβs noteworthy to accept that the ReliefF-SMO-SA-SVM performs relatively better than its state-of-the-art counterparts. For cancer datasets, our model performs better in terms of accuracy with a maximum of 99.45%.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Β
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
A MODIFIED BINARY PSO BASED FEATURE SELECTION FOR AUTOMATIC LESION DETECTION ...ijcsit
Β
This paper presents an effective feature selection method that can be applied to build a computer aided
diagnosis system for breast cancer in order to discriminate between healthy, benign and malignant
parenchyma. Determining the optimal feature set from a large set of original features is an important preprocessing
step which removes irrelevant and redundant features and thus improves computational
efficiency, classification accuracy and also simplifies the classifier structure. A modified binary particle
swarm optimized feature selection method (MBPSO)has been proposed where k-Nearest Neighbour
algorithm with leave-one-out cross validation serves as the fitness function. Digital mammograms obtained
from Regional Cancer Centre, Thiruvananthapuram and the mammograms from web accessible mini-MIAS
database has been used as the dataset for this experiment. Region of interests from the mammograms are
automatically detected and segmented. A total of 117 shape, texture and histogram features are extracted
from the ROIs. Significant features are selected using the proposed feature selection method.Classification
is performed using feed forward artificial neural networks with back propagation learning. Receiver
operating characteristics (ROC) and confusion matrix are used to evaluate the performance. Experimental
results show that the modified binary PSO feature selection method not only obtains better classification
accuracy but also simplifies the classification process as compared to full set of features. The performance
of the modified BPSO is found to be at par with other widely used feature selection techniques.
ABSTRACT
This paper presents an effective feature selection method that can be applied to build a computer aided diagnosis system for breast cancer in order to discriminate between healthy, benign and malignant parenchyma. Determining the optimal feature set from a large set of original features is an important preprocessing step which removes irrelevant and redundant features and thus improves computational efficiency, classification accuracy and also simplifies the classifier structure. A modified binary particle swarm optimized feature selection method (MBPSO)has been proposed where k-Nearest Neighbour algorithm with leave-one-out cross validation serves as the fitness function. Digital mammograms obtained from Regional Cancer Centre, Thiruvananthapuram and the mammograms from web accessible mini-MIAS database has been used as the dataset for this experiment. Region of interests from the mammograms are automatically detected and segmented. A total of 117 shape, texture and histogram features are extracted from the ROIs. Significant features are selected using the proposed feature selection method.Classification is performed using feed forward artificial neural networks with back propagation learning. Receiver operating characteristics (ROC) and confusion matrix are used to evaluate the performance. Experimental results show that the modified binary PSO feature selection method not only obtains better classification accuracy but also simplifies the classification process as compared to full set of features. The performance of the modified BPSO is found to be at par with other widely used feature selection techniques.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Β
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Integrated bio-search approaches with multi-objective algorithms for optimiza...TELKOMNIKA JOURNAL
Β
Optimal selection of features is very difficult and crucial to achieve, particularly for the task of classification. It is due to the traditional method of selecting features that function independently and generated the collection of irrelevant features, which therefore affects the quality of the accuracy of the classification. The goal of this paper is to leverage the potential of bio-inspired search algorithms, together with wrapper, in optimizing multi-objective algorithms, namely ENORA and NSGA-II to generate an optimal set of features. The main steps are to idealize the combination of ENORA and NSGA-II with suitable bio-search algorithms where multiple subset generation has been implemented. The next step is to validate the optimum feature set by conducting a subset evaluation. Eight (8) comparison datasets of various sizes have been deliberately selected to be checked. Results shown that the ideal combination of multi-objective algorithms, namely ENORA and NSGA-II, with the selected bio-inspired search algorithm is promising to achieve a better optimal solution (i.e. a best features with higher classification accuracy) for the selected datasets. This discovery implies that the ability of bio-inspired wrapper/filtered system algorithms will boost the efficiency of ENORA and NSGA-II for the task of selecting and classifying features.
A chi-square-SVM based pedagogical rule extraction method for microarray data...IJAAS Team
Β
Support Vector Machine (SVM) is currently an efficient classification technique due to its ability to capture nonlinearities in diagnostic systems, but it does not reveal the knowledge learnt during training. It is important to understand of how a decision is reached in the machine learning technology, such as bioinformatics. On the other hand, a decision tree has good comprehensibility; the process of converting such incomprehensible models into an understandable model is often regarded as rule extraction. In this paper we proposed an approach for extracting rules from SVM for microarray dataset by combining the merits of both the SVM and decision tree. The proposed approach consists of three steps; the SVM-CHI-SQUARE is employed to reduce the feature set. Dataset with reduced features is used to obtain SVM model and synthetic data is generated. Classification and Regression Tree (CART) is used to generate Rules as the Last phase. We use breast masses dataset from UCI repository where comprehensibility is a key requirement. From the result of the experiment as the reduced feature dataset is used, the proposed approach extracts smaller length rules, thereby improving the comprehensibility of the system. We obtained accuracy of 93.53%, sensitivity of 89.58%, specificity of 96.70%, and training time of 3.195 seconds. A comparative analysis is carried out done with other algorithms.
In this research, a hybrid wrapper model is proposed to identify the featured gene subset from the gene expression data. To balance the gap between exploration
and exploitation, a hybrid model with a popular meta-heuristic algorithm named
spider monkey optimizer (SMO) and simulated annealing (SA) is applied. In the proposed model, ReliefF is used as a filter to obtain the relevant gene subset
from dataset by removing the noise and outliers prior to feeding the data to the
wrapper SMO. To enhance the quality of the solution, simulated annealing is
deployed as local search with the SMO in the second phase, which will guide to the detection of the most optimal feature subset. To evaluate the performance of the proposed model, support vector machine (SVM) as a fitness function to recognize the most informative biomarker gene from the cancer datasets along with University of California, Irvine (UCI) datasets. To further evaluate the model, 4 different classifiers (SVM, naΒ¨Δ±ve Bayes (NB), decision tree (DT), and k-nearest neighbors (KNN)) are used. From the experimental results and analysis, itβs noteworthy to accept that the ReliefF-SMO-SA-SVM performs relatively better than its state-of-the-art counterparts. For cancer datasets, our model performs better in terms of accuracy with a maximum of 99.45%.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Β
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
A MODIFIED BINARY PSO BASED FEATURE SELECTION FOR AUTOMATIC LESION DETECTION ...ijcsit
Β
This paper presents an effective feature selection method that can be applied to build a computer aided
diagnosis system for breast cancer in order to discriminate between healthy, benign and malignant
parenchyma. Determining the optimal feature set from a large set of original features is an important preprocessing
step which removes irrelevant and redundant features and thus improves computational
efficiency, classification accuracy and also simplifies the classifier structure. A modified binary particle
swarm optimized feature selection method (MBPSO)has been proposed where k-Nearest Neighbour
algorithm with leave-one-out cross validation serves as the fitness function. Digital mammograms obtained
from Regional Cancer Centre, Thiruvananthapuram and the mammograms from web accessible mini-MIAS
database has been used as the dataset for this experiment. Region of interests from the mammograms are
automatically detected and segmented. A total of 117 shape, texture and histogram features are extracted
from the ROIs. Significant features are selected using the proposed feature selection method.Classification
is performed using feed forward artificial neural networks with back propagation learning. Receiver
operating characteristics (ROC) and confusion matrix are used to evaluate the performance. Experimental
results show that the modified binary PSO feature selection method not only obtains better classification
accuracy but also simplifies the classification process as compared to full set of features. The performance
of the modified BPSO is found to be at par with other widely used feature selection techniques.
ABSTRACT
This paper presents an effective feature selection method that can be applied to build a computer aided diagnosis system for breast cancer in order to discriminate between healthy, benign and malignant parenchyma. Determining the optimal feature set from a large set of original features is an important preprocessing step which removes irrelevant and redundant features and thus improves computational efficiency, classification accuracy and also simplifies the classifier structure. A modified binary particle swarm optimized feature selection method (MBPSO)has been proposed where k-Nearest Neighbour algorithm with leave-one-out cross validation serves as the fitness function. Digital mammograms obtained from Regional Cancer Centre, Thiruvananthapuram and the mammograms from web accessible mini-MIAS database has been used as the dataset for this experiment. Region of interests from the mammograms are automatically detected and segmented. A total of 117 shape, texture and histogram features are extracted from the ROIs. Significant features are selected using the proposed feature selection method.Classification is performed using feed forward artificial neural networks with back propagation learning. Receiver operating characteristics (ROC) and confusion matrix are used to evaluate the performance. Experimental results show that the modified binary PSO feature selection method not only obtains better classification accuracy but also simplifies the classification process as compared to full set of features. The performance of the modified BPSO is found to be at par with other widely used feature selection techniques.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Β
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Integrated bio-search approaches with multi-objective algorithms for optimiza...TELKOMNIKA JOURNAL
Β
Optimal selection of features is very difficult and crucial to achieve, particularly for the task of classification. It is due to the traditional method of selecting features that function independently and generated the collection of irrelevant features, which therefore affects the quality of the accuracy of the classification. The goal of this paper is to leverage the potential of bio-inspired search algorithms, together with wrapper, in optimizing multi-objective algorithms, namely ENORA and NSGA-II to generate an optimal set of features. The main steps are to idealize the combination of ENORA and NSGA-II with suitable bio-search algorithms where multiple subset generation has been implemented. The next step is to validate the optimum feature set by conducting a subset evaluation. Eight (8) comparison datasets of various sizes have been deliberately selected to be checked. Results shown that the ideal combination of multi-objective algorithms, namely ENORA and NSGA-II, with the selected bio-inspired search algorithm is promising to achieve a better optimal solution (i.e. a best features with higher classification accuracy) for the selected datasets. This discovery implies that the ability of bio-inspired wrapper/filtered system algorithms will boost the efficiency of ENORA and NSGA-II for the task of selecting and classifying features.
A chi-square-SVM based pedagogical rule extraction method for microarray data...IJAAS Team
Β
Support Vector Machine (SVM) is currently an efficient classification technique due to its ability to capture nonlinearities in diagnostic systems, but it does not reveal the knowledge learnt during training. It is important to understand of how a decision is reached in the machine learning technology, such as bioinformatics. On the other hand, a decision tree has good comprehensibility; the process of converting such incomprehensible models into an understandable model is often regarded as rule extraction. In this paper we proposed an approach for extracting rules from SVM for microarray dataset by combining the merits of both the SVM and decision tree. The proposed approach consists of three steps; the SVM-CHI-SQUARE is employed to reduce the feature set. Dataset with reduced features is used to obtain SVM model and synthetic data is generated. Classification and Regression Tree (CART) is used to generate Rules as the Last phase. We use breast masses dataset from UCI repository where comprehensibility is a key requirement. From the result of the experiment as the reduced feature dataset is used, the proposed approach extracts smaller length rules, thereby improving the comprehensibility of the system. We obtained accuracy of 93.53%, sensitivity of 89.58%, specificity of 96.70%, and training time of 3.195 seconds. A comparative analysis is carried out done with other algorithms.
Classification problems specified in high dimensional data with smallnumber of observation are generally becoming common in specific microarray data. In the time of last two periods of years, manyefficient classification standard models and also Feature Selection (FS) algorithm which isalso referred as FS technique have basically been proposed for higher prediction accuracies. Although, the outcome of FS algorithm related to predicting accuracy is going to be unstable over the variations in considered trainingset, in high dimensional data. In this paperwe present a latest evaluation measure Q-statistic that includes the stability of the selected feature subset in inclusion to prediction accuracy. Then we are going to propose the standard Booster of a FS algorithm that boosts the basic value of the preferred Q-statistic of the algorithm applied. Therefore study on synthetic data and 14 microarray data sets shows that Booster boosts not only the value of Q-statistics but also the prediction accuracy of the algorithm applied.
A new model for iris data set classification based on linear support vector m...IJECEIAES
Β
Data mining is known as the process of detection concerning patterns from essential amounts of data. As a process of knowledge discovery. Classification is a data analysis that extracts a model which describes an important data classes. One of the outstanding classifications methods in data mining is support vector machine classification (SVM). It is capable of envisaging results and mostly effective than other classification methods. The SVM is a one technique of machine learning techniques that is well known technique, learning with supervised and have been applied perfectly to a vary problems of: regression, classification, and clustering in diverse domains such as gene expression, web text mining. In this study, we proposed a newly mode for classifying iris data set using SVM classifier and genetic algorithm to optimize c and gamma parameters of linear SVM, in addition principle components analysis (PCA) algorithm was use for features reduction.
Hybrid filtering methods for feature selection in high-dimensional cancer dataIJECEIAES
Β
Statisticians in both academia and industry have encountered problems with high-dimensional data. The rapid feature increase has caused the feature count to outstrip the instance count. There are several established methods when selecting features from massive amounts of breast cancer data. Even so, overfitting continues to be a problem. The challenge of choosing important features with minimum loss in a different sample size is another area with room for development. As a result, the feature selection technique is crucial for dealing with high-dimensional data classification issues. This paper proposed a new architecture for high-dimensional breast cancer data using filtering techniques and a logistic regression model. Essential features are filtered out using a combination of hybrid chiβsquare and hybrid information gain (hybrid IG) with logistic regression as classifier. The results showed that hybrid IG performed the best for high-dimensional breast and prostate cancer data. The top 50 and 22 features outperformed the other configurations, with the highest classification accuracies of 86.96% and 82.61%, respectively, after integrating the hybrid information gain and logistic function (hybrid IG+LR) with a sample size of 75. In the future, multiclass classification of multidimensional medical data to be evaluated using data from a different domain.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Β
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Β
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Β
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second
most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast
cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast
cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data
processing tools, we tackled this disease analysis. Data mining is an important step of library discovery
where intelligent methods are used to detect patterns. Several clinical breast cancer studies were
conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier,
easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise
the testing algorithm. We used genetic programming methods to choose classification machines' best
features and parameter values. Data mining is an important step of library discovery where intelligent
methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A
comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F.
Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Β
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast
cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise
the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%
The International Journal of Engineering and Science (The IJES)theijes
Β
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
An overlapping conscious relief-based feature subset selection methodIJECEIAES
Β
Feature selection is considered as a fundamental prepossessing step in various data mining and machine learning based works. The quality of features is essential to achieve good classification performance and to have better data analysis experience. Among several feature selection methods, distance-based methods are gaining popularity because of their eligibility in capturing feature interdependency and relevancy with the endpoints. However, most of the distance-based methods only rank the features and ignore the class overlapping issues. Features with class overlapping data work as an obstacle during classification. Therefore, the objective of this research work is to propose a method named overlapping conscious MultiSURF (OMsurf) to handle data overlapping and select a subset of informative features discarding the noisy ones. Experimental results over 20 benchmark dataset demonstrates the superiority of OMsurf over six existing state-of-the-art methods.
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Β
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. NaΓ―ve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Β
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
Evolving Efficient Clustering and Classification Patterns in Lymphography Dat...ijsc
Β
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an established, proven structure from a voluminous collection of facts. A dominant area of modern-day research in the field of medical investigations includes disease prediction and malady categorization. In this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering techniques and compare the performance of classification algorithms on the clinical data. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen classification algorithms on the Lymphography dataset that enables the classifier to accurately perform multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and the Quinlanβs C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm offers increased clustering accuracy in less computation time.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Β
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlanβs C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
New hybrid ensemble method for anomaly detection in data science IJECEIAES
Β
Anomaly detection is a significant research area in data science. Anomaly detection is used to find unusual points or uncommon events in data streams. It is gaining popularity not only in the business world but also in different of other fields, such as cyber security, fraud detection for financial systems, and healthcare. Detecting anomalies could be useful to find new knowledge in the data. This study aims to build an effective model to protect the data from these anomalies. We propose a new hyper ensemble machine learning method that combines the predictions from two methodologies the outcomes of isolation forest-k-means and random forest using a voting majority. Several available datasets, including KDD Cup-99, Credit Card, Wisconsin Prognosis Breast Cancer (WPBC), Forest Cover, and Pima, were used to evaluate the proposed method. The experimental results exhibit that our proposed model gives the highest realization in terms of receiver operating characteristic performance, accuracy, precision, and recall. Our approach is more efficient in detecting anomalies than other approaches. The highest accuracy rate achieved is 99.9%, compared to accuracy without a voting method, which achieves 97%.
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...rahulmonikasharma
Β
Classification problems in high dimensional information with little sort of observations became furthercommon significantly in microarray information. The increasing amount of text data on internet sites affects the agglomerationanalysis. The text agglomeration could also be a positive analysis technique used for partitioning a huge amount of datainto clusters. Hence, the most necessary draw back that affects the text agglomeration technique is that the presenceuninformative and distributed choices in text documents. A broad class of boosting algorithms is known as actingcoordinate-wise gradient descent to attenuate some potential performs of the margins of a data set. This paperproposes a novel analysis live Q-statistic that comes with the soundness of the chosen feature set to boot to theprediction accuracy. Then we've a bent to propose the Booster of associate degree FS algorithm that enhances theworth of the Q-statistic of the algorithm applied.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
Β
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidateβs features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
Β
The development of microarray technology has supplied a large volume of data to many fields. The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. In as much as the data achieving from microarray technology is very noisy and also has thousands of features, feature selection plays an important role in removing irrelevant and redundant features and also reducing computational complexity. There are two important approaches for gene selection in microarray data analysis, the filters and the wrappers. To select a concise subset of informative genes, we introduce a hybrid feature selection which combines two approaches. The fact of the matter is that candidateβs features are first selected from the original set via several effective filters. The candidate feature set is further refined by more accurate wrappers. Thus, we can take advantage of both the filters and wrappers. Experimental results based on 11 microarray datasets show that our mechanism can be effected with a smaller feature set. Moreover, these feature subsets can be obtained in a reasonable time.
Improving the performance of Intrusion detection systemsyasmen essam
Β
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Β
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
Β
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
More Related Content
Similar to Enhancing feature selection with a novel hybrid approach incorporating genetic algorithms and swarm intelligence techniques
Classification problems specified in high dimensional data with smallnumber of observation are generally becoming common in specific microarray data. In the time of last two periods of years, manyefficient classification standard models and also Feature Selection (FS) algorithm which isalso referred as FS technique have basically been proposed for higher prediction accuracies. Although, the outcome of FS algorithm related to predicting accuracy is going to be unstable over the variations in considered trainingset, in high dimensional data. In this paperwe present a latest evaluation measure Q-statistic that includes the stability of the selected feature subset in inclusion to prediction accuracy. Then we are going to propose the standard Booster of a FS algorithm that boosts the basic value of the preferred Q-statistic of the algorithm applied. Therefore study on synthetic data and 14 microarray data sets shows that Booster boosts not only the value of Q-statistics but also the prediction accuracy of the algorithm applied.
A new model for iris data set classification based on linear support vector m...IJECEIAES
Β
Data mining is known as the process of detection concerning patterns from essential amounts of data. As a process of knowledge discovery. Classification is a data analysis that extracts a model which describes an important data classes. One of the outstanding classifications methods in data mining is support vector machine classification (SVM). It is capable of envisaging results and mostly effective than other classification methods. The SVM is a one technique of machine learning techniques that is well known technique, learning with supervised and have been applied perfectly to a vary problems of: regression, classification, and clustering in diverse domains such as gene expression, web text mining. In this study, we proposed a newly mode for classifying iris data set using SVM classifier and genetic algorithm to optimize c and gamma parameters of linear SVM, in addition principle components analysis (PCA) algorithm was use for features reduction.
Hybrid filtering methods for feature selection in high-dimensional cancer dataIJECEIAES
Β
Statisticians in both academia and industry have encountered problems with high-dimensional data. The rapid feature increase has caused the feature count to outstrip the instance count. There are several established methods when selecting features from massive amounts of breast cancer data. Even so, overfitting continues to be a problem. The challenge of choosing important features with minimum loss in a different sample size is another area with room for development. As a result, the feature selection technique is crucial for dealing with high-dimensional data classification issues. This paper proposed a new architecture for high-dimensional breast cancer data using filtering techniques and a logistic regression model. Essential features are filtered out using a combination of hybrid chiβsquare and hybrid information gain (hybrid IG) with logistic regression as classifier. The results showed that hybrid IG performed the best for high-dimensional breast and prostate cancer data. The top 50 and 22 features outperformed the other configurations, with the highest classification accuracies of 86.96% and 82.61%, respectively, after integrating the hybrid information gain and logistic function (hybrid IG+LR) with a sample size of 75. In the future, multiclass classification of multidimensional medical data to be evaluated using data from a different domain.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Β
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Β
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Β
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second
most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast
cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast
cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data
processing tools, we tackled this disease analysis. Data mining is an important step of library discovery
where intelligent methods are used to detect patterns. Several clinical breast cancer studies were
conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier,
easier, or more comprehensive than others. This research is focused on genetic programming and machine
learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise
the testing algorithm. We used genetic programming methods to choose classification machines' best
features and parameter values. Data mining is an important step of library discovery where intelligent
methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A
comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F.
Tree processes, i.e. 97.71%.
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
Β
Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast
cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise
the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data
set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%
The International Journal of Engineering and Science (The IJES)theijes
Β
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
An overlapping conscious relief-based feature subset selection methodIJECEIAES
Β
Feature selection is considered as a fundamental prepossessing step in various data mining and machine learning based works. The quality of features is essential to achieve good classification performance and to have better data analysis experience. Among several feature selection methods, distance-based methods are gaining popularity because of their eligibility in capturing feature interdependency and relevancy with the endpoints. However, most of the distance-based methods only rank the features and ignore the class overlapping issues. Features with class overlapping data work as an obstacle during classification. Therefore, the objective of this research work is to propose a method named overlapping conscious MultiSURF (OMsurf) to handle data overlapping and select a subset of informative features discarding the noisy ones. Experimental results over 20 benchmark dataset demonstrates the superiority of OMsurf over six existing state-of-the-art methods.
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Β
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. NaΓ―ve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Β
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
Evolving Efficient Clustering and Classification Patterns in Lymphography Dat...ijsc
Β
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an established, proven structure from a voluminous collection of facts. A dominant area of modern-day research in the field of medical investigations includes disease prediction and malady categorization. In this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering techniques and compare the performance of classification algorithms on the clinical data. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen classification algorithms on the Lymphography dataset that enables the classifier to accurately perform multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and the Quinlanβs C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm offers increased clustering accuracy in less computation time.
EVOLVING EFFICIENT CLUSTERING AND CLASSIFICATION PATTERNS IN LYMPHOGRAPHY DAT...ijsc
Β
Data mining refers to the process of retrieving knowledge by discovering novel and relative patterns from
large datasets. Clustering and Classification are two distinct phases in data mining that work to provide an
established, proven structure from a voluminous collection of facts. A dominant area of modern-day
research in the field of medical investigations includes disease prediction and malady categorization. In
this paper, our focus is to analyze clusters of patient records obtained via unsupervised clustering
techniques and compare the performance of classification algorithms on the clinical data. Feature
selection is a supervised method that attempts to select a subset of the predictor features based on the
information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with
the class label having four distinct values. This paper highlights the accuracy of eight clustering algorithms
in detecting clusters of patient records and predictor attributes and highlights the performance of sixteen
classification algorithms on the Lymphography dataset that enables the classifier to accurately perform
multi-class categorization of medical data. Our work asserts the fact that the Random Tree algorithm and
the Quinlanβs C4.5 algorithm give 100 percent classification accuracy with all the predictor features and
also with the feature subset selected by the Fisher Filtering feature selection algorithm.. It is also stated
here that the Density Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm
offers increased clustering accuracy in less computation time.
New hybrid ensemble method for anomaly detection in data science IJECEIAES
Β
Anomaly detection is a significant research area in data science. Anomaly detection is used to find unusual points or uncommon events in data streams. It is gaining popularity not only in the business world but also in different of other fields, such as cyber security, fraud detection for financial systems, and healthcare. Detecting anomalies could be useful to find new knowledge in the data. This study aims to build an effective model to protect the data from these anomalies. We propose a new hyper ensemble machine learning method that combines the predictions from two methodologies the outcomes of isolation forest-k-means and random forest using a voting majority. Several available datasets, including KDD Cup-99, Credit Card, Wisconsin Prognosis Breast Cancer (WPBC), Forest Cover, and Pima, were used to evaluate the proposed method. The experimental results exhibit that our proposed model gives the highest realization in terms of receiver operating characteristic performance, accuracy, precision, and recall. Our approach is more efficient in detecting anomalies than other approaches. The highest accuracy rate achieved is 99.9%, compared to accuracy without a voting method, which achieves 97%.
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...rahulmonikasharma
Β
Classification problems in high dimensional information with little sort of observations became furthercommon significantly in microarray information. The increasing amount of text data on internet sites affects the agglomerationanalysis. The text agglomeration could also be a positive analysis technique used for partitioning a huge amount of datainto clusters. Hence, the most necessary draw back that affects the text agglomeration technique is that the presenceuninformative and distributed choices in text documents. A broad class of boosting algorithms is known as actingcoordinate-wise gradient descent to attenuate some potential performs of the margins of a data set. This paperproposes a novel analysis live Q-statistic that comes with the soundness of the chosen feature set to boot to theprediction accuracy. Then we've a bent to propose the Booster of associate degree FS algorithm that enhances theworth of the Q-statistic of the algorithm applied.
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
Β
The development of microarray technology has suppli
ed a large volume of data to many fields. The gene
microarray analysis and classification have demonst
rated an effective way for the effective diagnosis
of
diseases and cancers. In as much as the data achiev
ing from microarray technology is very noisy and al
so
has thousands of features, feature selection plays
an important role in removing irrelevant and redund
ant
features and also reducing computational complexity
. There are two important approaches for gene
selection in microarray data analysis, the filters
and the wrappers. To select a concise subset of inf
ormative
genes, we introduce a hybrid feature selection whic
h combines two approaches. The fact of the matter i
s
that candidateβs features are first selected from t
he original set via several effective filters. The
candidate
feature set is further refined by more accurate wra
ppers. Thus, we can take advantage of both the filt
ers
and wrappers. Experimental results based on 11 micr
oarray datasets show that our mechanism can be
effected with a smaller feature set. Moreover, thes
e feature subsets can be obtained in a reasonable t
ime
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
Β
The development of microarray technology has supplied a large volume of data to many fields. The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. In as much as the data achieving from microarray technology is very noisy and also has thousands of features, feature selection plays an important role in removing irrelevant and redundant features and also reducing computational complexity. There are two important approaches for gene selection in microarray data analysis, the filters and the wrappers. To select a concise subset of informative genes, we introduce a hybrid feature selection which combines two approaches. The fact of the matter is that candidateβs features are first selected from the original set via several effective filters. The candidate feature set is further refined by more accurate wrappers. Thus, we can take advantage of both the filters and wrappers. Experimental results based on 11 microarray datasets show that our mechanism can be effected with a smaller feature set. Moreover, these feature subsets can be obtained in a reasonable time.
Improving the performance of Intrusion detection systemsyasmen essam
Β
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
Similar to Enhancing feature selection with a novel hybrid approach incorporating genetic algorithms and swarm intelligence techniques (20)
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Β
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
Β
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Β
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Β
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
Β
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
Β
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Β
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Β
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
Β
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances studentsβ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Β
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Β
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Β
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
Β
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Β
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
Β
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
Β
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Β
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
Β
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Β
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Β
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Forklift Classes Overview by Intella PartsIntella Parts
Β
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Β
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Event Management System Vb Net Project Report.pdfKamal Acharya
Β
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named βEvent Management Systemβ is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Β
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Courier management system project report.pdfKamal Acharya
Β
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
Β
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the studentβs details, driverβs details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologistβs survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Vaccine management system project report documentation..pdfKamal Acharya
Β
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Vaccine management system project report documentation..pdf
Β
Enhancing feature selection with a novel hybrid approach incorporating genetic algorithms and swarm intelligence techniques
1. International Journal of Electrical and Computer Engineering (IJECE)
Vol. 14, No. 1, February 2024, pp. 944~959
ISSN: 2088-8708, DOI: 10.11591/ijece.v14i1.pp944-959 ο² 944
Journal homepage: http://ijece.iaescore.com
Enhancing feature selection with a novel hybrid approach
incorporating genetic algorithms and swarm intelligence
techniques
Salsabila Benghazouani1
, Said Nouh1
, Abdelali Zakrani2
, Ihsane Haloum3
, Mostafa Jebbar4
1
Department of Mathematics and Computer Science, Faculty of Sciences Ben MβSick, Hassan II University, Casablanca, Morocco
2
Department of Computer Science Engineering, ENSAM, Hassan II University, Casablanca, Morocco
3
Department of Immunogenetics and Human Pathologies, Faculty of Medicine and Pharmacy, Hassan II University,
Casablanca, Morocco
4
Departement of Mathematics and Computer Science, EST, Hassan II University, Casablanca, Morocco
Article Info ABSTRACT
Article history:
Received Oct 20, 2022
Revised Sep 2, 2023
Accepted Sep 15, 2023
Computing advances in data storage are leading to rapid growth in
large-scale datasets. Using all features increases temporal/spatial complexity
and negatively influences performance. Feature selection is a fundamental
stage in data preprocessing, removing redundant and irrelevant features to
minimize the number of features and enhance the performance of
classification accuracy. Numerous optimization algorithms were employed
to handle feature selection (FS) problems, and they outperform conventional
FS techniques. However, there is no metaheuristic FS method that
outperforms other optimization algorithms in many datasets. This motivated
our study to incorporate the advantages of various optimization techniques to
obtain a powerful technique that outperforms other methods in many
datasets from different domains. In this article, a novel combined method
GASI is developed using swarm intelligence (SI) based feature selection
techniques and genetic algorithms (GA) that uses a multi-objective fitness
function to seek the optimal subset of features. To assess the performance of
the proposed approach, seven datasets have been collected from the UCI
repository and exploited to test the newly established feature selection
technique. The experimental results demonstrate that the suggested method
GASI outperforms many powerful SI-based feature selection techniques
studied. GASI obtains a better average fitness value and improves
classification performance.
Keywords:
Feature selection
Genetic algorithms
Machine learning
Multi-objective optimization
Swarm intelligence
This is an open access article under the CC BY-SA license.
Corresponding Author:
Salsabila Benghazouani
Department of Mathematics and Computer Science, Faculty of Sciences Ben MβSick, Hassan II University
Casablanca, Morocco
Email: benghazouani.salsabila239@gmail.com
1. INTRODUCTION
Feature selection plays a crucial role in the preprocessing phase of machine learning, it eliminates
irrelevant and redundant features (noisy attributes), which increases the performance of a classifier and
reduces the computational complexity [1]. In the healthcare sector, feature identification and selection play a
vital role in enhancing accuracy in prediction, classification, and detection systems. This crucial
preprocessing step not only enables reduction of dimensionality but also permits a better understanding of
pathologies [2]. In an exhaustive search space, the number of possible combinations to determine the most
relevant and non-redundant features is 2n
, where n represents the number of features (NP-complete problem)
2. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Enhancing feature selection with a novel hybrid approach incorporating β¦ (Salsabila Benghazouani)
945
[3]. Numerous feature selection algorithms have been suggested in the existing literature, and they are
generally classified into three groups: filter algorithms, wrapper algorithms, and integrated algorithms [4].
The filter approach is independent on the learning algorithm and uses information-theoretic measures to
assess and classify the features [5]. The advantage of this method is represented in terms of computation
efficiency and it is power against overfitting [6]. By contrast, wrapper approaches employ a learning
algorithm to assess subsets of features, which gives high accuracy for classifiers. However, they require a
high computation time [7]. Integrated approaches integrate the advantages of filtering and wrapping methods.
They incorporate the selection of variables during the learning process which allows to reach a compromise
between the computational cost and the model performance [8]. However, the fundamental difficulty with the
filter technique is that the features are chosen autonomously without using the machine learning classifier.
While the wrapper technique chooses features using an optimization algorithm and works directly with the
classifier [9]. Compared to the standard exhaustive search, optimization algorithms offer the advantage of
efficiently selecting the optimal subset of features in a reasonable time.
Numerous optimization methods have recently been utilized to tackle feature selection (FS)
problems, and they significantly outperform more traditional FS techniques. However, no meta-heuristic FS
approach surpasses other optimization algorithms in many datasets. Such as, Rostami et al. [10] compared
the performance of different swarm intelligence (SI) based feature selection methods on several datasets. The
findings indicate that on the support vector machine (SVM) classifier, in the colon dataset, the cuckoo
optimization algorithm (COA) outperforms the particle swarm optimization (PSO) method. However, in the
isolated letter speech recognition (ISOLET) dataset, the PSO-based method performs better than COA. In
study [11], an improved salp swarm algorithm (ISSA) is developed and compared to other swarm techniques.
The findings revealed that when employing the k-nearest neighbor (KNN) classifier on the Waveform
dataset, the SSA-based method exhibited superior performance in comparison to PSO. Conversely, for the
Parkinsonβs dataset, the PSO-based method surpassed the SSA-based method in terms of performance.
This paper seeks to remedy these limitations by proposing a powerful feature selection approach
based on a genetic algorithm (GA) that combines the advantages of various swarm intelligence (SI)-based
feature selection techniques. The objective is to efficiently use helpful information from various SI-based
feature selection techniques to obtain a better average fitness value and higher classification performance
than other optimization algorithms in many data sets from different fields. The suggested feature selection
approach has been applied to seven databases from the field of experimentation and publicly available UCI
databases (colon, breast cancer Wisconsin, heart, arrhythmia, sonar, ionosphere, waveform). Moreover, the
potency of the suggested method was then tested. The remainder of this article is structured to follow: section
2 describes the literature survey and related works. Section 3 details the proposed feature selection method
GASI. Experimental results and discussion are shown in section 4, which is then succeeded by a conclusion
and future perspectives in section 5.
2. LITERATURE SURVEY AND RELATED WORKS
A crucial issue for machine learning tasks is the large dimensionality of a data set with huge feature
spaces and a limited number of samples [12]. Dimensionality reduction is a technique to tackle this issue by
removing redundant and noisy features. This improves the classifierβs performance and reduces its complexity
in terms of computation and memory space. Dimensionality reduction approaches are typically categorized into
two groups: feature selection and feature extraction. A reduction based on a data transformation is called a
feature extraction, which replaces the initial data set with a new reduced one built from the initial set of features.
A feature selection-based reduction chooses the most pertinent features from the dataset. In the following
subsection, we briefly describe some feature selection approaches: filter, wrapper, and embedded.
Filter techniques use statistical performance measures to evaluate features and select the best
ranking ones; these approaches are not dependent on the learning algorithm [5]. The filter methods are
categorized into two categories: multivariate and univariate. The univariate approaches assess the pertinence
of the attributes to the target class by an assessment criteria like mutual information (MI), information gain
(IG), and Gini index (GI) [13]. This approach does not consider the interactions between features [3] and is
prone to getting stuck in a local optimum [14]. The multivariate methods take into consideration the
dependencies between features which allows the elimination of irrelevant and redundant variables. Among
the multivariate methods, the maximum relevance minimum redundancy approach (MRMR) [15] and the
relevance redundancy feature selection (RRFS) method [16].
The wrapper method is based on the learning algorithm to assess the variables to choose an optimum
subset of characteristics with high classification precision [3]. Although this method uses a classifier and
considers the interactions between variables, it remains computationally expensive [17]. Generally, a
cross-validation mechanism is often used to reduce time complexity and avoid overfitting problems [18].
3. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 944-959
946
The embedded approach differs from other feature selection approaches. On the one hand,
learning algorithms are not employed in the filter procedures. On the other hand, the wrapper approaches
utilize a learning machine technique to assess the quality of feature subsets, independent of knowledge
about the classification or regression functionβs specific structure [19]. Whereas the embedded approach
integrates feature selection into the training process, it uses a machine learning method to seek the best
subset of features while assuring a balance between computational cost and model performance [8].
Among the learning algorithms that use this concept: decision trees (DT), support vector machine (SVM),
and AdaBoost. For example, DT is a tree-based classifier with several nodes and leaves. Each leaf is a
label of class, while each node represents a particular feature. The relevance of a feature is determined by
its location in the DT. Therefore, in DT-based integration approaches, the tree is first generated using an
ensemble of models, and subsequently, the features engaged in the classification are chosen as the
definitive subset of features [20].
Feature selection is considered among NP-hard problems due to the number of feasible subsets of
variables increasing exponentially with the number of predictors [3]. Metaheuristics are approximate search
methods often used for NP-hard problems, as they can achieve satisfactory (near-optimal) solutions in a short
time [6]. Several feature selection methods use metaheuristics to escape local optimum and decrease
computational complexity in high dimensional datasets [21]. This meta-heuristic is also based on an initially
randomly generated population, then a fitness function that assesses the performance of the individual
solutions of this one; a new population will be created if any termination criteria are not satisfied. This
process is then iterated until one of the end criteria is fulfilled [22].
Genetic algorithm (GA) is an evolutionary computation algorithm that draws inspiration from the
Darwinian evolution of biological populations. This well-known approach imitates the mechanism of natural
selection, where the most appropriate individuals are selected for the reproduction of the next generationβs
children. This suggests that GA functions as child chromosomes are produced from their parentsβ
chromosomes. Genetic operators, including crossover and mutation, are among the most crucial components
of GAs and play a major part in utilizing the search space to discover novel solutions. While the mutation
operators are in charge of creating new information by altering part of it, the crossover operators can search
for new solutions using data already present in the population. In GAs, crossover operators are typically used
to find novel solutions considerably more frequently than mutation operators. Although during the search
procedure, the mutation operators assist in escaping the local optima. Genetic algorithms (GAs) have
successfully shown their high ability to solve optimization problems, including feature selection problems
[7], and numerous authors have suggested several GAs variants to solve the feature selection problem [4],
[23]β[25]. In 2016, Cerrada et al., [26] showed that GA could effectively achieve optimal global solutions for
problems with large search spaces.
PSO is a robust optimization approach focused on SI, developed by Kennedy and Eberhart [27]. The
approach is founded on the collective behavior of the shoal of fish and the flight of birds. PSO is utilized in
various machine learning and feature selection applications. For instance, in [28], a multi-objective feature
selection strategy using gray wolf optimization (GWO) and Newtonβs law derived PSO is created to reduce
the number of chosen features and the rate of classification errors. In [29], a comparison of classification
accuracy between PSO and a hybrid method that employs the Harris Hawk optimization algorithm (HHO) for
optimizing SVM is performed. In [30], a hybrid meta-heuristic approach combining PSO and adaptive GA
operators is introduced. This approach aims to optimize feature selection in machine learning models
specifically designed to detect instances of tax avoidance. In [31], a thorough examination is conducted on
current classification methods and gene selection techniques. The paper specifically emphasizes the
effectiveness of emerging methods, like the SI algorithm, in the tasks of feature selection and classification
for microarrays with high-dimensional data. In [32], a proposed system is presented that achieves automatic
classification and detection of different pest attacks and plant infections. This is accomplished by employing
a combination of radial basis probabilistic network (RBPN) and a genetic algorithm-based particle swarm
optimization (GA-based PSO) method. In [33], a feature selection technique derived from PSO is proposed,
which incorporates multiple classifiers. This approach utilizes adaptive parameters and strategies to tackle
feature selection problems on a large scale, with the aim of improving classification accuracy and reducing
computational complexity.
Differential evolution (DE) is a stochastic search algorithm focused on swarm intelligence,
proposed by Storn and Price [34]. This optimization method was first developed to resolve the Chebyshev
polynomial issue, but it also has been demonstrated effective in solving complex optimization issues [35].
Zhang et al. [36] presented a multi-objective feature selection method focused on differential evolution
and defined a mutation operator to evade local optimum. Li et al. [37] proposed a novel large-scale multi-
objective cooperative co-evolution method for feature selection to search for subsets of optimal features
efficiently.
4. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Enhancing feature selection with a novel hybrid approach incorporating β¦ (Salsabila Benghazouani)
947
The cuckoo optimization algorithm (COA) is a recent evolutionary optimization approach focusing
on swarm intelligence, which is derived from the life of a bird named the cuckoo [38]. It is principle is
inspired by the behavior of the cuckoo bird in nesting and egg-laying to overcome optimization issues and
find the global optimum [39]. In [40], a combination of neural network and cuckoo search algorithm is
deployed for feature selection in heart disease classification. The firefly algorithm (FA) is an excellent
instance of SI, in which underperforming entities collaborate to generate high-performance solutions. In [41],
Yang introduced the FA with the basic notion being based on the optical connection between fireflies.
The salp swarm algorithm (SSA) is a recently developed algorithm based on SI that imitates the
behavior of sea salps [42]. The SSA demonstrated a high performance when evaluated with various
optimization issues. In [43], a new SSA and chaos theory combination is suggested to enhance feature
selection accuracy. In [44], the dynamic salp swarm approach for feature selection is used to resolve the local
optimum issue of SSA and to strike a balance between exploiting and exploring.
The Jaya algorithm (JA) is a recently implemented population-based meta-heuristic algorithm. Roa
presented it in 2016 to handle constrained and unconstrained optimization problems. In [45], a novel hybrid
feature selection approach is developed, incorporating the binary JA for the classification of microarray data
is suggested to seek the optimum subset of features.
The flower pollination algorithm (FPA) is a meta-heuristic optimization technique that centers
around the pollination process found in flowering plants. It was introduced by Yang in 2012 [46]. The
primary goal of a flower is essentially to reproduce through the process of pollination, which involves the
transfer of pollen and is frequently aided by pollinators such as birds and insects.
3. THE PROPOSED FEATURE SELECTION METHOD GASI
This section proposes a novel feature selection approach by integrating genetic algorithms and
swarm intelligence-based feature selection techniques incorporating particle swarm optimization, differential
evolution, cuckoo optimization algorithm, firefly algorithm, salp swarm algorithm, Jaya algorithm, flower
pollination algorithm and other feature selection methods such as SelectFromModel and recursive feature
elimination (RFE). The proposed approach GASI is founded on two principal pillars. The primary axis builds
an initial smart population composed of the precious results of swarm intelligence-based feature selection
algorithms (PSO, DE, COA, FA, SSA, JA, FPA, SelectFromModel, and RFE) that aim to discover the most
optimal subset of features. The second axis introduces this intelligent population to the genetic algorithm in
order to search for a better subset of features that contains a smaller number of features and improves the
classification performance. The architecture of the suggested feature selection approach GASI is illustrated in
Figure 1.
In this framework as shown in Figure 1, several SI-based feature selection techniques are applied to
a dataset. Then an intelligent population composed of the feature subsets produced by these techniques is fed
to the GA in the second step. The GA starts with this population and attempts to converge to the optimal
subset of features employing genetic operators. An evaluation is made for each individual in the actual
population based on a specified fitness function. A novel population is produced using genetic operations
(selection, crossover, and mutation). This method is developed to maximize the classification accuracy and
reduce the size of the feature subset. The following subsections describe the proposed GA method.
3.1. Encoding of individuals
In this context, individuals are represented using binary arrays consisting of n bits, where n
corresponds to the number of features in the original dataset. A bit with a value of 1 in this array indicates the
inclusion of the corresponding feature in the subset, while a bit with a value of 0 signifies the exclusion of
that feature. This binary encoding method serves as an efficient means of representing feature subsets,
enabling algorithms to make decisions about which features to include or exclude during various data
analysis and optimization processes. It is a fundamental approach in feature selection and dimensionality
reduction tasks.
3.2. Smart population with SI-based feature selection
Instead of creating an initial population with a predetermined number of randomly generated
individuals, we take advantage of the best solutions obtained by many powerful SI-based feature selection
approaches. For that purpose, an intelligent population is constructed from the optimal subsets of features
produced by the different SI-based feature selection approaches, with additional randomly generated
individuals to keep the diversity of the next generation. This intelligent collection of features will be fed into
the genetic algorithm as the initial population to search for the optimal subset of features that maximizes
classification performance and reduces the size of the features.
5. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 944-959
948
Figure 1. Flowchart of the proposed method GASI
3.3. Fitness function
The fitness function operates by simultaneously considering two distinct objectives: the
enhancement of classification accuracy and the reduction of the number of selected features. To convert this
function into a minimization problem, we introduce weights for each of these objectives. These weights
enable the amalgamation of these criteria into a unified representation of the fitness function. Consequently,
the fitness function can be articulated in the following manner:
πππ πΉ(π) = πΌ. πΈππ(π) + π½. ππΉπ(π) (1)
In (1) represents the fitness function that assesses the fitness value attached to each individual. Where X is a
vector of features illustrating a selected subset of features, Ξ± and Ξ² in (1) are the weights assigned to each
objective, the classification error, and the proportion of features selected, respectively, which fulfill the below
conditions in (2) and (3).
π½ = 1 β πΌ (2)
πΌ β [0,1] (3)
6. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Enhancing feature selection with a novel hybrid approach incorporating β¦ (Salsabila Benghazouani)
949
The parameters Ξ± and Ξ², where πΌ β [0, 1] and π½ = 1 β πΌ, are utilized to regulate the importance of
classification accuracy and feature reduction. The values of Ξ± and Ξ² used in previous studies [47], [48] are
also employed in the current experiments, with Ξ± set to 0.99. In (4) defines the error rate of classification
πΈππ(π₯) that needs to be minimized and is complementary to the accuracy of the classifier defined in (5),
while ππΉπ(π₯) represents the proportion of selected predictors in (6), where D is the size of the individual and
ππ is a binary variable that specifies whether the feature I is present or not in a selected individual (7).
πΈππ = 1 β π΄πππ’ππππ¦ (4)
π΄πππ’ππππ¦ =
ππ + ππ
ππ + ππ + πΉπ + πΉπ
(5)
0 β€ ππΉπ =
β ππ
π·
π = 1
π·
β€ 1 (6)
ππ β 0,1 (7)
where TP, TN, FP, and FN correspond to the number of true positives, true negatives, false positives, and
false negatives, respectively. The individuals are ranked according to their fitness value, and the chosen
number of individuals with the lowest fitness value are considered the parents of the next generation.
3.4. Genetic operators
3.4.1. Selection
Once a population of chromosomes has been created, GA will search for a few pairs of parent
chromosomes to apply a crossover operation. To select the parent chromosomes, we employed the Roulette
selection approach [49]. Every chromosome gets space and place on the roulette wheel based on its fitness.
The wheel is rotated after the chromosomes have been placed on it. When the wheel stops turning, a random
pointer on it points at the chosen chromosome. The best chromosomes have a large space, implying a high
probability of being selected. The likelihood of selecting individual x is proportional to it is fitness and
determined by (8):
ππππ(π₯π) =
πππ‘πππ π (π₯π)
β πππ‘πππ π (π₯π)
π
π = 1
(8)
where fitness (xi) is the fitness value of the individual xi. An elitist approach guarantees that the better
individual is systematically moved to the next generation with no crossover or mutation. It is essential to
maintain the constant convergence of the genetic algorithm. Tournament selection where individuals are
randomly selected and compete for survival allows for the inclusion of individuals with lower fitness values,
promoting diversity by giving them a chance to contribute to the next generation.
3.4.2. Crossover
Crossover is applied to each pair of chromosomes selected by the abovementioned method with a
specified probability Pc. A high probability Pc involves the appearance of new individuals in the population.
The crossover is applied by selecting a random point on the chromosome where the exchange of the parentβs
parts occurs. This process then gives rise to a new offspring based on the selected exchange point with
particular parts of the parents as shown in Figure 2.
Figure 2. One point crossover
3.4.3. Mutation
The term βmutationβ refers to the random change in the value of a gene on a chromosome. Mutation
acts as a background noise that prevents evolution from freezing. It extends space exploration and ensures
that the global optimum can be reached. Therefore, this operator avoids converging to local optima. The
technique used is a uniform mutation, so each bit of a chromosome has a low probability Pm of being flipped.
7. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 944-959
950
4. RESULTS AND DISCUSSION
This section aims to assess the proposed genetic algorithm in terms of fitness values, feature space
reduction, and prediction accuracy. The proposed method GASI uses an initial population based on all the
feature selection techniques chosen in this paper (FPA, JA, SSA, DE, COA, FA, PSO, RFE, and
SelectFromModel). We have also compared these results with other SI-based feature selection techniques,
which would allow a robust empirical study. It is worth noting that even if this genetic algorithmβs
computational cost is higher than other feature selection approaches, they surpass them in terms of fitness
value and accuracy. The rest of this section describes the employed datasets, the used classifiers, the
evaluated approaches, the evaluation measures, the results, and the discussion in the following subsections.
4.1. Datasets
Based on the World Health Organization (WHO), heart disorders and cancer are the two leading
causes of mortality in developing and under-developed countries. Breast, lung, colon, and rectum tumors
remain the most commonly diagnosed cancers worldwide [50]. For these reasons, we tested the proposed
method on seven well-known datasets to evaluate its performance. These datasets are colon, breast cancer
Wisconsin, heart, arrhythmia, sonar, ionosphere, and waveform collected from the UCI repository. The
description of the datasets is given in Table 1.
Table 1. Description of the seven studied datasets
Dataset No. of features No. of instances No. of classes
Colon cancer 2,000 62 2
Breast cancer Wisconsin 32 569 2
Heart 13 270 2
Arrhythmia 279 452 16
Sonar 60 208 2
Ionosphere 34 351 2
Waveform 21 5,000 3
4.2. Classifier description
As learning algorithms, we used two popular classifiers, namely logistic regression (LR) and
AdaBoost (AB), to evaluate the proposed method. LR is a method for predicting a dichotomous dependent
variable. This approach finds the best fitting model that describes the association between the attributes of the
dependent variable and a set of independent variables [51]. AdaBoost, an abbreviation for adaptive boosting,
is a meta-algorithm for machine learning proposed by Freund and Schapire [52]. A classifier AdaBoost is a
meta estimator that first fits a classifier and adapts it in multiple instances on the same dataset. Subsequently,
the weights of the misclassified samples are adjusted to prioritize severe cases, leading to subsequent
classifiers focusing more on them.
4.3. The evaluated methods
In the experiments, we utilized SI-based feature selection approaches, specifically FPA, JA, SSA,
DE, COA, FA, and PSO defined in section 2. Additionally, we incorporated two more selection methods,
SelectFromModel and recursive feature elimination (RFE), which will be described subsequently. This
diverse set of feature selection methods was chosen to comprehensively explore their effectiveness within the
experimental context, enabling a thorough examination of their impact on the overall results. The inclusion of
these various methods provides a robust foundation for assessing the role of feature selection in the studyβs
outcomes.
SelectFromModel is one of the feature selection techniques for extracting essential and relevant
features. It removes features whose corresponding importance values are below the given threshold value.
This model works with estimators that have important features or coefficients [53].
RFE is an integrated technique compatible with various learning algorithms like SVMs and Lasso. It
is primary function involves iteratively and explicitly reducing the number of features by recursively
eliminating those with low weights or importance scores. RFE is particularly useful for optimizing model
performance by focusing on the most relevant features in a dataset [54].
Table 2 displays the average accuracy of the classification (in%), over ten trials, of SelectFromModel
and RFE using the logistic regression and AdaBoost classifiers. The results indicate that in colon cancer, breast
cancer, and waveform datasets, the SelectFromModel method slightly improves the classification performance
compared to the RFE method. In contrast, the latter enhances the classification accuracy in the remaining
datasets.
8. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Enhancing feature selection with a novel hybrid approach incorporating β¦ (Salsabila Benghazouani)
951
Table 2. A comparison between SelectFromModel and RFE in terms of classification accuracy using the
logistic regression and AdaBoost classifiers
Dataset Classifier Feature selection methods
SelectFromModel RFE
Colon cancer logistic regression 85.78 85.78
AdaBoost 86.84 85.78
Breast cancer logistic regression 96.30 95.90
AdaBoost 95.70 95.36
Heart logistic regression 83.95 86.41
AdaBoost 66.66 74.07
Arrhythmia logistic regression 65.44 66.91
AdaBoost 63.97 63.97
Sonar logistic regression 87.30 84.12
AdaBoost 87.30 88.88
Ionosphere logistic regression 84.90 93.39
AdaBoost 93.39 89.62
Waveform logistic regression 85.79 85.06
AdaBoost 83.26 82.26
4.4. Proposed method GASI parameters
As the parameters significantly impact the efficiency of the genetic algorithms, they should be
chosen carefully to obtain the highest performance. Table 3 presents the parameters employed in GASI
evaluation. The mentioned values were determined empirically through several experiments of the proposed
approach.
Table 3. Common parameters for the proposed method
Parameter Value
Population size 30
Number of generations 100
Pc (crossover probability) 0.8
Pm (mutation probability) 0.01
Weight of the classification error Ξ± 0.99
Weight of the number of selected predictors Ξ² 0.01
Elitist strategy The best individual goes to the next iteration
π π πππ₯: the number of runs maximal 10
4.4.1. Evaluation measures
The effectiveness of the proposed strategy GASI was evaluated in terms of the average accuracy
of the classifier, the minimum number and rate of remaining features as in (12)-(14), the average, the best,
and the worse fitness values as in (9)-(11). The proposed strategy is then compared with other
metaheuristic algorithms using these measures. A mathematical formula for the evaluation measures is
given in (9) to (14).
Average fitness value
π΄π£ππ =
1
ππ πππ₯
β ππ
β
ππ πππ₯
π = 1 (9)
Optimal value for fitness
π΅ππ π‘π = πππ { ππ
β
: π = 1,2, . . . , ππ πππ₯ } (10)
Worst value for fitness
πππ₯π = πππ₯ { ππ
β
: π = 1,2, . . . , ππ πππ₯ } (11)
Average accuracy
π΄π£ππ΄ππ =
1
ππ πππ₯
β π΄πππ
β
ππ πππ₯
π = 1 (12)
Average number of selected features
9. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 944-959
952
π΄π£πππΉπ =
1
ππ πππ₯
β πππππ‘β (π₯)π
β
ππ πππ₯
π = 1 (13)
Rate of the remaining features
π ππππππΉπ =
1
ππ πππ₯
β
πππππ‘β (π₯)π
β
ππ
ππ πππ₯
π = 1 (14)
where ππ πππ₯ is the maximum number of runs and ππ β represents the best fitness score attained at the ππ‘β
run. π΄πππ β is the optimum accuracy of the classifier achieved at the ππ‘β
run, length (π₯)π indicates the number
of features that have been selected, and TN denotes the total number of features in the given dataset.
4.5. Experimental evaluation
In this sub-section, the performances of the suggested method are evaluated and compared to other
powerful competitors. The results are presented in terms of fitness values, remaining feature rates, and
classification accuracy. Each feature selection approach is executed ten times in each experiment, and the
average of these different runs is utilized for comparing the different approaches. In addition, each dataset is
normalized and randomly divided into a training set (70% of the dataset) and a testing set (30%). All these
approaches are executed using Python on an Intel Core- i7 CPU with 16 GB of RAM.
Table 4 displays the average classification accuracy and rate of the remaining features (in %) over
ten runs of the suggested approach GASI and the different SI-based feature selection approaches (i.e.,
FPA, JA, SSA, DE, COA, FA, and PSO) using the LR and AB classifiers, the best results are indicated in
bold. The results in Table 4 show that the suggested approach is more optimal than many other SI-based
feature selection techniques. It was able to select fewer features in most datasets while increasing the
classification performance. Table 4 demonstrates that the proposed method GASI consistently outperforms
the other SI-based feature selection technique. For instance, in the Colon cancer dataset using the RL
classifier, GASI method achieved a classification accuracy of 98.94%. Contrarily, these values were
reported as 94.37, 96.31, 97.89, 94.73, 95.25, 94.73, and 95.25, respectively, for the FPA, JA, SSA, DE,
COA, FA, and PSO approaches. In addition, the AdaBoost classifier enhanced the classification accuracy
to 100% for the suggested approach GASI. However, the accuracy of the FPA, JA, SSA, DE, COA, FA,
and PSO methods was 92.62, 90.52, 95.25, 93.15, 99.47, 95.25, and 95.26, respectively. Table 4 also
presents the number of selected features. The results show that all methods significantly reduce the
dimensionality by only selecting a small part of the original features. For instance, by employing the RL
classifier, the GASI approach performs better than the other SI-based methods in the colon cancer and
sonar datasets, selecting only 0.2746 and 0.2694, respectively. Furthermore, the PSO method chose an
average of 0.2343 and 0.3788 features in the breast cancer and arrhythmia datasets, respectively. However,
the FPA method chose an average of 0.3615 features in the heart dataset, compared to the JA methodβs
average of 0.2411 features in the Ionosphere dataset and the DE methodβs average of 0.7190 features in
the waveform dataset.
Table 5 presents the evaluated results in terms of the average, the best (minimum), and the worst
(maximum) fitness values. The results reveal that the proposed method, GASI, performed better in all
datasets than other SI-based feature selection algorithms and delivered the smallest average fitness function
value. For example, in the colon cancer dataset employing the RL classifier, the GASI approach provided an
average value of 0.0131 in the fitness function. On the other hand, for the FPA, JA, SSA, DE, COA, FA, and
PSO methods, these values were 0.0569, 0.0249, 0.0255, 0.0560, 0.0512, 0.0568, and 0.0507, respectively.
Using the AdaBoost classifier, the mean value of the fitness function is 0.0037 for the GASI method.
However, the values of other methods FPA, JA, SSA, DE, COA, FA and PSO were 0.0777, 0.0253, 0.0464,
0.0724, 0.0099, 0.0516 and 0.0044 respectively.
Figures 3 and 4 illustrate the mean classification accuracy on all datasets for the RL and AdaBoost
classifiers, respectively. From these results, we can observe that, on all classifiers, the proposed method
GASI obtained the highest average classification accuracy. The results in Figure 3 show that the GASI
method achieved an average classification accuracy of 92.52% which ranked first with a margin of 2.89%
compared to the JA approach, which achieved the second-best average classification accuracy. The FA
method scored third with a margin of 3.54% compared to the best method. Furthermore, according to the
results in Figure 4, on the AB classifier, the suggested approach GASI obtained the first place with an
average classification accuracy of 91.53% with a margin of 1.81% compared to the COA method, which
achieved the second-best average classification accuracy. In contrast, the DE approach secured the third
position with an average classification accuracy of 89.12%.
10. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Enhancing feature selection with a novel hybrid approach incorporating β¦ (Salsabila Benghazouani)
953
Figure 5 provides an in-depth analysis of average fitness values across various datasets, with
Figures 5(a) to 5(g) offering specific comparisons of mean fitness values for datasets such as colon cancer,
breast cancer, heart, sonar, ionosphere, waveform, and arrhythmia. The results consistently affirm the
superior performance of our proposed GASI approach when compared to other SI-based feature selection
methods. This superiority is consistently demonstrated by GASI achieving the smallest average fitness value
during evaluations conducted using both logistic regression and AdaBoost classifiers. These findings
accentuate GASIβs effectiveness in enhancing feature selection and its potential for wide-ranging applications
across diverse datasets and machine learning algorithms.
Table 4. Average classification accuracy and remaining feature rates of the different feature selection
approaches with logistic regression and AdaBoost classifier. The best results are marked in bold
Dataset N_features Method Classifier
Logistic regression AdaBoost
AvgAcc (%) AvgNSF RemainFR AvgAcc (%) AvgNSF RemainFR
Colon cancer 2,000 GASI 98.94 549.2 0.2746 100 750.8 0.3754
FPA 94.37 961.1 0.4805 92.62 966.6 0.4833
JA 96.31 815.3 0.4076 90.52 911.5 0.4557
SSA 97.89 950.9 0.4754 95.25 958 0.479
DE 94.73 779.6 0.3898 93.15 938.7 0.4693
COA 95.25 864.3 0.4321 99.47 944.8 0.4724
FA 94.73 941.3 0.4706 95.25 955.8 0.4779
PSO 95.25 765 0.3825 95.26 887.2 0.4436
Breast cancer 32 GASI 99.79 11.6 0.3625 100 8.9 0.2781
FPA 96.23 12.1 0.3781 98.98 14.7 0.4593
JA 97.17 7.8 0.2437 97.24 11.9 0.3718
SSA 95.90 13.6 0.4468 97.37 14 0.4375
DE 97.24 11.7 0.3656 99.45 11.9 0.3718
COA 96.70 10.5 0.3281 98.18 11 0.3437
FA 98.11 11.5 0.3593 99.32 11.6 0.3625
PSO 96.50 7.5 0.2343 97.24 12 0.375
Heart 13 GASI 91.35 7 0.5384 91.35 10 0.7692
FPA 83.95 4.7 0.3615 89.13 8.1 0.6230
JA 90.37 9.5 0.7307 89.87 5.7 0.4384
SSA 89.01 5.6 0.4307 87.77 6.3 0.4846
DE 89.13 8.7 0.6692 89.25 5.6 0.4307
COA 90.12 5.3 0.4076 89.13 5.4 0.4153
FA 91.11 5.2 0.4 87.65 3 0.2307
PSO 85.18 6.4 0.4923 89.38 7.3 0.5615
Arrhythmia 279 GASI 78.67 130 0.4666 69.11 105.3 0.3774
FPA 68.67 130.4 0.4673 66.02 129.5 0.4641
JA 69.63 110.3 0.3953 64.63 72.7 0.2605
SSA 68.60 134.8 0.4831 63.89 121.5 0.4354
DE 70.00 109.4 0.3921 65.44 76.7 0.2749
COA 69.41 123.3 0.4419 66.98 101.5 0.3637
FA 69.11 136.1 0.4878 67.86 129.9 0.4655
PSO 69.77 105.7 0.3788 66.17 79.9 0.2863
Sonar 60 GASI 93.96 15.9 0.2694 98.41 28 0.4745
FPA 90.63 28.3 0.4796 93.17 29.5 0.5
JA 92.85 19.3 0.3271 93.65 26.3 0.4457
SSA 92.69 25.9 0.4389 91.58 28 0.4745
DE 89.52 19.7 0.3338 96.66 26.8 0.4542
COA 88.25 25.2 0.4271 95.39 28.1 0.4762
FA 89.52 26.2 0.4440 94.70 28.88 0.4406
PSO 92.38 22.7 0.3847 94.44 25.1 0.4254
Ionosphere 34 GASI 96.22 11 0.3235 96.22 13.5 0.3979
FPA 93.11 15.9 0.4676 93.01 14.2 0.4176
JA 93.58 8.2 0.2411 94.81 11.8 0.3470
SSA 90.37 15.9 0.4676 95.37 14.6 0.4294
DE 91.50 11.5 0.3382 94.90 12.1 0.3558
COA 93.67 13.9 0.4088 94.15 13.7 0.4029
FA 93.77 12.6 0.3705 93.86 13.1 0.3852
PSO 93.30 13 0.3823 94.71 12.4 0.3647
Waveform 21 GASI 88.73 17 0.8095 85.64 16.5 0.7857
FPA 86.07 15.6 0.7428 84.39 15.5 0.7380
JA 87.53 17 0.8095 85.20 15.3 0.7285
SSA 86.22 15.8 0.7523 84.39 14.9 0.7095
DE 87.05 15.1 0.7190 84.96 14.8 0.7047
COA 87.27 16.6 0.7904 84.72 14.9 0.7095
FA 86.51 17.3 0.8238 85.11 15.6 0.7428
PSO 87.03 16.2 0.7714 85.46 15.2 0.7238
11. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 944-959
954
Table 5. Average, best, and worst fitness values of the different feature selection methods using logistic
regression and AdaBoost classifier. The best results of fitness values are indicated in bold
Dataset Method Classifier
Logistic regression AdaBoost
Avgf Bestf Maxf Avgf Bestf Maxf
Colon cancer GASI 0.0131 0.0009 0.0529 0.0037 0.0032 0.0040
FPA 0.0569 0.0566 0.0570 0.0777 0.0567 0.1089
JA 0.0249 0.0039 0.0562 0.0253 0.0044 0.0566
SSA 0.0255 0.0047 0.0568 0.0464 0.0048 0.0569
DE 0.0560 0.0558 0.0564 0.0724 0.0047 0.1610
COA 0.0512 0.0046 0.0564 0.0099 0.0046 0.0567
FA 0.0568 0.0567 0.0569 0.0516 0.0047 0.1088
PSO 0.0507 0.0037 0.0560 0.0044 0.0034 0.0047
Breast cancer GASI 0.0058 0.0033 0.0099 0.0029 0.0026 0.0030
FPA 0.0412 0.0312 0.0441 0.0148 0.0106 0.0235
JA 0.0119 0.0089 0.0156 0.0086 0.0033 0.0166
SSA 0.0452 0.0322 0.0508 0.0305 0.0239 0.0382
DE 0.0311 0.0295 0.0352 0.0092 0.0050 0.0109
COA 0.0360 0.0319 0.0372 0.0216 0.0172 0.0235
FA 0.0224 0.0182 0.0239 0.0105 0.0093 0.0113
PSO 0.0290 0.0282 0.0299 0.0073 0.0033 0.0229
Heart GASI 0.0909 0.0909 0.0909 0.0932 0.0932 0.0932
FPA 0.1625 0.1611 0.1635 0.1137 0.1031 0.1268
JA 0.1026 0.0940 0.1252 0.1046 0.1023 0.1245
SSA 0.1130 0.1008 0.1268 0.1258 0.1023 0.1512
DE 0.1142 0.1054 0.1283 0.1106 0.1008 0.1268
COA 0.1018 0.1016 0.1023 0.1117 0.1039 0.1268
FA 0.0920 0.0894 0.1023 0.1245 0.1245 0.1245
PSO 0.1515 0.1512 0.1528 0.1107 0.1031 0.1268
Arrhythmia GASI 0.2157 0.2157 0.2157 0.3095 0.3079 0.3102
FPA 0.3147 0.3030 0.3179 0.3409 0.2960 0.3684
JA 0.3045 0.3019 0.3100 0.3527 0.3514 0.3589
SSA 0.3156 0.3034 0.3183 0.3617 0.3608 0.3680
DE 0.3009 0.2950 0.3036 0.3448 0.3020 0.3588
COA 0.3072 0.3027 0.3170 0.3304 0.3170 0.3386
FA 0.3106 0.3104 0.3109 0.3227 0.3106 0.3252
PSO 0.3029 0.2946 0.3167 0.3377 0.3154 0.3597
Sonar GASI 0.0624 0.0498 0.0655 0.0204 0.0204 0.0204
FPA 0.0975 0.0828 0.1145 0.0725 0.0515 0.0845
JA 0.0739 0.0508 0.0821 0.0673 0.0520 0.0824
SSA 0.0766 0.0667 0.0986 0.0880 0.0674 0.1149
DE 0.1070 0.0978 0.1133 0.0375 0.0201 0.0523
COA 0.1205 0.1130 0.1301 0.0505 0.0356 0.0677
FA 0.1081 0.0988 0.1142 0.0572 0.0371 0.0829
PSO 0.0792 0.0501 0.0985 0.0592 0.0507 0.0681
Ionosphere GASI 0.0405 0.0405 0.0405 0.0488 0.0423 0.0505
FPA 0.0728 0.0607 0.0800 0.0732 0.0689 0.0800
JA 0.0659 0.0595 0.0767 0.0548 0.0414 0.0598
SSA 0.0999 0.0890 0.1077 0.0500 0.0405 0.0604
DE 0.0874 0.0677 0.1062 0.0539 0.0405 0.0595
COA 0.0666 0.0598 0.0785 0.0619 0.0508 0.0697
FA 0.0653 0.0589 0.0700 0.0645 0.0589 0.0703
PSO 0.0701 0.0683 0.0773 0.0559 0.0408 0.0607
Waveform GASI 0.1196 0.1196 0.1196 0.1500 0.1459 0.1506
FPA 0.1452 0.1405 0.1416 0.1618 0.1559 0.1688
JA 0.1315 0.1308 0.1330 0.1537 0.1510 0.1579
SSA 0.1438 0.1363 0.1512 0.1616 0.1511 0.1751
DE 0.1352 0.1351 0.1363 0.1559 0.1543 0.1605
COA 0.1338 0.1323 0.1361 0.1583 0.1554 0.1615
FA 0.1416 0.1405 0.1462 0.1548 0.1528 0.1570
PSO 0.1360 0.1333 0.1382 0.1511 0.1445 0.1597
4.6. Discussion
In this section, we delve into the main arguments that robustly demonstrate the performance of the
suggested approach. These critical points not only provide a comprehensive understanding of the approachβs
effectiveness but also underscore its potential benefits. Through a detailed exploration of these arguments, we
aim to establish the approach as a compelling and viable solution within its intended domain, offering
valuable insights and outcomes for further consideration.
β A machine learning task requires an efficient feature selection approach that can choose the most optimal
number of features and obtain a better performance. Using a wide range of features increases the
12. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Enhancing feature selection with a novel hybrid approach incorporating β¦ (Salsabila Benghazouani)
955
probability of selecting irrelevant and redundant attributes, which negatively influences the modelβs
performance, while the strong reduction of the number of features risks losing the original information of
the dataset. In this paper the proposed multi-objective fitness function allows both reducing the number of
features and minimizing the classification error. Subsequently, the features selected from the cancer
dataset present the maximum information for diagnostic or predictive tasks.
β The main goal of the suggested method is to take advantage of the best solutions obtained by many
different SI-based feature selection approaches. This approach uses a genetic algorithm with a different
strategy to develop a powerful feature selection technique that finds the best subset of features in many
data sets from different fields. This strategy is based on an initial intelligent population composed of the
best solutions obtained by the different SI-based feature selection approaches. In addition, the genetic
operators (crossover mutation selection) keep the diversity of the generation to enhance the quality of the
search space exploration and avoid the local optimum problem.
β The temporal complexity is not a real obstacle because the selection of the characteristics is made with
the exploitation of the model. This preliminary stage will not be repeated with each use of the machine
learning model.
Figure 3. Average classification accuracy over all datasets on the Logistic regression classifier
Figure 4. Average classification accuracy over all datasets on the AdaBoost classifier
13. ο² ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 14, No. 1, February 2024: 944-959
956
(a) (b)
(c) (d)
(e) (f)
(g)
Figure 5. Average fitness values of feature selection methods across diverse datasets (a) colon cancer,
(b) breast cancer, (c) heart, (d) sonar, (e) ionosphere, (f) waveform, and (g) arrhythmia
14. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Enhancing feature selection with a novel hybrid approach incorporating β¦ (Salsabila Benghazouani)
957
5. CONCLUSION
With the massive amounts of digital data of various types and the exponential growth of artificial
intelligence-based applications, the size of the data is increasing, leading to massive databases with a large
number of features, especially in the medical field. At the same time, data mining and machine learning tasks
require fast speed and greater accuracy. Over the past few years, numerous meta-heuristic methods have been
developed to reduce the size of the dataset by eliminating redundant and irrelevant features that represent
noise for the model. This paper suggests a novel powerful feature selection method, which uses a strategy
that com- bines many SI-based (i.e., FPA, JA, SSA, DE, COA, FA, and PSO) feature selection approaches
and employs a genetic algorithm that uses a multi-objective fitness function to discover the optimal subset of
features in many data sets from different areas. This approach is applied to seven well-known datasets from
the UCI repository for feature selection. The results obtained were compared with many powerful different
SI-based feature selection approaches, and the experiments show that our method obtained better solutions in
terms of fitness value and classification accuracy. Day by day, world health is affected by numerous invasive
pathologies, especially heart disorders and cancer. This study shows the necessity of raising healthcare
professionalsβ awareness about the efficient use of powerful feature selection techniques that may be
successfully applied to medical databases for detecting, classifying, and predicting diseases. For future work,
the suggested technique can be employed in high-dimensional datasets, and it can be combined with other
metaheuristic techniques to more effectively improve the exploration of the searching space and accelerate
convergence. Moreover, the suggested approach can also be used to solve various real-world problems.
REFERENCES
[1] J. Cai, J. Luo, S. Wang, and S. Yang, βFeature selection in machine learning: a new perspective,β Neurocomputing, vol. 300,
pp. 70β79, Jul. 2018, doi: 10.1016/j.neucom.2017.11.077.
[2] B. Remeseiro and V. Bolon-Canedo, βA review of feature selection methods in medical applications,β Computers in Biology and
Medicine, vol. 112, Sep. 2019, doi: 10.1016/j.compbiomed.2019.103375.
[3] Z. Hu, Y. Bao, T. Xiong, and R. Chiong, βHybrid filter-wrapper feature selection for short-term load forecasting,β Engineering
Applications of Artificial Intelligence, vol. 40, pp. 17β27, Apr. 2015, doi: 10.1016/j.engappai.2014.12.014.
[4] R. Guha et al., βDeluge based genetic algorithm for feature selection,β Evolutionary Intelligence, vol. 14, no. 2, pp. 357β367, Jun.
2021, doi: 10.1007/s12065-019-00218-5.
[5] M. Labani, P. Moradi, F. Ahmadizar, and M. Jalili, βA novel multivariate filter method for feature selection in text classification
problems,β Engineering Applications of Artificial Intelligence, vol. 70, pp. 25β37, Apr. 2018, doi: 10.1016/j.engappai.2017.12.014.
[6] J. Zhang, Y. Xiong, and S. Min, βA new hybrid filter/wrapper algorithm for feature selection in classification,β Analytica Chimica
Acta, vol. 1080, pp. 43β54, Nov. 2019, doi: 10.1016/j.aca.2019.06.054.
[7] N. El Aboudi and L. Benhlima, βReview on wrapper feature selection approaches,β in 2016 International Conference on
Engineering and MIS (ICEMIS), Sep. 2016, pp. 1β5, doi: 10.1109/ICEMIS.2016.7745366.
[8] Y. Fu, X. Liu, S. Sarkar, and T. Wu, βGaussian mixture model with feature selection: An embedded approach,β Computers &
Industrial Engineering, vol. 152, p. 107000, Feb. 2021, doi: 10.1016/j.cie.2020.107000.
[9] H. Faris et al., βTime-varying hierarchical chains of salps with random weight networks for feature selection,β Expert Systems
with Applications, vol. 140, Feb. 2020, doi: 10.1016/j.eswa.2019.112898.
[10] M. Rostami, K. Berahmand, E. Nasiri, and S. Forouzandeh, βReview of swarm intelligence-based feature selection methods,β
Engineering Applications of Artificial Intelligence, vol. 100, Apr. 2021, doi: 10.1016/j.engappai.2021.104210.
[11] A. E. Hegazy, M. A. Makhlouf, and G. S. El-Tawel, βImproved salp swarm algorithm for feature selection,β Journal of King Saud
University-Computer and Information Sciences, vol. 32, no. 3, pp. 335β344, Mar. 2020, doi: 10.1016/j.jksuci.2018.06.003.
[12] S. Forouzandeh, K. Berahmand, and M. Rostami, βPresentation of a recommender system with ensemble learning and graph
embedding: a case on MovieLens,β Multimedia Tools and Applications, vol. 80, no. 5, pp. 7805β7832, Feb. 2021, doi:
10.1007/s11042-020-09949-5.
[13] L. E. Raileanu and K. Stoffel, βTheoretical comparison between the Gini index and information gain criteria,β Annals of
Mathematics and Artificial Intelligence, vol. 41, no. 1, pp. 77β93, May 2004, doi: 10.1023/B:AMAI.0000018580.96245.c6.
[14] J.-H. Cheng, D.-W. Sun, and H. Pu, βCombining the genetic algorithm and successive projection algorithm for the selection of
feature wavelengths to evaluate exudative characteristics in frozenβthawed fish muscle,β Food Chemistry, vol. 197, pp. 855β863,
Apr. 2016, doi: 10.1016/j.foodchem.2015.11.019.
[15] P. Hanchuan, L. Fuhui, and C. Ding, βFeature selection based on mutual information criteria of max-dependency, max-relevance,
and min-redundancy,β IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226β1238, Aug.
2005, doi: 10.1109/TPAMI.2005.159.
[16] A. J. Ferreira and M. A. T. Figueiredo, βAn unsupervised approach to feature discretization and selection,β Pattern Recognition,
vol. 45, no. 9, pp. 3048β3060, Sep. 2012, doi: 10.1016/j.patcog.2011.12.008.
[17] N. D. Cilia, C. De Stefano, F. Fontanella, and A. Scotto di Freca, βVariable-length representation for EC-based feature selection
in high-dimensional data,β in Applications of Evolutionary Computation, Springer International Publishing, 2019, pp. 325β340.
[18] R. Kohavi and G. H. John, βWrappers for feature subset selection,β Artificial Intelligence, vol. 97, no. 1β2, pp. 273β324, Dec.
1997, doi: 10.1016/S0004-3702(97)00043-X.
[19] T. N. Lal, O. Chapelle, J. Weston, and A. Elisseeff, βEmbedded methods,β in Feature Extraction, Berlin, Heidelberg: Springer
Berlin Heidelberg, 2006, pp. 137β165.
[20] S. Tabakhi, P. Moradi, and F. Akhlaghian, βAn unsupervised feature selection algorithm based on ant colony optimization,β
Engineering Applications of Artificial Intelligence, vol. 32, pp. 112β123, Jun. 2014, doi: 10.1016/j.engappai.2014.03.007.
[21] S. Barak, J. H. Dahooie, and T. TichΓ½, βWrapper ANFIS-ICA method to do stock market timing and feature selection on the basis
of Japanese Candlestick,β Expert Systems with Applications, vol. 42, no. 23, pp. 9221β9235, Dec. 2015, doi:
10.1016/j.eswa.2015.08.010.
[22] C. Wang, H. Pan, and Y. Su, βA many-objective evolutionary algorithm with diversity-first based environmental selection,β
16. Int J Elec & Comp Eng ISSN: 2088-8708 ο²
Enhancing feature selection with a novel hybrid approach incorporating β¦ (Salsabila Benghazouani)
959
[53] Y. Liang, S. Zhang, H. Qiao, and Y. Yao, βiPromoter-ET: Identifying promoters and their strength by extremely randomized
trees-based feature selection,β Analytical Biochemistry, vol. 630, Oct. 2021, doi: 10.1016/j.ab.2021.114335.
[54] C. Peng, X. Wu, W. Yuan, X. Zhang, Y. Zhang, and Y. Li, βMGRFE: multilayer recursive feature elimination based on an
embedded genetic algorithm for cancer classification,β IEEE/ACM Transactions on Computational Biology and Bioinformatics,
vol. 18, no. 2, pp. 621β632, Mar. 2021, doi: 10.1109/TCBB.2019.2921961.
BIOGRAPHIES OF AUTHORS
Salsabila Benghazouani obtained a masterβs degree in computer engineering
from the Faculty of Science and Technology, Mohammedia, Morocco, in 2006, and a masterβs
degree in data science from the Faculty of Sciences Ben MβSick, Hassan II University of
Casablanca, Morocco, in 2021. She is currently preparing her Ph.D. in computer science at the
TIM Lab, Ben MβSik Faculty of Science, Hassan II University, Casablanca, Morocco. Her
research focuses on artificial intelligence, machine learning and deep learning. She can be
contacted at email: benghazouani.salsabila239@gmail.com.
Said Nouh holds a Ph.D. in computer sciences at National School of Computer
Science and Systems Analysis (ENSIAS), Rabat, Morocco in 2014. He is currently professor
(Higher Degree Research (HDR)) at Faculty of Sciences Ben MβSick, Hassan II University,
Casablanca, Morocco. His current research interests are artificial intelligence, machine
learning, deep learning, telecommunications, information and coding theory. He can be
contacted at email: said.nouh@univh2m.ma.
Abdelali Zakrani obtained his B.Sc. and DESA (M.Sc.) degrees in computer
science from Hassan II University, Casablanca, Morocco, in 2003 and 2005, respectively, and
the Ph.D. degree in the same field from Mohammed V University, Rabat, Morocco, in 2012.
His current research interestsβ artificial neural network, data mining and software engineering.
He can be contacted at email: abdelali.zakrani@univh2c.ma.
Ihsane Haloum is currently a Ph.D. student, affiliated with the Laboratory of
Immuno-genetics and Human Pathologies, at the Faculty of Medicine and Pharmacy of
Casablanca, Hassan II University of Casablanca, Morocco. She received her M.Sc. degree in
medical biotechnologies, in 2021, from the Faculty of Medicine and Pharmacy of Rabat,
Morocco. Her research topics are related to immunogenetics, oncology, and artificial
intelligence. She can be contacted at email: ihsane.haloum@gmail.com.
Mostafa Jebbar holds a Ph.D. in computer science. He is currently associate
professor at Superior School of Technology, Hassan II University, Casablanca, Morocco. His
current research interests are architecture logicielle, RM-ODPSOA, cloud computing. He can
be contacted at email: mostafajebbar@gmail.com.