journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
An experimental study on hypothyroid using rotation forestIJDKP
This paper majorly focuses on hypothyroid medical diseases caused by underactive thyroid glands. The
dataset used for the study on hypothyroid is taken from UCI repository. Classification of this thyroid
disease is a considerable task. An experimental study is carried out using rotation forest using features
selection methods to achieve better accuracy. An important step to gain good accuracy is a pre- processing
step, thus here two feature selection techniques are used. A filter method, Correlation features subset
selection and wrappers method has helped in removing irrelevant as well as useless features from the data
set. Fourteen different machine learning algorithms were tested on hypothyroid data set using rotation
forest which successfully turned out giving positively improved results
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
Attribute reduction and classification task are an essential process in dealing with large data
sets that comprise numerous number of input attributes. There are many search methods and
classifiers that have been used to find the optimal number of attributes. The aim of this paper is
to find the optimal set of attributes and improve the classification accuracy by adopting
ensemble rule classifiers method. Research process involves 2 phases; finding the optimal set of
attributes and ensemble classifiers method for classification task. Results are in terms of
percentage of accuracy and number of selected attributes and rules generated. 6 datasets were
used for the experiment. The final output is an optimal set of attributes with ensemble rule
classifiers method. The experimental results conducted on public real dataset demonstrate that
the ensemble rule classifiers methods consistently show improve classification accuracy on the
selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
This document presents a method for selecting optimal views for materialization in a data warehouse using a genetic algorithm. It discusses how genetic algorithms work by mimicking natural selection. The method represents views in a multidimensional lattice and uses chromosomes to encode potential view materialization subsets. It calculates a fitness score for each chromosome based on the total attribute frequency of selected views, with higher scores indicating better solutions. The genetic algorithm is run over generations to iteratively improve the selected view materialization subset based on this fitness function and optimize for reducing query response times within space constraints.
Volume 14 issue 03 march 2014_ijcsms_march14_10_14_rahulDeepak Agarwal
1) The document presents a hybrid approach for feature subset selection that combines artificial bee colony and particle swarm optimization algorithms.
2) It applies this approach to three datasets from a public repository to select optimal feature subsets and compares the classification accuracy to other algorithms.
3) The results show the proposed hybrid approach achieves better classification accuracy on all three datasets compared to using artificial bee colony or random selection alone.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...ijaia
Feature selection and classification task are an essential process in dealing with large data sets that
comprise numerous number of input attributes. There are many search methods and classifiers that have
been used to find the optimal number of attributes. The aim of this paper is to find the optimal set of
attributes and improve the classification accuracy by adopting ensemble rule classifiers method. Research
process involves 2 phases; finding the optimal set of attributes and ensemble classifiers method for
classification task. Results are in terms of percentage of accuracy and number of selected attributes and
rules generated. 6 datasets were used for the experiment. The final output is an optimal set of attributes
with ensemble rule classifiers method. The experimental results conducted on public real dataset
demonstrate that the ensemble rule classifiers methods consistently show improve classification accuracy
on the selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
An experimental study on hypothyroid using rotation forestIJDKP
This paper majorly focuses on hypothyroid medical diseases caused by underactive thyroid glands. The
dataset used for the study on hypothyroid is taken from UCI repository. Classification of this thyroid
disease is a considerable task. An experimental study is carried out using rotation forest using features
selection methods to achieve better accuracy. An important step to gain good accuracy is a pre- processing
step, thus here two feature selection techniques are used. A filter method, Correlation features subset
selection and wrappers method has helped in removing irrelevant as well as useless features from the data
set. Fourteen different machine learning algorithms were tested on hypothyroid data set using rotation
forest which successfully turned out giving positively improved results
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...csandit
Attribute reduction and classification task are an essential process in dealing with large data
sets that comprise numerous number of input attributes. There are many search methods and
classifiers that have been used to find the optimal number of attributes. The aim of this paper is
to find the optimal set of attributes and improve the classification accuracy by adopting
ensemble rule classifiers method. Research process involves 2 phases; finding the optimal set of
attributes and ensemble classifiers method for classification task. Results are in terms of
percentage of accuracy and number of selected attributes and rules generated. 6 datasets were
used for the experiment. The final output is an optimal set of attributes with ensemble rule
classifiers method. The experimental results conducted on public real dataset demonstrate that
the ensemble rule classifiers methods consistently show improve classification accuracy on the
selected dataset. Significant improvement in accuracy and optimal set of attribute selected is
achieved by adopting ensemble rule classifiers method.
This document presents a method for selecting optimal views for materialization in a data warehouse using a genetic algorithm. It discusses how genetic algorithms work by mimicking natural selection. The method represents views in a multidimensional lattice and uses chromosomes to encode potential view materialization subsets. It calculates a fitness score for each chromosome based on the total attribute frequency of selected views, with higher scores indicating better solutions. The genetic algorithm is run over generations to iteratively improve the selected view materialization subset based on this fitness function and optimize for reducing query response times within space constraints.
Volume 14 issue 03 march 2014_ijcsms_march14_10_14_rahulDeepak Agarwal
1) The document presents a hybrid approach for feature subset selection that combines artificial bee colony and particle swarm optimization algorithms.
2) It applies this approach to three datasets from a public repository to select optimal feature subsets and compares the classification accuracy to other algorithms.
3) The results show the proposed hybrid approach achieves better classification accuracy on all three datasets compared to using artificial bee colony or random selection alone.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Filter Based Approach for Genomic Feature Set Selection (FBA-GFS)IJCSEA Journal
Feature selection is an effective method used in text categorization for sorting a set of documents into certain number of predefined categories. It is an important method for improving the efficiency and accuracy of text categorization algorithms by removing irredundant terms from the corpus. Genome contains the total amount of genetic information in the chromosomes of an organism, including its genes and DNA sequences. In this paper a Clustering technique called Hierarchical Techniques is used tocategories the Features from the Genome documents. A framework is proposed for Genomic Feature set Selection. A Filter based Feature Selection Method like
2 statistics, CHIR statistics are used to select the Feature set. The Selected Feature set is verified by using F-measure and it is biologically validated for Biological relevance using the BLAST tool.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Neighborhood search methods with moth optimization algorithm as a wrapper met...IJECEIAES
Feature selection methods are used to select a subset of features from data, therefore only the useful information can be mined from the samples to get better accuracy and improves the computational efficiency of the learning model. Moth-flame Optimization (MFO) algorithm is a population-based approach, that simulates the behavior of real moth in nature, one drawback of the MFO algorithm is that the solutions move toward the best solution, and it easily can be stuck in local optima as we investigated in this paper, therefore, we proposed a MFO Algorithm combined with a neighborhood search method for feature selection problems, in order to avoid the MFO algorithm getting trapped in a local optima, and helps in avoiding the premature convergence, the neighborhood search method is applied after a predefined number of unimproved iterations (the number of tries fail to improve the current solution). As a result, the proposed algorithm shows good performance when compared with the original MFO algorithm and with state-of-the-art approaches.
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
Performance Evaluation of Different Data Mining Classification Algorithm and ...IOSR Journals
This document evaluates the performance of different data mining classification algorithms and predictive analysis. It applies algorithms like decision trees, naive Bayes, k-nearest neighbor, neural networks, and support vector machines to datasets like Iris, liver disorder, and E. coli. The results show that neural networks and k-nearest neighbor achieved the best performance on these datasets, with accuracy rates up to 97.33% for Iris classification. Feature selection techniques like removing zero-weighted attributes were also found to improve some algorithm performance. Predictive analysis experiments found that neural networks and k-nearest neighbor were most accurate at predicting new class labels.
Applying genetic algorithms to information retrieval using vector space modelIJCSEA Journal
The document describes a study that applied genetic algorithms to information retrieval using the vector space model. The study used an adaptive genetic algorithm approach with two proposed fitness functions (cosine and Jaccard's), adaptive crossover and mutation probabilities. Experimental results on a test corpus showed improvements in precision and recall compared to traditional approaches, with the proposed cosine fitness function performing best. Precision generally decreased as recall increased. The modifications made to the genetic algorithm and fitness functions led to better weighting of query terms and improved results.
IRJET-Clustering Techniques for Mushroom DatasetIRJET Journal
The document evaluates the performance of different clustering algorithms (Expectation Maximization, Farthest Fast, and K-means) on a mushroom dataset from the UCI machine learning repository. The algorithms are compared based on the number of correctly clustered instances and time taken to build the model. The mushroom dataset contains 8124 instances with 22 attributes classified as edible or poisonous mushrooms. The goal is to group similar mushrooms together using the different clustering techniques in the data mining tool WEKA.
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...cscpconf
Optimization problems are dominantly being solved using Computational Intelligence. One of
the issues that can be addressed in this context is problems related to attribute subset selection
evaluation. This paper presents a computational intelligence technique for solving the
optimization problem using a proposed model called Modified Genetic Search Algorithms
(MGSA) that avoids local bad search space with merit and scaled fitness variables, detecting
and deleting bad candidate chromosomes, thereby reducing the number of individual
chromosomes from search space and subsequent iterations in next generations. This paper aims
to show that Rotation forest ensembles are useful in the feature selection method. The base
classifier is multinomial logistic regression method integrated with Haar wavelets as projection
filter and reproducing the ranks of each features with 10 fold cross validation method. It also
discusses the main findings and concludes with promising result of the proposed model. It
explores the combination of MGSA for optimization with Naïve Bayes classification. The result
obtained using proposed model MGSA is validated mathematically using Principal Component
Analysis. The goal is to improve the accuracy and quality of diagnosis of Breast cancer disease
with robust machine learning algorithms. As compared to other works in literature survey,
experimental results achieved in this paper show better results with statistical inferenc
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
This document proposes a mutual information-based feature selection algorithm to select optimal features for network intrusion detection classification. The algorithm aims to handle dependent data features better than previous methods. It evaluates the effectiveness of the algorithm on network intrusion detection cases. Most previous methods suffer from low detection rates and high false alarm rates. The proposed approach uses feature selection, filtering, clustering, and clustering ensemble techniques in a hybrid data mining method to achieve high accuracy for intrusion detection systems.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
An unsupervised feature selection algorithm with feature ranking for maximizi...Asir Singh
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,
efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,
finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The
predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and
redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be
removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm
namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates
irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this
proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB),
instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for
classifiers.
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATAacijjournal
A well-constructed classification model highly depends on input feature subsets from a dataset, which may contain redundant, irrelevant, or noisy features. This challenge can be worse while dealing with medical datasets. The main aim of feature selection as a pre-processing task is to eliminate these features and select the most effective ones. In the literature, metaheuristic algorithms show a successful performance to find optimal feature subsets. In this paper, two binary metaheuristic algorithms named S-shaped binary Sine Cosine Algorithm (SBSCA) and V-shaped binary Sine Cosine Algorithm (VBSCA) are proposed for feature selection from the medical data. In these algorithms, the search space remains continuous, while a binary position vector is generated by two transfer functions S-shaped and V-shaped for each solution. The proposed algorithms are compared with four latest binary optimization algorithms over five medical datasets from the UCI repository. The experimental results confirm that using both bSCA variants enhance the accuracy of classification on these medical datasets compared to four other algorithms.
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
This paper presents a hybrid data mining approach based on supervised learning and unsupervised learning to identify the closest data patterns in the data base. This technique enables to achieve the maximum accuracy rate with minimal complexity. The proposed algorithm is compared with traditional clustering and classification algorithm and it is also implemented with multidimensional datasets. The implementation results show better prediction accuracy and reliability.
Leave one out cross validated Hybrid Model of Genetic Algorithm and Naïve Bay...IJERA Editor
This document presents a new hybrid model for feature selection and classification that combines genetic algorithm and naive Bayes. The proposed method first uses a binary coded genetic algorithm to select a reduced subset of important features from datasets. It then applies a naive Bayes classification method to evaluate the selected features and identify the subset that achieves the highest classification accuracy. The performance of the proposed hybrid model is evaluated on eight datasets and compared to recent publications, finding it achieves satisfactory or higher classification accuracy using fewer features.
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
This document presents a new automatic clustering algorithm called Automatic Merging for Optimal Clusters (AMOC). AMOC is a two-phase iterative extension of k-means clustering that aims to automatically determine the optimal number of clusters for a given dataset. In the first phase, AMOC initializes a large number of clusters k using k-means. In the second phase, it iteratively merges the lowest probability cluster with its closest neighbor, recomputing metrics each time to evaluate if the merge improved clustering quality. The algorithm stops merging once no improvements are found. Experimental results on synthetic and real datasets show AMOC finds nearly optimal cluster structures in terms of number, compactness and separation of clusters.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Sca a sine cosine algorithm for solving optimization problemslaxmanLaxman03209
The document proposes a new population-based optimization algorithm called the Sine Cosine Algorithm (SCA) for solving optimization problems. SCA creates multiple random initial solutions and uses sine and cosine functions to fluctuate the solutions outward or toward the best solution, emphasizing exploration and exploitation. The performance of SCA is evaluated on test functions, qualitative metrics, and by optimizing the cross-section of an aircraft wing, showing it can effectively explore, avoid local optima, converge to the global optimum, and solve real problems with constraints.
In this research, a hybrid wrapper model is proposed to identify the featured gene subset from the gene expression data. To balance the gap between exploration
and exploitation, a hybrid model with a popular meta-heuristic algorithm named
spider monkey optimizer (SMO) and simulated annealing (SA) is applied. In the proposed model, ReliefF is used as a filter to obtain the relevant gene subset
from dataset by removing the noise and outliers prior to feeding the data to the
wrapper SMO. To enhance the quality of the solution, simulated annealing is
deployed as local search with the SMO in the second phase, which will guide to the detection of the most optimal feature subset. To evaluate the performance of the proposed model, support vector machine (SVM) as a fitness function to recognize the most informative biomarker gene from the cancer datasets along with University of California, Irvine (UCI) datasets. To further evaluate the model, 4 different classifiers (SVM, na¨ıve Bayes (NB), decision tree (DT), and k-nearest neighbors (KNN)) are used. From the experimental results and analysis, it’s noteworthy to accept that the ReliefF-SMO-SA-SVM performs relatively better than its state-of-the-art counterparts. For cancer datasets, our model performs better in terms of accuracy with a maximum of 99.45%.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
AUTOMATED TEST CASE GENERATION AND OPTIMIZATION: A COMPARATIVE REVIEWijcsit
Software testing is the primary phase, which is performed during software development and it is carried by a sequence of instructions of test inputs followed by expected output. Evolutionary algorithms are most popular in the computational field based on population. The test case generation process is used to identify
test cases with resources and also identifies critical domain requirements. The behavior of bees is based on
population and evolutionary method. Bee Colony algorithm (BCA) has gained superiority in comparison to other algorithms in the field of computation. The Harmony Search (HS) algorithm is based on the enhancement process of music. When musicians compose the harmony through different possible combinations of the music, at that time the pitches are stored in the harmony memory and the optimization
can be done by adjusting the input pitches and generate the perfect harmony. Particle Swarm Optimization (PSO) is an intelligence based meta-heuristic algorithm where each particle can locate their source of food at different position.. In this algorithm, the particles will search for a better food source position in the hope of getting a better result. In this paper, the role of Artificial Bee Colony, particle swarm optimization
and harmony search algorithms are analyzed in generating random test data and optimized those test data.
Test case generation and optimization through bee colony, PSO and harmony search (HS) algorithms which are applied through a case study, i.e., withdrawal operation in Bank ATM and it is observed that these algorithms are able to generate suitable automated test cases or test data in a client manner. This
section further gives the brief details and compares between HS, PSO, and Bee Colony (BC) Optimization
methods which are used for test case or test data generation and optimization.
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Editor IJCATR
Nowadays, the automotive industry has attracted the attention of consumers, and product quality is considered as an
essential element in current competitive markets. Security and comfort are the main criteria and parameters of selecting a car.
Therefore, standard dataset of CAR involving six features and characteristics and 1728 instances have been used. In this paper, it
has been tried to select a car with the best characteristics by using intelligent algorithms (Random Forest, J48, SVM,
NaiveBayse) and combining these algorithms with aggregated classifiers such as Bagging and AdaBoostMI. In this study, speed
and accuracy of intelligent algorithms in identifying the best car have been taken into account.
Artificial bee colony with fcm for data clusteringAlie Banyuripan
This document summarizes a research paper that proposes a modified artificial bee colony (ABC) algorithm for data clustering. The algorithm integrates a fuzzy C-means (FCM) operator into the ABC algorithm. Specifically:
- The ABC algorithm is modified to incorporate an FCM operator, which introduces scout bees based on a fuzzy function rather than randomly.
- In each cycle, after the employed bee and onlooker bee phases, the FCM operator generates a new solution for the scout bees based on previous solutions to improve optimization.
- The algorithm was tested on datasets from a machine learning repository and showed improved clustering quality compared to existing algorithms.
Neighborhood search methods with moth optimization algorithm as a wrapper met...IJECEIAES
Feature selection methods are used to select a subset of features from data, therefore only the useful information can be mined from the samples to get better accuracy and improves the computational efficiency of the learning model. Moth-flame Optimization (MFO) algorithm is a population-based approach, that simulates the behavior of real moth in nature, one drawback of the MFO algorithm is that the solutions move toward the best solution, and it easily can be stuck in local optima as we investigated in this paper, therefore, we proposed a MFO Algorithm combined with a neighborhood search method for feature selection problems, in order to avoid the MFO algorithm getting trapped in a local optima, and helps in avoiding the premature convergence, the neighborhood search method is applied after a predefined number of unimproved iterations (the number of tries fail to improve the current solution). As a result, the proposed algorithm shows good performance when compared with the original MFO algorithm and with state-of-the-art approaches.
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
Performance Evaluation of Different Data Mining Classification Algorithm and ...IOSR Journals
This document evaluates the performance of different data mining classification algorithms and predictive analysis. It applies algorithms like decision trees, naive Bayes, k-nearest neighbor, neural networks, and support vector machines to datasets like Iris, liver disorder, and E. coli. The results show that neural networks and k-nearest neighbor achieved the best performance on these datasets, with accuracy rates up to 97.33% for Iris classification. Feature selection techniques like removing zero-weighted attributes were also found to improve some algorithm performance. Predictive analysis experiments found that neural networks and k-nearest neighbor were most accurate at predicting new class labels.
Applying genetic algorithms to information retrieval using vector space modelIJCSEA Journal
The document describes a study that applied genetic algorithms to information retrieval using the vector space model. The study used an adaptive genetic algorithm approach with two proposed fitness functions (cosine and Jaccard's), adaptive crossover and mutation probabilities. Experimental results on a test corpus showed improvements in precision and recall compared to traditional approaches, with the proposed cosine fitness function performing best. Precision generally decreased as recall increased. The modifications made to the genetic algorithm and fitness functions led to better weighting of query terms and improved results.
IRJET-Clustering Techniques for Mushroom DatasetIRJET Journal
The document evaluates the performance of different clustering algorithms (Expectation Maximization, Farthest Fast, and K-means) on a mushroom dataset from the UCI machine learning repository. The algorithms are compared based on the number of correctly clustered instances and time taken to build the model. The mushroom dataset contains 8124 instances with 22 attributes classified as edible or poisonous mushrooms. The goal is to group similar mushrooms together using the different clustering techniques in the data mining tool WEKA.
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...cscpconf
Optimization problems are dominantly being solved using Computational Intelligence. One of
the issues that can be addressed in this context is problems related to attribute subset selection
evaluation. This paper presents a computational intelligence technique for solving the
optimization problem using a proposed model called Modified Genetic Search Algorithms
(MGSA) that avoids local bad search space with merit and scaled fitness variables, detecting
and deleting bad candidate chromosomes, thereby reducing the number of individual
chromosomes from search space and subsequent iterations in next generations. This paper aims
to show that Rotation forest ensembles are useful in the feature selection method. The base
classifier is multinomial logistic regression method integrated with Haar wavelets as projection
filter and reproducing the ranks of each features with 10 fold cross validation method. It also
discusses the main findings and concludes with promising result of the proposed model. It
explores the combination of MGSA for optimization with Naïve Bayes classification. The result
obtained using proposed model MGSA is validated mathematically using Principal Component
Analysis. The goal is to improve the accuracy and quality of diagnosis of Breast cancer disease
with robust machine learning algorithms. As compared to other works in literature survey,
experimental results achieved in this paper show better results with statistical inferenc
Network Based Intrusion Detection System using Filter Based Feature Selection...IRJET Journal
This document proposes a mutual information-based feature selection algorithm to select optimal features for network intrusion detection classification. The algorithm aims to handle dependent data features better than previous methods. It evaluates the effectiveness of the algorithm on network intrusion detection cases. Most previous methods suffer from low detection rates and high false alarm rates. The proposed approach uses feature selection, filtering, clustering, and clustering ensemble techniques in a hybrid data mining method to achieve high accuracy for intrusion detection systems.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
An unsupervised feature selection algorithm with feature ranking for maximizi...Asir Singh
Prediction plays a vital role in decision making. Correct prediction leads to right decision making to save the life, energy,
efforts, money and time. The right decision prevents physical and material losses and it is practiced in all the fields including medical,
finance, environmental studies, engineering and emerging technologies. Prediction is carried out by a model called classifier. The
predictive accuracy of the classifier highly depends on the training datasets utilized for training the classifier. The irrelevant and
redundant features of the training dataset reduce the accuracy of the classifier. Hence, the irrelevant and redundant features must be
removed from the training dataset through the process known as feature selection. This paper proposes a feature selection algorithm
namely unsupervised learning with ranking based feature selection (FSULR). It removes redundant features by clustering and eliminates
irrelevant features by statistical measures to select the most significant features from the training dataset. The performance of this
proposed algorithm is compared with the other seven feature selection algorithms by well known classifiers namely naive Bayes (NB),
instance based (IB1) and tree based J48. Experimental results show that the proposed algorithm yields better prediction accuracy for
classifiers.
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATAacijjournal
A well-constructed classification model highly depends on input feature subsets from a dataset, which may contain redundant, irrelevant, or noisy features. This challenge can be worse while dealing with medical datasets. The main aim of feature selection as a pre-processing task is to eliminate these features and select the most effective ones. In the literature, metaheuristic algorithms show a successful performance to find optimal feature subsets. In this paper, two binary metaheuristic algorithms named S-shaped binary Sine Cosine Algorithm (SBSCA) and V-shaped binary Sine Cosine Algorithm (VBSCA) are proposed for feature selection from the medical data. In these algorithms, the search space remains continuous, while a binary position vector is generated by two transfer functions S-shaped and V-shaped for each solution. The proposed algorithms are compared with four latest binary optimization algorithms over five medical datasets from the UCI repository. The experimental results confirm that using both bSCA variants enhance the accuracy of classification on these medical datasets compared to four other algorithms.
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
This paper presents a hybrid data mining approach based on supervised learning and unsupervised learning to identify the closest data patterns in the data base. This technique enables to achieve the maximum accuracy rate with minimal complexity. The proposed algorithm is compared with traditional clustering and classification algorithm and it is also implemented with multidimensional datasets. The implementation results show better prediction accuracy and reliability.
Leave one out cross validated Hybrid Model of Genetic Algorithm and Naïve Bay...IJERA Editor
This document presents a new hybrid model for feature selection and classification that combines genetic algorithm and naive Bayes. The proposed method first uses a binary coded genetic algorithm to select a reduced subset of important features from datasets. It then applies a naive Bayes classification method to evaluate the selected features and identify the subset that achieves the highest classification accuracy. The performance of the proposed hybrid model is evaluated on eight datasets and compared to recent publications, finding it achieves satisfactory or higher classification accuracy using fewer features.
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
This document presents a new automatic clustering algorithm called Automatic Merging for Optimal Clusters (AMOC). AMOC is a two-phase iterative extension of k-means clustering that aims to automatically determine the optimal number of clusters for a given dataset. In the first phase, AMOC initializes a large number of clusters k using k-means. In the second phase, it iteratively merges the lowest probability cluster with its closest neighbor, recomputing metrics each time to evaluate if the merge improved clustering quality. The algorithm stops merging once no improvements are found. Experimental results on synthetic and real datasets show AMOC finds nearly optimal cluster structures in terms of number, compactness and separation of clusters.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Sca a sine cosine algorithm for solving optimization problemslaxmanLaxman03209
The document proposes a new population-based optimization algorithm called the Sine Cosine Algorithm (SCA) for solving optimization problems. SCA creates multiple random initial solutions and uses sine and cosine functions to fluctuate the solutions outward or toward the best solution, emphasizing exploration and exploitation. The performance of SCA is evaluated on test functions, qualitative metrics, and by optimizing the cross-section of an aircraft wing, showing it can effectively explore, avoid local optima, converge to the global optimum, and solve real problems with constraints.
In this research, a hybrid wrapper model is proposed to identify the featured gene subset from the gene expression data. To balance the gap between exploration
and exploitation, a hybrid model with a popular meta-heuristic algorithm named
spider monkey optimizer (SMO) and simulated annealing (SA) is applied. In the proposed model, ReliefF is used as a filter to obtain the relevant gene subset
from dataset by removing the noise and outliers prior to feeding the data to the
wrapper SMO. To enhance the quality of the solution, simulated annealing is
deployed as local search with the SMO in the second phase, which will guide to the detection of the most optimal feature subset. To evaluate the performance of the proposed model, support vector machine (SVM) as a fitness function to recognize the most informative biomarker gene from the cancer datasets along with University of California, Irvine (UCI) datasets. To further evaluate the model, 4 different classifiers (SVM, na¨ıve Bayes (NB), decision tree (DT), and k-nearest neighbors (KNN)) are used. From the experimental results and analysis, it’s noteworthy to accept that the ReliefF-SMO-SA-SVM performs relatively better than its state-of-the-art counterparts. For cancer datasets, our model performs better in terms of accuracy with a maximum of 99.45%.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
AUTOMATED TEST CASE GENERATION AND OPTIMIZATION: A COMPARATIVE REVIEWijcsit
Software testing is the primary phase, which is performed during software development and it is carried by a sequence of instructions of test inputs followed by expected output. Evolutionary algorithms are most popular in the computational field based on population. The test case generation process is used to identify
test cases with resources and also identifies critical domain requirements. The behavior of bees is based on
population and evolutionary method. Bee Colony algorithm (BCA) has gained superiority in comparison to other algorithms in the field of computation. The Harmony Search (HS) algorithm is based on the enhancement process of music. When musicians compose the harmony through different possible combinations of the music, at that time the pitches are stored in the harmony memory and the optimization
can be done by adjusting the input pitches and generate the perfect harmony. Particle Swarm Optimization (PSO) is an intelligence based meta-heuristic algorithm where each particle can locate their source of food at different position.. In this algorithm, the particles will search for a better food source position in the hope of getting a better result. In this paper, the role of Artificial Bee Colony, particle swarm optimization
and harmony search algorithms are analyzed in generating random test data and optimized those test data.
Test case generation and optimization through bee colony, PSO and harmony search (HS) algorithms which are applied through a case study, i.e., withdrawal operation in Bank ATM and it is observed that these algorithms are able to generate suitable automated test cases or test data in a client manner. This
section further gives the brief details and compares between HS, PSO, and Bee Colony (BC) Optimization
methods which are used for test case or test data generation and optimization.
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Editor IJCATR
Nowadays, the automotive industry has attracted the attention of consumers, and product quality is considered as an
essential element in current competitive markets. Security and comfort are the main criteria and parameters of selecting a car.
Therefore, standard dataset of CAR involving six features and characteristics and 1728 instances have been used. In this paper, it
has been tried to select a car with the best characteristics by using intelligent algorithms (Random Forest, J48, SVM,
NaiveBayse) and combining these algorithms with aggregated classifiers such as Bagging and AdaBoostMI. In this study, speed
and accuracy of intelligent algorithms in identifying the best car have been taken into account.
Artificial bee colony with fcm for data clusteringAlie Banyuripan
This document summarizes a research paper that proposes a modified artificial bee colony (ABC) algorithm for data clustering. The algorithm integrates a fuzzy C-means (FCM) operator into the ABC algorithm. Specifically:
- The ABC algorithm is modified to incorporate an FCM operator, which introduces scout bees based on a fuzzy function rather than randomly.
- In each cycle, after the employed bee and onlooker bee phases, the FCM operator generates a new solution for the scout bees based on previous solutions to improve optimization.
- The algorithm was tested on datasets from a machine learning repository and showed improved clustering quality compared to existing algorithms.
Improved feature selection using a hybrid side-blotched lizard algorithm and ...IJECEIAES
Feature selection entails choosing the significant features among a wide collection of original features that are essential for predicting test data using a classifier. Feature selection is commonly used in various applications, such as bioinformatics, data mining, and the analysis of written texts, where the dataset contains tens or hundreds of thousands of features, making it difficult to analyze such a large feature set. Removing irrelevant features improves the predictor performance, making it more accurate and cost-effective. In this research, a novel hybrid technique is presented for feature selection that aims to enhance classification accuracy. A hybrid binary version of sideblotched lizard algorithm (SBLA) with genetic algorithm (GA), namely SBLAGA, which combines the strengths of both algorithms is proposed. We use a sigmoid function to adapt the continuous variables values into a binary one, and evaluate our proposed algorithm on twenty-three standard benchmark datasets. Average classification accuracy, average number of selected features and average fitness value were the evaluation criteria. According to the experimental results, SBLAGA demonstrated superior performance compared to SBLA and GA with regards to these criteria. We further compare SBLAGA with four wrapper feature selection methods that are widely used in the literature, and find it to be more efficient.
The improved k means with particle swarm optimizationAlexander Decker
This document summarizes a research paper that proposes an improved K-means clustering algorithm using particle swarm optimization. It begins with an introduction to data clustering and types of clustering algorithms. It then discusses K-means clustering and some of its drawbacks. Particle swarm optimization is introduced as an optimization technique inspired by swarm behavior in nature. The proposed algorithm uses particle swarm optimization to select better initial cluster centroids for K-means clustering in order to overcome some limitations of standard K-means. The algorithm works in two phases - the first uses particle swarm optimization and the second performs K-means clustering using the outputs from the first phase.
A hybrid approach from ant colony optimization and K-neare.pdfRayhanaKarar
This document proposes a hybrid approach using ant colony optimization and K-nearest neighbor for feature selection and classification. The approach uses multiple groups of ants, with each group selecting features using a different search criterion. A fitness function composed of heuristic and pheromone values is used to evaluate candidate features. The heuristic value represents the class separability of the feature, while the pheromone value is based on classification accuracy. Sequential forward selection is applied to iteratively select features that improve accuracy. The approach is tested on medical datasets, achieving promising results by increasing accuracy with fewer selected features.
Cuckoo algorithm with great deluge local-search for feature selection problemsIJECEIAES
Feature selection problem is concerned with searching in a dataset for a set of features aiming to reduce the training time and enhance the accuracy of a classification method. Therefore, feature selection algorithms are proposed to choose important features from large and complex datasets. The cuckoo search (CS) algorithm is a type of natural-inspired optimization algorithms and is widely implemented to find the optimum solution for a specified problem. In this work, the cuckoo search algorithm is hybridized with a local search algorithm to find a satisfactory solution for the problem of feature selection. The great deluge (GD) algorithm is an iterative search procedure, that can accept some worse moves to find better solutions for the problem, also to increase the exploitation ability of CS. The comparison is also provided to examine the performance of the proposed method and the original CS algorithm. As result, using the UCI datasets the proposed algorithm outperforms the original algorithm and produces comparable results compared with some of the results from the literature.
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
1. The document presents a hybrid algorithm that combines Kernelized Fuzzy C-Means (KFCM), Hybrid Ant Colony Optimization (HACO), and Fuzzy Adaptive Particle Swarm Optimization (FAPSO) to improve clustering of electrocardiogram (ECG) beat data.
2. The algorithm maps data into a higher dimensional space using kernel functions to make clusters more linearly separable, addresses issues with KFCM being sensitive to initialization and prone to local minima.
3. It uses HACO to optimize cluster centers and membership degrees, and FAPSO to evaluate fitness values and optimize weight vectors, forming usable clusters for applications like ECG classification.
Single parent mating in genetic algorithm for real robotic system identificationIAESIJAI
System identification (SI) is a method of determining a mathematical model
for a system given a set of input-output data. A representation is made using
a mathematical model based on certain specified assumptions. In SI, model
structure selection is a step where a model structure perceived as an adequate
system representation is selected. A typical rule is that the final model must
have a good balance between parsimony and accuracy. As a popular search
method, genetic algorithm (GA) is used for selecting a model structure.
However, the optimality of the final model depends much on the
effectiveness of GA operators. This paper presents a mating technique
named single parent mating (SPM) in GA for use in a real robotic SI. This
technique is based on the chromosome structure of the parents such that a
single parent is sufficient in achieving mating that eases the search for the
optimal model. The results show that using three different objective
functions (Akaike information criterion, Bayesian information criterion and
parameter magnitude–based information criterion 2) respectively, GA with
the mating technique is able to find more optimal models than without the
mating technique. Validations show that the selected models using the
mating technique are acceptable.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINEaciijournal
This document presents a new decision tree algorithm for data mining in medicine. It proposes using a modified C4.5 algorithm that selects either one or two attributes as splitting criteria at each node, choosing the option with the highest information gain ratio. The algorithm is tested on breast cancer and dermatology datasets, achieving equivalent or higher accuracy compared to standard C4.5. The results indicate the proposed algorithm improves classification performance and reduces tree depth compared to C4.5. However, further testing is still needed on more complex medical datasets before generalizing the findings. The algorithm may help increase diagnostic precision if incorporated into medical decision support systems.
Advanced Computational Intelligence: An International Journal (ACII)aciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
The document discusses using genetic algorithms for feature selection and the Weka tool for classification of an ovarian cancer dataset. It begins by introducing data mining and classification techniques. It then describes using a genetic algorithm to select relevant features from the ovarian cancer dataset, reducing it from over 15,000 attributes to 20. These selected features are input into Weka, which applies additional data preprocessing before using various classifiers. The Simple Logistic classifier achieved the highest accuracy rate of 100%. In conclusion, genetic algorithms help reduce data dimensionality for classification, and Weka is useful for evaluating different machine learning algorithms on medical datasets.
The document describes a proposed fast clustering-based feature subset selection (FAST) algorithm for high-dimensional data. The FAST algorithm works in two steps: 1) clustering features using minimum spanning tree methods, and 2) selecting the most representative feature from each cluster. This identifies useful and independent features efficiently. Experimental results on 35 real-world datasets demonstrate that FAST produces smaller feature subsets and improves classifier performance compared to other feature selection algorithms.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
An efficient and powerful advanced algorithm for solving real coded numerica...IOSR Journals
This document discusses an artificial bee colony algorithm with crossover operators for solving numerical optimization problems. The algorithm is based on the intelligent behavior of honey bee swarms. It introduces crossover operations between individual food source positions to generate new offspring. The offspring replace parents if they have better fitness. The algorithm is tested on standard benchmark functions like Griewank and Rosenbrock and results compared to an X-ABC algorithm. Results show the ABC with crossover performs better with fewer parameters.
Similar to Welcome to International Journal of Engineering Research and Development (IJERD) (20)
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
Distributed Denial of Service (DDoS) Attacks became a massive threat to the Internet. Traditional
Architecture of internet is vulnerable to the attacks like DDoS. Attacker primarily acquire his army of Zombies,
then that army will be instructed by the Attacker that when to start an attack and on whom the attack should be
done. In this paper, different techniques which are used to perform DDoS Attacks, Tools that were used to
perform Attacks and Countermeasures in order to detect the attackers and eliminate the Bandwidth Distributed
Denial of Service attacks (B-DDoS) are reviewed. DDoS Attacks were done by using various Flooding
techniques which are used in DDoS attack.
The main purpose of this paper is to design an architecture which can reduce the Bandwidth
Distributed Denial of service Attack and make the victim site or server available for the normal users by
eliminating the zombie machines. Our Primary focus of this paper is to dispute how normal machines are
turning into zombies (Bots), how attack is been initiated, DDoS attack procedure and how an organization can
save their server from being a DDoS victim. In order to present this we implemented a simulated environment
with Cisco switches, Routers, Firewall, some virtual machines and some Attack tools to display a real DDoS
attack. By using Time scheduling, Resource Limiting, System log, Access Control List and some Modular
policy Framework we stopped the attack and identified the Attacker (Bot) machines
Hearing loss is one of the most common human impairments. It is estimated that by year 2015 more
than 700 million people will suffer mild deafness. Most can be helped by hearing aid devices depending on the
severity of their hearing loss. This paper describes the implementation and characterization details of a dual
channel transmitter front end (TFE) for digital hearing aid (DHA) applications that use novel micro
electromechanical- systems (MEMS) audio transducers and ultra-low power-scalable analog-to-digital
converters (ADCs), which enable a very-low form factor, energy-efficient implementation for next-generation
DHA. The contribution of the design is the implementation of the dual channel MEMS microphones and powerscalable
ADC system.
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
-A composite beam is composed of a steel beam and a slab connected by means of shear connectors
like studs installed on the top flange of the steel beam to form a structure behaving monolithically. This study
analyzes the effects of the tensile behavior of the slab on the structural behavior of the shear connection like slip
stiffness and maximum shear force in composite beams subjected to hogging moment. The results show that the
shear studs located in the crack-concentration zones due to large hogging moments sustain significantly smaller
shear force and slip stiffness than the other zones. Moreover, the reduction of the slip stiffness in the shear
connection appears also to be closely related to the change in the tensile strain of rebar according to the increase
of the load. Further experimental and analytical studies shall be conducted considering variables such as the
reinforcement ratio and the arrangement of shear connectors to achieve efficient design of the shear connection
in composite beams subjected to hogging moment.
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
Gold has been extracted from northeast Africa for more than 5000 years, and this may be the first
place where the metal was extracted. The Arabian-Nubian Shield (ANS) is an exposure of Precambrian
crystalline rocks on the flanks of the Red Sea. The crystalline rocks are mostly Neoproterozoic in age. ANS
includes the nations of Israel, Jordan. Egypt, Saudi Arabia, Sudan, Eritrea, Ethiopia, Yemen, and Somalia.
Arabian Nubian Shield Consists of juvenile continental crest that formed between 900 550 Ma, when intra
oceanic arc welded together along ophiolite decorated arc. Primary Au mineralization probably developed in
association with the growth of intra oceanic arc and evolution of back arc. Multiple episodes of deformation
have obscured the primary metallogenic setting, but at least some of the deposits preserve evidence that they
originate as sea floor massive sulphide deposits.
The Red Sea Hills Region is a vast span of rugged, harsh and inhospitable sector of the Earth with
inimical moon-like terrain, nevertheless since ancient times it is famed to be an abode of gold and was a major
source of wealth for the Pharaohs of ancient Egypt. The Pharaohs old workings have been periodically
rediscovered through time. Recent endeavours by the Geological Research Authority of Sudan led to the
discovery of a score of occurrences with gold and massive sulphide mineralizations. In the nineties of the
previous century the Geological Research Authority of Sudan (GRAS) in cooperation with BRGM utilized
satellite data of Landsat TM using spectral ratio technique to map possible mineralized zones in the Red Sea
Hills of Sudan. The outcome of the study mapped a gossan type gold mineralization. Band ratio technique was
applied to Arbaat area and a signature of alteration zone was detected. The alteration zones are commonly
associated with mineralization. The alteration zones are commonly associated with mineralization. A filed check
confirmed the existence of stock work of gold bearing quartz in the alteration zone. Another type of gold
mineralization that was discovered using remote sensing is the gold associated with metachert in the Atmur
Desert.
Reducing Corrosion Rate by Welding DesignIJERD Editor
This document summarizes a study on reducing corrosion rates in steel through welding design. The researchers tested different welding groove designs (X, V, 1/2X, 1/2V) and preheating temperatures (400°C, 500°C, 600°C) on ferritic malleable iron samples. Testing found that X and V groove designs with 500°C and 600°C preheating had corrosion rates of 0.5-0.69% weight loss after 14 days, compared to 0.57-0.76% for 400°C preheating. Higher preheating reduced residual stresses which decreased corrosion. Residual stresses were 1.7 MPa for optimal X groove and 600°C
Router 1X3 – RTL Design and VerificationIJERD Editor
Routing is the process of moving a packet of data from source to destination and enables messages
to pass from one computer to another and eventually reach the target machine. A router is a networking device
that forwards data packets between computer networks. It is connected to two or more data lines from different
networks (as opposed to a network switch, which connects data lines from one single network). This paper,
mainly emphasizes upon the study of router device, it‟s top level architecture, and how various sub-modules of
router i.e. Register, FIFO, FSM and Synchronizer are synthesized, and simulated and finally connected to its top
module.
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
This paper presents a component within the flexible ac-transmission system (FACTS) family, called
distributed power-flow controller (DPFC). The DPFC is derived from the unified power-flow controller (UPFC)
with an eliminated common dc link. The DPFC has the same control capabilities as the UPFC, which comprise
the adjustment of the line impedance, the transmission angle, and the bus voltage. The active power exchange
between the shunt and series converters, which is through the common dc link in the UPFC, is now through the
transmission lines at the third-harmonic frequency. DPFC multiple small-size single-phase converters which
reduces the cost of equipment, no voltage isolation between phases, increases redundancy and there by
reliability increases. The principle and analysis of the DPFC are presented in this paper and the corresponding
simulation results that are carried out on a scaled prototype are also shown.
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
Power quality has been an issue that is becoming increasingly pivotal in industrial electricity
consumers point of view in recent times. Modern industries employ Sensitive power electronic equipments,
control devices and non-linear loads as part of automated processes to increase energy efficiency and
productivity. Voltage disturbances are the most common power quality problem due to this the use of a large
numbers of sophisticated and sensitive electronic equipment in industrial systems is increased. This paper
discusses the design and simulation of dynamic voltage restorer for improvement of power quality and
reduce the harmonics distortion of sensitive loads. Power quality problem is occurring at non-standard
voltage, current and frequency. Electronic devices are very sensitive loads. In power system voltage sag,
swell, flicker and harmonics are some of the problem to the sensitive load. The compensation capability
of a DVR depends primarily on the maximum voltage injection ability and the amount of stored
energy available within the restorer. This device is connected in series with the distribution feeder at
medium voltage. A fuzzy logic control is used to produce the gate pulses for control circuit of DVR and the
circuit is simulated by using MATLAB/SIMULINK software.
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
Additive manufacturing process, also popularly known as 3-D printing, is a process where a product
is created in a succession of layers. It is based on a novel materials incremental manufacturing philosophy.
Unlike conventional manufacturing processes where material is removed from a given work price to derive the
final shape of a product, 3-D printing develops the product from scratch thus obviating the necessity to cut away
materials. This prevents wastage of raw materials. Commonly used raw materials for the process are ABS
plastic, PLA and nylon. Recently the use of gold, bronze and wood has also been implemented. The complexity
factor of this process is 0% as in any object of any shape and size can be manufactured.
Spyware triggering system by particular string valueIJERD Editor
This computer programme can be used for good and bad purpose in hacking or in any general
purpose. We can say it is next step for hacking techniques such as keylogger and spyware. Once in this system if
user or hacker store particular string as a input after that software continually compare typing activity of user
with that stored string and if it is match then launch spyware programme.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
- Data over the cloud is transferred or transmitted between servers and users. Privacy of that
data is very important as it belongs to personal information. If data get hacked by the hacker, can be
used to defame a person’s social data. Sometimes delay are held during data transmission. i.e. Mobile
communication, bandwidth is low. Hence compression algorithms are proposed for fast and efficient
transmission, encryption is used for security purposes and blurring is used by providing additional
layers of security. These algorithms are hybridized for having a robust and efficient security and
transmission over cloud storage system.
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
A thorough review of existing literature indicates that the Buckley-Leverett equation only analyzes
waterflood practices directly without any adjustments on real reservoir scenarios. By doing so, quite a number
of errors are introduced into these analyses. Also, for most waterflood scenarios, a radial investigation is more
appropriate than a simplified linear system. This study investigates the adoption of the Buckley-Leverett
equation to estimate the radius invasion of the displacing fluid during waterflooding. The model is also adopted
for a Microbial flood and a comparative analysis is conducted for both waterflooding and microbial flooding.
Results shown from the analysis doesn’t only records a success in determining the radial distance of the leading
edge of water during the flooding process, but also gives a clearer understanding of the applicability of
microbes to enhance oil production through in-situ production of bio-products like bio surfactans, biogenic
gases, bio acids etc.
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
- Gesture gaming is a method by which users having a laptop/pc/x-box play games using natural or
bodily gestures. This paper presents a way of playing free flash games on the internet using an ordinary webcam
with the help of open source technologies. Emphasis in human activity recognition is given on the pose
estimation and the consistency in the pose of the player. These are estimated with the help of an ordinary web
camera having different resolutions from VGA to 20mps. Our work involved giving a 10 second documentary to
the user on how to play a particular game using gestures and what are the various kinds of gestures that can be
performed in front of the system. The initial inputs of the RGB values for the gesture component is obtained by
instructing the user to place his component in a red box in about 10 seconds after the short documentary before
the game is finished. Later the system opens the concerned game on the internet on popular flash game sites like
miniclip, games arcade, GameStop etc and loads the game clicking at various places and brings the state to a
place where the user is to perform only gestures to start playing the game. At any point of time the user can call
off the game by hitting the esc key and the program will release all of the controls and return to the desktop. It
was noted that the results obtained using an ordinary webcam matched that of the Kinect and the users could
relive the gaming experience of the free flash games on the net. Therefore effective in game advertising could
also be achieved thus resulting in a disruptive growth to the advertising firms.
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
-LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region[5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits.
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region [5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits. The supported simulation
is done through PSIM 6.0 software tool
Amateurs Radio operator, also known as HAM communicates with other HAMs through Radio
waves. Wireless communication in which Moon is used as natural satellite is called Moon-bounce or EME
(Earth -Moon-Earth) technique. Long distance communication (DXing) using Very High Frequency (VHF)
operated amateur HAM radio was difficult. Even with the modest setup having good transceiver, power
amplifier and high gain antenna with high directivity, VHF DXing is possible. Generally 2X11 YAGI antenna
along with rotor to set horizontal and vertical angle is used. Moon tracking software gives exact location,
visibility of Moon at both the stations and other vital data to acquire real time position of moon.
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
Simple Sequence Repeats (SSR), also known as Microsatellites, have been extensively used as
molecular markers due to their abundance and high degree of polymorphism. The nucleotide sequences of
polymorphic forms of the same gene should be 99.9% identical. So, Microsatellites extraction from the Gene is
crucial. However, Microsatellites repeat count is compared, if they differ largely, he has some disorder. The Y
chromosome likely contains 50 to 60 genes that provide instructions for making proteins. Because only males
have the Y chromosome, the genes on this chromosome tend to be involved in male sex determination and
development. Several Microsatellite Extractors exist and they fail to extract microsatellites on large data sets of
giga bytes and tera bytes in size. The proposed tool “MS-Extractor: An Innovative Approach to extract
Microsatellites on „Y‟ Chromosome” can extract both Perfect as well as Imperfect Microsatellites from large
data sets of human genome „Y‟. The proposed system uses string matching with sliding window approach to
locate Microsatellites and extracts them.
Importance of Measurements in Smart GridIJERD Editor
- The need to get reliable supply, independence from fossil fuels, and capability to provide clean
energy at a fixed and lower cost, the existing power grid structure is transforming into Smart Grid. The
development of a smart energy distribution grid is a current goal of many nations. A Smart Grid should have
new capabilities such as self-healing, high reliability, energy management, and real-time pricing. This new era
of smart future grid will lead to major changes in existing technologies at generation, transmission and
distribution levels. The incorporation of renewable energy resources and distribution generators in the existing
grid will increase the complexity, optimization problems and instability of the system. This will lead to a
paradigm shift in the instrumentation and control requirements for Smart Grids for high quality, stable and
reliable electricity supply of power. The monitoring of the grid system state and stability relies on the
availability of reliable measurement of data. In this paper the measurement areas that highlight new
measurement challenges, development of the Smart Meters and the critical parameters of electric energy to be
monitored for improving the reliability of power systems has been discussed.
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
The document summarizes a study on the use of ground granulated blast furnace slag (GGBS) and limestone powder to replace cement in self-compacting concrete (SCC). Tests were conducted on SCC mixes with 0-50% replacement of cement with GGBS and 0-20% replacement with limestone powder. The results showed that replacing 30% of cement with GGBS and 15% with limestone powder produced SCC with the highest compressive strength of 46MPa, meeting fresh property requirements. The study concluded that this ternary blend of cement, GGBS and limestone powder can improve SCC properties while reducing costs.
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Welcome to International Journal of Engineering Research and Development (IJERD)
1. International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com
Volume 6, Issue 4 (March 2013), PP. 114-121
A Hybrid Classifier based Rule Extraction System
Tulips Angel Thankachan1, Dr. Kumudha Raimond2
1
M.Tech Student, Department of Computer Science & Engineering, Karunya University, Coimbatore.
2
Professor, Department of Computer Science & Engineering, Karunya University, Coimbatore.
Abstract:- Classification is one of the common tasks performed in knowledge discovery and machine
learning. Different learning algorithms can be applied to induce various forms of classification
knowledge. This type of learning process results in creating classification system called classifier.
Measure used to evaluate such systems is classification accuracy. Classification accuracy of individual
classifiers can be improved through combining classifiers. This paper proposes a hybrid classifier
model called DTABC (Decision Tree and Artificial Bee Colony Algorithm) to improve the
classification accuracy irrespective to size, dimensionality, class distribution and domain of the dataset.
This proposed system comprises of two phases: in the first phase, rules are extracted from the training
dataset using C4.5 decision tree algorithm. In the second phase, ABC algorithm is applied over C4.5
produced rules to produce more optimized rules. The proposed system has been compared with C4.5,
DTGA and Naïve Bayes. The results show that the proposed hybrid classifier provides better
classification accuracy.
Keywords:- artificial bee colony algorithm, classification, c4.5, hybrid classifier, fitness function,
accuracy.
I. INTRODUCTION
Nowadays, the volume of digital data is increasing tremendously, so the amount of raw data extracted
is also increasing [1]. Data mining (DM) is one of the methods to customize and manipulate the data according
to our need. The knowledge extracted by DM should be readable, accurate and ease to understand. Its process
can also be called as knowledge discovery. It can be applicable in areas such as parallel computing, artificial
intelligence and database. Various types of algorithms such as Evolutionary Algorithm (EA) [2], Classification
algorithm, Clustering algorithm can be applied in DM. Genetic algorithm (GA) [3], Artificial bee colony (ABC)
algorithm [4], Ant colony optimization algorithm (ACO) [5], Particle swam optimizations (PSO) [6] are some of
the EA algorithms which are being used in DM.
Classification includes assigning a decision class label to a set of unclassified objects described by a
fixed set of attributes. Different learning algorithms can be applied to bring on various forms of classification
knowledge from provided learning examples. This knowledge can be successively used to classify new objects.
In this sense learning process results in creating classification system called classifier. Classifier is to categorize
datasets into various classes based on the features of the dataset. A typical measure used to evaluate such
systems is classification accuracy. Classification algorithms have been implemented over various learning tasks
in different domains such as financial classification, manufacturing control chemical process control, and robot
control.
Many methods are applied in DM for classification and rule extraction processes[16]. One approach in
DM for classification is GA. From a set of possible solutions GA generates a best new solution. It replaces the
worst solution with best solutions. The main application done with GA are oil collecting routing problem [7].
Another method used in DM is PSO [1]. It simulates the coordinated motion in flocks of birds. The main benefit
of using PSO is that it reduces the complexity and speeds up the DM process [8]. The main application done
with PSO is mining of breast cancer pattern [6]. Another method used in DM is ACO. It explains the social
behavior of ant colonies. An example of ACO application is prediction of protein activity [5]. The major
approach involves combining GA and ACO in prediction postsynaptic activity of proteins. The hybrid algorithm
takes the advantages of both ACO and GA methods by complementing each other for feature selection in
protein function prediction. Another hybrid PSO/ACO algorithm also has been proposed for discovering
classification rules. Another hybrid model in DM for classification and rule extraction is Decision Tree and
Genetic algorithm (DTGA) [12]. In this hybrid model the primary rules are generated by C4.5 and those rules
are given as the input to GA. GA replaces the worst rules with the best one and produce more optimized set of
rules. It provides better classification accuracy than individual classifiers. GA is a primeval algorithm and more
efficient algorithms are available to overcome the limitations of GA. As ABC is a recent optimization algorithm,
114
2. A Hybrid Classifier based Rule Extraction System
it is used in this work along with C4.5 algorithm to improve the classification accuracy of the dataset
irrespective of domain, size and dimensionality.
The rest of this paper is organized as follows: Section 2 covers the proposed system and the working of
the optimization algorithm. Section 3 explains about the other classifiers used for comparing the performance of
the proposed work. Experimental design and the results of this work are shown in section 4. Conclusions are
given in Section 5.
II. DTABC
From the survey on different machine learning systems, some of the classification algorithm will solve
only some specific problems. In the hybrid model of DTGA [12], GA is a primeval optimization algorithm. So
that it is not providing much better classification accuracy and optimized rules. In this study a new hybrid model
is proposed with a recent optimization algorithm for providing more optimized rules and better classification
accuracy, irrespective of size, domain and dimensionality of the input dataset. DTABC mainly consists of three
phases as shown in figure 1. In the first phase, the UCI (University of California at Irvine) repository datasets
are given as input to C4.5 for classifying the dataset based on the classes. In the second phase, the IF-THEN
form rules are extracted. In the third phase, IF-THEN form rules are eliminated and ABC algorithm is applied
over the rules to produce more optimized set of rules which in turn increased the classification accuracy of the
proposed system.
A. C4.5: A Decision Tree Classifier
C4.5 is a statistical classifier used to generate decision tree [12]. It is developed by Ross Quinlan. It
overcomes the limitations of Quinlan's ID3 algorithm. The C4.5 classifier is mainly used for classification. It is
a directed tree which consists of nodes. It is having a node called a “root” along with various leaf nodes. The
root node is not having any incoming node but it is having one or more outgoing nodes. All the leaf nodes have
exactly one incoming node but it may or may not have outgoing nodes. Information gain indicates how much
informative each node is. Based on the information gain of the nodes, the positions of nodes are decided.
Therefore the instances are classified by navigating them from the most informative node down to a leaf.
Pruning strategies are applied over the tree to stop the growth of the tree.
B. ABC Algorithm
ABC is one of the most recent optimization algorithm introduced by Dervis Karaboga in 2005 [4]. It is
swam intelligent based algorithm. It explains the intelligent behaviour of honey bees. It provides a population-
based search procedure. The main aim of bees is to discover the places of food sources which is having high
nectar amount. The employed and onlooker bees fly around the multidimensional search space for finding the
best food sources depending on their experiences and their neighbour bees. And the scout bees fly around and
find the food positions randomly without any experience. If the new source is having the higher nectar amount,
it replaces the previous source with the new one in the memory. Thus the algorithm combines the local and
global search methods for balancing the exploitation and exploration processes. ABC algorithm is a simple
concept and easy to implement. It is applied over various applications such as digital IIR filters [10], protein
tertiary structures [9], artificial neural networks [11] etc.
115
3. A Hybrid Classifier based Rule Extraction System
The foraging selection of bees leads to the appearance of collective intelligence of honey bee swarms. It consists
of three essential components: unemployed foragers, employed foragers and food sources.
1) Unemployed foragers: scouts and onlookers are the two types of unemployed foragers. The main aim of
them are exploring and exploiting food source. At the initial stage of ABC algorithm the unemployed
foragers are having two choices: either it becomes a scout or an onlooker.
2) Employed foragers: the employed foragers used to store the food source information and using it
according to some calculated probability. When the food source has been exhausted the employed
foragers will act as scout bees.
3) Food sources: The food sources represented the position of the solutions. It has been fixed by calculating
the fitness function.
The colony of artificial bees in this algorithm divided into three phases: employed bees, onlooker bees
and scout bees. The number of onlooker and employed bees are same and the sum of these two is the colony
size. The overall flowchart of the ABC algorithm is shown in figure 2[16]. Initially the algorithm is generating
an initial population by randomly selecting the food source positions. Next the initial nectar amount is calculated
using equation 2 of newly generated population. Then the population is subjected to repeat cycles of the search
courses of the three phases. Using the search equation as shown in equation 1 the employed bee is producing
modifications on the population and calculating the nectar amount. If the new nectar amount is better than the
previous nectar amount, the employed bees will memorize the new positions; otherwise it keeps the previous
nectar amount. After completing the search process of all the employed bees, they share the information such as
nectar amount, new food positions of food source with the onlooker bees.
The onlooker bees evaluate the nectar amount information of the employed bees and they start the
search process. The onlooker bees finding the food positions by depends on the probability equation 3. If the
new nectar amount is better than the previous one they will memorize the new one and ignoring the previous
one. The search process will complete when it satisfies the termination criteria.
C. Search Equation
In ABC algorithm, the bees are doing local search for finding the new possible food positions.
(1)
116
4. A Hybrid Classifier based Rule Extraction System
Equation 1 shows the local search strategy of the ABC algorithm. Vij is the new food position, k and j
are the randomly chosen parameters and both are different, where k € [1,2,…,SN] and j € [1,2,…,D] . Xij and Xkj
are the old food source positions. The difference between these two positions is the distance from one food
source to the other one. D is the number of optimization parameters and SN is the number of employed bees. Φij
is a random number between [-1, 1].
D. Nectar Amount
Nectar amount is also called as fitness function. It is an objective function. The fitness formula used in
ABC algorithm is shown in equation 2:
𝑛−𝑚 𝑛
𝑓 𝑟𝑖 = + (2)
𝑛+𝑚 𝑚 +𝑘
Where, ri is the ith rule, „n‟ is the number of training examples satisfying all the conditions in the
antecedent (A) as well as the consequent (C) of the rule (ri). „m‟ is the number of training examples which
satisfy all the conditions in the antecedent (A) part but not the consequent (C) of the rule (ri). „k‟ is the
predefined positive constant value. Here k=4.
E. Probability Equation
It is one of important functions that supports the algorithm[16]. It is shown in equation 3:
𝑓𝑖𝑡
𝑃𝑖 = 𝑆𝑁 𝑖 (3)
𝑛 −1 𝑓𝑖𝑡 𝑛
th th
Where, Pi is the probability value associated with i food source. fiti represents i food source‟s nectar
amounts. SN is the number of food source which is equal to the number of employed bees.
F. Classification Accuracy
The overall classification accuracy explains the overall performance of the system. It is calculated
using equation 4:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑
𝐹= ∗ 100 (4)
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠
III. OTHER CLASSIFIERS
To compare the classification accuracy of the proposed system DTABC, three other classifiers such as
C4.5, DTGA (Decision Tree and Genetic algorithm) [12] and Naïve Bayes are used.
A. DTGA
DTGA is a hybrid classifier used to improve the classification accuracy. It mainly consists of three
phases [12]. In the first phase the UCI dataset is given as the input for C4.5 to generate the rules of „IF-Then‟
form. In its next phase discretized rules are produced by eliminating the IF-Then clause. In the last phase, the
GA is applied over the discretized rules to generate more optimized set of rules. The overall classification
accuracy of the system is calculated using equation 4.
B. Naive Bayes
Naive Bayesian classifier is a statistical classifier. This classifier is based on the Bayes‟ Theorem and
the maximum posteriori hypothesis. It is having strong independence (naïve) assumptions.
P E H x P(H)
Bayes‟s rule : P HE = (5)
P(E)
The necessary scheme of Bayes's rule is that the product of a hypothesis or an event (H) can be predicted based
on some evidences (E) that can be observed. From Bayes's rule, we have
(1) A priori probability of H or P(H): This is the probability of an event before the evidence is observed.
(2) A posterior probability of H or P(H | E): This is the probability of an event after the evidence is observed.
IV. EXPERIMENTAL DESIGN AND RESULTS
A. Experimental Study
This subsection describes the details of implementation of the proposed work. Totally four algorithms
such as C4.5, DTGA, Naive Bayes and DTABC are used for the comparison purpose. C4.5 and Naive Bayes are
implemented using WEKA (Waikato Environment for Knowledge Analysis, Version 3.4.2) DM tool. DTGA
and DTABC are hybrid model classifiers. DTGA is composed with C4.5 and GA and DTABC is composed with
C4.5 and ABC. GA and ABC are implemented in java-1.4.1 on Intel core i3 running on windows 7. All
experiments are performed on the same machine. The algorithms are tested over 5 benchmark datasets from UCI.
117
5. A Hybrid Classifier based Rule Extraction System
Table 1 gives the features of the datasets used. All the algorithms are evaluated using these datasets irrespective
of domain, size, dimensionality and class distribution.
Table1: Summary of UCI datasets
Dataset Number of Non- Number Number of
Target Attributes of Classes Examples
Contact- Lenses 4 3 24
Diabetes 8 2 768
Glass 9 7 214
Heart-Statlog 13 2 270
Weather 4 2 14
B. Experimental Analysis
1) Description of DataSets
As an example, weather dataset has been taken as input to DTABC. It is having 14 instances. It
contains 4 non target attributes and 1 target attribute. Weather dataset is shown in table 4. There are two types of
attributes in the dataset such as target and non target attributes. The attributes and its values are shown in table 2
and table 3.
Table 2 Non-Target attributes with value
Attribute Attribute values
Outlook Sunny Overcast Rain
Humidity Continuous
Temperature Continuous
Windy True False
Table 3 Target attribute with value
Attribute Attribute values
Playing- Decision Yes No
Table 4 Original weather dataset
Day Outlook Humidity Temperature Windy Playing-
Decision
1 Sunny 85 46 False No
2 Sunny 88 45 True No
3 Overcast 82 42 False Yes
4 Rain 94 25 False Yes
5 Rain 70 20 False Yes
6 Rain 65 15 True No
7 Overcast 66 14 True Yes
8 Sunny 85 28 False No
9 Sunny 70 15 False Yes
10 Rain 72 36 False Yes
11 Sunny 65 25 True Yes
12 Overcast 90 22 True Yes
13 Overcast 75 41 False Yes
118
6. A Hybrid Classifier based Rule Extraction System
14 Rain 80 21 True No
2) Phase 1
The attributes are discretized to reduce the range. The basic conditions used for discretizing the dataset
are given in table 5. The discretized dataset corresponding to the original dataset is shown in table 6.
Table 5 Conditions used for discretizing the dataset
Attributes Basic conditions to discretize
Playing-decision Play(1) Yes Don‟t(0): no
Outlook Sunny(1) Overcast(2) Rain(3)
Humidity High(1): >=75 Normal(2) <75
Temperature Hot(1): >36 Mild(2): (20,36) Cool(3): <=20
Windy True(1) False(2)
Table 6 Discretized form of weather dataset
Day Outlook Humidity Temperature Windy Playing-Decision
1 1 1 1 2 0
2 1 1 1 1 0
3 2 1 1 2 1
4 3 1 2 2 1
5 3 2 3 2 1
6 3 2 3 1 0
7 2 2 3 1 1
8 1 1 2 2 0
9 1 2 3 2 1
10 3 2 2 2 1
11 1 2 2 1 1
12 2 1 2 1 1
13 2 2 1 2 1
14 3 1 2 1 0
3) Phase 2
From the discretized dataset, five primary rules are generated by C4.5. It is in IF- THEN structure. The
following are the rules generated by C4.5.
Rule 1: IF (Outlook=sunny) AND (Humidity <=75) THEN (Playing-decision=Yes).
Rule 2: IF (Outlook=sunny) AND (Humidity >75) THEN (Playing-decision=No).
Rule 3: IF (Outlook=overcast) THEN (Playing-decision=Yes).
Rule 4: IF (Outlook=rainy) AND (Windy=True) THEN (Playing-decision=No).
Rule 5: IF (Outlook=rainy) AND (Windy=False) THEN (Playing-decision=Yes).
As C4.5 is a decision tree. The decision tree generated after the classification is shown in figure 3.
119
7. A Hybrid Classifier based Rule Extraction System
Next step is to eliminate the IF-THEN clause of C4.5 generated rules. i.e., discretizing the C4.5
generated rules (table 7) to apply ABC algorithm for getting more optimized set of rules.
Table 7 Discretized form of C4.5 generated rules
Outlook Humidity Temperature Windy Playing-decision
Rule 1 1 2 * * 1
Rule 2 1 1 * * 0
Rule 3 2 * * * 1
Rule 4 3 * * 1 0
Rule 5 3 * * 2 1
(„*‟ denotes the don‟t care condition. i.e., those attributes has no importance on that rule)
4) Phase 3
Next step is to apply ABC algorithm over the discretized ruleset. As ABC is an optimization algorithm,
it generates more optimized set of rules which are shown in Table 8.
Table 8 Rule set generated by ABC algorithm
Outlook Humidity Temperature Windy Playing-decision
Rule1 3 * * * 1
Rule 2 * 2 * * 1
Rule 3 3 * * 2 1
Rule 4 1 1 * * 0
Rule 5 3 * * 1 0
C. Comparison
For analyzing the performance of the proposed hybrid system DTABC, the accuracy of the system is
compared with other three systems such as C4.5, Naïve Bayes and DTGA. From the four classifiers, DTABC
and DTGA are hybrid classifiers and C4.5 and Naïve Bayes are single classifiers. The experiments are done
over 5 datasets from different domain and size. The accuracies of each dataset from each classifier are shown in
table 9.
Table 9: Comparative performances of C4.5, Naïve Bayes, DTGA and DTABC on the UCI datasets
Classification accuracy in %
Dataset C4.5 Naïve Bayes DTGA DTABC
Contact-lenses 83.3333 70.8333 86.6667 93.33
Diabetes 73.8281 76.3021 76.7813 83.57
Glass 66.8224 48.5981 69.4953 74.84
Heart-statlog 76.6667 64.2587 79.7333 85.87
Weather 64.2857 59.4286 66.8571 72
The graph which represents the comparative performance of the classifiers is shown in figure 4.
Figure 4 Comparative performance of classification accuracies (%).
120
8. A Hybrid Classifier based Rule Extraction System
From table 9, it is clear that proposed DTABC system is providing better classification accuracy than
C4.5, Naïve Bayes and DTGA.
V. CONCLUSIONS
From the survey of the hybrid classifiers [12, 13, 14] and the proposed approach, it is clear that the works are
done for improving the classification accuracy irrespective of domain, size and dimensionality. The proposed
approach has been experimented over 5 datasets of different size and domain. From the analysis of the work, it
is clear that the DTABC is providing optimized set of rules which is depicted through the better classification
accuracy than other classifiers taken for comparison. As DTGA is also a hybrid classifier, it is having lesser
performance than DTABC. So ABC algorithm is better optimization algorithm than GA to produce more
optimized set of rules irrespective of domain, size and dimensionality.
REFERENCES
[1]. Sousa, T., Silva, A., & Neves, A. (2004). Particle Swarm based Data Mining Algorithms for
classification tasks.Parallel Computing, 30(5-6), 767-783.
[2]. Tan, K. C., Teoh, E. J., Yu, Q., & Goh, K. C. (2009). A hybrid evolutionary algorithm for attribute
selection indata mining. Expert Systems with Applications, 36(4), 8616-8630.
[3]. Sumida, B. H., Houston, A. I., McNamara, J. M., & Hamilton, W. D. (1990). Genetic algorithms and
evolution.Journal of Theoretical Biology, 147(1), 59-84.
[4]. Karaboga, D., & Basturk, B. (2007). A powerful and efficient algorithm for numerical function
optimization:artificial bee colony (ABC) algorithm. Journal of Global Optimization, 39(3), 459-471.
[5]. Nemati, S., Basiri, M. E., Ghasem-Aghaee, N., & Aghdam, M. H. (2009). A novel ACO-GA hybrid
algorithm for feature selection in protein function prediction. Expert Systems with Applications, 36(10),
12086-12094.
[6]. Kennedy, J. (2006). Swarm Intelligence (pp. 187-219).
[7]. Santos, H. G., Ochi, L. S., Marinho, E. H., & Drummond, L. M. A. (2006). Combining an evolutionary
algorithm with data mining to solve a single-vehicle routing problem. Neurocomputing, 70(1-3), 70-77.
[8]. Yeh, W.-C., Chang, W.-W., & Chung, Y. Y. (2009). A new hybrid approach for mining breast cancer
patternusing discrete particle swarm optimization and statistical method. Expert Systems with
Applications, 36(4),8204-8211.
[9]. Bahamish, H. A. A., Abdullah, R., & Salam, R. A. (2009). Protein Tertiary Structure Prediction Using
Artificial Bee Colony Algorithm. Paper presented at the Modelling & Simulation, 2009. AMS '09. Third
Asia International Conference on Modelling & Simulation.
[10]. Karaboga, N. (2009). A new design method based on artificial bee colony algorithm for digital IIR
filters.Journal of the Franklin Institute, 346(4), 328-348.
[11]. Karaboga, D., & Akay, B. B. (2005). An artificial bee colony (abc) algorithm on training artificial
neural networks (Technical Report TR06): Erciyes University, Engineering Faculty, Computer
Engineering Department.
[12]. Sarkar B.K., Sana S.S., Chaudhari K., A genetic algorithm based rule extraction system. Artificial soft
computing,vol. 12., pp. 238–254., 2012.
[13]. B.K. Sarkar, S.S. Sana, K.S. Choudhury, Accuracy Based, Learning classification system, International
Journal of Information and Decision Sciences 2 (1) (2010) 68–85.
[14]. B.K. Sarkar, S.S. Sana, A hybrid approach to design efficient learning classifiers, Journal, Computers
and Mathematics with Applications 58 (2009) 65–73.
[15]. H. Langseth, T.D. Nielsen, Classification using Hierarchical Naïve Bayes models., Mach Learn.,
Volume: 63, Issue: 2, Pages: 135-159., 2006.
[16]. M. A. M. Shukran, Y. Y. Chung, W.C. Yeh, N. Wahid, A.M.A. Zaidi, Artificial Bee Colony based
Data Mining Algorithms forClassification Tasks, Modern Applied Science, Vol. 5, No. 4; August 2011.
121