Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Using particle swarm optimization to solve test functions problemsriyaniaes
In this paper the benchmarking functions are used to evaluate and check the particle swarm optimization (PSO) algorithm. However, the functions utilized have two dimension but they selected with different difficulty and with different models. In order to prove capability of PSO, it is compared with genetic algorithm (GA). Hence, the two algorithms are compared in terms of objective functions and the standard deviation. Different runs have been taken to get convincing results and the parameters are chosen properly where the Matlab software is used. Where the suggested algorithm can solve different engineering problems with different dimension and outperform the others in term of accuracy and speed of convergence.
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...csandit
Software testing is the primary phase, which is performed during software development and it is
carried by a sequence of instructions of test inputs followed by expected output. The Harmony
Search (HS) algorithm is based on the improvisation process of music. In comparison to other
algorithms, the HSA has gain popularity and superiority in the field of evolutionary
computation. When musicians compose the harmony through different possible combinations of
the music, at that time the pitches are stored in the harmony memory and the optimization can
be done by adjusting the input pitches and generate the perfect harmony. The test case
generation process is used to identify test cases with resources and also identifies critical
domain requirements. In this paper, the role of Harmony search meta-heuristic search
technique is analyzed in generating random test data and optimized those test data. Test data
are generated and optimized by applying in a case study i.e. a withdrawal task in Bank ATM
through Harmony search. It is observed that this algorithm generates suitable test cases as well
as test data and gives brief details about the Harmony search method. It is used for test data
generation and optimization
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Using particle swarm optimization to solve test functions problemsriyaniaes
In this paper the benchmarking functions are used to evaluate and check the particle swarm optimization (PSO) algorithm. However, the functions utilized have two dimension but they selected with different difficulty and with different models. In order to prove capability of PSO, it is compared with genetic algorithm (GA). Hence, the two algorithms are compared in terms of objective functions and the standard deviation. Different runs have been taken to get convincing results and the parameters are chosen properly where the Matlab software is used. Where the suggested algorithm can solve different engineering problems with different dimension and outperform the others in term of accuracy and speed of convergence.
AUTOMATIC GENERATION AND OPTIMIZATION OF TEST DATA USING HARMONY SEARCH ALGOR...csandit
Software testing is the primary phase, which is performed during software development and it is
carried by a sequence of instructions of test inputs followed by expected output. The Harmony
Search (HS) algorithm is based on the improvisation process of music. In comparison to other
algorithms, the HSA has gain popularity and superiority in the field of evolutionary
computation. When musicians compose the harmony through different possible combinations of
the music, at that time the pitches are stored in the harmony memory and the optimization can
be done by adjusting the input pitches and generate the perfect harmony. The test case
generation process is used to identify test cases with resources and also identifies critical
domain requirements. In this paper, the role of Harmony search meta-heuristic search
technique is analyzed in generating random test data and optimized those test data. Test data
are generated and optimized by applying in a case study i.e. a withdrawal task in Bank ATM
through Harmony search. It is observed that this algorithm generates suitable test cases as well
as test data and gives brief details about the Harmony search method. It is used for test data
generation and optimization
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
ABSTRACT: Genetic Algorithms (GAs) have become increasingly useful over the years for solving combinatorial problems. Though they are generally accepted to be good performers among metaheuristic algorithms, most works have concentrated on the application of the GAs rather than the theoretical justifications. In this paper, we examine and justify the suitability of Genetic Algorithms in solving complex, multi-variable and multi-modal optimization problems. To achieve this, a simple Genetic Algorithm was used to solve four standard complicated optimization test functions, namely Rosenbrock, Schwefel, Rastrigin and Shubert functions. These functions are benchmarks to test the quality of an optimization procedure towards a global optimum. We show that the method has a quicker convergence to the global optima and that the optimal values for the Rosenbrock, Rastrigin, Schwefel and Shubert functions are zero (0), zero (0), -418.9829 and -14.5080 respectively
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATAacijjournal
A well-constructed classification model highly depends on input feature subsets from a dataset, which may contain redundant, irrelevant, or noisy features. This challenge can be worse while dealing with medical datasets. The main aim of feature selection as a pre-processing task is to eliminate these features and select the most effective ones. In the literature, metaheuristic algorithms show a successful performance to find optimal feature subsets. In this paper, two binary metaheuristic algorithms named S-shaped binary Sine Cosine Algorithm (SBSCA) and V-shaped binary Sine Cosine Algorithm (VBSCA) are proposed for feature selection from the medical data. In these algorithms, the search space remains continuous, while a binary position vector is generated by two transfer functions S-shaped and V-shaped for each solution. The proposed algorithms are compared with four latest binary optimization algorithms over five medical datasets from the UCI repository. The experimental results confirm that using both bSCA variants enhance the accuracy of classification on these medical datasets compared to four other algorithms.
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets automatically. The AMOC is an extension to standard k-means with a two phase iterative procedure combining certain validation techniques in order to find optimal clusters with automation of merging of clusters. Experiments on both synthetic and real data have proved that the proposed algorithm finds nearly optimal clustering structures in terms of number of clusters, compactness and separation.
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
Biclusters are required to analyzing gene expression patterns of genes comparing rows in expression profiles and analyzing expression profiles of samples by comparing columns in gene expression matrix. In the process of biclustering we need to cluster genes and samples. The algorithm presented in this paper is based upon the two-way clustering approach in which the genes and samples are clustered using parallel fuzzy C-means clustering using message passing interface, we call it MFCM. MFCM applied for clustering on genes and samples which maximize membership function values of the data set. It is a parallelized rework of a parallel fuzzy two-way clustering algorithm for microarray gene expression data [9], to study the efficiency and parallelization improvement of the algorithm. The algorithm uses gene entropy measure to filter the clustered data to find biclusters. The method is able to get highly correlated biclusters of the gene expression dataset.
Applying genetic algorithms to information retrieval using vector space modelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on
mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in
1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used
cosine similarity and jaccards to compute similarity between the query and documents, and used two
proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at
evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study
concluded that we might have several improvements when using adaptive genetic algorithms.
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...aciijournal
Present work considers the minimization of the bi-criteria function including weighted sum of makespan and total completion time for a Multiprocessor task scheduling problem.Genetic algorithm is the most
appealing choice for the different NP hard problems including multiprocessor task scheduling.
Performance of genetic algorithm depends on the quality of initial solution as good initial solution provides the better results. Different list scheduling heuristics based hybrid genetic algorithms (HGAs) have been
proposed and developedfor the problem. Computational analysis with the help of defined performance
index has been conducted on the standard task scheduling problems for evaluating the performance of the
proposed HGAs. The analysis shows that the ETF-GA is quite efficient and best among the other heuristic based hybrid genetic algorithms in terms of solution quality especially for large and complex problems.
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attributebased-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).
DGBSA : A BATCH JOB SCHEDULINGALGORITHM WITH GA WITH REGARD TO THE THRESHOLD ...IJCSEA Journal
In this paper , we will provide a scheduler on batch jobs with GA regard to the threshold detector. In The algorithm proposed in this paper, we will provide the batch independent jobs with a new technique ,so we can optimize the schedule them. To do this, we use a threshold detector then among the selected jobs, processing resources can process batch jobs with priority. Also hierarchy of tasks in each batch, will be determined with using DGBSA algorithm. Now , with the regard to the works done by previous ,we can provide an algorithm that by adding specific parameters to fitness function of the previous algorithms ,develop a optimum fitness function that in the proposed algorithm has been used. According to assessment done on DGBSA algorithm, in compare with the similar algorithms, it has more performance. The effective parameters that used in the proposed algorithm can reduce the total wasting time in compare with previous algorithms. Also this algorithm can improve the previous problems in batch processing with a new technique.
Grid computing can involve lot of computational tasks which requires trustworthy computational nodes. Load balancing in grid computing is a technique which overall optimizes the whole process of assigning computational tasks to processing nodes. Grid computing is a form of distributed computing but different from conventional distributed computing in a manner that it tends to be heterogeneous, more loosely coupled and dispersed geographically. Optimization of this process must contains the overall maximization of resources utilization with balance load on each processing unit and also by decreasing the overall time or output. Evolutionary algorithms like genetic algorithms have studied so far for the implementation of load balancing across the grid networks. But problem with these genetic algorithm is that they are quite slow in cases where large number of tasks needs to be processed. In this paper we give a novel approach of parallel genetic algorithms for enhancing the overall performance and optimization of managing the whole process of load balancing across the grid nodes.
Presenting an Algorithm for Tasks Scheduling in Grid Environment along with I...Editor IJCATR
Nowadays, human faces with huge data. With regard to expansion of computer technology and detectors, some terabytes are
produced. In order to response to this demand, grid computing is considered as one of the most important research fields. Grid technology
and concepts were used to provide resource subscription between scientific units. The purpose was using resources of grid environment
to solve complex problems.
In this paper, a new algorithm based on Mamdani fuzzy system has been proposed for tasks scheduling in computing grid. Mamdani
fuzzy algorithm is a new technique measuring criteria by using membership functions. In this paper, our considered criterion is response
time. The results of proposed algorithm implemented on grid systems indicate priority of the proposed method in terms of validation
criteria of scheduling algorithms like ending time of the task and etc. Also, efficiency increases considerably.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Parallel and distributed genetic algorithm with multiple objectives to impro...khalil IBRAHIM
we argue that the timetabling problem reflects the problem of scheduling university courses, So you must specify the range of time periods and a group of instructors for a range of lectures to check a set of constraints and reduce the cost of other constraints ,this is the problem called NP-hard, it is a class of problems that are informally, it’s mean that necessary operations to solve the problem will increase exponentially and directly proportional to the size of the problem, The construction of timetable is the most complicated problem that was facing many universities, and increased by size of the university data and overlapping disciplines between colleges, and when a traditional algorithm (EA) is unable to provide satisfactory results, a distributed EA (dEA), which deploys the population on distributed systems, it also offers an opportunity to solve extremely high dimensional problems through distributed coevolution using a divide-and-conquer mechanism, Further, the distributed environment allows a dEA to maintain population diversity, thereby avoiding local optima and also facilitating multi-objective search, by employing different distribution models to parallelize the processing of EAs, we designed a genetic algorithm suitable for Universities environment and the constraints facing it when building timetable for lectures.
Nature Inspired Models And The Semantic WebStefan Ceriu
In this paper we present a series of nature inspired models used as alternative solutions for Semantic Web concerns. Some of the methods presented in this article perform better than classic algorithms by enhancing response time and computational costs. Others are just proof of concept, first steps towards new techniques that will improve their respective field. The intricate nature of the Semantic Web urges the need for faster, more intelligent algorithms and nature inspired models have been proven to be more than suitable for such complex tasks.
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
ABSTRACT: Genetic Algorithms (GAs) have become increasingly useful over the years for solving combinatorial problems. Though they are generally accepted to be good performers among metaheuristic algorithms, most works have concentrated on the application of the GAs rather than the theoretical justifications. In this paper, we examine and justify the suitability of Genetic Algorithms in solving complex, multi-variable and multi-modal optimization problems. To achieve this, a simple Genetic Algorithm was used to solve four standard complicated optimization test functions, namely Rosenbrock, Schwefel, Rastrigin and Shubert functions. These functions are benchmarks to test the quality of an optimization procedure towards a global optimum. We show that the method has a quicker convergence to the global optima and that the optimal values for the Rosenbrock, Rastrigin, Schwefel and Shubert functions are zero (0), zero (0), -418.9829 and -14.5080 respectively
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATAacijjournal
A well-constructed classification model highly depends on input feature subsets from a dataset, which may contain redundant, irrelevant, or noisy features. This challenge can be worse while dealing with medical datasets. The main aim of feature selection as a pre-processing task is to eliminate these features and select the most effective ones. In the literature, metaheuristic algorithms show a successful performance to find optimal feature subsets. In this paper, two binary metaheuristic algorithms named S-shaped binary Sine Cosine Algorithm (SBSCA) and V-shaped binary Sine Cosine Algorithm (VBSCA) are proposed for feature selection from the medical data. In these algorithms, the search space remains continuous, while a binary position vector is generated by two transfer functions S-shaped and V-shaped for each solution. The proposed algorithms are compared with four latest binary optimization algorithms over five medical datasets from the UCI repository. The experimental results confirm that using both bSCA variants enhance the accuracy of classification on these medical datasets compared to four other algorithms.
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets automatically. The AMOC is an extension to standard k-means with a two phase iterative procedure combining certain validation techniques in order to find optimal clusters with automation of merging of clusters. Experiments on both synthetic and real data have proved that the proposed algorithm finds nearly optimal clustering structures in terms of number of clusters, compactness and separation.
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
Biclusters are required to analyzing gene expression patterns of genes comparing rows in expression profiles and analyzing expression profiles of samples by comparing columns in gene expression matrix. In the process of biclustering we need to cluster genes and samples. The algorithm presented in this paper is based upon the two-way clustering approach in which the genes and samples are clustered using parallel fuzzy C-means clustering using message passing interface, we call it MFCM. MFCM applied for clustering on genes and samples which maximize membership function values of the data set. It is a parallelized rework of a parallel fuzzy two-way clustering algorithm for microarray gene expression data [9], to study the efficiency and parallelization improvement of the algorithm. The algorithm uses gene entropy measure to filter the clustered data to find biclusters. The method is able to get highly correlated biclusters of the gene expression dataset.
Applying genetic algorithms to information retrieval using vector space modelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on
mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in
1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used
cosine similarity and jaccards to compute similarity between the query and documents, and used two
proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at
evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study
concluded that we might have several improvements when using adaptive genetic algorithms.
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...aciijournal
Present work considers the minimization of the bi-criteria function including weighted sum of makespan and total completion time for a Multiprocessor task scheduling problem.Genetic algorithm is the most
appealing choice for the different NP hard problems including multiprocessor task scheduling.
Performance of genetic algorithm depends on the quality of initial solution as good initial solution provides the better results. Different list scheduling heuristics based hybrid genetic algorithms (HGAs) have been
proposed and developedfor the problem. Computational analysis with the help of defined performance
index has been conducted on the standard task scheduling problems for evaluating the performance of the
proposed HGAs. The analysis shows that the ETF-GA is quite efficient and best among the other heuristic based hybrid genetic algorithms in terms of solution quality especially for large and complex problems.
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attributebased-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).
DGBSA : A BATCH JOB SCHEDULINGALGORITHM WITH GA WITH REGARD TO THE THRESHOLD ...IJCSEA Journal
In this paper , we will provide a scheduler on batch jobs with GA regard to the threshold detector. In The algorithm proposed in this paper, we will provide the batch independent jobs with a new technique ,so we can optimize the schedule them. To do this, we use a threshold detector then among the selected jobs, processing resources can process batch jobs with priority. Also hierarchy of tasks in each batch, will be determined with using DGBSA algorithm. Now , with the regard to the works done by previous ,we can provide an algorithm that by adding specific parameters to fitness function of the previous algorithms ,develop a optimum fitness function that in the proposed algorithm has been used. According to assessment done on DGBSA algorithm, in compare with the similar algorithms, it has more performance. The effective parameters that used in the proposed algorithm can reduce the total wasting time in compare with previous algorithms. Also this algorithm can improve the previous problems in batch processing with a new technique.
Grid computing can involve lot of computational tasks which requires trustworthy computational nodes. Load balancing in grid computing is a technique which overall optimizes the whole process of assigning computational tasks to processing nodes. Grid computing is a form of distributed computing but different from conventional distributed computing in a manner that it tends to be heterogeneous, more loosely coupled and dispersed geographically. Optimization of this process must contains the overall maximization of resources utilization with balance load on each processing unit and also by decreasing the overall time or output. Evolutionary algorithms like genetic algorithms have studied so far for the implementation of load balancing across the grid networks. But problem with these genetic algorithm is that they are quite slow in cases where large number of tasks needs to be processed. In this paper we give a novel approach of parallel genetic algorithms for enhancing the overall performance and optimization of managing the whole process of load balancing across the grid nodes.
Presenting an Algorithm for Tasks Scheduling in Grid Environment along with I...Editor IJCATR
Nowadays, human faces with huge data. With regard to expansion of computer technology and detectors, some terabytes are
produced. In order to response to this demand, grid computing is considered as one of the most important research fields. Grid technology
and concepts were used to provide resource subscription between scientific units. The purpose was using resources of grid environment
to solve complex problems.
In this paper, a new algorithm based on Mamdani fuzzy system has been proposed for tasks scheduling in computing grid. Mamdani
fuzzy algorithm is a new technique measuring criteria by using membership functions. In this paper, our considered criterion is response
time. The results of proposed algorithm implemented on grid systems indicate priority of the proposed method in terms of validation
criteria of scheduling algorithms like ending time of the task and etc. Also, efficiency increases considerably.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Parallel and distributed genetic algorithm with multiple objectives to impro...khalil IBRAHIM
we argue that the timetabling problem reflects the problem of scheduling university courses, So you must specify the range of time periods and a group of instructors for a range of lectures to check a set of constraints and reduce the cost of other constraints ,this is the problem called NP-hard, it is a class of problems that are informally, it’s mean that necessary operations to solve the problem will increase exponentially and directly proportional to the size of the problem, The construction of timetable is the most complicated problem that was facing many universities, and increased by size of the university data and overlapping disciplines between colleges, and when a traditional algorithm (EA) is unable to provide satisfactory results, a distributed EA (dEA), which deploys the population on distributed systems, it also offers an opportunity to solve extremely high dimensional problems through distributed coevolution using a divide-and-conquer mechanism, Further, the distributed environment allows a dEA to maintain population diversity, thereby avoiding local optima and also facilitating multi-objective search, by employing different distribution models to parallelize the processing of EAs, we designed a genetic algorithm suitable for Universities environment and the constraints facing it when building timetable for lectures.
Nature Inspired Models And The Semantic WebStefan Ceriu
In this paper we present a series of nature inspired models used as alternative solutions for Semantic Web concerns. Some of the methods presented in this article perform better than classic algorithms by enhancing response time and computational costs. Others are just proof of concept, first steps towards new techniques that will improve their respective field. The intricate nature of the Semantic Web urges the need for faster, more intelligent algorithms and nature inspired models have been proven to be more than suitable for such complex tasks.
Job Scheduling on the Grid Environment using Max-Min Firefly AlgorithmEditor IJCATR
Grid computing indeed is the next generation of distributed systems and its goals is creating a powerful virtual, great, and
autonomous computer that is created using countless Heterogeneous resource with the purpose of sharing resources. Scheduling is one
of the main steps to exploit the capabilities of emerging computing systems such as the grid. Scheduling of the jobs in computational
grids due to Heterogeneous resources is known as an NP-Complete problem. Grid resources belong to different management domains
and each applies different management policies. Since the nature of the grid is Heterogeneous and dynamic, techniques used in
traditional systems cannot be applied to grid scheduling, therefore new methods must be found. This paper proposes a new algorithm
which combines the firefly algorithm with the Max-Min algorithm for scheduling of jobs on the grid. The firefly algorithm is a new
technique based on the swarm behavior that is inspired by social behavior of fireflies in nature. Fireflies move in the search space of
problem to find the optimal or near-optimal solutions. Minimization of the makespan and flowtime of completing jobs simultaneously
are the goals of this paper. Experiments and simulation results show that the proposed method has a better efficiency than other
compared algorithms.
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
Data Streams are sequential set of data records. When data appears at highest speed and constantly, so predicting
the class accordingly to the time is very essential. Currently Ensemble modeling techniques are growing
speedily in Classification of Data Stream. Ensemble learning will be accepted since its benefit to manage huge
amount of data stream, means it will manage the data in a large size and also it will be able to manage concept
drifting. Prior learning, mostly focused on accuracy of ensemble model, prediction efficiency has not considered
much since existing ensemble model predicts in linear time, which is enough for small applications and
accessible models workings on integrating some of the classifier. Although real time application has huge
amount of data stream so we required base classifier to recognize dissimilar model and make a high grade
ensemble model. To fix these challenges we developed Ensemble tree which is height balanced tree indexing
structure of base classifier for quick prediction on data streams by ensemble modeling techniques. Ensemble
Tree manages ensembles as geodatabases and it utilizes R tree similar to structure to achieve sub linear time
complexity
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...IAEME Publication
Close range photogrammetry network design is referred to the process of placing a set of
cameras in order to achieve photogrammetric tasks. The main objective of this paper is tried to find
the best location of two/three camera stations. The genetic algorithm optimization and Particle
Swarm Optimization are developed to determine the optimal camera stations for computing the three
dimensional coordinates. In this research, a mathematical model representing the genetic algorithm
optimization and Particle Swarm Optimization for the close range photogrammetry network is
developed. This paper gives also the sequence of the field operations and computational steps for this
task. A test field is included to reinforce the theoretical aspects.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Survey on Efficient Techniques of Text Miningvivatechijri
In the current era, with the advancement of technology, more and more data is available in digital
form. Among which, most of the data (approx. 85%) is in unstructured textual form. So it has become essential to
develop better techniques and algorithms to extract useful and interesting information from this large amount of
textual data. Text mining is process of extracting useful data from unstructured text. The algorithm used for text
mining has advantages and disadvantages. Moreover the issues in the field of text mining that affect the accuracy
and relevance of the results are identified.
A time efficient and accurate retrieval of range aggregate queries using fuzz...IJECEIAES
Massive growth in the big data makes difficult to analyse and retrieve the useful information from the set of available data’s. Existing approaches cannot guarantee an efficient retrieval of data from the database. In the existing work stratified sampling is used to partition the tables in terms of stratic variables. However k means clustering algorithm cannot guarantees an efficient retrieval where the choosing centroid in the large volume of data would be difficult. And less knowledge about the stratic variable might leads to the less efficient partitioning of tables. This problem is overcome in the proposed methodology by introducing the FCM clustering instead of k means clustering which can cluster the large volume of data which are similar in nature. Stratification problem is overcome by introducing the post stratification approach which will leads to efficient selection of stratic variable. This methodology leads to an efficient retrieval process in terms of user query within less time and more accuracy.
Similar to Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Datasets (20)
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
Parallel Evolutionary Algorithms for Feature Selection in High Dimensional Datasets
1. Parallel Evolutionary Algorithms for Feature
Selection in High Dimensional Datasets
Safa Adi #1
, Mohammed Aldasht ∗2
#
Computer and IT Department, Palestine Polytechnic University
Palestine
1
safa_adi@ppu.edu
∗
Computer Engineering Department, Palestine Polytechnic University
Palestine
2
mohammed@ppu.edu
Abstract—Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is time-
consuming and less accurate comparing with other PEA.
Index Terms: Evolutionary algorithms, parallel com-
puting, classification, feature selection, high dimensional
dataset.
I. INTRODUCTION
Nowadays many disciplines have to deal with high
dimensional datasets which involve a huge number of
features. So we need data preprocessing methods and data
reduction models in order to simplify input data.
There are two main types of data reduction models [1]. The
first is: instance selection and instance generation processes
are focused on the instance level. (i.e. select a representative
portion of data that can fulfill a data mining task as if the
whole data is used) [14]. The second is: feature selection
and feature extraction models which work at the level of
characteristics. These models attempt to reduce a dataset by
removing noisy, irrelevant, or redundant features. Feature
selection is a necessary preprocessing step in analyzing big
datasets. It often leads to smaller data that will make the
classifier training better and faster [3].
Feature selection is a problem with big datasets. In order to
make classification faster and more accurate, we need to select
the subset of features that are discriminative. Evolutionary
algorithms like Genetic algorithms, Swarm intelligence
optimization, Ant colony optimization, etc. These methods
can be effective for this problem, but they require a huge
amount of computation (long execution time), also memory
consumption. In order to overcome these weaknesses, parallel
computing can be used.
In this survey, we will review a set of papers about parallel
evolutionary algorithms that used for feature selection in large
datasets. Furthermore, we will compare the performance of
different algorithms and environment.
The rest of the paper is organized as follow: Section2
is Background about feature selection approaches and
parallel architecture in general. Section3 talk about parallel
evolutionary algorithms. Section 4 will discuss and review
many papers which talk about the feature selection problem
by using parallel computing. Section5 contains the summary
of the survey, the last section is the conclusion and future
work.
II. BACKGROUND
In general, there are three classes of feature selection:
filter-based, wrapper, and embedded. The filter approach
analyzes the features statistically and ignores the classifier
[18]. Most of filter-based methods perform two operations,
ranking and subset selection. In some cases, these two
operations are performed sequentially, first the ranking, then
the selection, in other cases only the selection is carried
out. These methods are effective in terms of execution time.
However, filter methods sometimes select redundant variables;
since they don’t consider the relationships between variables.
Therefore, they are mainly used as a pre-processing method.
In the wrapper model [15], the process of feature selection is
depending on the performance of a specific classifier. But its
disadvantages are time-consuming and over fitting. The last
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
181 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
2. method for feature selection is the embedded. In this method,
the feature selection process and the learning algorithm
(tuning the parameters) are combined to each other[6, 15].
The selection of optimal feature subset is an optimization
problem that proved to be NP-hard, complex, and time-
consuming problem [13]. Two major approaches are
traditionally used to tackle NP-hard problems, as seen in
Figure1: exact methods and metaheuristics. Exact methods
allow exact solution to be found, but this approach is
impractical since it is extremely time consuming for real
world problems. On the other hand, metaheuristics are used
for solving complex and real world problems. Because
metaheuristics provide suboptimal (sometimes optimal)
solution in reasonable time [2, 11, 13].
As seen in Figure1, Metaheuristics are divided into two
categories [13]:
• Trajectory-based (exploitation-oriented methods): the
well-known metaheuristics families based on the manip-
ulation of a single solution. Include Simulated Annealing
(SA), Tabu Search (TS), Iterated Local Search (ILS),
Variable Local Search (VNS), and Greedy Randomized
Adaptive Search Procedures (GRASP).
• Population-based (exploration-oriented methods): the
well-known metaheuristics families based on the
manipulation of a population of solutions. Include PSO,
ACO, SS, Evolutionary Algorithms (EAs), Differential
Evolution (DE), Evolutionary Strategies (ES), and
Estimation Distribution Algorithms (EDA).
Fig. 1. Approaches for handling NP-hard problems
Metaheuristics algorithms have proved to be suitable tools
for solving the feature selection accurately and efficiently
for large dimensions in big datasets [2]. The main problems
when dealing with big datasets are: The first is execution
time because the complexity of the metaheuristics methods
for feature selection is at least O(n2
D), where n is the
number of instances and D is the number of features. The
second is memory consumption since most methods for
feature selection need to store the whole dataset in memory.
Therefore, the researchers try to parallelize the sequential
metaheuristics to improve their efficiency for feature selection
on large datasets. There are many programming models
and paradigms, such as MapReduce (Hadoop, spark), MPI,
OpenMP, CUDA [1, 6, 13]. Parallel computing can be process
interaction (shared memory, message passing) or problem
decomposition (task or data parallelization) [6].
Parallel computing is a good solution for these problems
since many calculations are carried out simultaneously in
the task and/or data [6]. Population-based metaheuristics are
naturally prone to parallelize since most of their variation
operators can be easily undertaken in parallel [2, 13].
Parallel implementations of metaheuristics are an effective
alternative to speed up sequential metaheuristics; by reducing
the search time for solutions of optimization problems.
Furthermore, they lead to the more precise random algorithm
and improve the quality of solutions [11]. As seen in Figure2,
the implementation of parallel metaheuristics is divided into
two categories [13].
Fig. 2. Parallel implementation of metaheuristics
Parallel evolutionary algorithms are used in many works
rather than feature selection, such as inferring phylogenies,
traffic prediction. In [9] Santander et al., used MPI/OpenMP
with a hybrid multiobjective evolutionary algorithm (fast non-
dominated sorting genetic algorithms and firefly algorithm);
for phylogenetic reconstruction (Inferring evolutionary trees).
In [10] Jiri at al., used parallel multiobjective GA with
OpenMP. In order to make traffic prediction more accurate.
Master-Slave scheme of GA was implemented on multi-core
parallel architecture. They reduced the computational time,
but it was successful for short-term traffic prediction.
III. OVERVIEW OF PARALLEL EVOLUTIONARY
ALGORITHMS FOR FEATURE SELECTION
Feature selection algorithms are used to find an optimal
subset of relevant features in the data. In this section we will
talk about parallel evolutionary algorithms that are used for
feature selection problem in large datasets. We will illustrate
the steps of six algorithms (PGA, PCHC, PPSO, PGPSO,
PSS, and PACO).
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
182 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
3. A. Parallel Genetic algorithm (PGA)
In order to increase the efficiency and reduce the execution
time of the genetic algorithm (GA); the researchers used par-
allel GA. Algorithm 1 presents the parallel GA methodology,
with the master-slave model of parallel GA.
Algorithm 1 Parallel genetic algorithm [10]
Create initial population
Evaluate initial population
Create slaves
while not done do
Start slave
Wait for slave to finish
Run mutation operator
end while
for i=1 to slave iterations do
Select individuals
Run crossover operator
Evaluate offsprings
if solution found then
set done=True
end if
end for
B. Parallel CHC algorithm (PCHC)
A CHC is a non-traditional GA, which combines a con-
servative selection strategy (that always preserves the best
individuals found so far), that produces offsprings that are at
the maximum hamming distance from their parent. The main
processes of CHC algorithm are [1]:
• Half-Uniform Crossover (HUX): This will produce two
offsprings, which are maximally different from their two
parents.
• Elitist selection: this will keep the best solutions in each
generation.
• Incest prevention: this step prevents two individuals to
mate if the similarity between them greater than a thresh-
old.
• The Restarting process: if the specified population
stagnated, then this step generated a new population by
choosing the best individuals.
C. Particle Swarm Optimization (PSO)
This subsection handles the geometric particle swarm
optimization (GPSO) and shows the algorithm that used to
parallelize PSO or GPSO.
1) Geometric Particle Swarm Optimization (GPSO):
GPSO is a recent version of PSO. The key issue in GPSO
is the using a multi-parental recombination of solutions
(particles). In the first phase, a random initialization of
particles created. Then the algorithm evaluates these particles
to update the historical and social positions. Finally, the
three parents (3PMBCX) move the particles, as shown in
Algorithm 2:
Algorithm 2 GPSO algorithm [2]
S:SwarmInitialization()
while not stop condition do do
for each particle i of the swarm S do do
evaluate(solution(xi))
update(velocity equation (hi))
update(global best solution (gi))
end for
for each particle i of the swarm S do do
xi:3PMBCX ((xi, wa), (gi, wb), (hi, wc))
mutate(xi)
end for
end while
Output: best solution found
2) Parallel Multi Swarm Optimization (PMSO): Parallel
multi swarm optimization presented in [2], it was defined in
analogy with parallel GA as a pair of (S, M), where S is a
collection swarm, and M is a migration policy. Algorithm 3
depicts the parallel PSO methodology.
Algorithm 3 Multi swarm optimization [2]
DO IN PARALLEL for each i 1,...,m
initialize(Si)
while not stop condition do do
iterate Si for n steps /* PSO evolution */
for each Sj (Si) do do
send particles in s(Si) to Sj
end for
for each Sj such that Si (Sj ) do do
receive particles from Sj
replace particles in Si according to r
end for
end while
Output: best solution ever found in the multi-swarm
D. Parallel Scatter Search (PSS)
Scatter search is an evolutionary method that was success-
fully applied to hard optimization problems. It uses strategies
for search diversification and intensification that have proved
effective in a variety of optimization problems, see Algorithm
4.
E. Parallel Ant Colony Optimization (PACO)
When dealing with huge search space, parallel computing
techniques usually applied to improve the efficiency. Parallel
ACO algorithms can achieve high-quality solutions in
reasonable execution times comparing with sequential ACO
[18]. In Algorithm 5, the methodology of PACO is presented.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
183 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
4. Algorithm 4 Parallel scatter search methodology [11]
Create Population (Pop, PopSize)
Generate ReferenceSet (RefSet, RefSetSize)
while Stopping Criterion1 do
while Stopping Criterion2 do
Select Subset (Subset, SubsetSize)
for each processor r=1 to n do in parallel do
Combine Solutions (SubSet, CurSol)
Improve Solution (CurSol, ImpSol)
end for
end while
Update ReferenceSet (RefSet)
end while
Algorithm 5 Parallel ant colony optimization methodology
[18]
Generate Ants
Initialize N processors
Multicast to all slaves processors N and the task ids of all
slaves
for each slave do do
Send a number between 0 and N that identifies the task
inside the program
end for
while not all slaves have sent back solution do
Wait for solution
if a slave returns a solution that is better than any solution
received then
Multicast this solution to all slaves
end if
end while
Return the best solution
IV. PARALLEL EVOLUTIONARY ALGORITHMS FOR
FEATURE SELECTION
We reviewed a set of research papers, which were dealing
with feature selection problem for high dimensional datasets
in a parallel environment and using parallel evolutionary
algorithms. Let us discuss these studies in the following
subsections.
A. Parallel GA
Liu et al., [5] used parallel GA for selecting informative
genes (features) in tissue classification, using wrapper
approach. The main purpose was to find the subset of features
with fewer elements and higher accuracy. The parallelization
of GA performed by dividing the population into sub-
populations, and then the GA run on each sub-population.
Therefore, the searching for the optimal subset of genes can
be on several CPUs/computers at the same time.
For evaluation, the Golub classifier was used. This classifier
introduced by the authors and it depend on the sign of the
results for classification; if the sign is positive the sample x
belongs to class 1, else if it negative the sample x belongs
to class 2. This classifier used only if the datasets have two
classes. The accuracy of the classifier tested by using the
LOOCV (leave one out cross validation) method. The results
showed that using the parallel GA increased the accuracy,
and reduced the number of genes that used for classification.
In [8] Zheng et al., analyzed the execution speed and
solution quality of many parallel GA schemes theoretically.
Furthermore, they pointed to the best scheme of parallel GA
that used on multi-core architecture. This paper considered
the relationship between speed and parallel architecture along
with solution quality.
They analyzed (Master-Slave, Synchronous Island,
Asynchronous Island, Cellular, and hybrid scheme of Master-
Slave and Island) schemes of parallel GA, with Pthread
library on multi-core parallel architecture.
To validate their theoretical analyzing an experiments
performed. The hybrid scheme of (Master-Slave and
Asynchronous Island) was the best scheme in performance
using multi-core architecture. The Island scheme has the best
execution time, but the worst solution quality. To improve
the solution quality when using Island model it is better to
decrease the number of islands. The Asynchronous Island is
faster than the Synchronous. The Master-Slave scheme has
the best solution quality and the worst execution time.
Soufan at el., [15] developed a web-based tool called
DWFS, which used for feature selection for different
problems. This tool followed a hybrid approach of wrapper
and filter. First, the filter used as preprocessing and select
the top ranked features based on tunable and a predefined
threshold. In the next step, parallel GA based on wrapper
approach applied to the selected features to search for subset
features that increase the classifier accuracy. The scheme of
parallel GA was Master-Slave; the master node used to create
initial population and GA steps. While the slave (worker)
nodes used for fitness evaluation of each chromosome, this
implementation is performed on 64 core.
For evaluation, they used three different classifiers
(Bayesian classifier, K-nearest neighbor, and a combination
of them). The results of the experiments show that DWFS
tool provided many options to enhance the feature selection
problem in different biological and biomedical problems.
In [7] Pinho et al., presented a framework called ParJEColi
(java-based library) for a parallel evolutionary algorithm in
bioinformatics applications. The aim of this platform was
to make the parallel environment (multi-core, cluster, and
grid) easy and transparent to the users. This library adapted
itself to the problem and the target parallel architecture. The
user can easily configure the parallel model and the target
architecture; since, ParJEColi encapsulated the parallelization
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
184 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
5. concerns as features. The explicit steps implemented by a
simple GUI.
The experiments for validation this framework were done
on 2 biological dataset and many bioinformatics scenarios.
The results indicate that the proposed framework improves
the computational performance (decreases execution time)
also the solution quality.
B. Parallel CHC
In [1] Peralta et al., presented a parallel evolutionary
algorithm called CHC algorithm by using the MapReduce
paradigm for selecting features in high dimensional datasets
to improve the classification. The parallelization of CHC
algorithm is done by using MapReduce procedure (Hadoop
implementation).
A cluster of computers of 20 computing nodes were used.
Each dataset split into 512-map task. For evaluating their
work, three classifiers where used SVM (support vector
machine), logistic regression, and Bayesian classifier.
The results showed that the run time for classification
increased as the number of features decreased, except for
Bayesian classifier. They explained this result as follow: if
the number of blocks less than the number of computing
machines; this leads to have some machines remain idle. In
addition, if the number of blocks greater than the number of
computing machines, the blocks maybe will not distributed
in efficient way.
They compared parallel CHC with the serial version, and
they concluded that the accuracy of classification increased by
using parallel CHC. Furthermore, the parallel version of CHC
reduced the run time when the datasets is high dimensional.
C. Parallel PSO
PSO is an efficient optimization technique, it used to
solve the problem of feature selection in high dimensional
datasets. In [4] Chen et al., used the parallel PSO algorithm
for solving two problems at the same time. By creating an
objective function that takes into account three variables
at the same time (the selected features, the number of
support vectors, and average accuracy of SVM). In order
to maximize the capability of SVM classifier in generalization.
The proposed method called PTVPSO-SVM (parallel time
variant particle swarm optimization support vector machine),
it had two phase: 1) the parameter settings of SVM and
feature selection work together. 2) the accuracy of SVM
evaluated using the set of features and the optimal parameters
from the first phase.
They used parallel virtual machine (PVM) with 8 machines;
and 10-fold cross validation. The results showed that they
could achieve the following aims: increasing the accuracy
classification, reducing the execution time comparing
with sequential PSO, producing an appropriate model of
parameters, and selecting the most discriminative subset of
features.
Feature selection can be carried out based on rough set
theory with searching algorithm as in [3, 6]. In [6] Qian
et al., proposed three parallel attribute reduction (feature
selection) algorithms based on MapReduce on Hadoop. The
first algorithm was built by constructing the proper (key,
value) by rough set theory and implementing MapReduce
functions. The second algorithms were done by realizing
the parallel computation of equivalence classes and attribute
significances. The last parallel algorithm was designed to
acquire the core attributes and a reduce in both data and
parallel task.
The experiments are performed on a cluster of computers
(17 computing node). They considered the performance
of the parallel algorithms, but they did not focus on the
classification accuracy; since the sequential and parallel
algorithms gave the same results. The results showed that the
proposed parallel attribute reduction algorithms could deal
with high dimensional datasets in an efficient way and better
than the sequential algorithms.
In [3] Adamczyk, use rough set theory for attribute
reduction, to increase the efficiency he implemented parallel
Asynchronous PSO for this problem. The parallelization was
done by assigning the complex function computations in
slave cores and the main core make the updating particle and
checking the convergence of the algorithm.
From their experiments it was noticeable that the efficiency
and speedup of parallel PSO algorithm were raising as the
size of dataset increased. The achievable accuracy was not
astonishing, but it was better than the classical algorithms.
D. Parallel GPSO
In [2] Garcia-Nieto et al., parallelized a version of PSO
called GPSO which is suitable for feature selection problem
in high dimensional datasets. The proposed method was called
PMOS (Parallel multi-swarm optimizer). Which was done by
running a set of parallel sub PSOs algorithms, which forming
an island model. Migration operation exchanged solutions
between islands based on a certain frequency The aim of
the fitness function increasing the classification accuracy and
reduce the number of selected genes (features).
They used the SVM classifier (Support Vector Machine)
to prove the accuracy of the selected subset of features. In
their experiments, they used a cluster of computers as a
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
185 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
6. parallel architecture. They found that 8-swarm PMSO was
the best choice for parallelization. The results pointed out
that this algorithm was better than the sequential version and
other methods in term of performance and accuracy while it
selected few genes for each subset.
E. Parallel SS
In [11] Lopez et al., present a parallel SS metaheuristics
for solving feature selection problem in classification. They
proposed two methods for combining solutions in SS.
The first method is called GC (greedy combination): in this
strategy, the common features of the combined solutions are
added, then at each iteration one of the remaining features is
added to any new solution.
The second strategy is called RGC (reduced greedy
combination), it has the same start as GC, but in the next
step, it considers only the features that appear in solutions
with good quality. Then the parallelization of SS is obtained
by running these two methods (GC, RGC) at the same time
on two processors. Using different combination methods and
parameters settings at each processor.
They compared the proposed parallel SS with sequential
SS and GA. The results show that the quality of solution in
parallel SS is better than solutions which was obtained from
the sequential SS and GA. Also, the parallel SS use a smaller
set of features for classification. The run time is the same for
parallel and sequential SS.
F. Parallel ACO
This subsection shows how the parallel ACO is used to
solve feature selection problem for classification in high
dimensional datasets.
In [17] Meena et al., implemented a parallel ACO to
solve the feature selection problem for long documents. The
parallelization was done using MapReduce programming
model (Hadoop) that automatically parallelize the code and
data then run them on a cluster of computing nodes. The
wrapper approachis used as evaluation criteria that used
Bayesian classifier. Furthermore, the accuracy of the classifier
was based on these metrics: precision, recall, accuracy and
F-measure.
The enhanced algorithm (parallel ACO) was compared with
ACO, enhanced ACO, and two feature selection methods,
CHI (Statistical technique) and IG (Information Gain). They
used Bayesian classifier in evaluation process. The results
showed that for a given fixed quality of the solutions the
proposed algorithm could reduce the execution time but
without considered the solution quality. On the other hand,
the accuracy of the classifier was increased using parallel
TABLE I
SUMMARY OF ALGORITHMS AND PROGRAMMING MODELS
Paper
Used
evolutionary
algorithm
Parallel
Programming
model
Peralta et al. [1]
CHC
(Type of GA)
MapReduce
Garcia-Nieto et al. [2] GPSO MALLBA
Adamczyk [3] PSO Unknown
Chen et al. [4] PSO PVM
Liu et al. [5] GA Unknown
Lopez et al. [11] SS Unknown
Soufan et al. [15] GA MPI
Meena et al. [17] ACO MapReduce
ACO comparing with sequential ACO and feature selection
methods.
In [12] Cano et al., parallelized an existing multi-objective
ant programming model that used as the classifier. This
algorithm was used for rule mining in high dimensional
datasets. The parallelization was done on data and each ant
encoded a rule. This was achieved by let each processor
perform the same task on a different subset of the data at the
same time. In the implementation, they used GPUs, which
are multi-core and parallel processor units architecture. This
parallel model Followed CUDA method.
For evaluation they used these metrics: true positive,
false positive, true negative, false negative, sensitivity, and
specificity. The results indicate that the efficiency of this
model was increased as the size of datasets increased.
V. SUMMARY AND DISCUSSION
The summary of the papers that implemented the parallel
EA for solving the classification problem in high dimensional
datasets is reported in Table 1 and Table 2.
Many research papers [2, 3, 7, 8, 9, 10, 12], stated that
we can reduce the execution time and achieve acceptable
speed ups, when applying parallel evolutionary algorithms
on multiple processors. We noticed that they achieved a
reasonable speed up in many cases.
In the next table (Table 2), when comparing the accuracy
of parallel EA it is important to notice how many classifiers
were used to measure the accuracy. Furthermore, we should
consider the metrics that were used to evaluate the classifier.
For example, the parallel PSO and its variants have the
higher accuracy; but they used only one metric which is the
success rate. This means that the parallel PSO is not the most
accurate parallel EA based on Table 2.
On the other hand, the parallel GA and its variant has the
least accuracy, but they used from two to five metrics for
evaluation purpose. Based on these metrics, we can say that
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
186 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
7. TABLE II
SUMMARY OF DATASETS, CLASSIFIERS, AND ACCURACY RESULTS
Paper dataset Classifiers
Metrics for
classification
Accuracy
Peralta
et al. [1]
Epsilon Bayesian
AUC=
(TPR+TNR)/2
0.71
SVM 0.68
Logistic
Regression
0.70
ECBDL
14-ROS
Bayesian 0.67
SVM 0.63
Logistic
Regression
0.63
Garcia-
Nieto
et al. [2]
Colon SVM
Success
Rate
0.85
Lymp 0.97
Leuk 0.98
Lung 0.97
Adamczyk
[3]
15 Data
Set
—
Success
rate
0.70
(Avg)
Chen
et al. [4]
30 Data
Set
SVM
Success
rate
0.87
(Avg)
Liu
et al. [5]
Leukemia Golub
Success
rate
0.88
Colon N/A
Lopez
et al. [11]
12 Data
Set
Nearest
Neighbor
Success
rate
0.86
(Avg)
Bayesian
0.87
(Avg)
Decision
Tree
0.86
(Avg)
Soufan
et al. [15]
9 Data
Set
K- Nearest
Neighbor
F1, PPV,
GMean,...
0.81(Avg)
(GMean)
Bayesian
0.79(Avg)
(GMean)
Meena
et al. [17]
2 Data
Sets
Bayesian
F-measure,
recall,....
0.64
(Avg)
the parallel GA is the best parallel EA for feature selection
in high dimensional datasets
VI. CONCLUSION
After the review of different parallel EA that are used
to solve the feature selection problem in high dimensional
datasets. We adopted the accuracy as a measure to compare
the algorithms performance.
The following points show our conclusion about the perfor-
mance of the mentioned algorithms in this chapter for feature
selection:
• GA and its variants: based on the papers we reviewed,
the parallel GA has the higher accuracy.
• PSO and its variants: the parallel PSO has the same
accuracy as sequential PSO.
• SS: parallel SS gives better results in case of accuracy
than GA and sequential SS.
• ACO: parallel ACO has the less accurate results than the
other parallel EA.
It is noticeable that PGAs are the most suitable algorithms
for feature selection in large datasets; since they achieved
the highest accuracy. On the other hand, the PACO is
time-consuming and less accurate comparing with other PEA.
References
[1] D. Peralta, S. RA.O, S. Rama.rez-Gallego, I.Triguero,
J. Benitez, F. Herrera. ”Evolutionary Feature Selection
for Big Data Classification: A MapReduce Approach”.
Hindawi Publishing Corporation, Mathematical Problems in
Engineering, Volume 11, Article ID 246139, (2015).
[2] J. Garca-Nieto, E. Alba. ”Parallel multi-swarm optimizer
for gene selection in DNA microarrays”. DOI 10.1007/s10489-
011-0325-9, Appl Intell (2012) 37:255266.
[3] M. Adamczyk. ”Parallel Feature Selection Algorithm
based on Rough Sets and Particle Swarm Optimization”.
DOI:.10.15439/2014F389, ACSIS, VOL.2. IEEE, (2014).
[4] H. Ling Chen, B. Yang, S. Wang, G. Wang, D. Liu,
H. Zhong Li, W. Liu. ”Towards an optimal support vector
machine classifier using a parallel particle swarm optimization
startegy”. Applied Mathematics and Computation 239, 180-
197. Elsevier, (2014).
[5] J. Liu, H. Iba. ”Selecting Informative Genes with Parallel
Genetic Algorithms in Tissue Classification”. Genome
Informatics 12: 14-23, (2001).
[6] J. Qian, D. Miao, Z. Zhang, X. Yue. ”Parallel attribute
reduction algorithms using MapReduce”. Elsevier, Information
Sciences (2014).
[7] J. Pinho, L. Sobral, M. Rocha. ”Parallel evolutionary
computation in bioinformatics applications”. Elsevier.
Computer methods and programs in biomedicine 110,
183191, (2013).
[8] L. Zheng, Y. Lu, M. Ding, Y.Shen, M. Guo, S. Guo.
”Architecture-based performance evaluation of genetic
algorithms on multi/many core systems”. The 14th IEEE
International Conference on Computational Science and
Engineering. 978-0-7695-4477-9/11, (2011) IEEE DOI
10.1109/CSE.2011.65
[9] S. Santander-Jimenez, M. Vega-Rodrguez. ”Parallel
Multiobjective Metaheuristics for Inferring Phylogenies on
Multicore Clusters”. DOI 10.1109/TPDS.2014.2325828, IEEE
(2014).
[10] J. Petrlik, L. Sekanina. ”Towards Robust and Accurate
Traffic Prediction Using Parallel Multiobjective Genetic
Algorithms and Support Vector Regression”. (2015) IEEE
18th International Conference on Intelligent Transportation
Systems.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
187 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
8. [11] F. Lopez, M. Torres, B. Batista, J. Perez, J. Vega.
”Solving feature selection problem by a parallel Scatter
Search”. Elsevier, European Journal of Operational Research
169, 477-489, (2006).
[12] A. Cano, J. Olmo, S. Ventura. ”Parallel Multi-Objective
Ant Programming for Classification Using GPUs”. Journal of
Parallel and Distributed Computing, November (2012).
[13] E. Albaa, G. Luquea, S. Nesmachnowb. ”Parallel
metaheuristics: recent advances and new trends” Intl.
Trans. in Op. Res. 20 (2013) 148, DOI: 10.1111/j.1475-
3995.2012.00862.x
[14] H. Liu, M. Hiroshi. ”Instance Selection and Construction
for Data mining”. Springer ,Eds (2001).
[15] O. Soufan, D. Kleftogiannis, P. Kalnis, V. Bajic. ”DWFS:
A wrapper Feature Selection Tool Based on a Parallel Genetic
Algorithm”. PLOS ONE, DOI: 10.1371/journal.pone.0117988,
February (2015).
[16] A. Gottlieb, G.S. Almasi. ”Highly Parallel Computing”.
Benjamin-Cummings Publishing Co., Inc. Red word city, CA,
USA (1989).
[17] M. Meena, K.R. Chandran, A. Kathik, A. Samuel. ”A
parallel ACO algorithm to select terms to categorise longer
documents”. Int. J. Computational Science and Engineering,
Vol 6 , No.4, (2001).
[18] A. Sameh, A. Ayman, N. Hasan. ”Parallel Ant Colony
Optimization”. International Journal of research and reviews
in Computer Science (IJRRCS), Vol.1, No. 2, June (2010).
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 3, March 2018
188 https://sites.google.com/site/ijcsis/
ISSN 1947-5500