The present survey provides the state-of-the-art of research, copiously devoted to Evolutionary Approach
(EAs) for clustering exemplified with a diversity of evolutionary computations. The Survey provides a
nomenclature that highlights some aspects that are very important in the context of evolutionary data
clustering. The paper missions the clustering trade-offs branched out with wide-ranging Multi Objective
Evolutionary Approaches (MOEAs) methods. Finally, this study addresses the potential challenges of
MOEA design and data clustering, along with conclusions and recommendations for novice and
researchers by positioning most promising paths of future research.
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
Decision-making typically needs the mechanisms to compromise among opposing norms. Once multiple objectives square measure is concerned of machine learning, a vital step is to check the weights of individual objectives to the system-level performance. Determinant, the weights of multi-objectives is associate in analysis method, associated it's been typically treated as a drawback. However, our preliminary investigation has shown that existing methodologies in managing the weights of multi-objectives have some obvious limitations like the determination of weights is treated as one drawback, a result supporting such associate improvement is limited, if associated it will even be unreliable, once knowledge concerning multiple objectives is incomplete like an integrity caused by poor data. The constraints of weights are also mentioned. Variable weights square measure is natural in decision-making processes. Here, we'd like to develop a scientific methodology in determinant variable weights of multi-objectives. The roles of weights in a creative multi-objective decision-making or machine-learning of square measure analyzed, and therefore the weights square measure determined with the help of a standard neural network.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
The Optimization of choosing Investment in the capital markets using artifici...inventionjournals
Optimization is one of crucial items in behavioural sciences. These daystheuse of Meta heuristic has grown considerably in all fields. In this study, we will look for optimization of selection in a portfolio of investment opportunities. We’ve been looking for a selection logic using a meta-heuristic algorithm Called artificial neural networks. The results showed that using artificial neural network algorithm had an optimization in decision-making and selection of investment opportunities. The research is applied one considering the purpose and is looking for developing knowledge in a particular field.
Accounting for variance in machine learning benchmarksDevansh16
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
Data Warehouses are structures with large amount of data collected from heterogeneous sources to be
used in a decision support system. Data Warehouses analysis identifies hidden patterns initially unexpected
which analysis requires great memory and computation cost. Data reduction methods were proposed to
make this analysis easier. In this paper, we present a hybrid approach based on Genetic Algorithms (GA)
as Evolutionary Algorithms and the Multiple Correspondence Analysis (MCA) as Analysis Factor Methods
to conduct this reduction. Our approach identifies reduced subset of dimensions p’ from the initial subset p
where p'<p where it is proposed to find the profile fact that is the closest to reference. Gas identify the
possible subsets and the Khi² formula of the ACM evaluates the quality of each subset. The study is based
on a distance measurement between the reference and n facts profile extracted from the warehouse.
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
Decision-making typically needs the mechanisms to compromise among opposing norms. Once multiple objectives square measure is concerned of machine learning, a vital step is to check the weights of individual objectives to the system-level performance. Determinant, the weights of multi-objectives is associate in analysis method, associated it's been typically treated as a drawback. However, our preliminary investigation has shown that existing methodologies in managing the weights of multi-objectives have some obvious limitations like the determination of weights is treated as one drawback, a result supporting such associate improvement is limited, if associated it will even be unreliable, once knowledge concerning multiple objectives is incomplete like an integrity caused by poor data. The constraints of weights are also mentioned. Variable weights square measure is natural in decision-making processes. Here, we'd like to develop a scientific methodology in determinant variable weights of multi-objectives. The roles of weights in a creative multi-objective decision-making or machine-learning of square measure analyzed, and therefore the weights square measure determined with the help of a standard neural network.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
The Optimization of choosing Investment in the capital markets using artifici...inventionjournals
Optimization is one of crucial items in behavioural sciences. These daystheuse of Meta heuristic has grown considerably in all fields. In this study, we will look for optimization of selection in a portfolio of investment opportunities. We’ve been looking for a selection logic using a meta-heuristic algorithm Called artificial neural networks. The results showed that using artificial neural network algorithm had an optimization in decision-making and selection of investment opportunities. The research is applied one considering the purpose and is looking for developing knowledge in a particular field.
Accounting for variance in machine learning benchmarksDevansh16
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
Data Warehouses are structures with large amount of data collected from heterogeneous sources to be
used in a decision support system. Data Warehouses analysis identifies hidden patterns initially unexpected
which analysis requires great memory and computation cost. Data reduction methods were proposed to
make this analysis easier. In this paper, we present a hybrid approach based on Genetic Algorithms (GA)
as Evolutionary Algorithms and the Multiple Correspondence Analysis (MCA) as Analysis Factor Methods
to conduct this reduction. Our approach identifies reduced subset of dimensions p’ from the initial subset p
where p'<p where it is proposed to find the profile fact that is the closest to reference. Gas identify the
possible subsets and the Khi² formula of the ACM evaluates the quality of each subset. The study is based
on a distance measurement between the reference and n facts profile extracted from the warehouse.
When deep learners change their mind learning dynamics for active learningDevansh16
Abstract:
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
Multi-label spatial classification based on association rules with multi objective genetic
algorithms (MOGA) enriched by semi supervised learning is proposed in this paper. It is to deal
with multiple class labels problem. In this paper we adapt problem transformation for the multi
label classification. We use hybrid evolutionary algorithm for the optimization in the generation
of spatial association rules, which addresses single label. MOGA is used to combine the single
labels into multi labels with the conflicting objectives predictive accuracy and
comprehensibility. Semi supervised learning is done through the process of rule cover
clustering. Finally associative classifier is built with a sorting mechanism. The algorithm is
simulated and the results are compared with MOGA based associative classifier, which out
performs the existing
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELScscpconf
Uncertainty is a pervasive in real world environment due to vagueness, is associated with the
difficulty of making sharp distinctions and ambiguity, is associated with situations in which the
choices among several precise alternatives cannot be perfectly resolved. Analysis of large
collections of uncertain data is a primary task in the real world applications, because data is
incomplete, inaccurate and inefficient. Representation of uncertain data in various forms such
as Data Stream models, Linkage models, Graphical models and so on, which is the most simple,
natural way to process and produce the optimized results through Query processing. In this
paper, we propose the Uncertain Data model can be represented as Possibilistic data model
and vice versa for the process of uncertain data using various data models such as possibilistic
linkage model, Data streams, Possibilistic Graphs. This paper presents representation and
process of Possiblistic Linkage model through Possible Worlds with the use of product-based
operator.
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...ijaia
Objects or structures that are regular take uniform dimensions. Based on the concepts of regular models,
our previous research work has developed a system of a regular ontology that models learning structures
in a multiagent system for uniform pre-assessments in a learning environment. This regular ontology has
led to the modelling of a classified rules learning algorithm that predicts the actual number of rules needed
for inductive learning processes and decision making in a multiagent system. But not all processes or
models are regular. Thus this paper presents a system of polynomial equation that can estimate and predict
the required number of rules of a non-regular ontology model given some defined parameters.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
As a primary data mining method for knowledge discovery, clustering is a technique of classifying a dataset into groups of similar objects. The most popular method for data clustering K-means suffers from the drawbacks of requiring the number of clusters and their initial centers, which should be provided by the user. In the literature, several methods have proposed in a form of k-means variants, genetic algorithms, or combinations between them for calculating the number of clusters and finding proper clusters centers. However, none of these solutions has provided satisfactory results and determining the number of clusters and the initial centers are still the main challenge in clustering processes. In this paper we present an approach to automatically generate such parameters to achieve optimal clusters using a modified genetic algorithm operating on varied individual structures and using a new crossover operator. Experimental results show that our modified genetic algorithm is a better efficient alternative to the existing approaches.
The approaches used in literature for solving combinatorial optimization problems have applied specific methodology or a
specific combination of methodologies to solve it. However, less importance is attached to modeling the solution for the given problem systematically. Modeling helps in analyzing the various parts of the solution clearly, thereby identifying which part of the methodology or combination of methodologies applied is efficient or inefficient. In order to find how efficient the different parts of the applied methodology is or methodologies are, it may be better to solve the given problem using the notion of hyper-heuristics. This can be done by solving the different parts of the given problem with many different methodologies realized, implemented and benchmarked, enabling to choose the best hybrid methodology. A theoretical model or representation of the problem’s solution may facilitate clear proposal and realization of the different methodologies for the various parts of the solution. The literature reveals that there is a need
for a generic model which could be used to represent the solution for combinatorial optimization problems. Therefore, inspired by the basic problem solving behavior exhibited by animals in their day to day life, a new bio-inspired hyper-heuristic generic model for solving combinatorial optimization problems has been proposed. To demonstrate the application of this generic model proposed a problem specific model is derived that solves the web services selection/composition problem. This specialized model has been realized with a trip planning case study and the results are discussed.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
Integrated bio-search approaches with multi-objective algorithms for optimiza...TELKOMNIKA JOURNAL
Optimal selection of features is very difficult and crucial to achieve, particularly for the task of classification. It is due to the traditional method of selecting features that function independently and generated the collection of irrelevant features, which therefore affects the quality of the accuracy of the classification. The goal of this paper is to leverage the potential of bio-inspired search algorithms, together with wrapper, in optimizing multi-objective algorithms, namely ENORA and NSGA-II to generate an optimal set of features. The main steps are to idealize the combination of ENORA and NSGA-II with suitable bio-search algorithms where multiple subset generation has been implemented. The next step is to validate the optimum feature set by conducting a subset evaluation. Eight (8) comparison datasets of various sizes have been deliberately selected to be checked. Results shown that the ideal combination of multi-objective algorithms, namely ENORA and NSGA-II, with the selected bio-inspired search algorithm is promising to achieve a better optimal solution (i.e. a best features with higher classification accuracy) for the selected datasets. This discovery implies that the ability of bio-inspired wrapper/filtered system algorithms will boost the efficiency of ENORA and NSGA-II for the task of selecting and classifying features.
When deep learners change their mind learning dynamics for active learningDevansh16
Abstract:
Active learning aims to select samples to be annotated that yield the largest performance improvement for the learning algorithm. Many methods approach this problem by measuring the informativeness of samples and do this based on the certainty of the network predictions for samples. However, it is well-known that neural networks are overly confident about their prediction and are therefore an untrustworthy source to assess sample informativeness. In this paper, we propose a new informativeness-based active learning method. Our measure is derived from the learning dynamics of a neural network. More precisely we track the label assignment of the unlabeled data pool during the training of the algorithm. We capture the learning dynamics with a metric called label-dispersion, which is low when the network consistently assigns the same label to the sample during the training of the network and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
Process Mining (PM) emerged from business process management but has recently been applied to
educational data and has been found to facilitate the understanding of the educational process.
Educational Process Mining (EPM) bridges the gap between process analysis and data analysis, based on
the techniques of model discovery, conformance checking and extension of existing process models. We
present a systematic review of the recent and current status of research in the EPM domain, focusing on
application domains, techniques, tools and models, to highlight the use of EPM in comprehending and
improving educational processes.
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
Multi-label spatial classification based on association rules with multi objective genetic
algorithms (MOGA) enriched by semi supervised learning is proposed in this paper. It is to deal
with multiple class labels problem. In this paper we adapt problem transformation for the multi
label classification. We use hybrid evolutionary algorithm for the optimization in the generation
of spatial association rules, which addresses single label. MOGA is used to combine the single
labels into multi labels with the conflicting objectives predictive accuracy and
comprehensibility. Semi supervised learning is done through the process of rule cover
clustering. Finally associative classifier is built with a sorting mechanism. The algorithm is
simulated and the results are compared with MOGA based associative classifier, which out
performs the existing
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELScscpconf
Uncertainty is a pervasive in real world environment due to vagueness, is associated with the
difficulty of making sharp distinctions and ambiguity, is associated with situations in which the
choices among several precise alternatives cannot be perfectly resolved. Analysis of large
collections of uncertain data is a primary task in the real world applications, because data is
incomplete, inaccurate and inefficient. Representation of uncertain data in various forms such
as Data Stream models, Linkage models, Graphical models and so on, which is the most simple,
natural way to process and produce the optimized results through Query processing. In this
paper, we propose the Uncertain Data model can be represented as Possibilistic data model
and vice versa for the process of uncertain data using various data models such as possibilistic
linkage model, Data streams, Possibilistic Graphs. This paper presents representation and
process of Possiblistic Linkage model through Possible Worlds with the use of product-based
operator.
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...ijaia
Objects or structures that are regular take uniform dimensions. Based on the concepts of regular models,
our previous research work has developed a system of a regular ontology that models learning structures
in a multiagent system for uniform pre-assessments in a learning environment. This regular ontology has
led to the modelling of a classified rules learning algorithm that predicts the actual number of rules needed
for inductive learning processes and decision making in a multiagent system. But not all processes or
models are regular. Thus this paper presents a system of polynomial equation that can estimate and predict
the required number of rules of a non-regular ontology model given some defined parameters.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
As a primary data mining method for knowledge discovery, clustering is a technique of classifying a dataset into groups of similar objects. The most popular method for data clustering K-means suffers from the drawbacks of requiring the number of clusters and their initial centers, which should be provided by the user. In the literature, several methods have proposed in a form of k-means variants, genetic algorithms, or combinations between them for calculating the number of clusters and finding proper clusters centers. However, none of these solutions has provided satisfactory results and determining the number of clusters and the initial centers are still the main challenge in clustering processes. In this paper we present an approach to automatically generate such parameters to achieve optimal clusters using a modified genetic algorithm operating on varied individual structures and using a new crossover operator. Experimental results show that our modified genetic algorithm is a better efficient alternative to the existing approaches.
The approaches used in literature for solving combinatorial optimization problems have applied specific methodology or a
specific combination of methodologies to solve it. However, less importance is attached to modeling the solution for the given problem systematically. Modeling helps in analyzing the various parts of the solution clearly, thereby identifying which part of the methodology or combination of methodologies applied is efficient or inefficient. In order to find how efficient the different parts of the applied methodology is or methodologies are, it may be better to solve the given problem using the notion of hyper-heuristics. This can be done by solving the different parts of the given problem with many different methodologies realized, implemented and benchmarked, enabling to choose the best hybrid methodology. A theoretical model or representation of the problem’s solution may facilitate clear proposal and realization of the different methodologies for the various parts of the solution. The literature reveals that there is a need
for a generic model which could be used to represent the solution for combinatorial optimization problems. Therefore, inspired by the basic problem solving behavior exhibited by animals in their day to day life, a new bio-inspired hyper-heuristic generic model for solving combinatorial optimization problems has been proposed. To demonstrate the application of this generic model proposed a problem specific model is derived that solves the web services selection/composition problem. This specialized model has been realized with a trip planning case study and the results are discussed.
EFFICIENT FEATURE SUBSET SELECTION MODEL FOR HIGH DIMENSIONAL DATAIJCI JOURNAL
This paper proposes a new method that intends on reducing the size of high dimensional dataset by
identifying and removing irrelevant and redundant features. Dataset reduction is important in the case of
machine learning and data mining. The measure of dependence is used to evaluate the relationship
between feature and target concept and or between features for irrelevant and redundant feature removal.
The proposed work initially removes all the irrelevant features and then a minimum spanning tree of
relevant features is constructed using Prim’s algorithm. Splitting the minimum spanning tree based on the
dependency between features leads to the generation of forests. A representative feature from each of the
forests is taken to form the final feature subset
Integrated bio-search approaches with multi-objective algorithms for optimiza...TELKOMNIKA JOURNAL
Optimal selection of features is very difficult and crucial to achieve, particularly for the task of classification. It is due to the traditional method of selecting features that function independently and generated the collection of irrelevant features, which therefore affects the quality of the accuracy of the classification. The goal of this paper is to leverage the potential of bio-inspired search algorithms, together with wrapper, in optimizing multi-objective algorithms, namely ENORA and NSGA-II to generate an optimal set of features. The main steps are to idealize the combination of ENORA and NSGA-II with suitable bio-search algorithms where multiple subset generation has been implemented. The next step is to validate the optimum feature set by conducting a subset evaluation. Eight (8) comparison datasets of various sizes have been deliberately selected to be checked. Results shown that the ideal combination of multi-objective algorithms, namely ENORA and NSGA-II, with the selected bio-inspired search algorithm is promising to achieve a better optimal solution (i.e. a best features with higher classification accuracy) for the selected datasets. This discovery implies that the ability of bio-inspired wrapper/filtered system algorithms will boost the efficiency of ENORA and NSGA-II for the task of selecting and classifying features.
T OWARDS A S YSTEM D YNAMICS M ODELING M E- THOD B ASED ON DEMATELijcsit
If System Dynamics (SD) models are constructed based
solely on decision makers' mental models and u
n-
derstanding of the context subject to study, then the resulting systems must necessarily bear some d
e
gree of
deficiency due to the subjective, limited, and internally inconsistent mental models which led to t
he conce
p-
tion of these systems. As such, a systematic method for constructing SD models could be esse
n
tially helpful
in overcoming the biases dictated by the human mind's limited understanding and conceptualization of
complex systems. This paper proposes a
novel combined method to su
p
port SD model construction. The
classical Dec
i
sion Making Trial and Evaluation Laboratory (DEMATEL) technique is used to define causal
relationships among variables of a system, and to construct the corresponding Impact Relatio
n Maps
(IRMs). The novelty of this paper stems from the use of the resulting total influence m
a
trix to derive the
system dynamic's Causal Loop Diagram (CLD) and then define variable weights in the stock
-
flow chart
equations. This new method allows to overc
ome the subjectivity bias of SD
mode
ling while projecting D
E-
MATEL in a more d
y
namic simulation environment, which could significantly improve the strategic choices
made by an
a
lysts and policy makers
Survey on evolutionary computation tech techniques and its application in dif...ijitjournal
In computer science, 'evolutionary computation' is an algorithmic tool based on evolution. It implements
random variation, reproduction and selection by altering and moving data within a computer. It helps in
building, applying and studying algorithms based on the Darwinian principles of natural selection. In this
paper, studies about different evolutionary computation techniques used in some applications specifically
image processing, cloud computing and grid computing is carried out briefly. This work is an effort to help
researchers from different fields to have knowledge on the techniques of evolutionary computation
applicable in the above mentioned areas.
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
Ontology Based PMSE with Manifold PreferenceIJCERT
International journal from http://www.ijcert.org
IJCERT Standard on-line Journal
ISSN(Online):2349-7084,(An ISO 9001:2008 Certified Journal)
iso nicir csir
IJCERT (ISSN 2349–7084 (Online)) is approved by National Science Library (NSL), National Institute of Science Communication And Information Resources (NISCAIR), Council of Scientific and Industrial Research, New Delhi, India.
With the surge in modern research focus towards Pervasive Computing, lot of techniques and challenges
needs to be addressed so as to effectively create smart spaces and achieve miniaturization. In the process of
scaling down to compact devices, the real things to ponder upon are the Information Retrieval challenges.
In this work, we discuss the aspects of multimedia which makes information access challenging. An
Example Pattern Recognition scenario is presented and the mathematical techniques that can be used to
model uncertainty are also presented for developing a system that can sense, compute and communicate in
a way that can make human life easy with smart objects assisting from around his surroundings.
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...IAEME Publication
Close range photogrammetry network design is referred to the process of placing a set of
cameras in order to achieve photogrammetric tasks. The main objective of this paper is tried to find
the best location of two/three camera stations. The genetic algorithm optimization and Particle
Swarm Optimization are developed to determine the optimal camera stations for computing the three
dimensional coordinates. In this research, a mathematical model representing the genetic algorithm
optimization and Particle Swarm Optimization for the close range photogrammetry network is
developed. This paper gives also the sequence of the field operations and computational steps for this
task. A test field is included to reinforce the theoretical aspects.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Survey on Efficient Techniques of Text Miningvivatechijri
In the current era, with the advancement of technology, more and more data is available in digital
form. Among which, most of the data (approx. 85%) is in unstructured textual form. So it has become essential to
develop better techniques and algorithms to extract useful and interesting information from this large amount of
textual data. Text mining is process of extracting useful data from unstructured text. The algorithm used for text
mining has advantages and disadvantages. Moreover the issues in the field of text mining that affect the accuracy
and relevance of the results are identified.
MOCANAR: A MULTI-OBJECTIVE CUCKOO SEARCH ALGORITHM FOR NUMERIC ASSOCIATION RU...cscpconf
Extracting association rules from numeric features involves searching a very large search space. To
deal with this problem, in this paper a meta-heuristic algorithm is used that we have called
MOCANAR. The MOCANAR is a Pareto based multi-objective cuckoo search algorithm which
extracts high quality association rules from numeric datasets. The support, confidence,
interestingness and comprehensibility are the objectives that have been considered in the
MOCANAR. The MOCANAR extracts rules incrementally, in which, in each run of the algorithm, a
small number of high quality rules are made. In this paper, a comprehensive taxonomy of metaheuristic
algorithm have been presented. Using this taxonomy, we have decided to use a Cuckoo
Search algorithm because this algorithm is one of the most matured algorithms and also, it is simple
to use and easy to comprehend. In addition, until now, to our knowledge this method has not been
used as a multi-objective algorithm and has not been used in the association rule mining area. To
demonstrate the merit and associated benefits of the proposed methodology, the methodology has
been applied to a number of datasets and high quality results in terms of the objectives were
extracted
MOCANAR: A Multi-Objective Cuckoo Search Algorithm for Numeric Association Ru...csandit
Extracting association rules from numeric features
involves searching a very large search space. To
deal with this problem, in this paper a meta-heuris
tic algorithm is used that we have called
MOCANAR. The MOCANAR is a Pareto based multi-object
ive cuckoo search algorithm which
extracts high quality association rules from numeri
c datasets. The support, confidence,
interestingness and comprehensibility are the objec
tives that have been considered in the
MOCANAR. The MOCANAR extracts rules incrementally,
in which, in each run of the algorithm, a
small number of high quality rules are made. In thi
s paper, a comprehensive taxonomy of meta-
heuristic algorithm have been presented. Using this
taxonomy, we have decided to use a Cuckoo
Search algorithm because this algorithm is one of t
he most matured algorithms and also, it is simple
to use and easy to comprehend. In addition, until n
ow, to our knowledge this method has not been
used as a multi-objective algorithm and has not bee
n used in the association rule mining area. To
demonstrate the merit and associated benefits of th
e proposed methodology, the methodology has
been applied to a number of datasets and high quali
ty results in terms of the objectives were
extracted
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
Heuristics for the Maximal Diversity Selection ProblemIJMER
The problem of selecting k items from among a given set of N items such that the ‘diversity’
among the k items is maximum, is a classical problem with applications in many diverse areas such as
forming committees, jury selection, product testing, surveys, plant breeding, ecological preservation,
capital investment, etc. A suitably defined distance metric is used to determine the diversity. However,
this is a hard problem, and the optimal solution is computationally intractable. In this paper we present
the experimental evaluation of two approximation algorithms (heuristics) for the maximal diversity
selection problem
In recent years, consumers and legislation have been pushing companies to optimize their activities in such a way as to reduce negative environmental and social impacts more and more. In the other side, companies
must keep their total supply chain costs as low as possible to remain competitive.This work aims to develop a model to traveling salesman problem including environmental impacts and to identify, as far as possible, the contribution of genetic operator’s tuning and setting in the success and
efficiency of genetic algorithms for solving this problem with consideration of CO2 emission due to transport. This efficiency is calculated in terms of CPU time consumption and convergence of the solution. The best transportation policy is determined by finding a balance between financial and environmental
criteria.Empirically, we have demonstrated that the performance of the genetic algorithm undergo relevant
improvements during some combinations of parameters and operators which we present in our results part.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
JMeter webinar - integration with InfluxDB and Grafana
A preliminary survey on optimized multiobjective metaheuristic methods for data clustering using evolutionary approaches
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
A PRELIMINARY SURVEY ON OPTIMIZED
MULTIOBJECTIVE METAHEURISTIC METHODS FOR
DATA CLUSTERING USING EVOLUTIONARY
APPROACHES
Ramachandra Rao Kurada1, Dr. K Karteeka Pavan2, and Dr. AV Dattareya Rao3
1
Research Scholar (Part-time), Acharya Nagarjuna University, Guntur & Assistant
Professor Sr., Department of Computer Applications, Shri Vishnu Engineering College
for Women, Bhimavaram
2
Professor, Department of Information Technology, RVR & JC College of Engineering,
Guntur
3
Principal, Acharya Nagarjuna University, Guntur
ABSTRACT
The present survey provides the state-of-the-art of research, copiously devoted to Evolutionary Approach
(EAs) for clustering exemplified with a diversity of evolutionary computations. The Survey provides a
nomenclature that highlights some aspects that are very important in the context of evolutionary data
clustering. The paper missions the clustering trade-offs branched out with wide-ranging Multi Objective
Evolutionary Approaches (MOEAs) methods. Finally, this study addresses the potential challenges of
MOEA design and data clustering, along with conclusions and recommendations for novice and
researchers by positioning most promising paths of future research.
MOEAs have substantial success across a variety of MOP applications, from pedagogical multifunction
optimization to real-world engineering design. The survey paper noticeably organizes the developments
witnessed in the past three decades for EAs based metaheuristics to solve multiobjective optimization
problems (MOP) and to derive significant progression in ruling high quality elucidations in a single run.
Data clustering is an exigent task, whose intricacy is caused by a lack of unique and precise definition of a
cluster. The discrete optimization problem uses the cluster space to derive a solution for Multiobjective
data clustering. Discovery of a majority or all of the clusters (of illogical shapes) present in the data is a
long-standing goal of unsupervised predictive learning problems or exploratory pattern analysis.
KEYWORDS
Data clustering, multi-objective optimization problems, multiobjective evolutionary algorithms, meta
heuristics.
1. INTRODUCTION
The incipient Millennium witnesses the materialization of Information Technology because the
thrust behind the encroachment in Computational Intelligence (CI) [1]. The growing complexity
of computer programs, availability, increased speed of computations and their ever-decreasing
costs have already manifested a momentous impact on CI. Amongst the computational paradigms,
Evolutionary Computation (EC) [2] is currently apperceived as a notably applicable sundry of
ancient and novel computational applications in data clustering. EC comprises a set of softDOI : 10.5121/ijcsit.2013.5504
57
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
computing paradigms [3] designed to solve optimization problems. In contrast with the rigid/static
models of hard computing, these nature-inspired models provide self-adaptation mechanisms,
which aim at identifying and exploiting the properties of the instance of the problem being solved.
Clustering is the important step for many errands in machine learning [70]. Every algorithmic rule
has its own bias attributable to the improvements of various criteria. Unsupervised machine
learning is inherently an optimization task; one is trying to fit the best model to a sample of data
[4]. The definition of “best” is unconditional; generalization concert with relevancy to the full
universe of data points. However machine learning algorithms do not understand this a priori, and
rather than rely on heuristic estimations considering the standards of their replica and restriction,
such as goodness of fit with relevance to the experiment facts and figures, model parsimony, and
so on [5]. Optimization [6] is that the method of getting the simplest result or profit beneath a
given set of attenuating factors. Enterprise conclusions were tailored ultimately by
maximizing/minimizing a goal or advantage. The dimensions and complexity of improvement
issues that can be explained in a reasonable time has been advanced by the advent of up to date
computing technologies.
The single-criterion optimization problem has a single optimization solution with a single
objective function. In a multi-criterion optimization firm, there is more than one objective
function, each of which may have an uncooperative self-optimal decision. These optimal solutions
follow the name of an economist Vilfredo Pareto [7], stated in 1896 a concept known as “Pareto
optimality”. In a multi-criterion optimization firm, there is more than one objective function, each
of which may have an uncooperative self-optimal decision. The concept is that the solution to a
multi-objective optimization quandary is mundanely not a single value but instead of a set of
values, withal called the “Pareto Set”. A number of Pareto optimal solutions can, in principle, is
captured in a EAs population, thereby sanctioning a utilizer to find multiple Pareto-optimal
solutions in one simulation.
In topical times MOPs are the crucial areas in science and engineering. The complexity of MOPs
becomes more and more significant in terms of size of the problem to be solved i.e. the number of
objective functions and size of the search space [8]. The taxonomy of MOP is shown in Figure 1.
Figure 1. Taxonomy of Multiobjective Optimization Problems (MOPs)
MOPs do multi criteria decision making by three approaches Apriori, A posteriori and Interactive.
In Apriori the decision makers provide preferences before the optimization process. The A
posteriori approach search process determines a set of Pareto solutions. This set helps the decision
makers to have a complete knowledge of the Pareto front. The Pareto front constitutes an acquired
58
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
knowledge on the problem. Subsequently, the decision makers choose one solution from the set of
solutions provided by the solver [9]. Interactive approach is a progressive interaction between the
decision maker and the solver, by applying search components over the multiobjective
metaheuristic methods.
A metaheuristic is formally outlined as an unvaried generation method that guides a subordinate
heuristic by blending intelligence completely with distinct ideas for exploring and exploiting the
search space, discovering strategies that are utilized to structure data in order to find efficiently
near-optimal solutions [10]. Metaheuristic algorithms seek good solutions to optimization
problems in circumstances where the complexity of the tackled problem or the search time
available does not allow the use of exact optimization algorithms [11]. Solving MOPs with
multiple global/ local optimal solutions by the evolutionary algorithms (EA) and metaheuristics
deduce separation of a population of individuals into subpopulations, each connected to a
different optimum, with the aim of maintaining diversity for a longer period.
Data analysis [12] plays an indispensable role for understanding various phenomena. Cluster
analysis, primitive exploration with little or no prior knowledge, consists of research developed
across a wide variety of multiobjective metaheuristic methods [15]. Cluster analysis is all
pervading in life sciences [13], while taxonomy is the term used to denote the activity of ordering
and arranging the information in domains like botany, zoology, ecology, etc. The biological
classification dating back to the 18th century (Carl Linne) and still valid, is just a result of cluster
analysis. Clustering constitutes the steps that antecede classification. Classification aims at
assigning new objects to a set of existing clusters/classes; from which point of view it is the
'maintenance' phase, aiming at updating a given partition. The goal of multiobjective clustering
[14] is to find clusters in a data set by applying several Evolutionary-clustering algorithms
corresponding to different objective functions. Optimization-based clustering algorithms [14]
wholly rely on cluster validity indices, the optimum of which appears as a proxy for the
unidentified “correct classification” during an antecedently unhandled dataset.
This survey provides an updated overview that is fully devoted to metaheuristic evolutionary
algorithms for clustering, and is not limited to any particular kind of evolutionary approach,
comprises of advanced topics in MOPs, MOEAs and data clustering. Section 1embarks the basic
stipulations of MOPs and EC. Section 2 introduces relevant MOEAs classification scheme along
with Pareto Optimality and briefly addresses the basic framework, algorithms, design, recent
developments and applications in this field. Section 3 provides a catalog that foregrounds some
significant vistas in the context of evolutionary data clustering. Section 4 summarizes the survey
paper by addressing some important issues of MOPs, MOEAs and data clustering, and set most
hopeful paths for future research.
2. MULTI OBJECTIVE EVOLUTIONARY ALGORITHM (MOEAS)
2.1. Introduction
MOEAs [7], [10], [14], [15] have involved a stack of erudition proposition during the last three
decades, and they are still one of the most up-to-date study areas in the bifurcate of ECs. The first
generation of MOEA [16] was characterized by rate of proposal entity based on Pareto ranking.
The most frequent advancement to uphold diversity of the same is fitness sharing. In the other
end, MOEAs are characterized by a stress on tidiness and by the site of selectiveness. Elitism [17]
is a method to safeguard and use formerly found paramount results in succeeding generations of
EA. Maintaining archives of non-dominated solutions is an important issue in elitism EA. The
final contents of archive represent usually the result returned by optimization process.
When an area of concession (non-dominated) solutions is believed at search and managerial, these
solutions involve a nutritious estimation to the Pareto optimal front [18]. The Pareto optimal front
59
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
is the lay down of all non-subject solutions in the multiobjective coverage. However, when the
solutions in the obtained specific do not huddle on the Pareto optimal front, then they must lessen
to obtained non-dominated front or the informal Pareto front [6]. The pleasantness of Pareto
optimization [14] emanates from the fact that, in most of the MOPs there is no such single best
solution or global optima and it is very complicated to establish predilections among the criteria
antecedent to the search. The spread and distribution of the non-dominated solutions, the
proximity between the obtained front and the Pareto optimal front is decoratively obtained from
the back issues of non-dominated solutions [10]. The three main goals set by Zitzler in [67], [71]
could be achieved by MOPs are a) The Pareto solution set should be as close as possible to the
true Pareto front, b) The Pareto solution set should be uniformly distributed and diverse over of
the Pareto front, c) In order to provide the decision-maker a true picture of trade-offs, the set of
solutions should capture the whole spectrum of the Pareto front. This needs inspection of
solutions at the uppermost ends of the target function space.
Definition1:
(MOP
General
Definition):
In
general,
an
MOP
minimizes
( ⃗) =
( ⃗), … , ( ⃗) subject to ( ⃗) ≤ 0, = 1, … , , ⃗ ∈ Ω. An MOP solution minimizes
the component of a vector ( ⃗ ), where ⃗ is an n-dimensional decision variable vector ( ⃗ =
, … , ) from some universe Ω. The MOPs evaluation function, : Ω ⟶ , maps decision
variable ( ⃗ = , … , ) to ( ⃗ = , … , ). Using the MOP notation presented in Definition1,
these key Pareto concepts are mathematically defined as follows:
Definition 2: (Pareto Dominance): A vector ⃗ = ( , … ) is said to dominate ⃗ = ( , … )
denoted by ⃗ ≤ ⃗ if and only if
is partially less than
i.e. ∀ ∈ {1, … , }, ≤ ∧ ∃ ∈
{1, … , }: < .
Definition 3: (Pareto Optimality): A solution
if and only if there us bi
′
∈ Ω for which ⃗ =
∈ Ω is said to be Pareto optimal with respect to Ω
′
=
′
,…,
′
dominates ⃗ =
( )=
( ), … , ( ) . The phrase “Pareto Optimal” is taken to mean with deference to the
whole conclusion variable space except otherwise specified.
Definition 4: (Pareto Optimal Set): For a given MOP,
′
defined as ∗ ≔ ∈ Ω|¬∃ ′ ∈ Ω ∶
≤ ( ) .
( ), the Pareto Optimal Set (
Definition 5: (Pareto Front): For a given MOP, ( ), the Pareto Optimal Set (
( ), … , ( ) | ∈ ∗ .
Front ( ℱ ∗ ) is defined as: ℱ ∗ ≔ { ⃗ } = ( ) =
∗ ),
∗)
is
the Pareto
Figure 2. Taxonomy frameworks of meta-heuristics
60
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
While solving optimization problems, single-solution based metaheuristics improves a single
solution in different domains. They could be viewed as walk through neighborhoods or search
trajectories through the search space of the problem. The walks are performed by iterative
dealings that move from the present answer to a different one within the search area. Population
based metaheuristics [19] share the same concepts and viewed as an iterative improvement in a
population of solutions. First, the population is initialized. Then, a new population of solutions is
generated. It is followed with generation for a replacement population of solutions. Finally, this
new population is built-in into present one using some selection procedures. The search method is
stopped once a given condition is fulfilled. Figure 2 provides the taxonomy frameworks of
multiobjective metaheuristics.
2.2. Deterministic meta-heuristics - Tabu Search (TS)
Tabu search [20], also known as Hill Climbing is essentially a sophisticated and improved type of
local search, the algorithm works as follows: Consider a beginning accepted solution; appraise its
adjoining solutions based on accustomed adjacency structure, and set the best as the first found
neighbor, which is better than the present solution. Iterate the function until a superior solution is
found in the neighborhood of the present solution. The bounded search stops if the present
solution is better all its neighbors. The pseudo code 1 demonstrates the working procedure of
Tabu Search.
Pseudo code 1: Tabu Search (TS)
--------------------------------------------------------------------------------------------------Step 1: Initialize
Step 2: Repeat
Step 3:
Generate all of the acceptable neighborhood solutions.
Step 4:
Evaluate the generated solutions.
Step 5:
Hold the acceptable ones as a candidate solution
Step 6:
If there is no suitable candidate then
Step 7:
Select the critical of the controlled solutions as a candidate
Step 8:
Update the tabu list.
Step 9:
Move to candidate solution.
Step 10:
If the number of engenerated solutions are sufficient, expand.
Step 11: Until Termination condition is met.
----------------------------------------------------------------------------------------------------
2.3.Probabilistic meta-heuristics - Single solution based method
Simulated annealing (SA) is activated as the oldest element of meta-heuristic methods that gives
single solution and corpuscles accumulation issues. Metropolis et al. proposed SA in 1953 [21]. It
was motivated by assuming the concrete action of annealing solids. First, a solid is acrimonious to
a top temperature and again cooled boring so that the arrangement at any time is about in
thermodynamic equilibrium. At equilibrium, there may be abounding configurations with
anniversary agnate to a specific activity level. The adventitious of accepting a change from the
accepted agreement to a new agreement is accompanying to the aberration in activity amid with
the two states. Since then, SA has been broadly acclimated in combinatorial optimization issues
and it was proven that SA has accomplished acceptable after-effects on a variety of such issues.
The pseudo code 2 demonstrates the working procedure of SA.
61
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
Pseudo code 2: Simulated Annealing (SA)
----------------------------------------------------------Step 1: Initialize
Step 2: Repeat
Step 3:
Generate a candidate solution.
Step 4:
Evaluate the candidate.
Step 5:
Determine the current solution.
Step 6:
Reduce the temperature.
Step 7: Until Termination condition is met.
-----------------------------------------------------------
2.4.Population based methods
Population based methods are those which not only impersonate the biological or natural
occurrence but additionally establish with a set of initial realistic solutions called ‘Population’ and
the intention is direct, that search in state space would to reach to the most favorable solution. EC
are the recent search practices, which uses computational sculpt of procedure of evolution and
selection. Perceptions and mechanism of Darwin [23] evolution and natural selection are
predetermined in evolutionary algorithms (EAs) and are habituated to solve quandaries in many
fields of engineering and science. EAs are recent, parallel, search and optimization practices,
which uses computational sculpt of procedure of natural selection [23] and population genetics
[24].
2.4.1.
Evolutionary algorithms
EAs use the vocabulary borrowed from genetics. They simulate the evolution across a sequence of
generations (iterations within an iterative process) of a population (set) of candidate solutions. A
candidate solution is internally represented as a string of genes and is called chromosome or
individual. The position of a gene in a chromosome is called locus and all the possible values for
the gene form the set of alleles of the respective gene. The internal representation (encoding) of a
candidate solution in an evolutionary algorithm forms the genotype that is processed by the
evolutionary algorithm. Each chromosome corresponds to a candidate solution in the search space
of the problem, which represents its phenotype. A decoding function is necessary to translate the
genotype into phenotype. Mutation and Crossover are two frequently used operators referred to as
evolutionary approaches [25].
Mutation consists in a random perturbation of a gene while crossover aims at exchanging genetic
information among several chromosomes, thus avoids local optima. The chromosome subjected to
a genetic operator is called parent and the resulted chromosome is called offspring. A process
called selection involving some degree of randomness selects the individuals to Recombination or
Crossover creates offspring’s, mainly based on individual merit. The individual merit is evaluated
using a fitness function, which quantifies how the candidate solution befitted being encoded by
the chromosome, for the problem being solved. The fitness function is formulated based on the
mathematical function to be optimized. The solution returned by an evolutionary algorithm is
usually the most fitted chromosome in the last generation. The pseudo code 3 demonstrates the
basic structure of Evolutionary Approaches and the general framework of Evolutionary
Approaches is shown as Figure 3.
62
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
Figure 3. General Framework of Evolutionary Approaches
Several landmark methods identified in EAs are, Evolutionary Programming (EP) [35], [36],
Genetic Programming (GP) [15], Differential Evolution (DE) [26], Scatter Search (SS) [27],
Memetic Algorithm (MA) [28], and Genetic Algorithms (GA) [29], [30]. All these algorithms
accept the genetic operations anchored central with accessory deviation. In recent times, heuristics
or meta-heuristics are combined with these algorithms to form hybrid methods. Few independent
implementation instances of EAs are GAs, developed by Holland [31] and Goldberg [32].
Evolution Strategies (ESs) developed by Rechenberg [33] and Schwefel [34], Evolutionary
Programming (EP), originally developed by L. J. Fogel et al. [35] and subsequently refined by D.
B. Fogel [36]. Each of these algorithms has been proved capable of yielding approximately
optimal solutions when prearranged with complex, multimodal, non-differential, and
discontinuous search spaces [37].
Pseudo code 3: Evolutionary Approaches (EA)
-------------------------------------------------------------------------------------------------
Step 1: Initialize
Step 2: Repeat
Step 3:
Evaluate the individuals.
Step 4:
Repeat
Step 5:
Select parents.
Step 6:
Generate offspring.
Step 7:
Mutate if enough solutions are generated.
Step 8:
Until population number is reached.
Step 9:
replicate the top fitted individuals into population as they were
Step 10: Until required number of generations are generate.
------------------------------------------------------------------------------------------------2.4.2.
Genetic algorithms (GAs)
Genetic algorithms (GA) [32], [38] are randomized search and optimization techniques guided by
the attempts of change and accustomed genetics, and having a bulk of absolute parallelism. GAs
perform search in intricate, immensely colossal and multimodal scenery, and provide near optimal
solutions for fitness function of an optimization quandary. Evolutionary stress is applied with a
stochastic method of roulette wheel in step 3; parent cull is utilized to pick parents for the
incipient population. The canonical GA [39] encodes the problem within binary string individuals.
The pseudo code 4 demonstrates the working procedure of GA. While selection is random any
individual has the choice to become a parent, selection is clearly biased towards fitted individuals.
63
8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
Parents are not required to be distinctive for any iteration; fit individuals may produce many
offspring’s the crossover is selected at random and mutation is applied to all individuals in the
new population. With probability Pm, each bit on every string is inverted. The new population
then becomes the current population and the cycle is repeated until some termination criteria
satisfied [40]. The algorithm typically runs for some fixed number of iterations, or until
convergence is detected within the population. The likelihood of mutation and crossover, Pm and
Pc are factor of the algorithm and have to be set by the user [41].
Pseudo code 4: Genetic Algorithm (GA)
----------------------------------------------------------------------------------------------------------------Step 1: A population of µ random individuals is initialized.
Step 2: Fitness scores are assigned to each individual.
Step 3: By means of roulette wheel select µ/2 pairs of parents from the existing
to form anatomy of new population.
Step 4: With probability Pc, offspring are produced by assuming crossover on the µ /2 pairs
of parents. The offspring alter the parents in the new population.
Step 5: With probability Pm, mutation is performed on the new population.
Step 6: The new population becomes the current population.
Step 7: If the extinction situation are annoyed exit, otherwise go to step 3.
----------------------------------------------------------------------------------------------------------------2.4.3.
Evolutionary strategies (ESs)
Historically ESs was designed for parameter optimization problems. The encoding used for each
individual is a list of real numbers, called the object variables of the problem. The action ambit
factors acclimates the behavior of the mutation operator and are necessary when
translating to an individual [33], [34]. In each iteration, one offspring is generated from a
population of size m. The same parents are used to acclimate to all object parameters in the child,
and again the parents are re-selected for each action ambit factors. The parents are chosen
randomly from the existing population. Mutation is the major abettor in ES and it acts upon action
ambits as well as object parameters. Selection in EAs is deterministic i.e. the best m individuals
are taken from the new offspring self-adaptation mechanism like (µ,) or (µ+) [55]. The pseudo
code 5 demonstrates the working procedure of ES.
Pseudo code 5: Evolutionary Strategies (ESs)
----------------------------------------------------------------------------------------------------------------Step 1: µ individuals are arbitrarily initialized from the existing population.
Step 2: All the µ individuals are dispensed with Fitness scores.
Step 3: New offspring are engendered by recombination from the current population.
Step 4: The new offspring are mutated.
Step 5: All the individuals are dispensed with Fitness scores
Step 6: A new population of m individuals is selected, using either (µ,) or (µ+) selection.
Step 7: The new population becomes the current population.
Step 8: If the termination conditions are convinced exit, else go to step 3.
---------------------------------------------------------------------------------------------------------------2.4.4.
Evolutionary programming (EP)
EP was first developed by L. J. Fogel et. al. [26] for the evolution of finite state machines using a
restricted symbolic alphabet encoding. Individuals in EP comprise a string of real numbers. EP
64
9. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
differs from GAs and ESs and there is no recombination operator. Evolution is wholly dependent
on the mutation operator, using s a Gaussian probability distribution to perturb each variable. The
self-adaption is used to control the Gaussian mutation operator. Tournament selection method is
used to select fresh population from parents and children using mutation operator. The pseudo
code 6 demonstrates the working procedure of EP.
Pseudo code 6: Evolutionary Programming (EP)
------------------------------------------------------------------------------------------------------Step 1: A current population of m individuals is randomly initialized.
Step 2: Fitness scores are assigned to each of the m individuals.
Step 3: m individuals in the existing population are used to create m children
using mutation operator
Step 4: Fitness scores are assigned to the m offspring.
Step 5: A new population of size m is created from the m parents and the m offspring
using Tournament selection.
Step 6: If the termination conditions are met exit, else go to step 3.
------------------------------------------------------------------------------------------------------2.4.5.
Genetic programming (GP)
In GP individuals are computational programs. GP was developed by John Koza [40] and it is
based on GA. GP is known as an successful research exemplar in Artificial Intelligence and
Machine Learning, and have been studied in the most assorted areas of knowledge, such as data
mining, optimization tasks etc. In GP, the EAs operate over a population of programs that have
different forms and sizes. The primary population must have adequate multiplicity, that is, the
individuals must have most of the distinctiveness that are essential to crack the crisis directed by a
fitness function. Reproduction, Crossover, and Mutation genetic operators are applied after the
selection to the each chromosome in population. Those chromosomes that are more preponderant
solve the quandary, will receive a more preponderant fitness value, and accordingly will have a
more preponderant chance to be culled for the next generation [42]. The objective of the GP
algorithm is to cull, through recombination of genes, the program that that more preponderant
solves a given quandary. The pseudo code 7 demonstrates the working procedure of GP.
Pseudo code 7: Genetic Programming (GP)
---------------------------------------------------------------------------------------------Step1: Randomly create an initial population.
Step 2: Repeat
Step 3:
Evaluate of each program by means of the fitness function.
Step 4:
Select a subgroup of individuals to apply genetic operators.
Step 5:
Apply the genetic operators.
Step 6:
Replace the current population by this new population.
Step 7: Until a good solution or a stop criterion is reached.
---------------------------------------------------------------------------------------------2.4.6.
Ant colony optimization (ACO)
The first ACO [43] algorithm appeared in early 90s by was composed by Dorigo [44] is widely
solving metaheuristic for combinatorial optimization problems. The abstraction is based on the
ascertainment of foraging behavior of ants. When walking on routes from the nest to a source of
food, ants seem to assume a acquisition of a simple desultory route, but a quite ‘good’ one, in
agreement of shortness, or consistency, in terms of time of peregrinate. Thus, their deportment
65
10. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
sanctions them to solve an optimization quandary [45]. The pseudo code 8 demonstrates the
working procedure of ACO.
Pseudo code 8: Ant Colony Optimization (ACO)
---------------------------------------------------------------------------Step 1: Initialize
Step 2: Set I to 0.
Step 3: Repeat
Step 4:
Generate a feasible solution.
Step 5:
Evaluate goodness η.
Step 6:
If η > ηmax then
Step 7:
Begin update Elite list.
Step 8:
Shift the bounds End.
Step 7: If I mod α = 0 then amend the result.
Step 8: If I mod β = 0 then increase elite pheromone outlines.
Step 9: Update pheromone trails.
Step 10: Increment I.
Step 11: Until i=σ.
---------------------------------------------------------------------------Pseudo code 9: Particle Swarm Optimization (PSO)
----------------------------------------------------------------------------------------------------------------Step 1: Initialize
Step 2: Repeat
Step 3:
Evaluate fitness for each particle.
Step 4:
Update both the positions of local and global superlatives.
Step 5:
Update particle velocity by
v[i+1] = v[i]+c1*rand()*(pbest[i]-current[i])+c2*rand()*(gbest[i]-current[i]).
Step 6:
Update particle location by current[i+1] = current[i]+v[i].
Step 7: Until maximum number of generation reached.
----------------------------------------------------------------------------------------------------------------2.4.7.
Particle swarm optimization (PSO)
In mid-90s Kennedy and Eberhart [46], [47] first proposed the PSO algorithm. PSO is populationbased and evolutionary in nature. This algorithm was stimulated by the allegory of communal
interaction and correspondence in a flock of birds or school of fishes. In these communal groups,
a central agent directs the association of the entire swarms. The association of each entity is based
on the central agent and its own acquaintance. Hence, the particles in PSO algorithm rely on the
central agent, which is assumed as the top performer [48], [49]. The pseudo code 9 demonstrates
the working procedure of PSO.
2.4.8.
Bees algorithm (BA)
D.T. pharm first proposed the Bee algorithm [50] as an optimization method stimulated by the
usual foraging actions of honeybees to find the optimal result. The phenomenon behind this
algorithm is the food foraging behavior of honeybees. Honeybees are usually able to elongate
their colony over stretched spaces and in sundry probable directions, concurrently to capitalize on
significant number of food sources. A colony is prospered by reallocating its foragers to their
opportune fields. Typically, more bees are recruited for flower patches with ample amounts of
nectar or pollen that can be accumulated with less exertion [51]. The pseudo code 10
demonstrates the working procedure of BA.
66
11. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
Pseudo code 10: Bees Algorithm (BA)
--------------------------------------------------------------------------------------------------------------Step 1: Initialize
Step 2: Repeat
Step 3:
Evaluate fitness of the population.
Step 4:
While (stopping criterion did not meet)
Step 5:
Select sites for neighborhood search.
Step 6:
Recruit bees for selected sites (more bees for the best sites).
Step 7:
Evaluate fitness.
Step 8:
Select the fittest bee from each site.
Step 9:
Assign remaining bees to search randomly and evaluate their fitness.
Step 10:
End While
Step 11: Until maximum number of generation reached.
--------------------------------------------------------------------------------------------------------------Pseudo code 11: Water Flow-like Algorithm (WFA)
--------------------------------------------------------------------------------------------Step 1: Initialize
Step 2: Repeat
Step 3:
Repeat
Step 4:
Calculate number of sub flows.
Step 5:
For each sub flow find best neighborhood solution.
Step 6:
Distribute mass of flow to its sub flows.
Step 7:
Calculate improvement in objective value.
Step 8:
Until Population no. reached.
Step 9: Merge sub flows with same objective values.
Step 10: Update the no. of sub flows.
Step 11: Update total no. of water flows.
Step 12: If Precipitation condition met
Step 13:
Perform bit reordering strategy.
Step 14:
Distribute mass to flows.
Step 15: Evaluate new solution.
Step 16: Update the no. of sub flows.
Step 17: Update total no. of water flows.
Step 18: Until maximum number of generation reached.
--------------------------------------------------------------------------------------------2.4.9.
Water flow-like algorithm (WFA)
WFA was proposed by Yang [52] as a nature inspired optimization algorithm for object
clustering. It impersonate the action of water flowing from higher to lower level and aids in the
process of searching for optimal result. The pseudo code 11 demonstrates the working procedure
of WFA.
2.4.10. Differential evolution
DE [29], [53] materialized as a straightforward and well-organized scheme for global
optimization over continuous spaces more than a decade ago. It was kindred by computational
procedures as engaged by the typical evolutionary application to multiobjective, constrained, large
scale, and uncertain optimization issues [45]. DE solves the objective function by twisting and
tuning the sundry with the ingredient like initialization, mutation, diversity enhancement, and
67
12. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
selection as well as by the cull of the control parameters etc.
demonstrates the working procedure of DE.
[54]. The pseudo code 12
Pseudo code 12: Differential Evaluation (DE)
--------------------------------------------------------------------------------------------------------------Step 1: Begin
Step 2: Generate randomly an initial population of solutions.
Step 3: Calculate the fitness of the initial population.
Step 4: Repeat
Step 5:
Arbitrarily choose three solutions for each parent
Step 6:
Create one offspring using the DE operators.
Step 7:
repeat the procedure until it is equal to population size
Step 8:
For each member of the next generation.
Step 9:
If children (x) is more shaped than parent (x).
Step 10:
Parent (x) is replaced.
Step 11: Until a stop condition is satisfied.
Step 12: End.
--------------------------------------------------------------------------------------------------------------Some of the well-known meta-heuristics developed during the last three decades are Artificial
Immune Algorithm (AIA) [55], which works on the immune system of the human being. Music
creativeness in a music player is described as Harmony Search (HS) [56], the deeds of bacteria is
shown as Bacteria Foraging Optimization (BFO) [57]. The messaging between the frogs is
illustrated in Shuffled Frog Leaping (SFL) [58]. The algorithm for species movements like
migration and colonization is shown in Biogeography Based Optimization (BBO) [59]. The
theory of gravitational force, staging between the groups of bodies is described in Gravitational
Search Algorithm (GSA) [60]. The law of detonation of a grenade is shown in Grenade Explosion
Method (GEM) [61]. These algorithms are applied to numerous engineering optimization issues
and proved effective in solving precise kind of problems.
Teaching Learning Based Optimization (TLBO) [15] is an optimized method used to acquire
global solutions for continuous non-linear functions with fewer computational exertion and high
reliability. The viewpoint between teaching and learning is well exhibited in TLBO algorithm.
The TLBO algorithm is based on the influence of a teacher on the output of learners in a class.
The superiority of a teacher influences the outcome of learners. It is evident that a good teacher
trains learners such that they can have better outcome in terms of their marks or grades. Besides,
learners also learn from communication amongst themselves, which also facilitate in improving
their results. All EAs carve up the same basic idea, but diverge in the way they encode the result
and on the operators they use to produce the next generation. EAs are restricted by numerous
inputs, such as the size of the population, the rates that control how often mutation and crossover
are used. Broadly, there is no assurance that the EAs will find the optimal result to any subjective
problems, but careful handling of inputs and choosing an appropriate algorithm is passable to the
problem to increase the chances of accomplishment [58].
MOEA based on decomposition (MOEA/D) [17] is a recent framework based on traditional
aggregation methods that decomposes the MOP into a number of Scalar Objective Optimization
problems (SOPs). The aim of each SOP, or a sub problem, is a (linearly or nonlinearly) weighted
aggregation of the individual objectives. Memetic MOEA methods adept to offer not only more
preponderant speed of convergence to EAs, but with more preponderant correctness for the results
[62]. Ishibuchi and Murata suggested one of the first memetic MOEAs [63]. The algorithm
utilizes a local search procedure after the traditional variation operators are directed, and
arbitrarily they drawn scalar function, to accredit fitness, for parent selection.
68
13. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
MOPs are used to embark upon the arduousness of fuzzy partitioning where a number of fuzzy
cluster validity indices are simultaneously optimized. The resultant set of near-Pareto-optimal
answers contains a number of non-dominated answers, which the user can referee relatively and
choose upon the most undertaking one according to the problem necessity; Real-coded encoding
of the cluster centers is utilized for this principle [64].
The well-relished advances that have been utilized by distinct researchers for MOPs are:
Weighted-sum –Approach [67], Schaffer’s Vector Evaluated Generic Algorithm (VEGA) [48],
Fonseca and Fleming’s Multi-Objective Genetic Algorithm (MOGA) [49], Horn, Nafploitis and
Goldberg’s Niched Pareto Genetic Algorithm (NPGA) [65], Zitzler and Thiele’s strength Pareto
Evolutionary Algorithm (SPEA) [66], Srinivas and Deb’s Non-dominated Sorting Genetic
Algorithm (NSGA) [68] is used to determine the appropriate cluster centers and the corresponding
partition matrix [69], NSGA-II [101], Vector Optimized Evolution Strategy (VOES) [71] ,
Weight-based Genetic Algorithm (WBGA) [72], Predator-Prey evolution strategy (PPES) [73],
Rudolph’s Elitist multiobjective Evolutionary Algorithm [75], Knowles and Corne suggested
Pareto-archived evolution strategy (PAES) [54], which uses an evolutionary strategy.
Multi-objective genetic local search algorithm (MOGLS) for solving combinatorial multiobjective
optimization problems was first proposed by Ishibuchi and Murata [97]. MOGLS applies a local
search technique after using standard variation operators. A scalar fitness function was utilized to
cull a couple of parent solutions to engender incipient solutions with crossover and mutation
operator. Hybrid evolutionary metaheuristics (HEMH) [77] presents a combination with different
metaheuristics, each one integrated to other to improve the search capabilities. The hybridization
of greedy randomized adaptive search procedure (GRASP) with data mining (DM-GRASP) [78]
is directed to a primary set with high quality solutions discrete along the Pareto front within the
framework of MOEA/D.
To handle dynamic MOPs, a new co-evolutionary algorithm (COEA) is suggested [79]. It
hybridizes competitive and supportive procedures experiential in world to pathway the Pareto
front in a dynamic environment. The main conception of the competitive-cooperative coevolution is to permit the decomposition process of the optimization quandary to habituate and
emerge rather than being hand-designed and fine-tuned at the commencement of the evolutionary
optimization procedures. Non-dominated arranging Genetic Algorithm-II (NSGA-II) [80] is
utilized to work out the befitting cluster centers and the corresponding partition matrix.
2.4.11. Applications
As MOEAs recognition is rapidly grown, as successful and robust multiobjective optimizers and
researchers from various fields of science and engineering have been applying MOEAs to solve
optimization issues arising in their own domains. The domains where the MOEAs optimization
methods [81] applied are Scheduling Heuristics, Data Mining, Assignment and Management,
Networking, Circuits and communications, Bioinformatics, Control systems and robotics, Pattern
recognition and Image Processing, Artificial Neural Networks and Fuzzy systems, Manufacturing,
Composite Component design, Traffic engineering and transportation, Life sciences, embedded
systems, Routing protocols, algorithm design, Web-site and Financial optimization. Few Prons
and Corns that are faced by MOEAs while solving the optimization methods are:
a. The problem has multiple, possibly incommensurable, objectives.
b. The Computational time for each evaluation is in minutes or hours
c. Parallelism is not encouraged
69
14. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
d. The total number of evaluations is limited by financial, time, or resource constraints.
e. Noise is low since repeated evaluations yield very similar results
f. The overall decline in cost accomplished is high.
3. DATA CLUSTERING
Data is usually in the raw form, Information is processed into data i.e. a semantic connection gives the
data a meaning. Records called data items are expressed as tuples of numerical/categorical
values; each value in the tuple indicates the observed value of a feature. The features in a data set
are also called attributes or variables. Information can be automatically extracted by searching
patterns in the data. The process of detecting patterns in data is called learning from data.
Extorting implicit, previously unknown and impending useful information from data comprise the
substance of the Data Mining. The algorithmic framework providing automatic support for data
mining is generally called Machine Learning [83].
Association rule mining aims at detecting any association among the features. Classification aims
at predicting the value of a nominal feature called the class variable. Classification is called
supervised learning because the learning scheme is presented with a set of classified examples
from which it is expected to learn a way of classifying unseen examples. Data Clustering is
executed in two different modes called crisp and fuzzy. Classification is overseen as a supervised
learning because the learning scheme is presented with a set of classified examples from which it
is anticipated to discover a way of classifying unseen examples. In crisp clustering, the clusters
are disjoint and non-overlapping; any sample may fit in to one and only one category. In fuzzy
clustering, a sample may belong to more than one category with a certain fuzzy membership
ranking [84].
Figure 4. Delineate of Clustering
Cluster analysis includes a series of steps, ranging from preprocessing i.e. feature selection,
algorithm development, cluster validity, evaluation and interpretation. Each of them is closely
associated to each other and wield huge confront to the methodical disciplines in mining
knowledge from datasets [85], [86]. The clustering delineate is exemplified in Figure 4. At both
preprocessing and post-processing stage, feature selection/extraction and cluster validation are
important in any clustering algorithms. Preferring suitable and significant features can
significantly lessen the difficulty of consequent blueprint and result assessments to reveal the
degree of assurance to which it can rely on the generated clusters. However, both processes lack
universal guidance. Eventually, the substitution of different evaluations and technique is still
reliant on the applications themselves [85].
70
15. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
Clustering is ubiquitous, and wealth of clustering algorithms that has been developed to solve
different problems in specific fields. There are no intensely wide-ranging encroachments in
clustering that can be globally used to solve all problems [87]. In general, algorithms are
premeditated with certain supposition and favor some type of prejudice. Selection of initial seeds
greatly affects the quality of the clusters and in k-means type algorithms. Most of the seed
selection methods consequence different results in different independent runs. It was proven in
[88] by K Karteeka Pavan et.al. a single optimal, outlier insensitive seed selection algorithm for
k-means algorithms (SPSS) as extension to k-means++, resulted effectiveness in clustering when
demonstrated on synthetic, real and on microarray data sets.
Clustering is fully functional in an extensive diversity of fields, ranging from engineering,
machine learning, artificial intelligence, pattern recognition, computer sciences, life and medical
sciences, earth sciences, social sciences, etc. [90]. The sketch out of unsupervised predictive
learning is as follows: Distance and Similarity Measures [68], Hierarchical [69] (AgglomerativeBIRCH, CURE, ROCK, Divisive - DIANA, MONA), Squared Error [91] (Vector Quantization,
K-Means, GKA,PAM), PDF Estimation via Mixture Densities [92] (GMDD), Graph Theory [93]
(Chameleon, DTG, HCS, CLICK, CAST), Combinatorial Search Techniques [94] (GGA, TS and
SA clustering), Fuzzy [95] (FCM, MM, PCM, FCS), Neural Networks [96] [97] (LVQ, SOFM,
ART, SART, HEC, SPLL), Kernel [98] (SVC), Sequential Data [99] (sequence clustering,
Statistical sequence clustering), Large-Scale Data [100] (CLARA, CURE, CLARANS, BIRCH,
DBSCAN, DENCLUE, Wave Cluster, FC, ART), Data visualization and High-dimensional Data
[101] (PCA, ICA, LLE, CLIQUE, OptiGrid, ORCLUS).
Figure 5. Year- wise distribution of MOEA publications
MOEA predicated soft subspace clustering (MOSSC) is directed to popular elitist multiobjective
genetic algorithm NSGA-II. MOSSC [14] can concurrently optimize the weighting within-cluster
compression and weighting between-cluster partition integrated within two different clustering
validity criteria. Co-clustering [32] endeavor at clustering both the samples and the characteristics
concurrently to recognize hidden impede organization implanted into the data matrix. Clustering
ensembles have materialized as a high-flying procedure for advancing robustness, steadiness and
accuracy of unsupervised classification solutions [102]. The concurrency of clustering algorithms
is foreordained, implemented and applied to numerous applications. Majority of the clustering
task needs iterative procedures to find locally or globally optimal solutions from high-dimensional
data sets.
The research on MOEA has its origin in 1967, but a remarkable amount of publications were not
endorsed until 1990’s, from then onwards, the investigation on MOEA equipped drastically.
Figure 5 and Figure 6 attempts to envision the overall journeying on MOEA that is listed in
71
16. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
EMOO statistical Repository [103] and maintained by Professor CA Coello. By March 2013,
more than 8109 publications have been published in MOEA including data clustering [103].
Fig.6 envisages MOEAs publications into categories, the distribution includes 44% journal
papers, 39% conference papers, 10% in-collection literature, 4% published Ph.D Thesis, and 1%
of Master Thesis, Technical Reports and Books on MOEA. Fig. 5 envisages year-wise
distribution of MOEAs publications in all categories. The first cohort of MOEA flanked by the
years 1993-1999 was exemplifying by the minimalism of the algorithm. The second cohort
sandwiched between the years 2000-2009 concerned about MOEAs competence both at
algorithmic and data structure level. The third cohort is on track since the year 2010, addressing
the bio inspired metaheuristic algorithms, elitism, Pareto ranking schemes, etc. The average
numbers of publications that have been published in first, second and third cohort were 107, 483,
629 respectively. Hence, it is pragmatically an optimistic inclination that MOEA is still in
premature arena and gigantic volumes of novel research algorithms in MOEA and data clustering
are yet to be published.
Figure 6. Distribution of MOEA publications by categories
This study focused on clustering algorithms developed with multiobjective metaheuristic methods
embedded with the flavor of evolutionary algorithms. The validity indices determine the optimal
number of clusters and corresponds the best cluster structure, whether the numbers of their cluster
centers are same or not [89]. Numerous sprints with distinct initializations or factors capitulate
multiple solutions. The paramount of these solutions is often culled by cluster validity measures.
4. CONCLUSION AND FUTURE RESEARCH DIRECTIONS
The present survey endeavors to provide a wide-ranging impression of work that has been done in
the last three decades in MOPs, MOEAs in line with data clustering. The survey includes basic
historical framework of MOEAs, algorithms, methods to maintain diversity, advances in MOEA
designs, MOEAs for complicated MOPs, benchmark problems, methodological issues,
applications and research inclinations.
An imperative surveillance was success of most MOEAs depends on the careful balance of two
conflicting goals, exploration (searching new Pareto Optimal Solution) and exploitation (refining
the obtained Pareto Solutions). EAs are easy to portray and execute, but rigid to analyze
hypothetically. In spite of much experiential acquaintance and successful application, only little
theoretical fallout pertaining to their effectiveness and competence are available.
Optimization based data clustering methods wholly rely on a cluster validity indices, the best of
which appears as a substitute for the unidentified “correct classification” in a previously
72
17. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
unhandled dataset. Incipient technology has engendered more intricate and exigent tasks,
requiring more potent powerful clustering algorithms.
Evolutionary MOPs are still in early stages, and several research issues that remain yet to be
solved are:
a. Rationalizing of MOEAs convention in real world problems
b. The stopping conditions of MOEAs, since it is not conspicuous to estimate when the
population has reached a point from which no additional enhancement can be accomplish.
c. Identifying the paramount solution from Pareto Optimal Set
d. Hybridization of MOEAs
e. The influence of mating restrictions on MOEAs
f. Effectiveness and robustness in searching for a set of global trade-off solutions.
Although a lot of research work has been done in multibojective metaheuristics optimization in
the recent years, still the theoretical fragment of MOEAs encroachment is not so much
browbeaten. Inconceivable examination on different fitness assignment methods, validities,
pattern distinctiveness, performance metrics blending with different selection schemes of EAs and
data clustering are yet to be investigated.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
A. Mackworth, R. Goebel, Computational Intelligence: A Logical Approach by David. Oxford
University Press. ISBN 0-19-510270-3.
R. Kicinger, T. Arciszewski, K.A. De Jong, (2005) "Evolutionary computation and structural design: a
survey of the state of the art", Computers & Structures, Vol. 23-24, pp. 1943-1978.
H.P. Schwefel, (1977) “Numerische Optimierung von Computer-modellen mittels der
Evolutionsstrategie”, Basel: Birkhaeuser Verlag.
D.E. Goldberg, (1989) Genetic Algorithms in Search, Optimization and Machine Learning, AddisonWesley Publishing Company, Reading, Massachusetts.
H. C. Martin Law, P. Alexander, K. A. Jain, (2004) “Multiobjective Data Clustering”, IEEE
Computer Society Conference on Computer Vision and Pattern Recognition.
R.T. Marler, J.S. Arora, (2004) “Survey of multi-objective optimization methods for engineering”,
structured multidisc optimum, Vol. 26, pp. 369–395, DOI 10.1007/s00158-003-0368-6.
V. Pareto, (1906) “Manuale di Economia Politica. Societ`a Editrice Libraria”, Milan.
A. K. Jain, M. N. Murty, P. J. Flynn, (1999) “Data clustering: a review”. ACM Computer. Survey,
Vol. 31, No. 3, pp.264–323.
Z. Ya-Ping, S. Ji-Zhao, Z. Yi,Z. Xu, (2004) “Parallel implementation of CLARANS using PVM.
Machine Learning and Cybernetics”, Proceedings of 2004 International Conference on Cybernetics,
Vol. 3, pp. 1646–1649.
I.H. Osman, G. Laporte, ( 1996) “Metaheuristics: a bibliography”, Annals of Operations Research,
Vol. 6, pp. 513–623.
R.V. Rao, V.J. Savsani, D.P. Vakharia, (2012) “Teaching–learning-based optimization: an
optimization method for continuous non-linear large scale problems”, Information Sciences, Vol. 183,
pp. 1–15.
X. Rui, D. Wunsch II, (2005) “ Survey of Clustering Algorithms”, IEEE Transactions On Neural
Networks, Vol. 16, No. 3.
A. Fred, A. K. Jain, (2002) "Evidence accumulation clustering based on the k-means algorithm."
Structural, syntactic, and statistical pattern recognition. Springer Berlin Heidelberg, Vol. 23, pp. 442451.
C.A. Coello Coello, (1996) “An Empirical Study of Evolutionary Techniques for Multiobjective
Optimization in Engineering Design”. Ph.D. Thesis, Department of Computer Science, Tulane
University, New Orleans, LA.
C. Dimopoulos, (2006) “Multi-objective optimization of manufacturing cell design”. International
Journal of Production Research, Vol. 44, No. 22, pp. 4855-4875.
73
18. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
[16] K. Deb, (2001) Multi-objective Optimization Using Evolutionary Algorithms, John Wiley and Sons
Ltd, England.
[17] Q. Zhang, H. Li, (1997) “MOEA/D: a Multiobjective evolutionary algorithm based on
decomposition”, IEEE Transactions on Evolutionary Computation, Vol. 11, No. 6, pp. 712–73.
[18] R. Kumar, P. Rockett, (2002) “Improved Sampling of the Pareto-Front in Multiobjective Genetic
Optimizations by Steady-State Evolution: A Pareto Converging Genetic Algorithm”, Evolutionary
Computation, Vol. 10, No. 3, pp. 283-314.
[19] T. Ghosha, S. Senguptaa, M. Chattopadhyay, P. K. Dana, (2011) “Meta-heuristics in cellular
manufacturing: A state-of-the-art review”, International Journal of Industrial Engineering
Computations, Vol. 2, pp. 87–122.
[20] F. Glover, M. Laguna, “Tabu Search”, Kluwer Academic Publishers, Norwell, MA, USA, 1997.
[21] Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, (1953) “Equation of State
Calculations by Fast Computing Machines”, J. Chem. Phys. Vol. 21, No. 6, pp. 1087-1092.
[22] Kirkpatrick, E. Aarts, J. Korst, “Simulated Annealing and the Boltzmann Machine”. John Wiley &
Sons, New York, USA.
[23] C. Darwin, (1929) The origin of species by means of Natural selection or the preservation of avowed
races in the struggle for life, The book League of America, Originally published in 1859.
[24] R.A. Fisher, (1930) The genetical theory of natural selection, Oxford: Clarendon Press.
[25] T. Back, D. B. Fogel, Z. Michalewicz, (1997) Handbook of Evolutionary Computation, U.K., Oxford
University Press.
[26] L. J. Fogel, A. J. Owens, M. J. Walsh, (1966) Artificial Intelligence through Simulated Evolution,
Wiley, New York.
[27] D. Karaboga, (2005) “An Idea Based on Honey Bee Swarm for Numerical Optimization, Technical
Report-TR06”, Erciyes University, Engineering Faculty, Computer Engineering Department.
[28] A. Muruganandam, G. Prabhaharan, P. Asokan, V. Baskaran, (2005) “A memetic algorithm approach
to the cell formation problem”, International Journal of Advanced Manufacturing Technology, Vol.
25, pp. 988–997.
[29] M.G. Epitropakis, (2011) “Proximity-Based Mutation Operators”, IEEE Transactions on Evolutionary
Computation, Vol. 15, No. 1, pp 99-119.
[30] D.E. Goldberg, (1989) Genetic Algorithms in Search Optimization and Machine Learning. Addison
Wesley.
[31] J. H. Holland, (1982) Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, MA.
[32] D. E. Goldberg, K. Swamy, (2010) Genetic Algorithms: The Design of Innovation, 2nd, Springer,
2010.
[33] I. Rechenberg, (1973) Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der
Biologischen Evolution, Frommann- Holzboog, Stuttgart.
[34] H.P. Schwefel, (1981) Numerical Optimization of Computer Models, Wiley, Chichester.
[35] L.J. Fogel, G.H. Burgin, (1969) “Competitive goal-seeking through evolutionary programming,” Air
Force Cambridge Research Laboratories.
[36] D. B. Fogel, (1995) Evolutionary Computation: Toward a New Philosophy of Machine Intelligence,
IEEE Press, Piscataway, NJ.
[37] G. Jones, Genetic and Evolutionary Algorithm, Technical Report, University of Sheffield, UK.
[38] D. E. Goldberg, K. Deb (1991) "A comparative analysis of selection schemes used in genetic
algorithms." Urbana 51.
[39] J. Holland, (1975) Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann
Arbor.
[40] J. Koza, (1992) Genetic programming: on the programming of computers by means of natural
selection, MIT Press, Cambridge.
[41] J. Liu, L. Tang, (1999)“A modified genetic algorithm for single machine scheduling”, Computers &
Industrial Engineering,Vol. 37, pp. 43–46.
[42] M. C. da Rosa Joel, (2009) “Genetic Programming and Boosting Technique to Improve Time Series
Forecasting, Evolutionary Computation” Wellington Pinheiro dos Santos, ISBN: 978-953-307-008-7.
[43] M. Dorigo, (1992) Optimization, Learning and Natural Algorithm”, Ph.D. Thesis, Politecnico di
Milano, Italy.
[44] M. Dorigo, T. Stutzle, (2004) Ant Colony Optimization. MIT Press, Cambridge, MA, USA.
[45] S Das, N. S. Ponnuthurai, (2011) “Differential Evolution: A Survey of the State-of-the-Art”, IEEE
Transactions On Evolutionary Computation, Vol. 15, No. 1.
74
19. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
[46] C.L. Sun, J.C. Zeng, J.S. Pan, (2011) “An improved vector particle swarm optimization for
constrained optimization problems”, Information Scieces, Vol. 181, pp. 1153–1163.
[47] J. Kennedy, R.C. Eberhart, (1995) “Particle swarm optimization”, in Proceedings of IEEE
international conference on neural networks, pp. 1942–194.
[48] J. David Schaffer, (1995) “Multiple Objective Optimization with Vector Evaluated Genetic
Algorithms”, Proceedings of the 1st International Conference on Genetic Algorithm, pp. 93-100,
ISBN:0-8058-0426-9.
[49] C. M. Fonseca, P. J. Fleming, (1993) “Genetic algorithms for multiobjective optimization:
Formulation, discussion and generalization”, in Proceedings of the Fifth International Conference on
Genetic Algorithms, S. Forrest, Ed. San Mateo, CA: Morgan Kauffman, pp. 416–423.
[50] D.T. Pham et.al. (2006) “The bees algorithm – A novel tool for complex optimization problems”,
Proceedings of IPROMS Conference, pp. 454– 461.
[51] A. Ahrari, A.A. Atai, (2010) “Grenade explosion method – a novel tool for optimization of
multimodal functions”, Applied Soft Computing, Vol. 10, pp. 132–1140.
[52] F.C. Yang, Y.P. Wang, (2007) “Water flow-like algorithm for object grouping problems”, Journal of
the Chinese Institute of Industrial Engineers, Vol. 24, No. 6, pp. 475–488.
[53] R. Storn, K. Price,(2011) “Differential evolution – a simple and efficient heuristic for global
optimization over continuous spaces”, Journal of Global Optimization, Vol. 11, pp. 341–359.
[54] J.D. Knowles, D.W. Corne, (2000) “Approximating the non-dominated front using the Pareto
archived evolution strategy”, Evolutionary Computing, Vol. 8, pp. 142-172.
[55] J.D. Farmer, N. Packard, A. Perelson, (1989) “The immune system, adaptation and machine learning”,
Physica D, Vol. 22, pp. 187–204.
[56] A.R. Yildiz, (2009) “A novel hybrid immune algorithm for global optimization in design and
manufacturing”, Robotics and Computer-Integrated Manufacturing, Vol. 25, pp. 261–270.
[57] K.M. Passino, (2002) “Biomimicry of bacterial foraging for distributed optimization and control”,
IEEE Control Systems Magazine, Vol. 22, pp. 52–67.
[58] M. Eusuff, E. Lansey, (2003) “Optimization of water distribution network design using the shuffled
frog leaping algorithm”, Journal of Water Resources Planning and Management ASCE, Vol. 129, pp.
210–225.
[59] D. Simon, (2008) “Biogeography-based optimization”, IEEE Transactions on Evolutionary
Computation, Vol. 12, 702–713.
[60] E. Rashedi, H.N. Pour, S. Saryazdi, (2009) “GSA: a gravitational search algorithm”, Information
Sciences, Vol. 179, 2232–2248.
[61] J. Gareth, Genetic and Evolutionary Algorithms, Technical Report, University of Sheffield, UK.
[62] A. Lara, G. Sanchez, C.A. Coello Coello, O. Schutze, (2010) “HCS: a new local search strategy for
memetic multiobjective evolutionary algorithms”, IEEE Transactions on Evolutionary Computation,
Vol. 14, No. 1, pp. 112–132.
[63] H. Ishibuchi, T. Murata, (1998) “Multi-Objective Genetic Local Search Algorithm and Its Application
to Flowshop Scheduling”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 28, No. 3, 392–
403.
[64] S. Bandyopadhyay, U. Maulik, A. Mukhopadhyay, “Multiobjective Genetic Clustering for Pixel
Classification in Remote Sensing Imagery”, IEEE Transactions on Evolutionary Computation.
[65] A. Bill, Xi. Wang, M. Schroeder, (2009) “A roadmap of clustering algorithms: finding a match for a
biomedical application”. Brief Bioinform.
[66] J. Horn, N. Nafploitis, D. E. Goldberg, (1994) “A niched Pareto genetic algorithm for multiobjective
optimization” in Proceedings of the First IEEE Conference on Evolutionary Computation, Z.
Michalewicz, Ed. Piscataway, NJ: IEEE Press, pp. 82–87.
[67] E. Zitzler, L. Thiele, (1998) “Multiobjective optimization using evolutionary algorithms—A
comparative case study” in Parallel Problem Solving From Nature, Germany: Springer-Verlag, pp.
292–301.
[68] B. Scholkopf, A. Smola, (2002) “Learning with Kernels: Support Vector Machines, Regularization,
Optimization, and Beyond”, Cambridge, MA: MIT Press.
[69] N. Srinivas, K. Deb, (1995) “Multiobjective function optimization using nondominated sorting genetic
algorithms,” Evolutinary Computing, Vol. 2, No.3, pp. 221–248.
[70] J. T. Tou, R. C. Gonzalez, (1974) Pattern Recognition Principles Addison-Wesley.
[71] E. Zitzler, K. Deb, L. Thiele, (2000) “Comparison of multiobjective evolutionary algorithms:
Empirical results”, Evolutionary Computing, Vol. 8, pp. 173-195.
75
20. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
[72] F.A. Kursawe, (1990) “Variant of evolution strategies for vector optimization”, in Parallel Problem
Solving from Nature”, H.-P. Schwefel and R. Manner, Eds., Berlin, Germany: Springer, pp. 193-197.
[73] P. Hajela, J. Lee, “Constrained genetic search via search adaptation. An Immune network solution”,
Structural optimization, Vol. 13, pp. 111-115.
[74] M. Luamanns, G. Rudolph, (1998) “A Spatial Predator-Prey approach to Multiobjective optimization
a preliminary Study”, in Proceedings on the Parallel Problem solutions from nature, pp. 241-249.
[75] G. Rudolph, (1998) “Evolutionary search for minimal elements on partially ordered finite sets”,
Evolutionary Programming, Vol. 8, pp. 345-353.
[76] H. Ishibuchi, T. Murata, (1998) “A multiobjective genetic local search algorithm and its application to
flow shop scheduling”, IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications
and Reviews, Vol. 28 No. 3, pp. 392–403.
[77] A. Kafafy, A. Bounekkar, S. Bonnevay, (2011) “A Hybrid Evolutionary Metaheuristics (HEMH)
applied on 0/1 Multiobjective Knapsack Problems”, in Proceedings of the 13th annual conference on
Genetic and evolutionary computation, GECCO ’11, ACM, New York, NY, USA, pp. 497–504.
[78] M. H. Ribeiro, A. Plastino, S. L. Martins, (2006) “Hybridization of GRASP Metaheuristic with Data
Mining Techniques”, J. Math. Model. Algorithms, Vol. 5, No. 1, pp. 23–41.
[79]C. K. Goh, K. C. Tan, (2009) “A Competitive-Cooperative Coevolutionary Paradigm for Dynamic
Multiobjective Optimization”, Trans. Evol. Comp, Vol. 13, pp. 103–127.
[80] M. H. Ribeiro, A. Plastino, S. L. Martins, (2005) “A Hybrid GRASP with Data Mining for the
Maximum Diversity Problem”, Second International Workshop on Hybrid Metaheuristics 2005,
Universitat Politècnica de Catalunya, Barcelona, Spain, August 29-30, 2005.
[81] H. Ishibuchi, T. Yoshida, T. Murata, (2003) “Balance between Genetic Search and Local Search in
Memetic Algorithms for Multiobjective Permutation Flowshop Scheduling”, IEEE Transactions on
Evolutionary Computation, Vol. 7 , No. 2, pp. 204–233.
[82] V. Cherkassky, F. Mulier, (1998) Learning From Data: Concepts, Theory, and Methods, Wiley, New
York, 1998.
[83] B. Everitt, S. Landau, M. Leese, (2001) Cluster Analysis, Arnold, London, 2001.
[84] A. Abraham, L. C. Jain, R. Goldberg, (2005) Evolutionary Multiobjective Optimization: Theoretical
Advances and Applications, Springer Verlag, London.
[85] E. R. Hruschka, J. G. B. Campello, A. A. Freitas, “A Survey of Evolutionary Algorithms for
Clustering”, IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and
Reviews.
[86] K. Ramachandra Rao et.al, (2012) “Unsupervised Classification of Uncertain Data Objects in Spatial
Databases Using Computational Geometry and Indexing Techniques”, International Journal of
Engineering Research and Applications, Vol. 2, No. 2, pp. 806-814. ISSN No. 2248-9622,
http://www.ijera.com/papers/Vol2_issue2/ EG22806814.pdf.
[87] M. Sarkar, B. Yegnanarayana, D. Khemani, (1997) “A clustering algorithm using an evolutionary
programming-based approach”, Pattern Recognition Letters, Vol. 18, pp. 975-986.
[88] K. Karteeka Pavan, A.V. Dattatreya Rao, A. Appa Rao, (2011) “Robust seed selection algorithm for
k-means type algorithms” , International Journal of Computer Science & Information Technology
(IJCSIT), Vol. 3, No. 5, DOI: 10.5121/ijcsit.2011.3513.
[89] H. Dong, Y. Dong, C. Zhou, et.al, (2009) “Fuzzy clustering algorithm based on evolutionary
programming”, Expert Systems with Applications, doi:10.1016/j.eswa.2009.04.031.
[90] Arnold. Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2001) Hierarchical clustering. Cluster
Analysis, 5th Edition, 71-110.
[91] L. Kaufman, P. Rousseeuw, (1990) Finding Groups in Data: An Introduction to Cluster Analysis,
Wiley.
[92] D. Sankoff, J. Kruskal, (1999) “Time Warps, String Edits, and Macromolecules: The Theory and
Practice of Sequence Comparison”, CA, CSLI Publications.
[93] V. Guralnik, G. Karypis, (2001) “A scalable algorithm for clustering sequential data” in Proceedings
of 1st IEEE Int. Conf. Data Mining (ICDM’01), pp. 179–186.
[94] D. Gusfield, (1997) Algorithms on Strings, Trees, and Sequences: Computer Science and
Computational Biology, Cambridge, U.K, Cambridge Univ. Press.
[95] F. Höppner, F. Klawonn, R. Kruse, (1999) Fuzzy Cluster Analysis: Methods for Classification, Data
Analysis, and Image Recognition, New York: Wiley.
[96] T. Kohonen, (1990) “The self-organizing map” Proc. IEEE, Vol. 78, No. 9, September, pp. 1464–
1480.
76
21. International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 5, October 2013
[97] A. Baraldi, E. Alpaydin, (2002) “Constructive feed forward ART clustering networks—Part I and II”,
IEEE Trans. Neural Network, Vol. 13, No. 3, May, pp. 645–677.
[98] K. Müller, S. Mika, et.al. (2001) “An introduction to kernel-based learning algorithms” IEEE Trans.
Neural Network, Vol. 12, No. 2, pp. 181–201.
[99] V. Vapnik, (1998) Statistical Learning Theory, Wiley, New York.
[100] A. Jain, R. Dubes, (1988) Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice-Hall, 1988.
[101] R. Sun, C. Giles, (2000) “Sequence learning: Paradigms, algorithms, and applications” in Proceeding
of LNAI 1828, Berlin, Germany.
[102] A. Garg, A. Mangla, N. Gupta, V. Bhatnagar, (2006) “PBIRCH: A Scalable Parallel Clustering
algorithm for Incremental Data”, Database Engineering and Applications Symposium, International,
2006.
[103] C.A. Coello, http://delta.cs.cinvestav.mx/~ccoello/EMOO, Cinvesav-ipn, Mexico.
Authors
Ramachandra Rao Kurada is currently working as Asst. Prof. in Department of Computer Applications at
Shri Vishnu Engineering College for Women, Bhimavaram. He has 12 years of teaching experience and is a
part-time Research Scholar in Dept. of CSE, ANU, Guntur under the guidance of Dr. K Karteeka Pavan. He
is a life member of ISTE and CSI. His research interests are Computational Intelligence, Data warehouse &
Mining, Networking and Securities.
Dr. K. Karteeka Pavan has received her PhD in Computer Science & Engg. from Acharya Nagarjuna
University in 2011. Earlier she had received her postgraduate degree in Computer Applications from
Andhra University in 1996. She is having 16 years of teaching experience and currently she is working as
Professor in Department of Information Technology of RVR &JC College of Engineering, Guntur. She has
published more than 20 research publications in various International Journals and Conferences. Her
research interest includes Soft Computing, Bioinformatics, Data mining, and Pattern Recognition.
Dr. A.V.Dattatreya Rao has received PhD in Statistics from Acharya Nagarjuna University in 1987.
Currently he is Principal, Acharya Nagarjuna University, Guntur. Earlier he was Professor in Department of
Statistics, Acharya Nagarjuna University. He has published more than 45 research publications in various
National, Inter National Journals. He has successfully guided 3 Ph.Ds and 5 M.Phils. His research interest
includes Estimation Theory, Statistical Pattern Recognition, Directional Data Analysis and Image Analysis.
He is a life member of professional societies like Indian Statistical Association, Society for Development of
Statistics, Andhra Pradesh Mathematical society and a life member in ISPS. He acted as a chairperson for
various conferences.
77