A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Editor IJCATR
Grid computing is a hardware and software infrastructure and provides affordable, sustainable, and reliable access. Its aim is
to create a supercomputer using free resources. One of the challenges to the Grid computing is scheduling problem which is regarded
as a tough issue. Since scheduling problem is a non-deterministic issue in the Grid, deterministic algorithms cannot be used to improve
scheduling. In this paper, a combination of imperialist competition algorithm (ICA) and gravitational attraction is used for to address the
problem of independent task scheduling in a grid environment, with the aim of reducing the makespan and energy. Experimental results
compare ICA with other algorithms and illustrate that ICA finds a shorter makespan and energy relative to the others. Moreover, it
converges quickly, finding its optimum solution in less time than the other algorithms.
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
Clustering mixed type data is one of the major research topics in the area of data mining. In
this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution
centroid is used to represent the prototype of categorical variables in a cluster which is then combined
with the mean to represent the prototype of clusters with mixed type variables. In the method, data is
observed from different views and the variables are grouped into different views. Those instances that
can be viewed differently from different viewpoints can be defined as multiview data. During clustering
process the differences among views are ignored in usual cases. Here, both views and variables weights
are computed simultaneously. The view weight is used to determine the closeness or density of view and
variable weight is used to identify the significance of each variable. With the intention of determining
the cluster of objects both these weights are used in the distance function. In the proposed method,
enhancement to the k-prototypes is done so that it automatically computes both view and variable
weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering
algorithms.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Editor IJCATR
Grid computing is a hardware and software infrastructure and provides affordable, sustainable, and reliable access. Its aim is
to create a supercomputer using free resources. One of the challenges to the Grid computing is scheduling problem which is regarded
as a tough issue. Since scheduling problem is a non-deterministic issue in the Grid, deterministic algorithms cannot be used to improve
scheduling. In this paper, a combination of imperialist competition algorithm (ICA) and gravitational attraction is used for to address the
problem of independent task scheduling in a grid environment, with the aim of reducing the makespan and energy. Experimental results
compare ICA with other algorithms and illustrate that ICA finds a shorter makespan and energy relative to the others. Moreover, it
converges quickly, finding its optimum solution in less time than the other algorithms.
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
Clustering mixed type data is one of the major research topics in the area of data mining. In
this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution
centroid is used to represent the prototype of categorical variables in a cluster which is then combined
with the mean to represent the prototype of clusters with mixed type variables. In the method, data is
observed from different views and the variables are grouped into different views. Those instances that
can be viewed differently from different viewpoints can be defined as multiview data. During clustering
process the differences among views are ignored in usual cases. Here, both views and variables weights
are computed simultaneously. The view weight is used to determine the closeness or density of view and
variable weight is used to identify the significance of each variable. With the intention of determining
the cluster of objects both these weights are used in the distance function. In the proposed method,
enhancement to the k-prototypes is done so that it automatically computes both view and variable
weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering
algorithms.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Accelerating materials property predictions using machine learningGhanshyam Pilania
The materials discovery process can be significantly expedited and simplified if we can learn effectively from available knowledge and data. In the present contribution, we show that efficient and accurate prediction of a diverse set of properties of material systems is possible by employing machine (or statistical) learning
methods trained on quantum mechanical computations in combination with the notions of chemical similarity. Using a family of one-dimensional chain systems, we present a general formalism that allows us to discover decision rules that establish a mapping between easily accessible attributes of a system and its properties. It is shown that fingerprints based on either chemo-structural (compositional and configurational information) or the electronic charge density distribution can be used to make ultra-fast, yet accurate, property predictions. Harnessing such learning paradigms extends recent efforts to systematically explore and mine vast chemical spaces, and can significantly accelerate the discovery of new application-specific materials.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...IJECEIAES
This paper proposes and examines the performance of a hybrid model called the wavelet radial bases function neural networks (WRBFNN). The model will be compared its performance with the wavelet feed forward neural networks (WFFN model by developing a prediction or forecasting system that considers two types of input formats: input9 and input17, and also considers 4 types of non-stationary time series data. The MODWT transform is used to generate wavelet and smooth coefficients, in which several elements of both coefficients are chosen in a particular way to serve as inputs to the NN model in both RBFNN and FFNN models. The performance of both WRBFNN and WFFNN models is evaluated by using MAPE and MSE value indicators, while the computation process of the two models is compared using two indicators, many epoch, and length of training. In stationary benchmark data, all models have a performance with very high accuracy. The WRBFNN9 model is the most superior model in nonstationary data containing linear trend elements, while the WFFNN17 model performs best on non-stationary data with the non-linear trend and seasonal elements. In terms of speed in computing, the WRBFNN model is superior with a much smaller number of epochs and much shorter training time.
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
Job Scheduling on the Grid Environment using Max-Min Firefly AlgorithmEditor IJCATR
Grid computing indeed is the next generation of distributed systems and its goals is creating a powerful virtual, great, and
autonomous computer that is created using countless Heterogeneous resource with the purpose of sharing resources. Scheduling is one
of the main steps to exploit the capabilities of emerging computing systems such as the grid. Scheduling of the jobs in computational
grids due to Heterogeneous resources is known as an NP-Complete problem. Grid resources belong to different management domains
and each applies different management policies. Since the nature of the grid is Heterogeneous and dynamic, techniques used in
traditional systems cannot be applied to grid scheduling, therefore new methods must be found. This paper proposes a new algorithm
which combines the firefly algorithm with the Max-Min algorithm for scheduling of jobs on the grid. The firefly algorithm is a new
technique based on the swarm behavior that is inspired by social behavior of fireflies in nature. Fireflies move in the search space of
problem to find the optimal or near-optimal solutions. Minimization of the makespan and flowtime of completing jobs simultaneously
are the goals of this paper. Experiments and simulation results show that the proposed method has a better efficiency than other
compared algorithms.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
A Study of Firefly Algorithm and its Application in Non-Linear Dynamic Systemsijtsrd
Firefly Algorithm (FA) is a newly proposed computation technique with inherent parallelism, capable for local as well as global search, meta-heuristic and robust in computing process. In this paper, Firefly Algorithm for Dynamic System (FADS) is a proposed system to find instantaneous behavior of the dynamic system within a single framework based on the idealized behavior of the flashing characteristics of fireflies. Dynamic system where flows of mass and / or energy is cause of dynamicity is generally represented as a set of differential equations and Fourth Order Runge-Kutta (RK4) method is one of used tool for numerical measurement of instantaneous behaviours of dynamic system. In FADS, experimental results are demonstrating the existence of more accurate and effective RK4 technique for the study of dynamic system. Gautam Mahapatra | Srijita Mahapatra | Soumya Banerjee"A Study of Firefly Algorithm and its Application in Non-Linear Dynamic Systems" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-2 , February 2018, URL: http://www.ijtsrd.com/papers/ijtsrd8393.pdf http://www.ijtsrd.com/computer-science/artificial-intelligence/8393/a-study-of-firefly-algorithm-and-its-application-in-non-linear-dynamic-systems/gautam-mahapatra
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
Image segmentation takes a major role for analyzing the area of interest in image processing. Many researchers have used different types of techniques for analyzing the image. One of the widely used techniques is K-means clustering. In this paper we use two algorithms K-means and the advance of K-means is called as adaptive K-means clustering. Both the algorithms are using in different types of image and got a successful result. By comparing the Time period, PSNR and RMSE value from the result of both algorithms we prove that the Adaptive K-means clustering algorithm gives a best result as compard to K-means clustering in image segmentation.
Proposing a scheduling algorithm to balance the time and cost using a genetic...Editor IJCATR
Grid computing is a hardware and software infrastructure and provides affordable, sustainable, and reliable access. Its aim is
to create a supercomputer using free resources. One of the challenges to the Grid computing is scheduling problem which is regarded
as a tough issue. Since scheduling problem is a non-deterministic issue in the Grid, deterministic algorithms cannot be used to improve
scheduling.
In this paper, a combination of genetic algorithms and binary gravitational attraction is used for scheduling problem solving, where the
reduction in the duty performance timing and cost-effective use of simultaneous resources are investigated. In this case, the user
determines the execution time parameter and cost-effective use of resources. In this algorithm, a new approach that has led to a
balanced load of resources is used in the selection of resources. Experimental results reveals that our proposed algorithm in terms of
cost-time and selection of the best resource has reached better results than other algorithm.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Accelerating materials property predictions using machine learningGhanshyam Pilania
The materials discovery process can be significantly expedited and simplified if we can learn effectively from available knowledge and data. In the present contribution, we show that efficient and accurate prediction of a diverse set of properties of material systems is possible by employing machine (or statistical) learning
methods trained on quantum mechanical computations in combination with the notions of chemical similarity. Using a family of one-dimensional chain systems, we present a general formalism that allows us to discover decision rules that establish a mapping between easily accessible attributes of a system and its properties. It is shown that fingerprints based on either chemo-structural (compositional and configurational information) or the electronic charge density distribution can be used to make ultra-fast, yet accurate, property predictions. Harnessing such learning paradigms extends recent efforts to systematically explore and mine vast chemical spaces, and can significantly accelerate the discovery of new application-specific materials.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...IJECEIAES
This paper proposes and examines the performance of a hybrid model called the wavelet radial bases function neural networks (WRBFNN). The model will be compared its performance with the wavelet feed forward neural networks (WFFN model by developing a prediction or forecasting system that considers two types of input formats: input9 and input17, and also considers 4 types of non-stationary time series data. The MODWT transform is used to generate wavelet and smooth coefficients, in which several elements of both coefficients are chosen in a particular way to serve as inputs to the NN model in both RBFNN and FFNN models. The performance of both WRBFNN and WFFNN models is evaluated by using MAPE and MSE value indicators, while the computation process of the two models is compared using two indicators, many epoch, and length of training. In stationary benchmark data, all models have a performance with very high accuracy. The WRBFNN9 model is the most superior model in nonstationary data containing linear trend elements, while the WFFNN17 model performs best on non-stationary data with the non-linear trend and seasonal elements. In terms of speed in computing, the WRBFNN model is superior with a much smaller number of epochs and much shorter training time.
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
Job Scheduling on the Grid Environment using Max-Min Firefly AlgorithmEditor IJCATR
Grid computing indeed is the next generation of distributed systems and its goals is creating a powerful virtual, great, and
autonomous computer that is created using countless Heterogeneous resource with the purpose of sharing resources. Scheduling is one
of the main steps to exploit the capabilities of emerging computing systems such as the grid. Scheduling of the jobs in computational
grids due to Heterogeneous resources is known as an NP-Complete problem. Grid resources belong to different management domains
and each applies different management policies. Since the nature of the grid is Heterogeneous and dynamic, techniques used in
traditional systems cannot be applied to grid scheduling, therefore new methods must be found. This paper proposes a new algorithm
which combines the firefly algorithm with the Max-Min algorithm for scheduling of jobs on the grid. The firefly algorithm is a new
technique based on the swarm behavior that is inspired by social behavior of fireflies in nature. Fireflies move in the search space of
problem to find the optimal or near-optimal solutions. Minimization of the makespan and flowtime of completing jobs simultaneously
are the goals of this paper. Experiments and simulation results show that the proposed method has a better efficiency than other
compared algorithms.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
A Study of Firefly Algorithm and its Application in Non-Linear Dynamic Systemsijtsrd
Firefly Algorithm (FA) is a newly proposed computation technique with inherent parallelism, capable for local as well as global search, meta-heuristic and robust in computing process. In this paper, Firefly Algorithm for Dynamic System (FADS) is a proposed system to find instantaneous behavior of the dynamic system within a single framework based on the idealized behavior of the flashing characteristics of fireflies. Dynamic system where flows of mass and / or energy is cause of dynamicity is generally represented as a set of differential equations and Fourth Order Runge-Kutta (RK4) method is one of used tool for numerical measurement of instantaneous behaviours of dynamic system. In FADS, experimental results are demonstrating the existence of more accurate and effective RK4 technique for the study of dynamic system. Gautam Mahapatra | Srijita Mahapatra | Soumya Banerjee"A Study of Firefly Algorithm and its Application in Non-Linear Dynamic Systems" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-2 , February 2018, URL: http://www.ijtsrd.com/papers/ijtsrd8393.pdf http://www.ijtsrd.com/computer-science/artificial-intelligence/8393/a-study-of-firefly-algorithm-and-its-application-in-non-linear-dynamic-systems/gautam-mahapatra
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
Image segmentation takes a major role for analyzing the area of interest in image processing. Many researchers have used different types of techniques for analyzing the image. One of the widely used techniques is K-means clustering. In this paper we use two algorithms K-means and the advance of K-means is called as adaptive K-means clustering. Both the algorithms are using in different types of image and got a successful result. By comparing the Time period, PSNR and RMSE value from the result of both algorithms we prove that the Adaptive K-means clustering algorithm gives a best result as compard to K-means clustering in image segmentation.
Proposing a scheduling algorithm to balance the time and cost using a genetic...Editor IJCATR
Grid computing is a hardware and software infrastructure and provides affordable, sustainable, and reliable access. Its aim is
to create a supercomputer using free resources. One of the challenges to the Grid computing is scheduling problem which is regarded
as a tough issue. Since scheduling problem is a non-deterministic issue in the Grid, deterministic algorithms cannot be used to improve
scheduling.
In this paper, a combination of genetic algorithms and binary gravitational attraction is used for scheduling problem solving, where the
reduction in the duty performance timing and cost-effective use of simultaneous resources are investigated. In this case, the user
determines the execution time parameter and cost-effective use of resources. In this algorithm, a new approach that has led to a
balanced load of resources is used in the selection of resources. Experimental results reveals that our proposed algorithm in terms of
cost-time and selection of the best resource has reached better results than other algorithm.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
Quantum inspired evolutionary algorithm for solving multiple travelling sales...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Determining optimal location and size of capacitors in radial distribution ne...IJECEIAES
In this study, the problem of optimal capacitor location and size determination (OCLSD) in radial distribution networks for reducing losses is unraveled by moth swarm algorithm (MSA). MSA is one of the most powerful meta-heuristic algorithm that is taken from the inspiration of the food source finding behavior of moths. Four study cases of installing different numbers of capacitors in the 15-bus radial distribution test system including two, three, four and five capacitors areemployed to run the applied MSA for an investigation of behavior and assessment of performances. Power loss and the improvement of voltage profile obtained by MSA are compared with those fromother methods. As a result, it can be concluded that MSA can give a good truthful and effective solution method for OCLSD problem.
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
Nonlinear systems with uncertainty and disturbance are very difficult to model using mathematic approach. Therefore, a black-box modeling approach without any prior knowledge is necessary. There are some modeling approaches have been used to develop a black box model such as fuzzy logic, neural network, and evolution algorithms. In this paper, an evolutionary neural network by combining a neural network and a modified differential evolution algorithm is applied to model a nonlinear system. The feasibility and effectiveness of the proposed modeling are tested on a piezoelectric actuator SISO system and an experimental quadruple tank MIMO system.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
Data clustering is a process of arranging similar data into groups. A clustering algorithm
partitions a data set into several groups such that the similarity within a group is better than
among groups. In this paper a hybrid clustering algorithm based on K-mean and K-harmonic
mean (KHM) is described. The proposed algorithm is tested on five different datasets. The research is focused on fast and accurate clustering. Its performance is compared with the traditional K-means & KHM algorithm. The result obtained from proposed hybrid algorithm is much better than the traditional K-mean & KHM algorithm
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGcsandit
Algorithms based on minimum volume constraint or sum of squared distances constraint is
widely used in Hyperspectral image unmixing. However, there are few works about performing
comparison between these two algorithms. In this paper, comparison analysis between two
algorithms is presented to evaluate the performance of two constraints under different situations. Comparison is implemented from the following three aspects: flatness of simplex, initialization effects and robustness to noise. The analysis can provide a guideline on which constraint should be adopted under certain specific tasks.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
CHN and Swap Heuristic to Solve the Maximum Independent Set ProblemIJECEIAES
We describe a new approach to solve the problem to find the maximum independent set in a given Graph, known also as Max-Stable set problem (MSSP). In this paper, we show how Max-Stable problem can be reformulated into a linear problem under quadratic constraints, and then we resolve the QP result by a hybrid approach based Continuous Hopfeild Neural Network (CHN) and Local Search. In a manner that the solution given by the CHN will be the starting point of the local search. The new approach showed a good performance than the original one which executes a suite of CHN runs, at each execution a new leaner constraint is added into the resolved model. To prove the efficiency of our approach, we present some computational experiments of solving random generated problem and typical MSSP instances of real life problem.
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
Fitness function X-means for prolonging wireless sensor networks lifetimeIJECEIAES
X-means and k-means are clustering algorithms proposed as a solution for prolonging wireless sensor networks (WSN) lifetime. In general, X-means overcomes k-means limitations such as predetermined number of clusters. The main concept of X-means is to create a network with basic clusters called parents and then generate (j) number of children clusters by parents splitting. X-means did not provide any criteria for splitting parent’s clusters, nor does it provide a method to determine the acceptable number of children. This article proposes fitness function X-means (FFX-means) as an enhancement of X-means; FFX-means has a new method that determines if the parent clusters are worth splitting or not based on predefined network criteria, and later on it determines the number of children. Furthermore, FFX-means proposes a new cluster-heads selection method, where the cluster-head is selected based on the remaining energy of the node and the intra-cluster distance. The simulation results show that FFX-means extend network lifetime by 11.5% over X-means and 75.34% over k-means. Furthermore, the results show that FFX-means balance the node’s energy consumption, and nearly all nodes depleted their energy within an acceptable range of simulation rounds.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
MCCS
1. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
108
Malaysian Journal of Computer Science. Vol. 31(2), 2018
HYBRIDIZATION OF MAGNETIC CHARGE SYSTEM SEARCH METHOD FOR EFFICIENT DATA
CLUSTERING
Yugal Kumar1
and G. Sahoo2
1
Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat,
Himachal Pardesh, India
2
Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, India
Email: yugalkumar.14@gmail.com1
, gsahoo@bitmesra.ac.in2
DOI: https://doi.org/10.22452/mjcs.vol31no2.2
ABSTRACT
MCSS is a relatively new meta-heuristic algorithm inspired from the electromagnetic theory and has shown better
potential than the same class of algorithms. But, like the other meta-heuristic algorithm, some performance issues are
also associated with this algorithm such as convergence rate and trap in local optima. So, in this work, an attempt is
made to improve the convergence rate of MCSS algorithm and proposed a Hybrid Magnetic Charge System Search
(HMCSS) for solving the clustering problems. Further, a local search strategy is also inculcated into MCSS algorithm to
reduce the probability of trapping in local optima and exploring promise solutions. The effectiveness of the proposed
algorithm is tested on some benchmark functions and also applied to solve real world clustering problems. The
experimental results show that the proposed algorithm gives better results than the existing algorithms, and also
improves the convergence rate of MCSS algorithm.
Keywords: charge particles, clustering, electric force, magnetic force
1.0 INTRODUCTION
From past few decades, clustering problems are attracting a bulk of attention from researchers to find the solution both
theoretical and practical point of view. It is an unsupervised classification technique, allocate the data into distinct
groups, where these groups are known as clusters and data within a cluster having more common characteristics. Hence,
the aim of clustering is to find the optimal cluster centers of the data so that the unseen patterns can be accessed from
[1-3]. These problems have been surfaced out from a variety of research fields such as pattern recognition, data analysis,
image processing, market research, process monitoring, bioinformatics, biology, and so on [4-10]. Therefore, it is a
major issue to either ameliorate the present problem or to develop new clustering algorithms for enhancing the
efficiency of data clustering. Several clustering algorithms have been reported by research community in recent years.
These algorithms can be categorized as partitioned based clustering algorithm, hierarchical clustering algorithm, density
based clustering algorithm, grid based clustering algorithm, and model based clustering algorithms [11-15]. This work is
primarily focused on the partition based clustering problems. In partition based clustering algorithms, different partitions
of dataset are formed using a criterion function and usually Euclidean distance is used for this. K-Mean is one of the
oldest, simplest and popular one partition based clustering algorithms, but it converges in local optima and its
performance also depends on initial cluster centers [16-17]. To get rid of this local optima problem, many evolutionary
and swarm based approaches have been applied to date. These approaches consist of innovative and intelligent paradigm
to solve the optimization problems. Some of these are genetic algorithms (GA) [18, 19], particle swarm optimization
(PSO) [20,21], ant colony optimization (ACO) [22, 23], artificial bee colony optimization (ABC) [24, 25], teacher
learning based optimization method (TLBO) [26, 27], charge system search (CSS) [28, 29], simulated annealing (SA)
[30, 31], tabu search (TS) [32], cat swarm optimization (CSO) [33-36], and magnetic charge system search (MCSS) [37-
38], which have been previously implemented successfully to solve the clustering problems. These algorithms explore
2. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
109
Malaysian Journal of Computer Science. Vol. 31(2), 2018
the solution space for optimal solution effectively. Further, in literature, hybrid variants of these algorithms are also
reported [21, 23, 25, 34, 36].
In this work, a recently developed meta-heuristic, called Magnetic charge system search (MCSS) algorithm is explored
to solve the partition based clustering problems. MCSS has been used to solve engineering design problems [39]. This
algorithm is inspired from the electromagnetic forces (electrical and magnetic forces between the charged particles)
from physics and law of motion from mechanics. Charged particles work like search agents and explore the search space
to obtain the optimal solution. Each charged particle is directed towards its global optimum position using the net
amount of forces (electrical and magnetic forces) and law of motion. Nevertheless, like many other meta-heuristic and
evolutionary algorithms, the MCSS algorithm also encounters some challenges. Although, MCSS is a well efficient
optimization algorithm there is some inefficiency in context with the solution search equation, which in turn is used to
generate new candidate solutions based on the information of previous solutions and electromagnetic forces. Hence, it is
observed that the search equation of MCSS algorithm is substantial for exploitation but poor for exploration. As a result
of this, it cannot effectively use information to compute the most promising search direction and due to this, issues in
relation with the trapping in local optima as well effect on convergence speed may arise. To handle the aforementioned
issues, an improved search mechanism is proposed for MCSS algorithm, inspired from the DE algorithm and also to
guide the search of new candidate solutions in order to improve convergence rate. Further, a local search strategy is
proposed to explore the more promising solution space and also to escape from the local optima problem. The boundary
level constraints are also handled by using this strategy. The rest of the paper is organized as follows. Sections 2.0 and
2.1 summarize background on clustering and magnetic charged system search algorithm; Section 3.0 describes proposed
HMCSS algorithm in detail; Experimental results are illustrated in section 4.0 and work is concluded in section 5.0.
2.0 BACKGROUND
In the field of data analysis, clustering is a common problem and it becomes NP hard problem when number of clusters
is more than three. Consider, a dataset D(x, z), where x represents number of attributes and z represents number of
clusters. Mathematically, it can be expressed as:
x = {x , x , x … … x } where x ∈ R , and
z = {z , z , 𝑧 … … z } where ∀ z ∩ z = ∅, z = x, ∀ z = ∅ (1)
The goal of clustering problem is to determine the partitions in a dataset with minimum criterion function. Elements
within a partition are generally homogenous in nature but exhibit heterogeneity with the elements of other partitions.
Commonly, Euclidean distance is used as criterion function. It can be described as follows.
F(A, B) = min A − B , j = 1, 2,3, … … . K (2)
Where, 𝐴 denotes the ith data objects, 𝐵 denotes the jth cluster center and D denotes the distance between ith data
objects from the jth cluster center. Hence, aim of the partition based clustering algorithm is to compute the optimal
cluster centers in a given dataset.
2.1Magnetic charge system search
Magnetic charge system search (MCSS) is a recent meta-heuristic algorithm reported in [39]. It is based on the
electromagnetic forces and newton law of motion. The candidate solution represents in terms of charged particles (CPs)
and CPs are responsible for searching the optimal solution by exploring the solution space. When, a charge particle (CP)
is placed in solution space, it generates its own electric field and imposes an electric force on other CPs. The amount of
electric force generated by CP is calculated using equation 3.
3. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
110
Malaysian Journal of Computer Science. Vol. 31(2), 2018
E = q
q
R
∗ w +
q
r
∗ w
,
∗ p ∗ (X − X ),
k = 1, 2, 3, … . . K
w = 1, w = 0 ↔ r < 𝑅
w = 0, w = 1 ↔ r ≥ R
(3)
where, qi and qk represents the fitness values of ith
and kth
CP, ri,k denotes the separation distance between ith
and kth
CPs,
w1 and w2 are the two variables whose values are either 0 or 1, R represents the radius of CPs which is set to unity and
it is assumed that each CPs has uniform volume charge density but changes in every iteration and Pik denotes the moving
probability of each CPs. Now, the value of electric force (Eik) depends on the variables qi, qk, ri,k, Pik and R.
Whereas, movement of CP in random space search produces the magnetic field, which in turn imposes the magnetic
force on other CPs and it can be determined using equation 4.
M = q
I
R
∗ r ∗ w +
I
r
∗ w
,
∗ PM ∗ (X − X ),
k = 1, 2, 3, … . . K
w = 1, w = 0 ↔ r < 𝑅
w = 0, w = 1 ↔ r ≥ R
(4)
where, qk represents the fitness values of the kth
CP, Ii is the average electric current, ri,k denotes the separation distance
between ith
and kth
CPs, w1 and w2 are the two variables whose values are either 0 or 1, R represents the radius of CPs
which is set to unity and PMik denotes the probability of magnetic influence. Now, the value of magnetic force (Mik)
depends on the variables qk, Ii, ri,k, PM and R.
Finally, the total force (Ftotal) acting on a CP is the synergistic action of both the electric and magnetic forces and can be
computed using equation5.
F = p ∗ E + M (5)
where, p denotes a probability value to determine either the electric force (Eik) repelling or attracting,
E and M present the electric and magnetic forces exerted by the kth
CPs to ith
particles.
The total force with Newton second law of motion is used to find the updated positions of CPs in D-dimensional search
space and these updated positions of CPs can be computed using the equation 6 given below.
X , = rand ∗ Z ∗
F
m
∗ ∆t + rand ∗ Z ∗ V ∗ ∆t + X (6)
where, rand1 and rand2 are the two random variable in between 0 and 1, Za and Zv act as control parameters to control
the influence of total force (F ) and previous velocities, mk is the mass of kth
CPs which is equal to the qk and ∆t
represents the time step which is set to 1,X represents the position of old CPs, V denotes the velocity of kth
CPs.
The updated velocities of CPs can be calculated using equation 7.
V =
X − X
∆t
(7)
3.0 PROPOSED HMCSS
This section describes the proposed HMCSS algorithm for improving the convergence rate and local optima problems of
MCSS algorithm. In order to make the MCSS algorithm more efficient, robust and effective, two amendments are
proposed. For achieving the good convergence rate, both the exploration and exploitation abilities of algorithm should
be well balanced. MCSS algorithm is good for exploration, but converges slowly due to lack of exploitation. Another
issues regarding performance of MCSS algorithm is local optima problem and boundary constraints. The proposed
4. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
111
Malaysian Journal of Computer Science. Vol. 31(2), 2018
amendments are explained in subsections 3.1 and 3.2, and the pseudo code of proposed HMCSS is described in
subsection 3.3.
3.1 Solution Search Mechanism
The exploration and exploitation are prerequisites of each meta-heuristic algorithm. Exploration refers to the ability of
the algorithm in finding global optimum solution, whereas, exploitation refers to the ability of the algorithm in finding
better solution based on the previous solution [40]. In MCSS algorithm, the new candidate solution is generated using
equation 6. In equation 6, the right hand side first term is responsible for the exploitation of solutions in search space and
liability of exploration belongs to the second term. So, the updated solution is the combination of the total force exerted
by CPs and its velocities with the old position of CPs in solution space. On the other hand, in equation 6, rand1 and rand2
are the two random numbers in the range of 0 and 1; Za and Zv are the control parameters to control the exploitation and
exploration process; Ftotal,k and mk denotes the acting force and mass of the kth
CP; Vk,old is the previous velocity of kth
CP
and Ck,old is the position of the kth
CP in solution space. The equation 6 is good for exploration but poor at exploitation. It
is noted that well-made search equations improve the performance of algorithm. Hence, a new search equation is
proposed to enhance the exploration ability of the MCSS algorithm using the mutation strategy of Differential evolution
(DE) algorithm. DE is a simple, but efficient algorithm, which can be applied to solve different types of optimization
problems in real-world applications [40]. DE algorithm uses different mutation strategies to exploit the better solutions,
but the following mutation strategies are widely adopted.
DE/best/1: Vi= Xbest+ F (Xr1− Xr2) (8)
DE/rand/1: Vi= Xr1 + F (Xr2− Xr3) (9)
Where, i = {1, 2, . . . , SN}; r1, r2, and r3 are mutually different random integer indices selected from {1, 2, . . . ,NP};F is
a positive real number, typically less than 1.0 that controls the rate at which the population evolves. It is observed that
best solution component in the search mechanism can explore the optimum solution effectively and also guide searching
process efficiently and resulted in improved convergence rate. In DE algorithm, “DE/best/1”, denotes the best solutions
explored in the history and it can use to guide the current population, whereas, “DE/rand/1” maintains population
diversity. Hence, in this work, a new improved search mechanism is proposed motivated by DE algorithm and new
solution search equation is proposed by hybridizing the two solution search equations. The proposed equation is
described as follows.
X , = rand ∗ Z ∗
F
m
∗ ∆t + rand ∗ Z ∗ V ∗ ∆t + X + ∗ (X − X ) (10)
In above equation, the best solution in current population can improve the convergence rate of MCSS algorithm and Xr1
denotes the random population, maintains the population diversity. Thus, the exploitation ability is improved by using
global best experience of each CP in solution search space which is mentioned in equation 10 and this modification
leads the algorithm to be converged on optimal solution.
3.2 Local Search Strategy
Every meta-heuristic algorithm suffers with the problem of local optima. Another issue related to these algorithms is the
boundary constraints. It is noted that there is no predefined mechanism to deal with such problems. Hence, in this work,
effort is made to handle the problems of local optima and boundary constraints. So, a local search strategy is proposed to
handle these issues. This strategy is illustrated in Figure 6.1 and summarized as follows:
Step 1: For each CP, Euclidean distance is calculated from the other CPs in the input search space using the position of
CP. The other CPs within a threshold Euclidean distance can be considered as neighbors of current CP. Euclidean
distance between two CP Xi and Xj in the n-dimensional search space is given by equation 11.
5. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
112
Malaysian Journal of Computer Science. Vol. 31(2), 2018
d = x − x (11)
Step 2: If CP new position is out of range, other CPs in the neighborhood are evaluated.
Step 3: The position of the CP is then replaced with that of the best CP in the neighborhood instead of a random value.
This helps in exploring more promising candidate solutions.
So, in this work, two amendments in MCSS algorithm are proposed. First, the solution search equation of the MCSS
algorithm is modified using the cognitive equation of PSO algorithm. Second, a neighborhood search strategy is induced
in MCSS algorithm to explore the more promising positions of CPs in solution space. The proposed algorithm maintains
the core concept of the MCSS algorithm. The detailed description of proposed algorithm is summarized in section 3.1.
Fig. 1: Local search strategy
3.3 Pseudo code of proposed algorithm
The pseudo code of proposed algorithm is as follows.
Step 1: Initialize the parameters of MCCS algorithm such as number of CPs, c1, c2, ϵ and load the dataset.
Step 2: Initialize the charge particles positions and velocities.
For k = 1 to K* K is number of cluster centers in a dataset */
Evaluate the position of kth
CP (position (k)) from dataset using equation 12;
X(k) = X , + r ∗ (X , − X , (12)
6. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
113
Malaysian Journal of Computer Science. Vol. 31(2), 2018
Initialize the velocity of each kth
CP;
V (k) = 0;
End for
Step 3: Calculate the value of objective function.
For k = 1: K
Compute the objective function value for each data instance to each CPs using eq. 13;
D , = X , − X , (13)
where, X , denotes the ith
attribute of the jth
data instance, X , represents the ith
attribute of the kth
CPs of instance and d , denotes Euclidean distance between jth
data instance from the kth
CPs.
End for
Assign the data instance to each CP using minimum objective function values.
For k = 1: K
Compute the fitness for each CPs using equation 14;
D , = X , − X , (14)
where, X , denotes the ith
attribute of the jth
data instance, X , represents the ith
attribute of
the kth
CPs of instance and D , denotes distance between kth
CPs and jth
data instance.
End for
Step 4: For each charge particle
Initialize the personal best memory
For k=1: K
pbest (k) = position (k);
fitness (pbest) = fitness (k);
End for
Determine the positions of neighborhood particles in solution space.
For k=1 to K
Compute distance between pbest (k) to every particle.
Evaluate the positions of neighborhood particles (P) to a CP using minimum distance
criteria.
End for
Step 5: While the termination condition is not met, do following;
Iteration = 0;
Step 6: Calculate the mass of initial positioned CP.
For k = 1: K
Compute the mass of each CPs using equation 15;
M =
fit(k) − fit(worst)
fit(best) − fit(worst)
(15)
Where, mk represents the mass of kth
CPs, fit (k) is the fitness of kth
CPs, fit (best)
and fit (worst) are the best and worst fitness values in the given dataset.
End for
Step 7: Evaluate the value of moving probability Pk,i for each charged particle.
For k=1 to K
Determine the of fitness of each data instance (qik) to each CP using equation 16;
q =
fit(i) − fit(best)
fit(k) − fit(i)
(16)
7. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
114
Malaysian Journal of Computer Science. Vol. 31(2), 2018
where, fit(i) presents the fitness of the ith particle while fit(best) and fit(worst)
denote the best and worst fitness values of the given dataset.
End for
Determine the moving probability for each CP.
For k=1 to K
If (fit (qk,i) > fit (k))
Pk,i 1;
Else
Pk,i 0;
End if
End for
Step 8: Compute the Electric Force (E ) for each CP.
Compute the fitness of data instances.
For i = 1: N * N is total number of data instances in the dataset*/
Compute the fitness of each instance (qi) using equation 17;
q =
fit(i) − fit(worst)
fit(k) − fit(i)
, i = 1, 2, 3, … , N (17)
End for
Compute the separation distance (r ) for each CP.
For k=1 to K
Calculate the value of separation distance (r ) using equation 18;
r =
‖X − X ‖
‖(X + X ) 2⁄ − X ‖+∈
(18)
Where, Xi and Xk represent the position of ith
and kth
particle and Xbestdescribes the best
position and ∈ is a small positive constant to avoid singularities.
End for
If (r <R)
w11;
Else
w20;
End if
If (r ≥ R)
w10;
Else
w21;
End if
Determine the Electric Force for each CP.
For k=1 to K
Calculate the value of electric force using equation 3;
End for
Step 9: Compute the Magnetic Force (Mik).
Compute the average electric current (Ii).
For i = 1: N (N is total number of data instances in the dataset)
Determine the average electric current for each data instance (qi) using equation 19;
I = fit (i) − fit (i) (19)
End for
Compute the probability of magnetic influence (PMk).
For k=1:K
8. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
115
Malaysian Journal of Computer Science. Vol. 31(2), 2018
If fit (k) >= fit (i); /* fit (i) represents the fitness of each data instance*/
PM1;
Else
PM0;
End if
End for
Determine the Magnetic Force for each CP.
For k=1 to K
Calculate the value of electric force using equation 4;
End for
Step10: Compute the total force (Ftotal) act on each CPs.
If (rand ( ) > 0.10*(1-iteration / iteration max))
Pr=1;
Else
Pr= -1;
End if
For i=1:K
Ftotal= Pr*Eik+Mik;
End for
Step 11: Evaluate the new positions X , and velocities (V )of CPs.
For k=1 to K
Update the position of CP using equation 10;
Update the velocity of CP using equation 7;
End for
Step 12: Recomputed the objective and fitness function using updated positions and velocities of CPs.
Step 13: Update the personal best (pbest) position of CPs.
If (fitness (pbest) ≤ fitness (X , ))
fitness (pbest) fitness (X , );
Else
pbest (k) = position (k);
End if
For k=1 to K
pbest (k) = position (X , );
End for
Step 14: * Local Search Strategy*/
Determine the positions of neighborhood particles in solution space.
For k=1 to K
Compute distance between pbest (k) to every particle.
Evaluate the positions of neighborhood particles (P) to a CP using minimum distance
criteria.
End for
Repositioning of a neighborhood particle as a charge particle (CP)
For k=1 to K
If fitness (k) < rand ()
Select a particle from the neighborhood in random order;
Mark the particle as the new charge particle (CP);
Compute the velocity of new CP using equation 7;
End if
End for
Iteration = Iteration +1;
Step 15: Repeat the step 6 -14, until termination condition is not met.
End while
9. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
116
Malaysian Journal of Computer Science. Vol. 31(2), 2018
Step 16: Optimal cluster centers obtained.
3.4 Illustrative example
This section elaborates the process of the proposed method in detail with example dataset. Here, a two dimensional
dataset consisting 10 instances is considered for obtaining the optimal cluster centers as well as to explain the operation
of the proposed method. The instances of the example dataset are (5.4, 3.7), (4.8, 3.4), (4.8, 3.0), (4.3, 3.0), (5.8, 4.0),
(5.7, 4.4), (5.4, 3.9), (5.1, 3.5), (5.7, 3.8) and (5.1, 3.8). These instances are assumed to be divided into three clusters,
i.e., K = 3, d = 2, where K is the number of clusters and d is the dimension of data set. The aim is to reduce the intra-
cluster distance of the data set. The major steps of the proposed algorithm are as follows.
Step 1 and 2: The proposed algorithm starts by initializing the user defined parameters and identifying the initial cluster
centers positions and velocities. Equation 12is used to determine the positions of initial cluster centers (CPs) in random
order and it is assumed that the initial velocities of CPs are set to 0. Thus, the randomly generated initial cluster centers
are (4.9653, 3.7548), (5.0328, 3.4162) and (5.4318, 3.9185).
Step 3: In step 3, all data instances are grouped into three clusters using the initial cluster centers (4.9653, 3.7548),
(5.0328, 3.4162) and (5.4318, 3.9185) by using equation 13. As a result of this, the data instances 1, 2, 3, 4 belong to the
cluster center (4.9653, 3.7548), data instances 5, 6, 7 and 8 belong to the cluster center (5.0328, 3.4162) and data
instances 9 and 10 belong to the cluster center (5.4318, 3.9185). After grouping the data instances, the fitness of each
cluster center is computed using equation 14. The fitness of CPs is 0.1538, 0.0732 and 0.1086.
Step 4: In this step, global best position of CP and its fitness is computed. The global best position of CP and fitness is
(5.0328, 3.4162) and 0.0732.The neighbors of current CPs are also evaluated, which are further used to explore the more
promising solutions. Thus the neighborhood particles to the current CPs are (4.8, 3.0),(5.1,3.5) and (5.7, 3.8).
Step5, 6 and 7: Step 5 deals with the termination condition. In step 6, mass of initially positioned CPs are calculated
using equation 15, which are further used to determine the electric and magnetic force enforced on CPs. Thus, the mass
of CPs are 0.4183, 0.7246 and 1.5148. Step 7 is used to determine the moving probability for each CPs in binary form.
Step 8 and 9: In step 8 and 9, the electric and magnetic force acting on each CP is determined using equations3 and 4.
These forces can be viewed as the local search for the solutions in solution space and further used with search equation
to obtain the optimal cluster centers. Thus, the values of the electric force for CPs are 0.2163, 0.3854 and 0.3189
whereas the magnetic force acting on CPs are 0.0472, 0.0236and 0.0576 respectively.
Step 10: The total force acted on a CP is the combined effect of the electric and magnetic force. But a probabilistic
aspect is added to the electric force to give more practical resemblance with actual scenario whether an electric force is
attractive or repulsive. If, it is attractive then it will be added to the magnetic force otherwise it will be subtracted. Thus,
the values of total force for CPs in example dataset are 0.2348, 0.3972 and 0.2917.
Step 11and 12: The new positions and velocities of CPs are determined using the equations 10 and 7. The updated
positions and velocities of CPs are (4.8312 3.6425, 5.1682 3.76963, 5.4318 4.1364) and (0.1543, 0.2381 and 0.1029). In
step 12, objective and fitness function values are computed again using the updated positions of CPs.
Step 13: To update the best position of CPs, a comparison between fitness of best position (gbest) and fitness of newly
generated positions is performed, the better one take over the gbest.
Step 14 and 15: In this step, a neighborhood search strategy is invoked by identifying the particles near to the current
CPs using minimum distance criteria. The neighborhood particles replace the current CPs using degree of randomness
and act as new CPs instead of old CPs. Step 14 and 15 are repeated as long as stopping condition is not met and obtain
the best cluster centers as an output. Thus, the optimal cluster centers obtain for example dataset using proposed method
are (4.5978, 3.1632), (5.2543, 3.8013) and (5.7468, 4.1367).
10. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
117
Malaysian Journal of Computer Science. Vol. 31(2), 2018
4.0 EXPERIMENTAL RESULTS
To test the effectiveness of the proposed algorithm, it is applied to solve the clustering problems and results are
compared with K-Means, GA, PSO, ACO, CSS, MCSS, and MCSS-PSO algorithms. The proposed algorithm is
implemented in Matlab 2010a environment using windows 7 operating system, Intel corei3 processor, 3.4 GHz and 4
GB RAM.
4.1 Performance metrics
The performance of the proposed algorithm is evaluated using the sum of intra cluster distance, and f-measure
parameters. These parameters are described as below.
Intra cluster distances
It indicates the distance between the data objects within a cluster and its cluster center. This parameter also highlights
the quality of clustering i.e. the minimum is the intra cluster distance, the better will be the quality of the solution. The
results are measured in terms of best, average, and worst solutions.
F-Measure
The value of this parameter is measured in terms of recall and precision of an information retrieval system. It is also
described as weighted harmonic mean of recall and precision. It can be calculated as if a cluster j consists a set of nj data
objects as a result of a query and each class i consists of a set of ni data objects need for a query then nij gives the number
of instances of class i within cluster j. The recall and precision, for each cluster j and class i is defined as:
Recall r(i, j) =
n ,
n
andPrecision p(i, j) =
n ,
n
(18)
The value of F-measure (F (i, j)) is determined as
F(i, j) =
2 ∗ (Recall ∗ Precision)
(Recall + Precision)
(19)
The value of F-measure for a given clustering algorithm which consist of n number of data instances is given as
F(i, j) =
n
n
∗ max ∗ F(i, j) (20)
4.2 Parameters settings
The proposed algorithm consists of five parameters rand1, rand2, ϵ, , no. of CPs and p in which rand1 and rand2 are
random number in the range of 0 and 1, number of CPs is equal to numbers of clusters (K) present in the dataset, ϵ is a
small positive constant to avoid singularities and p denotes the number of neighbors. Table 1 shows the parameters
settings of the proposed algorithm. In order to achieve a good performance, experiment run with the best parameters
values and outcome of experiment measures after 100 iterations. The results presented in this work are the average of ten
different runs.
Table 1: Parameter settings of proposed algorithm
Parameters Values
No. of CPs No. of Clusters (K)
ϵ 0.001
p 3
11. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
118
Malaysian Journal of Computer Science. Vol. 31(2), 2018
4.3 Datasets
Eight real datasets are used to assess the performance of the proposed algorithm. These datasets are taken from the UCI
repository. These are iris, wine, CMC, glass, cancer, liver disease (LD), thyroid, and vowel. In spite of these, two
artificial datasets are also used to test effectiveness of the proposed algorithm, named as ART1 and ART2 datasets.
Table 2 summarizes the features of these datasets.
Table 2: Description of datasets
Dataset Clusters (K) Features Total Instances Instance in Each Cluster
ART 1 3 2 300 (100, 100, 100)
ART 2 3 3 300 (100, 100, 100)
Iris 3 4 150 (50, 50, 50)
Glass 6 9 214 (70,17, 76, 13, 9, 29)
LD 2 6 345 ( 145, 200)
Thyroid 3 3 215 (150, 30, 35)
Cancer 2 9 683 (444, 239)
CMC 3 9 1473 (629,334, 510)
Vowel 6 3 871 (72, 89, 172, 151, 207, 180)
Wine 3 13 178 (59, 71, 48)
Tables 3-4 show the results of proposed algorithm and other algorithm being compared. From results, it is stated that the
proposed algorithm performs than other algorithms. The proposed algorithm obtains better avg. intra cluster distances
for all dataset except thyroid dataset. Further, it is also seen that proposed algorithm obtains best intra cluster distances
for all datasets. FM parameter also supports the capability of the proposed algorithm for solving the clustering problems.
So, it can be concluded that the proposed clustering algorithm is more competitive and efficient as compared to the other
algorithms included in the experiment. To show the effectiveness of the proposed algorithm, some statistical tests are
also performed.
Table 3: Comparison of proposed HMCSS algorithm to other meta-heuristics algorithms
Dataset Para meters K-means GA PSO ACO HMCSS
ART1
Best 157.12 154.46 154.06 154.37 155.91
Avg. 161.12 158.87 158.24 158.52 156.83
Worst 166.08 164.08 161.83 162.52 158.46
FM 99.14 99.78 100 100 100
ART2
Best 743 741.71 740.29 739.81 736.63
Avg. 749.83 747.67 745.78 746.01 741.68
Worst 754.28 753.93 749.52 749.97 745.84
FM 98.94 99.17 99.26 99.19 99.57
Iris
Best 97.33 113.98 96.89 97.1 96.36
Avg. 106.05 125.19 97.23 97.17 96.45
Worst 120.45 139.77 97.89 97.8 96.93
FM 0.78 0.78 0.78 0.78 0.79
Wine
Best 16555.68 16530.53 16345.96 16530.53 16028.54
Avg. 18061 16530.53 16417.47 16530.53 16378.42
Worst 18563.12 16530.53 16562.31 16530.53 16618.36
FM 0.52 0.52 0.52 0.52 0.53
LD Best 11397.83 532.48 209.15 224.76 204.81
12. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
119
Malaysian Journal of Computer Science. Vol. 31(2), 2018
Avg. 11673.12 543.69 224.47 235.16 221.26
Worst 12043.12 563.26 239.11 256.44 230.73
FM 0.47 0.48 0.49 0.49 0.49
Cancer
Best 2999.19 2999.32 2973.5 2970.49 2904.65
Avg. 3251.21 3249.46 3050.04 3046.06 2928.48
Worst 3521.59 3427.43 3318.88 3242.01 2963.41
FM 0.83 0.82 0.82 0.82 0.87
CMC
Best 5842.2 5705.63 5700.98 5701.92 5642.61
Avg. 5893.6 5756.59 5820.96 5819.13 5645.27
Worst 5934.43 5812.64 5923.24 5912.43 5712.58
FM 0.33 0.32 0.33 0.33 0.37
Thyroid
Best 13956.83 10176.29 10108.56 10085.82 9956.48
Avg. 14133.14 10218.82 10149.7 10108.13 10016.78
Worst 146424.21 10254.39 10172.86 10134.82 10096.74
FM 0.73 0.76 0.78 0.78 0.8
Glass
Best 215.74 278.37 270.57 269.72 210.08
Avg. 235.5 282.32 275.71 273.46 215.45
Worst 255.38 286.77 283.52 280.08 224.34
FM 0.43 0.33 0.36 0.36 0.46
Vowel
Best 149422.26 149513.73 148976.01 149395.6 142326.74
Avg. 159242.89 159153.49 151999.82 159458.14 144012.39
Worst 161236.81 165991.65 158121.18 165939.82 154286.97
FM 0.65 0.65 0.65 0.65 0.66
Table 4: Performance comparison of proposed HMCSS to other CSS variants
Dataset Para meters CSS MCSS MCSS-PSO HMCSS
ART1
Best 153.91 153.18 152.76 155.91
Avg. 158.29 158.02 157.24 156.83
Worst 161.32 159.26 158.92 158.46
FM 100 100 100 100
ART2
Best 738.96 737.85 738.19 736.63
Avg. 745.61 745.12 742.34 741.68
Worst 749.66 748.67 746.58 745.84
FM 99.43 99.56 99.56 99.57
Iris
Best 96.47 96.43 96.31 96.36
Avg. 96.63 96.57 96.48 96.45
Worst 96.78 96.63 96.65 96.93
FM 0.79 0.79 0.79 0.79
Wine
Best 16282.12 16158.56 16047.32 16028.54
Avg. 16289.42 16189.96 16093.18 16078.42
Worst 16317.67 16223.61 16128.33 16108.36
FM 0.53 0.54 0.54 0.53
LD
Best 207.09 206.14 203.71 204.81
Avg. 228.27 221.69 219.69 221.26
13. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
120
Malaysian Journal of Computer Science. Vol. 31(2), 2018
Worst 242.14 236.23 231.47 230.73
FM 0.49 0.5 0.5 0.49
Cancer
Best 2946.48 2932.43 2918.93 2904.65
Avg. 2961.16 2947.74 2931.56 2928.48
Worst 3006.14 2961.03 2978.29 2963.41
FM 0.85 0.86 0.86 0.87
CMC
Best 5672.46 5653.26 5638.64 5642.61
Avg. 5687.82 5678.83 5654.37 5645.27
Worst 5723.63 2947.74 5691.43 5712.58
FM 0.36 0.37 0.37 0.37
Thyroid
Best 9997.25 9963.89 9917.56 9956.48
Avg. 10078.23 10036.93 10023.08 10016.78
Worst 10116.52 10078.34 10104.26 10096.74
FM 0.79 0.79 0.79 0.8
Glass
Best 203.58 209.47 206.32 210.08
Average 223.44 231.61 217.61 215.45
Worst 241.27 263.44 226.44 224.34
FM 0.45 0.45 0.45 0.46
Vowel
Best 149335.61 146124.87 142948.63 142326.74
Avg 152128.19 149832.13 145640.78 144012.39
Worst 154537.08 157726.43 153568.24 154286.97
FM 0.65 0.65 0.65 0.66
Fig. 2 : Clustering of ART1 dataset using proposed algorithm
-1 0 1 2 3 4 5
-2
-1
0
1
2
3
4
5
X1
Y1
ART1 Dataset
Cluster 1
Cluster 2
Cluster 3
14. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
121
Malaysian Journal of Computer Science. Vol. 31(2), 2018
Fig. 3 : Clustering of ART2 dataset using proposed algorithm
Fig. 4 : Clustering of iris dataset using proposed approach in 2Dimension
Fig. 5 : Clustering of iris dataset using proposed approach in 3Dimension
5
10
15
20
10
20
30
6
8
10
12
14
16
18
20
22
X1
ART2
Y1
Z1
Cluster 1
Cluster 2
Cluster 3
0 0.5 1 1.5 2 2.5
1
2
3
4
5
6
7
petal width
sepalwidth
Iris Dataset
Cluster 1
Cluster 2
Cluster 3
0
1
2
3
4
6
8
1
2
3
4
5
6
7
petal width
Iris Dataset
sepal width
petallength
Cluster 1
Cluster 2
Cluster 3
15. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
122
Malaysian Journal of Computer Science. Vol. 31(2), 2018
ART1 and ART2 data are arranged into three different clusters using the proposed algorithm and the arrangement of
data into different clusters is shown in fig. 2 and 3. The ART1 dataset consists of two dimensions , wheras, ART2
dataset consists of three dimensions. Further, fig. 4 and 5 shows the arrangmnet of iris data into three different clusters.
Fig. 4 illustrates the data using sepal width and petal width attributs, wheras, fig 5 shows the arrangement of data using
three dimension such as petal length, sepal width and petal width. From these figs., it is also concluded that data belongs
to cluster 1 is linearly separable from other two clusters. But the data belongs to clusters 2 and 3 are linearly non
separable.
4.4 Statistical analysis
This subsection section deals with the statistical analysis on the performance of the proposed HMCSS algorithm. The
objective of statistical analysis is to find the significant differences between the performances of all algorithms. These
tests have been widely applied in the machine learning domain [41-43]. Statistical analysis consists of two steps. In first
step, a statistical test is applied to check the substantial differences in the outcomes of the algorithms. In second step, a
post-hoc test is applied to validate the best performing algorithm. In post-hoc test, best performing algorithm acts as a
control algorithm and its performance is measured with the rest of the algorithms. In this work, Friedman and Quade
tests are employed for demonstrating the statistical analysis and along these tests, holm’s method is selected as post-hoc
test and level of confidence (α) is set as 0.05.
Table 5: Average ranking of algorithm using Friedman test using intra cluster distance parameter
Algorithms K-Means GA PSO ACO CSS MCSS MCSS-PSO HMCSS
Ranking 7.5 6.85 5.4 5.95 4.2 3.1 1.9 1.1
Results of the Friedman test are illustrated in Table 5 and 8. Table 5 summarizes the average ranking of each technique
computed using Friedman test for intra cluster distance parameter. The statistical value of Friedman test on the
confidence level 0.0.5 is 63.0834 and corresponding p-value is 3.647e-11. The null hypothesis is clearly rejected
indicating a significant difference in the performance of all algorithms. But, it is noticed that Friedman test cannot
differentiate the datasets that are used in experimentation and it assigns equal importance to each of the dataset.
However, varieties of datasets are used in this work and these are categorized into three categories, which are low
dimensional datasets, moderate dimensional datasets, and high dimensional dataset. For instance, iris dataset is one of
the low dimensional dataset, whereas wine dataset corresponds to the high dimensional dataset. So, an alternative
approach is used for ranking of algorithms on the basis of scaling of datasets and it provides more accurate results in
comparison to Friedman test.
An alternative approach to Friedman test is Quade Test. Tables 6 to 8 illustrates the results of Quade test using intra
cluster distance parameter. The relative size of observations with each algorithm is shown in Table 6 and it is used to
determine the relative significance of the datasets. Table 7 shows the average ranking of algorithms using Quade test.
The results of Quade test are shown in Table 8 and the statistical values measured on the confidence level 0.0.5 are
29.7752and corresponding p-value is 1.306e-10. Again the null hypothesis is rejected at the level of confidence 0.05.
Both methods rejected the null hypothesis and it can be concluded that substantial differences exist among the
performance of all algorithms being compared.
16. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
123
Malaysian Journal of Computer Science. Vol. 31(2), 2018
Table 6: Relative size of each dataset for Quade test using intra cluster distance parameter
Dataset
Algorithm
K-Means GA PSO ACO CSS MCSS MCSS- PSO HMCSS
Iris 12.5 17.5 7.5 2.5 -2.5 -7.5 -12.5 -17.5
Wine 3.5 2 0.5 2 -0.5 -1.5 -2.5 -3.5
LD 17.5 12.5 -2.5 7.5 2.5 -7.5 -17.5 -12.5
Cancer 17.5 12.5 7.5 2.5 -2.5 -7.5 -12.5 -17.5
CMC 17.5 2.5 12.5 7.5 -2.5 -7.5 -12.5 -17.5
Thyroid 17.5 12.5 7.5 2.5 -2.5 -7.5 -12.5 -17.5
Glass 2.5 17.5 12.5 7.5 -7.5 -2.5 -12.5 -17.5
Vowel 12.5 7.5 -2.5 17.5 2.5 -7.5 -12.5 -17.5
Relative
Size of
datasets
101 84.5 43 49.5 -13 -49 -95 -121
Holm’s post-hoc test is performed after the rejection of null hypothesis. This test can be done for illustrating the
significant differences occur between best one and rest of algorithms. The best algorithm can act as the benchmark
algorithm and its performance is measured against the other algorithms. Here, HMCSS algorithm acts as a control
algorithm and the results of the post-hoc test with both of Friedman and Quade tests are summarized in Table 9 to10.
From these, it can be observed that post-hoc test rejects the hypothesis at the confidence level of 0.1 and 0.05 except
MCSS-PSO. So, it can be proven that there is a significant difference in the performance of algorithms. The null
hypothesis is not rejected with MCSS-PSO algorithm but, it is noted that the performances of both algorithms are
significantly different.
Table 7: Ranking of each algorithm computed through Quade test using intra cluster distance parameter
Algorithm
K-
Means
GA PSO ACO CSS MCSS
MCSS-
PSO
HMCSS
Avg. Ranking 7.38 6.81 5.63 5.94 4.13 3.13 1.88 1.13
Both of tests are also applied on FM parameter results. Results of these tests are shown in Table 11to14. Table 11 shows
the average ranking of each algorithm computed using Friedman tests for f-measure parameter. It is noticed that HMCSS
algorithm obtains 1st
rank while GA gets the worst rank among all algorithms.
17. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
124
Malaysian Journal of Computer Science. Vol. 31(2), 2018
Table 8: Statistics obtained using Friedman and Quade tests for intra cluster distance parameter.
Method Statistical Value p-value Hypothesis
Friedman 63.0834 3.647e-11 Rejected
Quade 29.7752 1.306e-10 Rejected
The statistical value obtained for Friedman test at the confidence level 0.05 is 45.3741and the statistics of Friedman test
is summarized in Table 14 which rejects the null hypothesis. Tables 12 to14 consist of the outcomes of the Quade test.
Table 12 summarizes the relative size of each dataset for each algorithm, whereas the ranking of each algorithm using
Quade test is illustrated in Table 13. Statistics of Quade test is shown in Table 14 and the statistical value on the
confidence level 0.05 is 14.2635. It can be said that null hypothesis is rejected at the level 0.05.
Table 9: Results of Holm’s Post-hoc test for Friedman Test using intra cluster distance parameter
i Algorithms z values p values
α/i,
α=0.05
Hypothesis
α/i,
α=0.1
Hypothesis
7 K-Means 5.3127 < 0.00001 0.00714 Rejected 0.01429 Rejected
6 GA 4.8643 1.4636E-06 0.08330 Rejected 0.01667 Rejected
4 PSO 3.9286 5.8954E-05 0.01000 Rejected 0.02000 Rejected
5 ACO 3.6352 4.6324E-05 0.01250 Rejected 0.02500 Rejected
3 CSS 2.3296 0.02180 0.01666 Rejected 0.03333 Rejected
2 MCSS 1.8329 0.01928 0.25000 Rejected 0.05000 Rejected
1 MCSS-PSO 0.9134 0.17612 0.05000 Not Rejected 0.10000 Not Rejected
Table 10: Results of Holm’s Post-hoc test for Quade Test using intra cluster distance parameter
i Algorithms z values p values α/i, α=0.05 Hypothesis α/i, α=0.1 Hypothesis
7 K-Means 5.5783 < 0.00001 0.00714 Rejected 0.01429 Rejected
6 GA 5.1262 4.6315E-07 0.08330 Rejected 0.01667 Rejected
4 PSO 4.5873 3.6648E-06 0.01250 Rejected 0.02500 Rejected
5 ACO 3.7219 1.7326E-04 0.01000 Rejected 0.02000 Rejected
3 CSS 2.5493 0.00836 0.01666 Rejected 0.03333 Rejected
2 MCSS 2.1357 0.02618 0.25000 Rejected 0.05000 Rejected
1 MCSS-PSO 0.9126 0.1735 0.05000 Not Rejected 0.10000 Not Rejected
Table 11: Average ranking of algorithm using Friedman tests for FM parameter
Algorithms
K-
Means
GA PSO ACO CSS MCSS
MCSS-
PSO
HMCSS
Ranking 6.25 6.88 6 6 3.75 2.75 2.75 1.63
18. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
125
Malaysian Journal of Computer Science. Vol. 31(2), 2018
Table 12: Relative size of each dataset computed through Quade test for FM parameter
Dataset
Algorithm
K-
Means
GA PSO ACO CSS MCSS
MCSS-
PSO
HMCSS
Iris 3 3 3 3 -3 -3 -3 -3
Wine 6 6 6 6 -3 -9 -9 -3
LD 24.5 17.5 3.5 3.5 3.5 -14 -14 -24.5
Cancer 2.25 11.25 11.25 11.25 -2.25 -9 -9 -15.75
CMC 6.75 15.75 6.75 6.75 -2.25 -11.25 -11.25 -11.25
Thyroid 24.5 17.5 7 7 -10.5 -10.5 -10.5 -24.5
Glass 3.5 24.5 14 14 -10.5 -10.5 -10.5 -24.5
Vowel 0.75 0.75 0.75 0.75 0.75 0.75 0.75 -5.25
Relative
Size of
datasets
71.25 96.25 52.25 52.25 -27.25 -66.5 -66.5 -111.75
Table 13: Ranking of each algorithm determined through Quade test using FM parameter
Algorithm K-Means GA PSO ACO CSS MCSS
MCSS-
PSO
HMCSS
Avg. Ranking 6.25 6.88 6 6 3.75 2.75 2.75 1.63
Table 14: Statistics of Friedman and Quade tests using FM parameter
Method Statistical Value p-value Hypothesis
Friedman 45.3741 1.157e-7 Rejected
Quade 14.2635 7.489e-10 Rejected
Again, Holm’s post hoc test is to prove substantial difference between best one i.e. control algorithm and other
algorithms. Results of post-hoc method are described in Tables 15 to16 using intra cluster distance and FM parameters
at the level of confidence 0.05 and 0.1. From Table 16, it can be revealed that hypothesis is rejected with most of
algorithms except MCSS-PSO at the confidence level 0.05. While at the confidence level 0.1, hypothesis is rejected with
all of algorithms except MCSS-PSO which signifies that performance of the control algorithm MCSS-PSO considerable
is diversified from the others. In case of MCSS-PSO, when α = 0.05 and MCSS-PSO when α = 0.1, it is mentioned that
the performance of the proposed method is better than the other algorithms as reported in Table 4. Whereas, the results
of the Holm’s post-hoc test for Quade test is shown in Table 16. From this it can be concluded that the proposed
algorithm outperforms others at the significance level 0.1 and rejects all hypothesis. At the significance level 0.05, the
proposed algorithm rejects the hypothesis except MCSS-PSO. It can be concluded that performance of the proposed
19. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
126
Malaysian Journal of Computer Science. Vol. 31(2), 2018
algorithm is significantly different from the others and statistical analysis study also proves the substantial significance
of the proposed algorithm.
Table 15: Results of Holm’s Post-hoc test after rejection of null hypothesis for Friedman Test using FM.
i Algorithms z values p values α/i, α=0.05 Hypothesis α/i, α=0.1 Hypothesis
7 K-Means 3.5263 1.3265E-04 0.00714 Rejected 0.01429 Rejected
6 GA 5.6193 < 0.00001 0.08330 Rejected 0.01667 Rejected
5 ACO 4.3365 1.4158E-05 0.01000 Rejected 0.02000 Rejected
4 PSO 4.1832 2.3283E-05 0.01250 Rejected 0.02500 Rejected
3 CSS 2.7563 0.00836 0.01666 Rejected 0.03333 Rejected
2 MCSS 1.5749 0.09412 0.25000 Rejected 0.05000 Rejected
1 MCSS-PSO 1.0503 0.17367 0.05000 Not Rejected 0.10000 Not Rejected
Table 16: Results of Holm’s Post-hoc test after rejection of null hypothesis for Quade Test using FM.
i Algorithms z values p values α/i, α=0.05 Hypothesis α/i, α=0.1 Hypothesis
7 GA 5.8371 < 0.00001 0.0833 Rejected 0.01429 Rejected
6 ACO 4.9637 1.8632E-06 0.0100 Rejected 0.01667 Rejected
5 PSO 4.6374 6.8584E-06 0.0125 Rejected 0.02000 Rejected
4 K-Means 4.5453 13196E-05 0.0071 Rejected 0.02500 Rejected
3 CSS 2.8425 0.00574 0.0167 Rejected 0.03333 Rejected
2 MCSS 1.6256 0.09256 0.2500 Rejected 0.05000 Rejected
1 MCSS-PSO 1.4238 0.07635 0.0500 Not Rejected 0.10000 Rejected
5.0 CONCLUSION
In this work, a hybrid magnetic charge system search (HMCSS) is proposed for efficient data clustering. In the proposed
algorithm, a modified solution search equation is suggested using DE algorithm to enhance the exploitation ability of the
MCSS algorithm. Moreover, a concept of local search strategy is also incorporated into the proposed MCSS-PSO
algorithm to explore the more promising solutions and also to handle boundary constraints. The designed concept and
detailed procedure are presented with the help of illustrative example. The proposed algorithm is also capable to
overcome the local optima problem and explores the more promising solution direction. The capability of proposed
algorithm is tested for solving clustering problems. The experimental results demonstrate the better performance of the
20. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
127
Malaysian Journal of Computer Science. Vol. 31(2), 2018
proposed algorithm in comparison to other clustering algorithms being compared. In addition to it, some statistical tests
are also performed to prove the efficacy of the proposed algorithm. Finally, it can be concluded that the proposed
HMCSS is more efficient and robust algorithm for clustering problems.
REFERENCES
[1] M.R. Anderberg, Cluster Analysis for Application, New York, Academic Press, 1973.
[2] J.A. Hartigan, Clustering Algorithms, New York, Wiley, 1975.
[3] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. New Jersey, Prentice-Hall, 1988.
[4] S. Milan, V. Hlavac, and R. Boyle. Image processing, analysis, and machine vision, Champion and Hall, 1998, pp
2-6.
[5] P. Teppola, S. P. Mujunen, and P. Minkkinen, “Adaptive Fuzzy C-Means clustering in process monitoring”,
Chemometrics and intelligent laboratory systems, Vol. 45, No. 1, 1999, pp. 23-38.
[6] A. Webb, Statistical pattern recognition, New Jersey, John Wiley & Sons, 2002, pp. 361–406.
[7] H. Zhou, and L. Yonghuai, “Accurate integration of multi-view range images using k-means clustering”, Pattern
Recognition, Vol. 41, No. 1, 2008, pp. 152-175.
[8] E. Alpaydin, Introduction to machine learning, MIT press, 2004.
[9] WJ. Dunn III, M.J. Greenberg, and S. C. Soledad, “Use of cluster analysis in the development of structure-activity
relations for antitumor triazenes”, Journal of medicinal chemistry, Vol. 19, No. 11, 1976, pp. 1299-1301.
[10] Y. He, W. Pan, and L. Jizhen, “Cluster analysis using multivariate normal mixture models to detect differential
gene expression with microarray data”, Computational statistics & data analysis, Vol. 51, No. 2, 2006, pp. 641-658.
[11] R. Xu and D.C. Wunsch, Clustering, Oxford, Wiley, 2009.
[12] S. Das, A. Abraham, A. Konar, Meta-heuristic Clustering, Springer, 2009, ISBN 3540921729.
[13] E.R. Hruschka, R.J.G.B. Campello, A.A. Freitas and A.C.P.L.F. De Carvalho, “A survey of evolutionary
algorithms for clustering”, IEEE Transactions on Systems, Man and Cybernetics., Vol. 39, No. 2, 2009, pp. 133–
155.
[14] A. Abraham and S. Das, S. Roy, “Swarm intelligence algorithms for data clustering”, Soft Computing for
Knowledge Discovery and Data Mining, Springer, 2007, pp. 279–313.
[15] S. Basu, I. Davidson, K. Wagstaff, Constrained Clustering: Advances in Algorithms, Theory and Applications,
Data Mining and Knowledge Discovery, Chapman and Hall/CRC, 2008.
[16] A.K. Jain, “Data clustering: 50 years beyond K-means”, Pattern Recognition Letters, Vol. 31, No. 8, 2010, pp 651-
666.
[17] J. MacQueen, “Some methods for classification and analysis of multivariate observations”, in Proceedings of the
fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, 1967, pp 281-297.
21. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
128
Malaysian Journal of Computer Science. Vol. 31(2), 2018
[18] C. A. Murthy and N. Chowdhury, “In search of optimal clusters using genetic algorithms”, Pattern Recognition
Letters, Vol. 17, No. 8, 1996, pp 825-832.
[19] R.J. Kuo and L.M. Lin, “Application of a hybrid of genetic algorithm and particle swarm optimization algorithm
for order clustering”, Decision Support Systems, Vol. 49, 2010, pp 451–462.
[20] C. Y. Chen and F. Ye, “Particle swarm optimization algorithm and its application to clustering analysis”, In IEEE
international conference on in Networking, Sensing and Control, Vol. 2, 2004, pp. 789-794.
[21] T. Niknam and B. Amiri, “An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis”,
Applied Soft Computing, Vol. 10, No. 1, 2010, pp 183–197.
[22] P. S. Shelokar, V. K. Jayaraman, and B. D. Kulkarni, “An ant colony approach for clustering”, Analytica Chimica
Acta, Vol. 509, No. 2, 2004, pp 187-195.
[23] A. N. Sinha, N. Das, and G. Sahoo, “Ant colony based hybrid optimization for data clustering”, Kybernetes, Vol.
36, No. 2, 2007, pp.175 – 191.
[24] D. Karaboga and C. Ozturk, “A novel clustering approach: Artificial Bee Colony (ABC) algorithm”, Applied Soft
Computing, Vol. 11, No. 1, pp 652–657.
[25] X. Yan, Y. Zhu, W. Zou and L. Wang, “A new approach for data clustering using hybrid artificial bee colony
algorithm”, Neurocomputing, Vol. 97, 2012, pp 241–250.
[26] S. C. Satapathy and A. Naik, “Data clustering based on teaching-learning-based optimization”, In Swarm,
Evolutionary, and Memetic Computing, Cochin, India, 2011, pp. 148-156.
[27] A. J. Sahoo, and Y. Kumar, “Modified Teacher Learning Based Optimization Method for Data Clustering”, In
Advances in Signal Processing and Intelligent Recognition Systems, Kerla, India, 2014, pp. 429-437.
[28] Y. Kumar and G. Sahoo, “A charged system search approach for data clustering”, Progress in Artificial
Intelligence, Vol. 2, No. 2-3, 2014, pp 153-166.
[29] Y. Kumar and G. Sahoo, “A chaotic charged system search approach for data clustering”, Informatica, Vol. 38,
No. 3, 2014, pp 249-261.
[30] S. Z. Selimand and K. Alsultan, “A simulated annealing algorithm for the clustering problem”, Pattern
recognition, Vol. 24, No. 10, 1991, pp 1003-1008.
[31] U. Maulik and A. Mukhopadhyay, “Simulated annealing based automatic fuzzy clustering combined with ANN
classification for analyzing microarray data”, Computers & Operations Research, Vol. 37, No. 8, 2010, pp 1369-
1380.
[32] C. S. Sung and H. W. Jin, “Atabu-search-based heuristic for clustering”, Pattern Recognition, Vol. 33, No. 5, 2000,
pp 849-858.
[33] Y. Kumar and G. Sahoo, “A Hybrid Data Clustering Approach Based on improved Cat Swarm Optimization and
K- Harmonic Mean Algorithm”, AI communications, Vol. 28, No. 4, 2015, pp, 751-764.
[34] Y. Kumar and G. Sahoo, “A Hybridize Approach for Data Clustering Based on Cat Swarm Optimization”,
International Journal of Information and Communication Technology, Vol. 9, No. 1, 2016, pp. 117-141.
22. Hybridization of Magnetic Charge System Search Method for Efficient Data Clustering. pp 108-129
129
Malaysian Journal of Computer Science. Vol. 31(2), 2018
[35] S. Budi, and M. K. Ningrum, “Cat swarm optimization for clustering”, In International Conference on Soft
Computing and Pattern Recognition (SOCPAR'09), 2009, pp. 54-59.
[36] Y. Kumar and G. Sahoo, “An improved Cat Swarm Optimization Algorithm for Data Clustering”, In International
Conference on Computational Intelligence in Data Mining, Odisha, India, Vol. 1, 2014, pp. 187-197.
[37] Y. Kumar, S. Gupta, and G. Sahoo, “A Clustering Approach Based on Charged Particle”, International Journal of
Software Engineering and Its Applications, Vol. 10, No. 3, 2016, pp. 9-28.
[38] Y. Kumar and G. Sahoo, “Hybridization of magnetic charge system search and particle swarm optimization for
efficient data clustering using neighborhood search strategy”, Soft Computing, Vol. 19, No. 12, 2015, pp. 3621-
3645.
[39] A. Kaveh, A., M. A. M. Share, and M. Moslehi, “Magnetic charged system search: a new meta-heuristic algorithm
for optimization”,Acta Mechanica, Vol. 224, No. 1, 2013, pp 85-107.
[40] R. Storn and K. Price, “Differential evolution – a simple and efficient heuristic for global optimization over
continuous spaces”, Journal of Global Optimization, Vol. 23, 2010, pp. 689–694.
[41] M. Moohebat, R.G. Raj, D. Thorleuchter and S. Abdul-Kareem. “Linguistic Feature Classifying and Tracing”.
Malaysian Journal of Computer Science, Vol. 30, No.2, 2017, pp 77-90.
[42] R.G. Raj and S. Abdul-Kareem, “Information Dissemination And Storage For Tele-Text Based Conversational
Systems' Learning”, Malaysian Journal of Computer Science, Vol. 22 No. 2, 2009. pp. 138-159.
[43] A. Qazi, R. G. Raj, M. Tahir, M. Waheed, S. U. R. Khan, and A. Abraham, “A Preliminary Investigation of User
Perception and Behavioral Intention for Different Review Types: Customers and Designers Perspective,” The
Scientific World Journal, Vol. 2014, Article ID 872929, 8 pages, 2014. doi:10.1155/2014/872929.