This document presents a new automatic clustering algorithm called Automatic Merging for Optimal Clusters (AMOC). AMOC is a two-phase iterative extension of k-means clustering that aims to automatically determine the optimal number of clusters for a given dataset. In the first phase, AMOC initializes a large number of clusters k using k-means. In the second phase, it iteratively merges the lowest probability cluster with its closest neighbor, recomputing metrics each time to evaluate if the merge improved clustering quality. The algorithm stops merging once no improvements are found. Experimental results on synthetic and real datasets show AMOC finds nearly optimal cluster structures in terms of number, compactness and separation of clusters.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A HYBRID K-HARMONIC MEANS WITH ABCCLUSTERING ALGORITHM USING AN OPTIMAL K VAL...IJCI JOURNAL
Large quantities of data are emerging every year and an accurate clustering algorithm is needed to derive
information from these data. K-means clustering algorithm is popular and simple, but has many limitations
like its sensitivity to initialization, provides local optimum solutions. K-harmonic means clustering is an
improved variant of K-means which is insensitive to the initialization of centroids, but still in some cases it
ends up with local optimum solutions. Clustering using Artificial Bee Colony (ABC) algorithm always gives
global optimum solutions. In this paper a new hybrid clustering algorithm (KHM-ABC) is presented by
combining both K-harmonic means and ABC algorithm to perform accurate clustering. Experimental
results indicate that the performance of the proposed algorithm is superior to the available algorithms in
terms of the quality of clusters.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
Data clustering is a process of arranging similar data into groups. A clustering algorithm
partitions a data set into several groups such that the similarity within a group is better than
among groups. In this paper a hybrid clustering algorithm based on K-mean and K-harmonic
mean (KHM) is described. The proposed algorithm is tested on five different datasets. The research is focused on fast and accurate clustering. Its performance is compared with the traditional K-means & KHM algorithm. The result obtained from proposed hybrid algorithm is much better than the traditional K-mean & KHM algorithm
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A HYBRID K-HARMONIC MEANS WITH ABCCLUSTERING ALGORITHM USING AN OPTIMAL K VAL...IJCI JOURNAL
Large quantities of data are emerging every year and an accurate clustering algorithm is needed to derive
information from these data. K-means clustering algorithm is popular and simple, but has many limitations
like its sensitivity to initialization, provides local optimum solutions. K-harmonic means clustering is an
improved variant of K-means which is insensitive to the initialization of centroids, but still in some cases it
ends up with local optimum solutions. Clustering using Artificial Bee Colony (ABC) algorithm always gives
global optimum solutions. In this paper a new hybrid clustering algorithm (KHM-ABC) is presented by
combining both K-harmonic means and ABC algorithm to perform accurate clustering. Experimental
results indicate that the performance of the proposed algorithm is superior to the available algorithms in
terms of the quality of clusters.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
Data clustering is a process of arranging similar data into groups. A clustering algorithm
partitions a data set into several groups such that the similarity within a group is better than
among groups. In this paper a hybrid clustering algorithm based on K-mean and K-harmonic
mean (KHM) is described. The proposed algorithm is tested on five different datasets. The research is focused on fast and accurate clustering. Its performance is compared with the traditional K-means & KHM algorithm. The result obtained from proposed hybrid algorithm is much better than the traditional K-mean & KHM algorithm
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Editor IJCATR
Scheduling jobs to resources in grid computing is complicated due to the distributed and heterogeneous nature of the resources.
The purpose of job scheduling in grid environment is to achieve high system throughput and minimize the execution time of applications.
The complexity of scheduling problem increases with the size of the grid and becomes highly difficult to solve effectively.
To obtain a good and efficient method to solve scheduling problems in grid, a new area of research is implemented. In this paper, a job
scheduling algorithm is proposed to assign jobs to available resources in grid environment. The proposed algorithm is based on Ant
Colony Optimization (ACO) algorithm. This algorithm is combined with one of the best scheduling algorithm, Suffrage. This paper uses
the result of Suffrage in proposed ACO algorithm. The main contribution of this work is to minimize the makespan of a given set of
jobs. The experimental results show that the proposed algorithm can lead to significant performance in grid environment.
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATAacijjournal
A well-constructed classification model highly depends on input feature subsets from a dataset, which may contain redundant, irrelevant, or noisy features. This challenge can be worse while dealing with medical datasets. The main aim of feature selection as a pre-processing task is to eliminate these features and select the most effective ones. In the literature, metaheuristic algorithms show a successful performance to find optimal feature subsets. In this paper, two binary metaheuristic algorithms named S-shaped binary Sine Cosine Algorithm (SBSCA) and V-shaped binary Sine Cosine Algorithm (VBSCA) are proposed for feature selection from the medical data. In these algorithms, the search space remains continuous, while a binary position vector is generated by two transfer functions S-shaped and V-shaped for each solution. The proposed algorithms are compared with four latest binary optimization algorithms over five medical datasets from the UCI repository. The experimental results confirm that using both bSCA variants enhance the accuracy of classification on these medical datasets compared to four other algorithms.
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Editor IJCATR
Grid computing is a hardware and software infrastructure and provides affordable, sustainable, and reliable access. Its aim is
to create a supercomputer using free resources. One of the challenges to the Grid computing is scheduling problem which is regarded
as a tough issue. Since scheduling problem is a non-deterministic issue in the Grid, deterministic algorithms cannot be used to improve
scheduling. In this paper, a combination of imperialist competition algorithm (ICA) and gravitational attraction is used for to address the
problem of independent task scheduling in a grid environment, with the aim of reducing the makespan and energy. Experimental results
compare ICA with other algorithms and illustrate that ICA finds a shorter makespan and energy relative to the others. Moreover, it
converges quickly, finding its optimum solution in less time than the other algorithms.
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...jaumebp
This work assesses the performance of the BioHEL data mining method to handle large-scale datasets, and proposes a representation to deal efficiently with domains with mixed discrete-continuous attributes
THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...ijait
This paper discusses the design of active controllers for achieving generalized projective synchronization (GPS) of identical hyperchaotic Lü systems (Chen, Lu, Lü and Yu, 2006), identical hyperchaotic Cai systems (Wang and Cai, 2009) and non-identical hyperchaotic Lü and hyperchaotic Cai systems. The synchronization results (GPS) for the hyperchaotic systems have been derived using active control method and established using Lyapunov stability theory. Since the Lyapunov exponents are not required for these calculations, the active control method is very effective and convenient for achieving the GPS of the
hyperchaotic systems addressed in this paper. Numerical simulations are provided to illustrate the effectiveness of the GPS synchronization results derived in this paper.
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
Proposing a Scheduling Algorithm to Balance the Time and Energy Using an Impe...Editor IJCATR
Computational grids have become an appealing research area as they solve compute-intensive problems within the scientific
community and in industry. A grid computational power is aggregated from a huge set of distributed heterogeneous workers; hence, it
is becoming a mainstream technology for large-scale distributed resource sharing and system integration. Unfortunately, current grid
schedulers suffer from the haste problem, which is the schedule inability to successfully allocate all input tasks. Accordingly, some tasks
fail to complete execution as they are allocated to unsuitable workers. Others may not start execution as suitable workers are previously
allocated to other peers. This paper presents an imperialist competition algorithm (ICA) method to solve the grid scheduling problems.
The objective is to minimize the makespan and energy of the grid. Simulation results show that the grid scheduling problem can be
solved efficiently by the proposed method
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesCSCJournals
Large-scale analysis of genome sequences is in progress around the world, the major application of which is to establish the evolutionary relationship among the species using phylogenetic trees. Hierarchical agglomerative algorithms can be used to generate such phylogenetic trees given the distance matrix representing the dissimilarity among the species. ClustalW and Muscle are two general purpose programs that generates distance matrix from the input DNA or protein sequences. The limitation of these programs is that they are based on Smith-Waterman algorithm which uses dynamic programming for doing the pair-wise alignment. This is an extremely time consuming process and the existing systems may even fail to work for larger input data set. To overcome this limitation, we have used the frequency of codons usage as an approximation to find dissimilarity among species. The proposed technique further reduces the complexity by extracting only the significant features of the species from the mtDNA sequences using the techniques like frequent codons, codons with maximum range value or PCA technique. We have observed that the proposed system produces nearly accurate results in a significantly reduced running time.
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
Data clustering is a common technique for statistical data analysis; it is defined as a class of
statistical techniques for classifying a set of observations into completely different groups. Cluster analysis
seeks to minimize group variance and maximize between group variance. In this study we formulate a
mathematical programming model that chooses the most important variables in cluster analysis. A nonlinear
binary model is suggested to select the most important variables in clustering a set of data. The idea of the
suggested model depends on clustering data by minimizing the distance between observations within groups.
Indicator variables are used to select the most important variables in the cluster analysis.
COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...csandit
This study introduces and compares different methods for estimating the two parameters of
generalized logarithmic series distribution. These methods are the cuckoo search optimization,
maximum likelihood estimation, and method of moments algorithms. All the required
derivations and basic steps of each algorithm are explained. The applications for these
algorithms are implemented through simulations using different sample sizes (n = 15, 25, 50,
100). Results are compared using the statistical measure mean square error.
Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
ABSTRACT: Genetic Algorithms (GAs) have become increasingly useful over the years for solving combinatorial problems. Though they are generally accepted to be good performers among metaheuristic algorithms, most works have concentrated on the application of the GAs rather than the theoretical justifications. In this paper, we examine and justify the suitability of Genetic Algorithms in solving complex, multi-variable and multi-modal optimization problems. To achieve this, a simple Genetic Algorithm was used to solve four standard complicated optimization test functions, namely Rosenbrock, Schwefel, Rastrigin and Shubert functions. These functions are benchmarks to test the quality of an optimization procedure towards a global optimum. We show that the method has a quicker convergence to the global optima and that the optimal values for the Rosenbrock, Rastrigin, Schwefel and Shubert functions are zero (0), zero (0), -418.9829 and -14.5080 respectively
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Editor IJCATR
Scheduling jobs to resources in grid computing is complicated due to the distributed and heterogeneous nature of the resources.
The purpose of job scheduling in grid environment is to achieve high system throughput and minimize the execution time of applications.
The complexity of scheduling problem increases with the size of the grid and becomes highly difficult to solve effectively.
To obtain a good and efficient method to solve scheduling problems in grid, a new area of research is implemented. In this paper, a job
scheduling algorithm is proposed to assign jobs to available resources in grid environment. The proposed algorithm is based on Ant
Colony Optimization (ACO) algorithm. This algorithm is combined with one of the best scheduling algorithm, Suffrage. This paper uses
the result of Suffrage in proposed ACO algorithm. The main contribution of this work is to minimize the makespan of a given set of
jobs. The experimental results show that the proposed algorithm can lead to significant performance in grid environment.
BINARY SINE COSINE ALGORITHMS FOR FEATURE SELECTION FROM MEDICAL DATAacijjournal
A well-constructed classification model highly depends on input feature subsets from a dataset, which may contain redundant, irrelevant, or noisy features. This challenge can be worse while dealing with medical datasets. The main aim of feature selection as a pre-processing task is to eliminate these features and select the most effective ones. In the literature, metaheuristic algorithms show a successful performance to find optimal feature subsets. In this paper, two binary metaheuristic algorithms named S-shaped binary Sine Cosine Algorithm (SBSCA) and V-shaped binary Sine Cosine Algorithm (VBSCA) are proposed for feature selection from the medical data. In these algorithms, the search space remains continuous, while a binary position vector is generated by two transfer functions S-shaped and V-shaped for each solution. The proposed algorithms are compared with four latest binary optimization algorithms over five medical datasets from the UCI repository. The experimental results confirm that using both bSCA variants enhance the accuracy of classification on these medical datasets compared to four other algorithms.
Proposing a New Job Scheduling Algorithm in Grid Environment Using a Combinat...Editor IJCATR
Grid computing is a hardware and software infrastructure and provides affordable, sustainable, and reliable access. Its aim is
to create a supercomputer using free resources. One of the challenges to the Grid computing is scheduling problem which is regarded
as a tough issue. Since scheduling problem is a non-deterministic issue in the Grid, deterministic algorithms cannot be used to improve
scheduling. In this paper, a combination of imperialist competition algorithm (ICA) and gravitational attraction is used for to address the
problem of independent task scheduling in a grid environment, with the aim of reducing the makespan and energy. Experimental results
compare ICA with other algorithms and illustrate that ICA finds a shorter makespan and energy relative to the others. Moreover, it
converges quickly, finding its optimum solution in less time than the other algorithms.
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...jaumebp
This work assesses the performance of the BioHEL data mining method to handle large-scale datasets, and proposes a representation to deal efficiently with domains with mixed discrete-continuous attributes
THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...ijait
This paper discusses the design of active controllers for achieving generalized projective synchronization (GPS) of identical hyperchaotic Lü systems (Chen, Lu, Lü and Yu, 2006), identical hyperchaotic Cai systems (Wang and Cai, 2009) and non-identical hyperchaotic Lü and hyperchaotic Cai systems. The synchronization results (GPS) for the hyperchaotic systems have been derived using active control method and established using Lyapunov stability theory. Since the Lyapunov exponents are not required for these calculations, the active control method is very effective and convenient for achieving the GPS of the
hyperchaotic systems addressed in this paper. Numerical simulations are provided to illustrate the effectiveness of the GPS synchronization results derived in this paper.
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Feature selection using modified particle swarm optimisation for face recogni...eSAT Journals
Abstract
One of the major influential factors which affects the accuracy of classification rate is the selection of right features. Not all features have vital role in classification. Many of the features in the dataset may be redundant and irrelevant, which increase the computational cost and may reduce classification rate. In this paper, we used DCT(Discrete cosine transform) coefficients as features for face recognition application. The coefficients are optimally selected based on a modified PSO algorithm. In this, the choice of coefficients is done by incorporating the average of the mean normalized standard deviations of various classes and giving more weightage to the lower indexed DCT coefficients. The algorithm is tested on ORL database. A recognition rate of 97% is obtained. Average number of features selected is about 40 percent for a 10 × 10 input. The modified PSO took about 50 iterations for convergence. These performance figures are found to be better than some of the work reported in literature.
Keywords: Particle swarm optimization, Discrete cosine transform, feature extraction, feature selection, face recognition, classification rate.
Proposing a Scheduling Algorithm to Balance the Time and Energy Using an Impe...Editor IJCATR
Computational grids have become an appealing research area as they solve compute-intensive problems within the scientific
community and in industry. A grid computational power is aggregated from a huge set of distributed heterogeneous workers; hence, it
is becoming a mainstream technology for large-scale distributed resource sharing and system integration. Unfortunately, current grid
schedulers suffer from the haste problem, which is the schedule inability to successfully allocate all input tasks. Accordingly, some tasks
fail to complete execution as they are allocated to unsuitable workers. Others may not start execution as suitable workers are previously
allocated to other peers. This paper presents an imperialist competition algorithm (ICA) method to solve the grid scheduling problems.
The objective is to minimize the makespan and energy of the grid. Simulation results show that the grid scheduling problem can be
solved efficiently by the proposed method
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesCSCJournals
Large-scale analysis of genome sequences is in progress around the world, the major application of which is to establish the evolutionary relationship among the species using phylogenetic trees. Hierarchical agglomerative algorithms can be used to generate such phylogenetic trees given the distance matrix representing the dissimilarity among the species. ClustalW and Muscle are two general purpose programs that generates distance matrix from the input DNA or protein sequences. The limitation of these programs is that they are based on Smith-Waterman algorithm which uses dynamic programming for doing the pair-wise alignment. This is an extremely time consuming process and the existing systems may even fail to work for larger input data set. To overcome this limitation, we have used the frequency of codons usage as an approximation to find dissimilarity among species. The proposed technique further reduces the complexity by extracting only the significant features of the species from the mtDNA sequences using the techniques like frequent codons, codons with maximum range value or PCA technique. We have observed that the proposed system produces nearly accurate results in a significantly reduced running time.
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
Data clustering is a common technique for statistical data analysis; it is defined as a class of
statistical techniques for classifying a set of observations into completely different groups. Cluster analysis
seeks to minimize group variance and maximize between group variance. In this study we formulate a
mathematical programming model that chooses the most important variables in cluster analysis. A nonlinear
binary model is suggested to select the most important variables in clustering a set of data. The idea of the
suggested model depends on clustering data by minimizing the distance between observations within groups.
Indicator variables are used to select the most important variables in the cluster analysis.
COMPARING THE CUCKOO ALGORITHM WITH OTHER ALGORITHMS FOR ESTIMATING TWO GLSD ...csandit
This study introduces and compares different methods for estimating the two parameters of
generalized logarithmic series distribution. These methods are the cuckoo search optimization,
maximum likelihood estimation, and method of moments algorithms. All the required
derivations and basic steps of each algorithm are explained. The applications for these
algorithms are implemented through simulations using different sample sizes (n = 15, 25, 50,
100). Results are compared using the statistical measure mean square error.
Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
ABSTRACT: Genetic Algorithms (GAs) have become increasingly useful over the years for solving combinatorial problems. Though they are generally accepted to be good performers among metaheuristic algorithms, most works have concentrated on the application of the GAs rather than the theoretical justifications. In this paper, we examine and justify the suitability of Genetic Algorithms in solving complex, multi-variable and multi-modal optimization problems. To achieve this, a simple Genetic Algorithm was used to solve four standard complicated optimization test functions, namely Rosenbrock, Schwefel, Rastrigin and Shubert functions. These functions are benchmarks to test the quality of an optimization procedure towards a global optimum. We show that the method has a quicker convergence to the global optima and that the optimal values for the Rosenbrock, Rastrigin, Schwefel and Shubert functions are zero (0), zero (0), -418.9829 and -14.5080 respectively
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
Feature selection is one of the most common and critical tasks in database classification. It
reduces the computational cost by removing insignificant and unwanted features. Consequently, this
makes the diagnosis process accurate and comprehensible. This paper presents the measurement of
feature relevance based on fuzzy entropy, tested with Radial Basis Classifier (RBF) network,
Bagging(Bootstrap Aggregating), Boosting and stacking for various fields of datasets. Twenty
benchmarked datasets which are available in UCI Machine Learning Repository and KDD have been
used for this work. The accuracy obtained from these classification process shows that the proposed
method is capable of producing good and accurate results with fewer features than the original
datasets.
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
This paper presents an approach based on applying an aggregated predictor formed by multiple versions of a multilayer neural network with a back-propagation optimization algorithm for helping the engineer to get a list of the most appropriate well-test interpretation models for a given set of pressure/ production data. The proposed method consists of three stages: (1) data decorrelation through principal component analysis to reduce the covariance between the variables and the dimension of the input layer in the artificial neural network, (2) bootstrap replicates of the learning set where the data is repeatedly sampled with a random split of the data into train sets and using these as new learning sets, and (3) automatic reservoir model identification through aggregated predictor formed by a plurality vote when predicting a new class. This method is described in detail to ensure successful replication of results. The required training and test dataset were generated by using analytical solution models. In our case, there were used 600 samples: 300 for training, 100 for cross-validation, and 200 for testing. Different network structures were tested during this study to arrive at optimum network design. We notice that the single net methodology always brings about confusion in selecting the correct model even though the training results for the constructed networks are close to 1. We notice also that the principal component analysis is an effective strategy in reducing the number of input features, simplifying the network structure, and lowering the training time of the ANN. The results obtained show that the proposed model provides better performance when predicting new data with a coefficient of correlation approximately equal to 95% Compared to a previous approach 80%, the combination of the PCA and ANN is more stable and determine the more accurate results with lesser computational complexity than was feasible previously. Clearly, the aggregated predictor is more stable and shows less bad classes compared to the previous approach.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Cuckoo algorithm with great deluge local-search for feature selection problemsIJECEIAES
Feature selection problem is concerned with searching in a dataset for a set of features aiming to reduce the training time and enhance the accuracy of a classification method. Therefore, feature selection algorithms are proposed to choose important features from large and complex datasets. The cuckoo search (CS) algorithm is a type of natural-inspired optimization algorithms and is widely implemented to find the optimum solution for a specified problem. In this work, the cuckoo search algorithm is hybridized with a local search algorithm to find a satisfactory solution for the problem of feature selection. The great deluge (GD) algorithm is an iterative search procedure, that can accept some worse moves to find better solutions for the problem, also to increase the exploitation ability of CS. The comparison is also provided to examine the performance of the proposed method and the original CS algorithm. As result, using the UCI datasets the proposed algorithm outperforms the original algorithm and produces comparable results compared with some of the results from the literature.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A comparative study of three validities computation methods for multimodel ap...IJECEIAES
The multimodel approach offers a very satisfactory results in modelling, diagnose and control of complex systems. In the modelling case, this approach passes by three steps: the determination of the model’s library, the validities computation and the establishment of the final model. In this context, this paper focuses on the elaboration of a comparative study between three recent methods of validities computation. Thus, it highlight the method that offers the best performances in term of precision. To achieve this goal, we apply, these three methods on two simulation examples in order to compare their performances.
Single parent mating in genetic algorithm for real robotic system identificationIAESIJAI
System identification (SI) is a method of determining a mathematical model
for a system given a set of input-output data. A representation is made using
a mathematical model based on certain specified assumptions. In SI, model
structure selection is a step where a model structure perceived as an adequate
system representation is selected. A typical rule is that the final model must
have a good balance between parsimony and accuracy. As a popular search
method, genetic algorithm (GA) is used for selecting a model structure.
However, the optimality of the final model depends much on the
effectiveness of GA operators. This paper presents a mating technique
named single parent mating (SPM) in GA for use in a real robotic SI. This
technique is based on the chromosome structure of the parents such that a
single parent is sufficient in achieving mating that eases the search for the
optimal model. The results show that using three different objective
functions (Akaike information criterion, Bayesian information criterion and
parameter magnitude–based information criterion 2) respectively, GA with
the mating technique is able to find more optimal models than without the
mating technique. Validations show that the selected models using the
mating technique are acceptable.
Applying Genetic Algorithms to Information Retrieval Using Vector Space ModelIJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
This study introduces and compares different methods for estimating the two parameters of generalized logarithmic series distribution. These methods are the cuckoo search optimization, maximum likelihood estimation, and method of moments algorithms. All the required derivations and basic steps of each algorithm are explained. The applications for these algorithms are implemented through simulations using different sample sizes (n = 15, 25, 50, 100). Results are compared using the statistical measure mean square error.
USING CUCKOO ALGORITHM FOR ESTIMATING TWO GLSD PARAMETERS AND COMPARING IT WI...ijcsit
This study introduces and compares different methods for estimating the two parameters of generalized logarithmic series distribution. These methods are the cuckoo search optimization, maximum likelihood estimation, and method of moments algorithms. All the required derivations and basic steps of each algorithm are explained. The applications for these algorithms are implemented through simulations using different sample sizes (n = 15, 25, 50, 100). Results are compared using the statistical measure mean square error.
This study introduces and compares different methods for estimating the two parameters of generalized logarithmic series distribution. These methods are the cuckoo search optimization, maximum likelihood estimation, and method of moments algorithms. All the required derivations and basic steps of each algorithm are explained. The applications for these algorithms are implemented through simulations using different sample sizes (n = 15, 25, 50, 100). Results are compared using the statistical measure mean square error.
APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL IJCSEA Journal
Genetic algorithms are usually used in information retrieval systems (IRs) to enhance the information retrieval process, and to increase the efficiency of the optimal information retrieval in order to meet the users’ needs and help them find what they want exactly among the growing numbers of available information. The improvement of adaptive genetic algorithms helps to retrieve the information needed by the user accurately, reduces the retrieved relevant files and excludes irrelevant files. In this study, the researcher explored the problems embedded in this process, attempted to find solutions such as the way of choosing mutation probability and fitness function, and chose Cranfield English Corpus test collection on mathematics. Such collection was conducted by Cyrial Cleverdon and used at the University of Cranfield in 1960 containing 1400 documents, and 225 queries for simulation purposes. The researcher also used cosine similarity and jaccards to compute similarity between the query and documents, and used two proposed adaptive fitness function, mutation operators as well as adaptive crossover. The process aimed at evaluating the effectiveness of results according to the measures of precision and recall. Finally, the study concluded that we might have several improvements when using adaptive genetic algorithms.
Similar to An Automatic Clustering Technique for Optimal Clusters (20)
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSveerababupersonal22
It consists of cw radar and fmcw radar ,range measurement,if amplifier and fmcw altimeterThe CW radar operates using continuous wave transmission, while the FMCW radar employs frequency-modulated continuous wave technology. Range measurement is a crucial aspect of radar systems, providing information about the distance to a target. The IF amplifier plays a key role in signal processing, amplifying intermediate frequency signals for further analysis. The FMCW altimeter utilizes frequency-modulated continuous wave technology to accurately measure altitude above a reference point.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
An Automatic Clustering Technique for Optimal Clusters
1. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
DOI : 10.5121/ijcsea.2011.1412 133
An Automatic Clustering Technique for Optimal
Clusters
1
K. Karteeka Pavan, 2
Allam Appa Rao, 3
A.V. Dattatreya Rao
1
Department of Computer Applications, Rayapati Venkata Ranga Rao and Jagarlamudi
Chadramouli College of Engineering, Guntur, India
2
Jawaharlal Nehru Technological University, Kakinada, India
3
Department of Statistics, Acharya Nagarjuna University, Guntur, India
1
karteeka@yahoo.com, 2
apparaoallam@gmail.com, 3
avdrao@gmail.com
Abstract:
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for
Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets
automatically. The AMOC is an extension to standard k-means with a two phase iterative procedure
combining certain validation techniques in order to find optimal clusters with automation of merging of
clusters. Experiments on both synthetic and real data have proved that the proposed algorithm finds nearly
optimal clustering structures in terms of number of clusters, compactness and separation.
Keywords : Clustering, Optimal clusters, k-means, validation technique
1 Introduction
The two fundamental questions in data clustering are to find number of clusters and their
compositions. There are many clustering algorithms to answer the latter problem, but not many
methods for the former problem. Although a number of clustering methods have been proposed
for the latter problem, they are facing the difficulties in meeting the requirements of automation,
quality, simplicity and efficiency. Discovering an optimal number of clusters in a large data set is
usually a challenging task. Cheung [20] studied a rival penalized competitive learning algorithm
[9 -10] that has demonstrated a very good result in finding the cluster number. The algorithm is
formulated by learning the parameters of a mixture model through the maximization of a
weighted likelihood function. In the learning process, some initial seed centers move to the
genuine positions of the cluster centers in a data set, and other redundant seed points will stay at
the boundaries or outside of the clusters. Bayesian-Kullback Ying-Yang proposed a unified
algorithm for both unsupervised and supervised learning [13], which provides a reference for
solving the problem of selection of the cluster number. Lee and Antonsson [2] used an
evolutionary method to dynamically cluster a data set. Sarkar,et al. [11] and Fogel et al. [8] are
proposed an approach to dynamically cluster a data set using evolutionary programming, where
two fitness functions are simultaneously optimized: one gives the optimal number of clusters,
whereas the other leads to a proper identification of each cluster’s centroid. Recently Swagatam
Das and Ajith Abraham [18] proposed an Automatic Clustering using Differential Evolution
(ACDE) algorithm by introducing a new chromosome representation and Jain [1] explained few
more methods to select k, the number of clusters. The majority of these methods to determine the
2. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
134
best number of clusters may not work very well in practice. The clustering algorithms are
required to be run several times for good solution, and model-based methods, such as cross-
validation and penalized likelihood estimation, are computationally expensive.
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic
Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the
given datasets automatically. The AMOC is an extension to standard k-means, which combines
the validation techniques into the clustering process so that high quality clustering results can be
produced. The technique is a two-phase iterative procedure. In the first phase it produces clusters
for a large k. In the second phase, iteratively a low probability cluster is merged with its closest
cluster using a validation technique. Experiments on both synthetic and real data sets from UCI
prove that the proposed algorithm finds nearly optimal results in terms of compactness and
separation.
Section (2) deals with formulation of the proposed algorithm, while section (3) illustrates the
effectiveness of the new algorithm experimenting results on synthetic, real, and micro array data
sets. Finally concluding remarks are included in section (4).
2. Automatic Merging for Optimal Clusters (AMOC)
Let P = {P1, P2,… , Pm} be a set of m objects in which each object Pi is represented
as[pi,1,pi,2,…pi,n] where n is the number of features. The algorithm accepts large kmax as the upper
bound of the number of clusters and is taken to be m by intuition [12]. It iteratively merges the
lower probability cluster with its closest cluster according to average linkage and validates the
merging result using Rand Index.
Steps:
1. Initialize kmax = m
2. Assign kmax objects randomly to the cluster centroids
3. Find the clusters using k-means
4. Compute Rand index
5. Find a cluster that has least probability and merge with its closest cluster. Recompute
centroids, Rand index and decrement the number of clusters by one. If the newly
computed Rand index is greater than the previous Rand index, then update Rand
Index, number of clusters and cluster centroids with the newly computed values.
6. If step 5 has been executed for each and every cluster, then go to step7, otherwise got
to step5.
7. If there is a change in number of clusters, then go to step2, otherwise stop.
3. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
135
3. Experimental Results
To evaluate the performance of AMOC, we have tested it using both simulated and real data. The
clustering results of AMOC are compared with these of k-means, fuzzy-kmeans, and Automatic
clustering using Differential Evolution (ACDE) that determines optimal clusters automatically.
The results are validated with the Rand, Adjusted Rand, DB, CS and Silhouette cluster validity
measures and by identifying error rate using number of misclassifications.
In this AMOC the choice of initial centroids were selected at random and also done as suggested
by Arthu and Vassilvitskii [4]. The performance of the algorithm is also compared with k-
means++ [4].
The k-means and Fuzzy-kmeans algorithms are implemented with the number of clusters as equal
to the number of classes in the ground truth.
3.1 Experimental Data
The efficiency of new algorithms are evaluated by conducting experiments on five artificial data
sets, three real datasets down loaded from the web site UCI and two microarray data sets (two
yeast data sets) downloaded from http://www.cs. washington.edu/homes/kayee/cluster [7].
The real data sets used:
1. Iris plants database (m = 150, n = 4, K = 3)
2. Glass (m = 214, n = 9, K = 6)
3. Wine (m = 178, n = 13, K = 3)
The real microarray data sets used:
1. The yeast cell cycle data [15] showed the fluctuation of expression levels of
approximately 6000 genes over two cell cycles (17 time points). We used two different
subsets of this data with independent external criteria. The first subset (the 5-phase
criterion) consists of 384 genes whose expression levels peak at different time points
corresponding to the five phases of cell cycle [15]. We expect clustering results to
approximate this five class partition. Hence, we used the 384 genes with the 5- phase
criterion as one of our data sets.
2. The second subset (the MIPS criterion) consists of 237 genes corresponding to four
categories in the MIPS database [6]. The four categories (DNA synthesis and replication,
organization of centrosome, nitrogen and sulphur metabolism, and ribosomal proteins)
were shown to be reflected in clusters from the yeast cell cycle data [16].
The five synthetic data sets from Np(µ, ∑) with specified mean vector and variance covariance
matrix are as follows.
5. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
137
3.2 Presentation of Results
In this paper, while comparing the performance of AMOC with the other techniques we are
concentrating on two major issues: 1) quality of the solution as determined by Error rate and
cluster validity measures Rand, Adjusted Rand, DB, CS and Silhouette, 2) ability to find the
optimal number of clusters. Since all the algorithms produce different results in different
individual runs, we have taken 40 independent runs of each algorithm. The Rand [19], Adjusted
Rand, DB [5], CS [3] and Silhouette [14] metrics values and the overall error rate of the mean-of-
run solutions provided by the algorithms over the 10 datasets have been provided in Table-1. The
table also shows the mean number of classes determined by each algorithm except k-means and
fuzzy-k. All the results presented in this table are averages over 40 independent runs of each
algorithm. The minimum and maximum error rates those found in 40 independent runs of each
algorithm on each dataset are also tabulated in Table-1.
The above observations are presented graphically in the following figures. Figure1. to Figure2
represent the number of clusters identified by AMOC and ACDE in 40 independent runs. The
figures demonstrate that AMOC is performing well when compared to ACDE in determining the
clusters. Figure3 represent error rates obtained in 40 independent runs by AMOC and ACDE.
Figures 4 to Figure5 are the clusters and their centroids obtained during the execution of the
AMOC, in each iteration when choice of the initial k is 9.
Table-1:Validity measures along with error rates
Dataset Algorithm No. of
clusters, k
Mean values of Cluster Validity
Measures
Error rate
i/p
k
o/p k ARI RI SIL DB CS Mean Least Maximu
m
Synthetic
1
k-means 2 2
0.92 0.96
0.83
9
0.46
7
0.645
0.236 1.714 2.286
k-means++
0.925
0.96
2
0.83
9
0.46
6
0.567
1.914 1.714 2.286
fuzk
0.899 0.95
0.83
9
0.46
8
0.52
2.571 2.571 2.571
AMOC(rand)
19
2 0.92 0.96
0.83
9
0.46
7 0.749 2.029 1.714 2.286
AMOC(kmp
p) 2 0.925
0.96
3
0.83
9
0.46
6 0.749 1.905 1.714 2.286
ACDE 3.05 0.85
0.92
5
0.64
3
0.77
2 1.348 51.56 0 96
Synthetic
2
k-means 4 4
0.821
0.92
7
0.71
8 0.58
1.178
19.1 2.4 67
k-means++
0.883
0.95
3
0.77
6
0.51
9
1.21
7.16 2.4 59.8
fuzk
0.944
0.97
9
0.79
1
0.48
4
0.931 2.2
2.2 2.2
AMOC(rand)
22
3.05 0.694
0.86
7
0.73
8
0.55
9 1.067 46.5 2.4 80.2
AMOC(kmp
p) 3.8 0.885
0.95
3
0.78
8
0.49
9 0.946 8.79 2.4 34.4
ACDE 5.35 0.885
0.95
7 0.68
0.67
4 1.321 58.89 2.4 96.2
Synthetic
3
k-means 3 3
0.957 0.98
0.81
3
0.50
9
0.87 2.242
1 1
k-means++
0.97
0.98
7
0.82
3
0.76
1
0.92 1
1 50.67
fuzk
0.97
0.98
7
0.82
3
0.5 0.96 1
1 1
8. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
140
Figure2. Number of clusters of Synthetic2 data set
Figure3. Error rates obtained for Iris data set
Figure3. Error rates obtained for Iris data set
Number of clusters determined for the synthetic2
dataset
0
1
2
3
4
5
6
7
8
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
numberofclusters
ACDE
AMOC
Errorrates obtained by various algorithms for Iris dataset
0
10
20
30
40
50
60
70
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
errorrate
kmeans
kmeans++
fuzzyk
ACDE
AMOC
9. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
141
Figure 4.The results
obtained by AMOC for
the Synthetic2 data set
when initial k=9.
Starting with initial
clusters to final clusters
and their obtained
centers. The obtained
centers are marked with
‘ ‘ whereas original
centers are marked in red
color triangles
10. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
142
Table 2.error rates of various algorithms
Data set AMOC(rand) AMOC(kmpp) SPSS k-means kmpp Fuzzy-k ACDE
synthetic1 2.209 1.905 1.714 2.236 1.914 2.571 51.56
synthetic2 46.5 8.79 2.4 19.1 7.16 2.2 58.89
Synthetic3 7.6 4.317 1 2.242 1 1 83.59
Synthetic4 88.28 25.31 0 51.27 10.96 8.738 53.21
synthetic5 69.94 65.22 52.22 53.9 54.42 48.61 71.31
Iris 29.42 17.69 50.67 15.77 13.37 15.33 10.17
Wine 41.01 41.01 30.34 34.58 33.54 30.34 52.89
Glass 68.75 66.42 45.79 55.86 56.1 62.29 54.35
Yeast1 79.35 37.25 35.44 35.74 37.49 39.18 81.86
Yeast2 55.14 38.21 43.23 38.35 40 35.73 44.95
Comments on the results of AMOC
The errors rates obtained from various algorithms vs data are presented in table 2.
• From the above table it is observed that the AMOC either producing best clusters than
ACDE or performing equally well
• The results of AMOC show that average error rates is equally good when compared to
those of k-means, k-means++, fuzzyk and SPSS
• The results of AMOC show that they are far better when compared to ACDE in most of
the case.
• The best error rate over 40 runs of AMOC is very much comparable to the existing
algorithms mentioned in the above observations.
• The maximum error rate over 40 runs of AMOC appears to be the least when compared
to those of existing algorithms.
• The quality of AMOC in terms of Rand index is 70%.
Figure5.The results obtained by AMOC for the
Iris data set when initial k=9. Starting with
initial clusters to final clusters and their
obtained centers. The obtained centers are
marked with ‘ ‘ .
11. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
143
• Recently Sudhakar Jonnalagadda and Rajagopalan Srinivasan [17] developed a method
that determined 5 clusters from yeast2 data set where as the almost all existing methods
finds as 4. The proposed AMOC is also find 5 clusters from yeast2 data
Note: Results of CS, HI, ARI, etc., are very much in agreement with above all observations in the
performance of AMOC, hence detailed note with respect to them is not provided to avoid
duplication.
5. Conclusion
AMOC is ideally free from parameter. Though the AMOC require possible large k as input, the
input number of clusters does not affect the output number of clusters. The experimental results
have shown the performance of AMOC in finding optimal clusters automatically.
References
[1] A.K. Jain “Data Clustering: 50 Years Beyond K-Means” , Pattern Recognition letters, 31, 2010, pp
651-666
[2 ]C. Y. Lee and E.K. Antonsson “Self-adapting vertices for mask-layout synthesis,” in Proc. Model.
Simul. Microsyst. Conf., M. Laudon and B. Romanowicz, Eds., San Diego, CA,2000, pp. 83–86.
[3] C.H. Chou, M.C.Su, E. Lai , “A new cluster validity measure and its application to image
compression,” Pattern Anal. Appl., 7, 2, 2004, pp. 205–220
[4] D. Arthu and S. Vassilvitskii , “K-means++: The advantages of careful seeding, proceeding of the
18th Annual ACM-SIAM Symposium of Discrete Analysis”,7-9, ACM Press, New Orleans,
Louisiana, 2007, pp: 1027-1035.
[5] D.L. Davies and D.W. Bouldin, A cluster separation measure. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 1, 1979, pp.224–227.
[6] H.W. Mewes, K. Heumann, A. Kaps, K. Mayer, F. Pfeiffer, S. Stocker, and D. Frishman. MIPS: a
database for protein sequience and complete genomes. Nucleic Acids Research, 27, 1999:44-48.
[7] K. Y. Yeung“Cluster analysis of gene expression data. In PhD thesis University of Washington”,
2001
[8] L. Fogel, A. J. Owens, and M.J.Walsh, “Artificial Intelligence Through Simulated Evolution”. New
York: Wiley.1996
[9] L. Xu, “How Many Clusters: A Ying-Yang Machine Based Theory for a Classical Open Problem in
Pattern Recognition,” Proc. IEEE Int’l Conf. Neural Networks ICNN ’96 , 3, 1996, pp. 1546-1551.
[10] L. Xu, “Rival Penalized Competitive Learning, Finite Mixture, and Multisets Clustering,” Pattern
Recognition Letters, 18, 11- 13, 1997, pp. 1167-1178.
12. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.1, No.4, August 2011
144
[11] M. Sarkar, B. Yegnanarayana, and D. Khemani, “A clustering algorithm using an evolutionary
programming-based approach,” Pattern Recognit. Lett., 18, 10, 1997, pp. 975–986.
[12] N.R.Pal and J.C. Bezdek “On Cluster Validity for the Fuzzy C-Means Model,” IEEE Trans. Fuzzy
Systems, 3,3, 1995, pp. 370- 379.
[13] P. Guo, C.L. Chen, and M.R.Lyu, “Cluster Number Selection for a Small Set of Samples Using the
Bayesian Ying-Yang Model,” IEEE Trans. Neural Networks, 13,3, 2002, pp. 757-763
[14] P.J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis”.
Journal of Computational and Applied Mathematics, 20, 1987 pp.53–65.
[15] R.J. Cho, M.J. Campbell, E.A. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T.G. Wolfsberg , A.
E.Gabrielian, D. Landsman, D. J. Lockhart, and R.W.Davis , “A genome-wide transcriptional
analysis of the mitotic cell cycle,” Mol. Cell, 2, 1,1998, pp. 65–73.
[16] S. Tavazoie, J.D.Huges, M. J. Campbell, R.J. Cho and G.M. Church “Systematic determination of
genetic network architecture”. Nature Genetics, 22, 1999, pp.281–285.
[17] Sudhakar Jonnalagadda and Rajagopalan Srinivasan, “NIFTI: An evolutionary approach for finding
number of clusters in microarray data” BMC Bioinformatics, 10,40,2009, pp1-13.
[18] Swagatam Das, Ajith Abraham “Automatic Clustering Using An Improved Differential Evolution
Algorithm”, Ieee Transactions On Systems, Man, And Cybernetics—Part A: Systems And Humans,
38, 1,2008, pp218-237.
[19] W.M. Rand “Objective criteria for the evaluation of clustering methods. Journal of the American
Statistical Association,” 66, 1971, pp.846-850.
[20] Y. Cheung , “Maximum Weighted Likelihood via Rival Penalized EM for Density Mixture Clustering
with Automatic Model Selection,” IEEE Trans. Knowledge and Data Eng., 17, 6, 2005, pp. 750-761.