The document describes a general multiobjective clustering approach based on multiple distance measures. It proposes two clustering methods, MOECDM and MOEACDM, that use multiple distance matrices and multiobjective evolutionary algorithms to partition datasets into clusters in different distance spaces simultaneously. The key steps include: (1) generating initial populations using different distance measures and pre-clustering, (2) designing objective functions to minimize distances within clusters in different spaces, and (3) applying evolutionary operators like crossover and mutation to optimize the objective functions.
One of the important steps in routing is to find a feasible path based on the state information. In order to support real-time multimedia applications, the feasible path that satisfies one or more constraints has to be computed within a very short time. Therefore, the paper presents a genetic algorithm to solve the paths tree problem subject to cost constraints. The objective of the algorithm is to find the set of edges connecting all nodes such that the sum of the edge costs from the source (root) to each node is minimized. I.e. the path from the root to each node must be a minimum cost path connecting them. The algorithm has been applied on two sample networks, the first network with eight nodes, and the last one with eleven nodes to illustrate its efficiency.
Improve the Performance of Clustering Using Combination of Multiple Clusterin...ijdmtaiir
The ever-increasing availability of textual
documents has lead to a growing challenge for information
systems to effectively manage and retrieve the information
comprised in large collections of texts according to the user’s
information needs. There is no clustering method that can
adequately handle all sorts of cluster structures and properties
(e.g. shape, size, overlapping, and density). Combining
multiple clustering methods is an approach to overcome the
deficiency of single algorithms and further enhance their
performances. A disadvantage of the cluster ensemble is the
highly computational load of combing the clustering results
especially for large and high dimensional datasets. In this paper
we propose a multiclustering algorithm , it is a combination of
Cooperative Hard-Fuzzy Clustering model based on
intermediate cooperation between the hard k-means (KM) and
fuzzy c-means (FCM) to produce better intermediate clusters
and ant colony algorithm. This proposed method gives better
result than individual clusters.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...IJCSEA Journal
A hybrid learning automata–genetic algorithm (HLGA) is proposed to solve QoS routing optimization problem of next generation networks. The algorithm complements the advantages of the learning Automato Algorithm(LA) and Genetic Algorithm(GA). It firstly uses the good global search capability of LA to generate initial population needed by GA, then it uses GA to improve the Quality of Service(QoS) and acquiring the optimization tree through new algorithms for crossover and mutation operators which are an NP–Complete problem. In the proposed algorithm, the connectivity matrix of edges is used for genotype representation. Some novel heuristics are also proposed for mutation, crossover, and creation of random individuals. We evaluate the performance and efficiency of the proposed HLGA-based algorithm in comparison with other existing heuristic and GA-based algorithms by the result of simulation. Simulation results demonstrate that this paper proposed algorithm not only has the fast calculating speed and high accuracy but also can improve the efficiency in Next Generation Networks QoS routing. The proposed algorithm has overcome all of the previous algorithms in the literature..
Effects of The Different Migration Periods on Parallel Multi-Swarm PSO csandit
In recent years, there has been an increasing inter
est in parallel computing. In parallel
computing, multiple computing resources are used si
multaneously in solving a problem. There
are multiple processors that will work concurrently
and the program is divided into different
tasks to be simultaneously solved. Recently, a cons
iderable literature has grown up around the
theme of metaheuristic algorithms. Particle swarm o
ptimization (PSO) algorithm is a popular
metaheuristic algorithm. The parallel comprehensive
learning particle swarm optimization
(PCLPSO) algorithm based on PSO has multiple swarms
based on the master-slave paradigm
and works cooperatively and concurrently. The migra
tion period is an important parameter in
PCLPSO and affects the efficiency of the algorithm.
We used the well-known benchmark
functions in the experiments and analysed the perfo
rmance of PCLPSO using different
migration periods.
One of the important steps in routing is to find a feasible path based on the state information. In order to support real-time multimedia applications, the feasible path that satisfies one or more constraints has to be computed within a very short time. Therefore, the paper presents a genetic algorithm to solve the paths tree problem subject to cost constraints. The objective of the algorithm is to find the set of edges connecting all nodes such that the sum of the edge costs from the source (root) to each node is minimized. I.e. the path from the root to each node must be a minimum cost path connecting them. The algorithm has been applied on two sample networks, the first network with eight nodes, and the last one with eleven nodes to illustrate its efficiency.
Improve the Performance of Clustering Using Combination of Multiple Clusterin...ijdmtaiir
The ever-increasing availability of textual
documents has lead to a growing challenge for information
systems to effectively manage and retrieve the information
comprised in large collections of texts according to the user’s
information needs. There is no clustering method that can
adequately handle all sorts of cluster structures and properties
(e.g. shape, size, overlapping, and density). Combining
multiple clustering methods is an approach to overcome the
deficiency of single algorithms and further enhance their
performances. A disadvantage of the cluster ensemble is the
highly computational load of combing the clustering results
especially for large and high dimensional datasets. In this paper
we propose a multiclustering algorithm , it is a combination of
Cooperative Hard-Fuzzy Clustering model based on
intermediate cooperation between the hard k-means (KM) and
fuzzy c-means (FCM) to produce better intermediate clusters
and ant colony algorithm. This proposed method gives better
result than individual clusters.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...IJCSEA Journal
A hybrid learning automata–genetic algorithm (HLGA) is proposed to solve QoS routing optimization problem of next generation networks. The algorithm complements the advantages of the learning Automato Algorithm(LA) and Genetic Algorithm(GA). It firstly uses the good global search capability of LA to generate initial population needed by GA, then it uses GA to improve the Quality of Service(QoS) and acquiring the optimization tree through new algorithms for crossover and mutation operators which are an NP–Complete problem. In the proposed algorithm, the connectivity matrix of edges is used for genotype representation. Some novel heuristics are also proposed for mutation, crossover, and creation of random individuals. We evaluate the performance and efficiency of the proposed HLGA-based algorithm in comparison with other existing heuristic and GA-based algorithms by the result of simulation. Simulation results demonstrate that this paper proposed algorithm not only has the fast calculating speed and high accuracy but also can improve the efficiency in Next Generation Networks QoS routing. The proposed algorithm has overcome all of the previous algorithms in the literature..
Effects of The Different Migration Periods on Parallel Multi-Swarm PSO csandit
In recent years, there has been an increasing inter
est in parallel computing. In parallel
computing, multiple computing resources are used si
multaneously in solving a problem. There
are multiple processors that will work concurrently
and the program is divided into different
tasks to be simultaneously solved. Recently, a cons
iderable literature has grown up around the
theme of metaheuristic algorithms. Particle swarm o
ptimization (PSO) algorithm is a popular
metaheuristic algorithm. The parallel comprehensive
learning particle swarm optimization
(PCLPSO) algorithm based on PSO has multiple swarms
based on the master-slave paradigm
and works cooperatively and concurrently. The migra
tion period is an important parameter in
PCLPSO and affects the efficiency of the algorithm.
We used the well-known benchmark
functions in the experiments and analysed the perfo
rmance of PCLPSO using different
migration periods.
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...IJCNCJournal
This paper presents a method for constructing intrusion detection systems based on efficient fuzzy rulebased
classifiers. The design process of a fuzzy rule-based classifier from a given input-output data set can
be presented as a feature selection and parameter optimization problem. For parameter optimization of
fuzzy classifiers, the differential evolution is used, while the binary harmonic search algorithm is used for
selection of relevant features. The performance of the designed classifiers is evaluated using the KDD Cup
1999 intrusion detection dataset. The optimal classifier is selected based on the Akaike information
criterion. The optimal intrusion detection system has a 1.21% type I error and a 0.39% type II error. A
comparative study with other methods was accomplished. The results obtained showed the adequacy of the
proposed method
Here, in this paper we are introducing a dynamic clustering algorithm using fuzzy c-mean clustering algorithm. We will try to process several sets patterns together to find a common structure. The structure is finalized by interchanging prototypes of the given data and by moving the prototypes of the subsequent clusters toward each other. In regular FCM clustering algorithm, fixed numbers of clusters are chosen and those are pre-defined. If, in case, the number of chosen clusters is wrong, then the final result will degrade the purity of the cluster. In our proposed algorithm this drawback will be overcome by using dynamic clustering architecture. Here we will take fixed number of clusters in the beginning but on iterations the algorithm will increase the number of clusters automatically depending on the nature and type of data, which will increase the purity of the result at the end. A detailed clustering algorithm is developed on a basis of the standard FCM method and will be illustrated by means of numeric examples.
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...ijcseit
Many studies have been done in the area of Wireless Sensor Networks (WSNs) in recent years. In this kind of networks, some of the key objectives that need to be satisfied are area coverage, number of active sensors and energy consumed by nodes. In this paper, we propose a NSGA-II based multi-objective algorithm for optimizing all of these objectives simultaneously. The efficiency of our algorithm is demonstrated in the simulation results. This efficiency can be shown as finding the optimal balance point among the maximum coverage rate, the least energy consumption, and the minimum number of active nodes while maintaining the connectivity of the network
Scheduling Using Multi Objective Genetic Algorithmiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTINGIJNSA Journal
Network security has become more important role today to personal users and organizations. Denial-of-
Service (DoS) and Distributed Denial-of-Service (DDoS) attacks are serious problem in network. The
major challenges in design of an efficient algorithm in data stream are one-pass over the input, poly-log
space, poly-log update time and poly-log reporting time. In this paper, we use strongly explicit construction
d-disjunct matrices in Non-adaptive group testing (NAGT) to adapt these requirements and propose a
solution for fast detecting DoS and DDoS attacks based on NAGT approach
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
A C OMPARATIVE A NALYSIS A ND A PPLICATIONS O F M ULTI W AVELET T RANS...IJCI JOURNAL
In the era of telemedicine a large amount of medica
l information is exchanged via electronic media mos
tly
in the form of medical images, to improve the accur
acy and speed of diagnosis process. Medical Image
denoising has the basic importance in image analysi
s as these algorithm and procedures affects the eff
icacy
of medical diagnostic. In this paper focus is on Mu
lti wavelets based Image denoising techniques, beca
use
they provide the possibility of designing wavelets
systems which are orthogonal, symmetric and compact
ly
supported, simultaneously. Performance of Discrete
Multi Wavelet Transform and Discrete Wavelet
Transform based denoising methods are compared on t
he basis of PSNR
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problempaperpublications3
Abstract: In this work, the fuzzy nonlinear programming problem (FNLPP) has been developed and their result have also discussed. The numerical solutions of crisp problems and have been compared and the fuzzy solution and its effectiveness have also been presented and discussed. The penalty function method has been developed and mixed with Nelder and Mend’s algorithm of direct optimization problem solutionhave been used together to solve this FNLPP.
Keyword:Fuzzy set theory, fuzzy numbers, decision making, nonlinear programming, Nelder and Mend’s algorithm, penalty function method.
Found this paper really interesting. It delves into the learning behaviors of Deep Learning Ensembles and compares them Bayesian Neural Networks, which theoretically does the same thing. This answers why Deep Ensembles Outperform
Iterative Soft Decision Based Complex K-best MIMO DecoderCSCJournals
This paper presents an iterative soft decision based complex multiple input multiple output (MIMO) decoding algorithm, which reduces the complexity of Maximum Likelihood (ML) detector. We develop a novel iterative complex K-best decoder exploiting the techniques of lattice reduction for 8×8 MIMO. Besides list size, a new adjustable variable has been introduced in order to control the on-demand child expansion. Following this method, we obtain 6.9 to 8.0 dB improvement over real domain K-best decoder and 1.4 to 2.5 dB better performance compared to iterative conventional complex decoder for 4th iteration and 64-QAM modulation scheme. We also demonstrate the significance of new parameter on bit error rate. The proposed decoder not only increases the performance, but also reduces the computational complexity to a certain level.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been
successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is
extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The
kernel function is a generalization of the distance metric that measures the distance between two data points as the
data points are mapped into a high dimensional space. During the comparing of the four constructed classification
models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as
proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding
performance in this regard
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGIJORCS
Clustering plays a vital role in the various areas of research like Data Mining, Image Retrieval, Bio-computing and many a lot. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a biggest challenge. In this paper, we study various distance measures and their effect on different clustering. This paper surveys existing distance measures for clustering and present a comparison between them based on application domain, efficiency, benefits and drawbacks. This comparison helps the researchers to take quick decision about which distance measure to use for clustering. We conclude this work by identifying trends and challenges of research and development towards clustering.
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...IJCNCJournal
This paper presents a method for constructing intrusion detection systems based on efficient fuzzy rulebased
classifiers. The design process of a fuzzy rule-based classifier from a given input-output data set can
be presented as a feature selection and parameter optimization problem. For parameter optimization of
fuzzy classifiers, the differential evolution is used, while the binary harmonic search algorithm is used for
selection of relevant features. The performance of the designed classifiers is evaluated using the KDD Cup
1999 intrusion detection dataset. The optimal classifier is selected based on the Akaike information
criterion. The optimal intrusion detection system has a 1.21% type I error and a 0.39% type II error. A
comparative study with other methods was accomplished. The results obtained showed the adequacy of the
proposed method
Here, in this paper we are introducing a dynamic clustering algorithm using fuzzy c-mean clustering algorithm. We will try to process several sets patterns together to find a common structure. The structure is finalized by interchanging prototypes of the given data and by moving the prototypes of the subsequent clusters toward each other. In regular FCM clustering algorithm, fixed numbers of clusters are chosen and those are pre-defined. If, in case, the number of chosen clusters is wrong, then the final result will degrade the purity of the cluster. In our proposed algorithm this drawback will be overcome by using dynamic clustering architecture. Here we will take fixed number of clusters in the beginning but on iterations the algorithm will increase the number of clusters automatically depending on the nature and type of data, which will increase the purity of the result at the end. A detailed clustering algorithm is developed on a basis of the standard FCM method and will be illustrated by means of numeric examples.
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...ijcseit
Many studies have been done in the area of Wireless Sensor Networks (WSNs) in recent years. In this kind of networks, some of the key objectives that need to be satisfied are area coverage, number of active sensors and energy consumed by nodes. In this paper, we propose a NSGA-II based multi-objective algorithm for optimizing all of these objectives simultaneously. The efficiency of our algorithm is demonstrated in the simulation results. This efficiency can be shown as finding the optimal balance point among the maximum coverage rate, the least energy consumption, and the minimum number of active nodes while maintaining the connectivity of the network
Scheduling Using Multi Objective Genetic Algorithmiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTINGIJNSA Journal
Network security has become more important role today to personal users and organizations. Denial-of-
Service (DoS) and Distributed Denial-of-Service (DDoS) attacks are serious problem in network. The
major challenges in design of an efficient algorithm in data stream are one-pass over the input, poly-log
space, poly-log update time and poly-log reporting time. In this paper, we use strongly explicit construction
d-disjunct matrices in Non-adaptive group testing (NAGT) to adapt these requirements and propose a
solution for fast detecting DoS and DDoS attacks based on NAGT approach
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
A C OMPARATIVE A NALYSIS A ND A PPLICATIONS O F M ULTI W AVELET T RANS...IJCI JOURNAL
In the era of telemedicine a large amount of medica
l information is exchanged via electronic media mos
tly
in the form of medical images, to improve the accur
acy and speed of diagnosis process. Medical Image
denoising has the basic importance in image analysi
s as these algorithm and procedures affects the eff
icacy
of medical diagnostic. In this paper focus is on Mu
lti wavelets based Image denoising techniques, beca
use
they provide the possibility of designing wavelets
systems which are orthogonal, symmetric and compact
ly
supported, simultaneously. Performance of Discrete
Multi Wavelet Transform and Discrete Wavelet
Transform based denoising methods are compared on t
he basis of PSNR
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problempaperpublications3
Abstract: In this work, the fuzzy nonlinear programming problem (FNLPP) has been developed and their result have also discussed. The numerical solutions of crisp problems and have been compared and the fuzzy solution and its effectiveness have also been presented and discussed. The penalty function method has been developed and mixed with Nelder and Mend’s algorithm of direct optimization problem solutionhave been used together to solve this FNLPP.
Keyword:Fuzzy set theory, fuzzy numbers, decision making, nonlinear programming, Nelder and Mend’s algorithm, penalty function method.
Found this paper really interesting. It delves into the learning behaviors of Deep Learning Ensembles and compares them Bayesian Neural Networks, which theoretically does the same thing. This answers why Deep Ensembles Outperform
Iterative Soft Decision Based Complex K-best MIMO DecoderCSCJournals
This paper presents an iterative soft decision based complex multiple input multiple output (MIMO) decoding algorithm, which reduces the complexity of Maximum Likelihood (ML) detector. We develop a novel iterative complex K-best decoder exploiting the techniques of lattice reduction for 8×8 MIMO. Besides list size, a new adjustable variable has been introduced in order to control the on-demand child expansion. Following this method, we obtain 6.9 to 8.0 dB improvement over real domain K-best decoder and 1.4 to 2.5 dB better performance compared to iterative conventional complex decoder for 4th iteration and 64-QAM modulation scheme. We also demonstrate the significance of new parameter on bit error rate. The proposed decoder not only increases the performance, but also reduces the computational complexity to a certain level.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been
successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is
extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The
kernel function is a generalization of the distance metric that measures the distance between two data points as the
data points are mapped into a high dimensional space. During the comparing of the four constructed classification
models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as
proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding
performance in this regard
A COMPARATIVE STUDY ON DISTANCE MEASURING APPROACHES FOR CLUSTERINGIJORCS
Clustering plays a vital role in the various areas of research like Data Mining, Image Retrieval, Bio-computing and many a lot. Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a biggest challenge. In this paper, we study various distance measures and their effect on different clustering. This paper surveys existing distance measures for clustering and present a comparison between them based on application domain, efficiency, benefits and drawbacks. This comparison helps the researchers to take quick decision about which distance measure to use for clustering. We conclude this work by identifying trends and challenges of research and development towards clustering.
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
This paper proposes a clustering algorithm based on the Self Organizing Map (SOM) method. To find the optimal number of clusters, our algorithm uses the Davies Bouldin index which has not been used previously in the multi-SOM. The proposed algorithm is compared to three clustering methods based on five databases. Results show that our algorithm is as performing as concurrent methods.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
k-Means is a rather simple but well known algorithms for grouping objects, clustering. Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these point serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same. In this project we will discuss about K-means clustering algorithm, implementation and its application to the problem of unsupervised learning
Iterative Soft Decision Based Complex K-best MIMO DecoderCSCJournals
This paper presents an iterative soft decision based complex multiple input multiple output (MIMO) decoding algorithm, which reduces the complexity of Maximum Likelihood (ML) detector. We develop a novel iterative complex K-best decoder exploiting the techniques of lattice reduction for 8×8 MIMO. Besides list size, a new adjustable variable has been introduced in order to control the on-demand child expansion. Following this method, we obtain 6.9 to 8.0 dB improvement over real domain K-best decoder and 1.4 to 2.5 dB better performance compared to iterative conventional complex decoder for 4th iteration and 64-QAM modulation scheme. We also demonstrate the significance of new parameter on bit error rate. The proposed decoder not only increases the performance, but also reduces the computational complexity to a certain level.
Quantum inspired evolutionary algorithm for solving multiple travelling sales...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Spectroscopy or hyperspectral imaging consists in the acquisition, analysis, and extraction of the spectral information measured on a specific region or object using an airborne or satellite device. Hyperspectral imaging has become an active field of research recently. One way of analysing such data is through clustering. However, due to the high dimensionality of the data and the small distance between the different material signatures, clustering such a data is a challenging task.In this paper, we empirically compared five clustering techniques in different hyperspectral data sets. The considered clustering techniques are K-means, K-medoids, fuzzy Cmeans, hierarchical, and density-based spatial clustering of applications with noise. Four data sets are used to achieve this purpose which is Botswana, Kennedy space centre, Pavia, and Pavia University. Beside the accuracy, we adopted four more similarity measures: Rand statistics, Jaccard coefficient, Fowlkes-Mallows index, and Hubert index. According to accuracy, we found that fuzzy C-means clustering is doing better on Botswana and Pavia data sets, K-means and K-medoids are giving better results on Kennedy space centre data set, and for Pavia University the hierarchical clustering is better
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
A general multiobjective clustering approach based on multiple distance measures
1. A General Multiobjective Clustering Approach Based on Multiple
Distance Measure
Mehran Mesbahzade
May 2019 1
CONG LIU, JIE LIU, DUNLU PENG AND CHUNXUE WU
School of Optical-Electrical and Computer Engineering,
University of Shanghai for Science and Technology
https://ieeexplore.ieee.org/document/8421571/
3. Clustering analysis is an unsupervised learning approach that can find underlying substructures in a group of unlabeled
data points. And as an active subject, it is widely used in various research fields, such as data mining , pattern recognition,
image segmentation and computer vision. The task of clustering is to partition a dataset into many subgroups such that
objects within a group have a high similarity and objects belonging to different groups have a low similarity. A variety of
clustering methods have been developed for different goals and applications in specific areas. Partitioning clustering,
hierarchical clustering and density clustering are three most popular clustering techniques recently.
1.Introduction
3
4. In this section, we will describe the details of the two proposed multiple distance clustering methods, MOECDM and
MOEACDM. First of all, an appropriate multiobjective evolutionary algorithm should be designed. We apply NSGA-II
to play this role in our methods due to its fast nondominated sorting mechanism, and a series of changes are
applied to enable this algorithm to cope better with multiple distance clustering. NSGA-II is a well known
multiobjective evolutionary algorithm, and the other multiobjective evolutionary algorithms also can be considered
in here. Secondly, we need to get the distance matrices by using different distance measures, and each matrix
contains the relationship of each pair of objects in a specific distance space. And then the details of MOECDM can be
introduced, it can partition a dataset in different distance spaces at the same time.
2. MULTIPLE DISTANCE MEASURE FRAMEWORK
4
5. Let X = {x1, x2, · · · , xn} be a set of n data objects, where xi = (xi,1, xi,2, · · · , xi,h) ∈ Rh is a h-dimensional point in the continuous
feature space. DM is the distance matrix, and dij is the distance between xi and xj . Many distance measures can be applied
in here such as Euclidean distance, Path distance, Manifold distance and so on.
Next, we will split the DM instead of original dataset. The smaller of dij is, the closer between data xi and data xj. Suppose
that we shall divide the dataset X into k clusters C1, C2, · · · , Ck and |Ci| represents the number of data points in cluster Ci.
• Euclidean distance
• Path distance
The Path distance between a pair of points is defined as the minimum value of the maximum gap in any path which links
the pair of the points. Πi,j = {π|π = π0π1 · · · πk, 1 ≤ πs ≤ n, π0 = i, πk = j} denote all the possible paths which link xi and xj. The
Path distance between xi and xj is mathematically defined as
A. DISTANCE MATRICES DEFINITION
5
6. The Euclidean distance matrix DM1 and the Path distance matrix DM2 are defined as Equation 3.
A. DISTANCE MATRICES DEFINITION
6
7. 1) Representation and Initialization
In this scheme, each individual r consists of n genes r = {r1, r2, ..., rn} and the value of each gene ri is an integer, which
represents the data object xi will be assigned to the rith cluster. The range of ri falls into {1, ..., k}, where k is the cluster number.
All of the data objects having the same cluster label are then assigned to the same cluster. The label encoding scheme has a
major limitation that is it is usually time-consuming owing to the large-sized genes. To alleviate this limitation and to make
the individuals have a higher probability to lie in the promising search space, in this section, we apply three strategies to
produce the initial population P(0) = {r1(0), ..., rpop(0)}, where rl(0) represents the lth individual in the initial population and the
pop represents the population size. One of these strategies is to generate individuals at random, another strategy is to
generate individuals by NCUT pre-clustering based on Euclidean distance and the last one is to generate individuals by NCUT
pre-clustering based on Path distance.
The number of individuals generated by different strategies are adjusted by three parameters, pop_α1, pop_α2, pop_α3.
pop_α1 ∗ pop, pop_α2 ∗ pop and pop_α3 ∗ pop individuals are generated by the three strategies respectively.
B. MOECDM
7
8. B. MOECDM
8
The procedure of the initialization scheme can be seen in
Algorithm 1. Through this scheme, a set of individuals with two
distance measures characteristic have been generated in a
single run of MOECDM.
9. B. MOECDM
9
2) Objective Functions Designing
In this section, we propose a new multiobjective combination strategy, and two objective functions are designed in two
different distance spaces and optimized simultaneously by a multiobjective evolutionary algorithm.
It is well known that, the main rule of clustering is to minimize the compactness of each cluster, which is usually computed
by the distance between objects and either cluster’s centroids or medoids. However, the centroid of each cluster with
irregular structure is invalid and the medoid is hard to get by using many existing encoding schemes in evolutionary
algorithm. Therefore, in here, the objective functions will be established by minimizing the distance of each pair of objects
in the same cluster, and they can be written as Equation 4.
where d1
xy and d2
xy represent the two distance between object x and y respectively. The clustering results will be obtained by
minimizing JF1 and JF2 simultaneously.
10. B. MOECDM
10
3) Crossover Operator
In this subsection, each of individuals will be chosen to cross with a crossover individual in
a certain probability pc. We use two strategies to select the crossover individual. The first
is to select it from all of the current individuals, and the second is to select it from the
nondominated individuals. It is noted that, only one strategy can be used in once
crossover procedure. The first strategy can enhance the diversity of the new population
and the second strategy can push the population quickly towards the global optima. In
each generation, which strategy can be selected depending on a parameter, pc_α(g),
which is varied with the number of generation(g) and can be computed by Equation 5.
From this formula we can see that, the value of pc_α(g) decreases in the range between
[1, exp(1 − max_g)] as g increases. The details of the crossover operator are shown in
Algorithm 2.
11. B. MOECDM
11
4) Mutation Operator
The mutation operator is implemented after crossover operator. This operator can
be used to increase the diversity of population. According to the analysis above, the
value of each gene rcj
l(g + 1) is an integer from {1, ..., k}, which needs to be changed
with a probability pm. In this procedure, we also design two strategies to choose.
The one is to change rcj
l(g + 1) into another integer from
{1, ...rcj
l(g + 1) − 1, rcj
l(g + 1) + 1, ..., k} with a probability
1
𝐾−1
and the other is to
change it into rcj
best, which is the best individual evaluated till the current generation.
pm_α(g) is the selection parameter that determines which strategy can be chosen
and can be computed by Equation 6. The procedure of the mutation operator can
be seen in Algorithm 3.
12. B. MOECDM
12
5) Selection Operator
After mutation operator, two set of populations, current population P(g) = {r1(g), ..., rpop(g)} and the mutation population
Pm(g + 1) = {rm1(g + 1), ..., rmpop(g + 1)}, can be obtained. To keep the population size is equal with the subsequent
generations and ensure favorable solutions are preserved, a selection operator needs to be executed to preserve
individuals with high fitness and to eliminate individuals with low fitness. MOECDM employs the selection operator
proposed in NSGA-II to achieve this goal. In this selection operator, all of the individuals (P(g)∪Pm(g+1)) can be ranked
based on a nondominated sorting and a crowding distance estimation procedure, which is a measure that can
differentiate the importance of the individuals that have the same rank. The low-ranking and low-crowding individuals can
be selected at first. Though selection, the next generation population P(g + 1) = {r1(g + 1), ..., rpop(g + 1)} can be obtained.
13. C. MOEACDM
13
It is important to stress that the MOECDM makes sense when the number of cluster k is fixed. However, that does not hold
when k is variable. Indeed, it is straightforward to see that one can minimize the JF1 and JF2 by increasing the k, the
two functions will equal to zero in the limit case. The representation, crossover operator, mutation operator, selection
operator are similar as those in MOECDM.
1) Representation and Initialization
A slight difference between MOEACDM and MOECDM in representation is
the range of each gene is {1, ..., k} in MOECDM and that is {1, ..., kmax} in MOEACDM,
where kmax represents the maximum cluster number expected. Unlike MOECDM,
the best k∗ in MOEACDM is unknown.
Hence, the initial population will be generated by NCUT pre-clustering not
only with different distance measures but also with different cluster numbers.
The procedure of initialization for the MOEACDM is shown in Algorithm 4.
14. C. MOEACDM
14
2) Objective Functions Designing
In many literatures, validity index is widely used strategy that can detect the desirable cluster number by
looking for a balance between the compactness for each cluster and the separation between each pair of clusters.
The majority of the existing validity indices are inapplicable in here, similar with the analysis above, most of them are
designed based on Euclidean distance. We will try to design a new objective functions combination, each objective function
in which is based on a specific distance measure. Moreover, each objective function also requires to consider the balance of
compactness and separation in order to detect desirable cluster number automatically. Our inspiration comes from
Modularity method, which is a typical method used to detect network structures by the relationship of each pair of nodes.
where the definitions of d1
xy and d2
xy are the same as those in the previous section.
16. III. EXPERIMENTS OF MOECDM
16
A. DATASETS USED
In order to show the advantage of MOECDM for discovering different structures, we apply sixteen datasets to test the proposed
method. These datasets can be divided into four groups, spherical-type datasets, irregular type datasets, shape-type datasets and
real-life datasets. Spherical-type datasets contains four datasets named Data_separated1, Data_separated2, Data_connected1 and
Data_connected2, as shown in Figure 2. The clusters in each dataset have spherical structure and can be used to test the Euclidean
relationship in MOECDM. Irregular-type datasets contains four datasets as shown in Figure 3, they are Data_spiral, Data_rect,
Data_circle1 and Data_circle2. The four datasets have clusters with complex structure and can be used to test the Path relationship
in MOECDM.
17. III. EXPERIMENTS OF MOECDM
17
B. PARAMETERS SETTING
Population size(pop) and Maximum number of generations(max_g) are
two important parameters in evolutionary algorithm. For pop, on the
one hand, according to the introduction of section II-B1, the population
should contain individuals generated by using three strategies. Hence,
it should satisfy with pop > 3. We set to pop = 200 in MOECDM, and it
can prove to be more than sufficient to store all nondominated
solutions. For max_g, We set it to max_g = 200 in this experiment. The
values of pop_α1, pop_α2 and pop_α3 are set to 0.5, 0.25 and 0.25
respectively. The reason is we hope the individuals generated at random
are more than the other two strategies. For crossover operator and
mutation operator, the values of pc, pc_α, pm and pm_α are all set to 1.
The reason is the new population P(g + 1) is selected from P(g) ∪ Pm(g +
1), hence the Pm(g + 1) should has a large difference from P(g).
18. III. EXPERIMENTS OF MOECDM
18
C. PERFORMANCE MEASURES
In order to evaluate the performance of all the clustering algorithms quantitatively, in here, two measures, Rand index(R) and
F-measure(F), are used for evaluating the performance of the proposed method. They are defined as Equations 8 and 9
respectively.
where Precision =
𝑆𝑆
𝑆𝑆+𝑆𝐷
and Recall =
𝑆𝑆
𝑆𝑆+𝑆𝐷
.
SS is the number of pair points having same class labels belonging to the same clusters, SD is the number of pair points
having different class labels belonging to the same clusters, is the number of pair points having same class labels belonging
to different clusters, and DD is the number of pair points having different class labels belonging to the different clusters. Note
that higher the value of R(F), better is the clustering.
19. III. EXPERIMENTS OF MOECDM
19
D. ANALYSIS OF MOECDM
The first dataset adopted is Data_connected1 and the second dataset adopted
Is Data_circle1. Each cluster in this dataset is uniformly generated from a hyper-spherical
area, and connected with each other. The Pareto front and several clustering results are
presented in Figure 5. We can get ten clustering results corresponding to the
nondominated Pareto front. From left to right in Figure 5(a), J1 decreases with
the J2 increases. We select three meaningful partitions to analyze this nondominated
Pareto front. Figure 5(b) corresponds to the red point of the nondominated Pareto front.
It is the best clustering result, and the Rand value(F-measure) is 0.96(0.94).
FIGURE 5:
(a) The Pareto front of Data_connected1,
(b) The clustering result of the red point in Pareto front,
(c) The clustering result of the black point in Pareto front and
(d) The clustering result of the green point in Pareto front
FIGURE 6: (a) The Pareto front of Data_circle1, (b) The clustering result of
the red point in Pareto front, (c) The clustering result of the black point
in Pareto front and (d) The clustering result of the green point in Pareto front
The two experiments show that MOECDM is able to discover the
hidden distribution of each cluster with spherical and irregular structures.
20. III. EXPERIMENTS OF MOECDM
20
E. COMPARISON WITH OTHER METHODS
In this subsection, MOECDM is compared with three other approaches including Kmeans, FCM and NCUT. Kmeans
and FCM are two traditional clustering techniques, and they are designed based on Euclidean distance. NCUT is a spectral
clustering method and it can partition the relational dataset well. Relational dataset describes the distance between each
pair of objects. Hence, different distance measures and their combination can be applied to set up different matrices.
Here, we apply four strategies to set up the matrix, and they are
(1) Euclidean distance (NCUT(E)),
(2) Path distance (NCUT(P)),
(3) combining the Euclidean and Path distance measures directly (NCUT(E+P)), and
(4) normalizing the Euclidean and Path distance measures within the range [0, 1] (NCUT(Norm(E)+Norm(P))).
21. IV. EXPERIMENTS OF MOEACDM
21
Table 3 provides the comparison results of aforementioned
algorithms. Each algorithm is to execute twenty times on
each dataset. Mean_R, Max_R, Mean_F and Max_F show the
average Rand index, the maximum Rand index, the average
F-measures and the maximum F-measures respectively in
twenty runs.
TABLE 3: Comparison of the Rand index and F-measure
calculated by different methods
22. IV. EXPERIMENTS OF MOEACDM
23
A. ANALYSIS OF MOEACDM
In this subsection, we also apply two datasets to show the nondominated Pareto front obtained by MOEACDM
and corresponding clustering results. Data_separated1 and Data_spiral can be used in here. Figure 7 shows the nondominated
Pareto front and the clustering results by using MOEACDM for Data_separated1. In particular, it is only two points in the
nondominated Pareto front, and the two clustering results corresponding to the nondominated Pareto
front are very similar. A good explanation for this situation is that both of the two distance measures show the similar
relationship on this dataset. At the same time, table 4 also gives us the same view with Figure 7 .
TABLE 4: The Cluster number, Rand index and F-measure corresponding to different
nondominated Pareto front points for Data_separated1
FIGURE 7:
(a) The Pareto front of Data_separated1,
(b) The clustering result of the red point in the nondominated Pareto front and
(c) The clustering result of the green point in the nondominated Pareto
23. IV. EXPERIMENTS OF MOEACDM
24
Table 5 sketches the variations of J1, J2, cluster number, Rand value and F-measure
with ten different Pareto points. As can be seen from this table, the value of J1 is
decreasing and the value of J2 is increasing at the same time. As we all know, we
need to find out the trade-off between two objective functions. For this dataset,
Path distance is more dominant than Euclidean distance, therefore, the best
clustering result is the first Pareto point, which is the minimum of J2. Furthermore,
with the increasing of J1 and the decreasing of J2, the cluster number is far away
from the correct cluster number, and the Rand value(F-measure) is decreasing.
TABLE 5: The Cluster number, Rand index and F-measure corresponding to
different nondominated Pareto front points for Data_spiral
Figure 8 provides the nondominated Pareto front and three clustering results for
Data_spiral. Clearly the Figure 8(b) is the best clustering result, and it is the
first point in the nondominated Pareto front.
24. IV. EXPERIMENTS OF MOEACDM
25
C. DISCUSSION OF KMAX
The maximum cluster number expected, kmax, should be analyzed in this
subsection. It is an important parameter because it determines the size of search
space. On the one hand, a too large kmax will cause a waste of memory and
processing time. On the other hand, a too small kmax will increase the probability
of leaving k∗ out, thus becoming unreachable. Ideally, kmax = n is the best way,
however, the computation time is also high. We need to select a trade-off
between speed and accuracy of the proposed method.
Figure 9 plots the desirable cluster number obtained by using different kmax value
for four test datasets. The kmax value changes from 2 to 20 with a step size of
one. As can be seen from these figures, if the kmax is smaller than k∗, the kmax will
be detected as k∗. And, if the kmax is bigger than k∗, MOEACDM can always find
the optimal cluster number k∗ in all of the runs. The experimental results suggest
that MOEACDM is not sensitive to this parameter, kmax.
FIGURE 9: The desirable cluster number by using different values of kmax for four
test datasets
25. IV. EXPERIMENTS OF MOEACDM
26
Table 6 summarizes the clustering results obtained by aforementioned methods for the eight test datasets. Each algorithm is
to execute ten times on each dataset. CN denotes correct cluster number. AP(E) correctly identifies the correct number for
Data_separated1, Data_connected1,Iris and Soybean. The reason is the four datasets have clusters with spherical structure.
AP(P) correctly identifies the correct number for Data_separated1, Data_connected1, Data_spiral, Data_circle1, Spiral2 and Iris in
all runs, however, the values of Mean_R and Mean_F for Data_separated1, Data_connected1 and Iris are not the best.
TABLE 6: Comparison of the cluster number obtained(CN),
Mean Rand(Mean_R) and Mean F-measure(Mean_F) calculated
by compared approaches
26. V. DISCUSSION AND CONCLUSIONS
28
Clustering is an important task for discovering the hidden structures and is the subject of active research on several research fields
including information retrieval, finance, network management and medicine. The datasets in these fields may consist of a variety of
characteristics and have different structures. The existing evolutionary clustering approaches optimize one objective function and
they are hardly to cope with the dataset with multiple structures. In this paper, we propose a new Multi objective evolutionary
clustering framework to split dataset with different structures. Two objective functions designed based on two distance measures
are optimized simultaneously by a multi objective evolutionary algorithm. This approach takes into consideration of the
relationship between data points based on two different distance measures, therefore, it can split complex structures. However,
there are two limitations need to be improved. The first is the best solution is not chosen from nondominated Pareto front
automatically and the second is evolutionary algorithm is usually time-consuming. In the future, we will try to solve these
problems and take the approaches into application fields such as data analysis and image segmentation. In addition, testing other
combinations of distance measures is also a future work.
TABLE 7: The average running time (seconds) of MOECDM and MOEACDM
with different (pop,max_g) for eight datasets
به نام خدا
مقاله : یک رویکرد عمومی خوشه بندی چند منظوره بر اساس روش اندازه گیری فاصله چندگانه
بخش اول معرفی
معرفی چهار چوب اندازه گیری چند تایی
تعاریف MATRIX فاصله
معرفی MOECDM
معرفی MOEACDM
روش الگوریتم MOECDM
آزمایشات MOECDM
بحث و نتیجه گیری
1. معرفی
تجزیه و تحلیل CLUSTERING یک رویکرد یادگیری بی نظارت است که می تواند زیر ساخت های نهفته را در یک گروه از نقاط داده بدون برچسب مشاهده کند. و به عنوان یک موضوع فعال، آن را به طور گسترده ای در زمینه های مختلف تحقیق، مانند داده کاوی ، تشخیص الگو، تقسیم بندی تصویر و بینایی کامپیوتر استفاده می شود. وظیفه خوشه بندی این است که یک مجموعه داده را به بخش های مختلف تقسیم کند به طوری که اشیاء در یک گروه شباهت زیادی دارند و اشیاء متعلق به گروه های مختلف یک شباهت کم دارند. روش های مختلف خوشه بندی برای اهداف و برنامه های کاربردی خاص به ویژه ارائه شده اند. خوشه بندی پارتیشن بندی، خوشه بندی سلسله مراتبی و خوشه بندی تراکمی اخیرا سه تکنیک محبوب ترین خوشه بندی است.
2. چهار چوب اندازه گیری چند تایی
در این بخش، جزئیات دو روش خوشه بندی چند تایی پیشنهاد شده، MOECDM و MOEACDM را توصیف می کنیم. اول از همه، یک الگوریتم تکاملی چند منظوره مناسب باید طراحی شود. ما از NSGA-II استفاده می کنیم به دلیل مکانیزم مرتبسازی سریع این الگوریتم، و یک سری تغییرات اعمال می کنیم تا این الگوریتم برای خوشه بندی چندتایی بهتر عمل کند. NSGA-II یک الگوریتم تکاملی چند منظوره شناخته شده است و دیگر الگوریتم های تکاملی چند هدفه نیز می تواند در اینجا مورد توجه قرار گیرد.
در مرحله دوم، ما باید با استفاده از اندازه گیری های مختلف فاصله، ماتریس فاصله را بدست آوریم، و هر ماتریس شامل رابطه هر جفت شی در یک فضای فاصله خاص است. و سپس جزئیات MOECDM را تعریف می کنیم، می تواند یک مجموعه داده در فضاهای فاصله مختلف در یک زمان پارتیشن بندی کند.
تعاریف MATRIX فاصله
در این بخش، ما روش های اندازه گیری فاصله استفاده شده در روش ما معرفی می کنیم و ماتریس های فاصله را بدست می آوریم.
X = {x1, x2, · · · , xn} مجموعه ای از n داده ها است که در آن xi = (xi,1, xi,2, · · · , xi,h) ∈ Rh یک نقطه ی h بعدی در فضای مجهول پیوسته است. DM ماتریس فاصله است، و dij فاصله بین xi و xj است. در اینجا می توان از روش های مختلف اندازه گیری فاصله ای مانند فاصله اقلیدسی، فاصله مسیر، فاصله منیفولد و غیره استفاده کرد. سپس ، DM را به جای مجموعه داده اصلی تقسیم کنیم. هرچقدر مقدار dij کوچکتر باشد، فاصله بین data xi و data xj نزدیک تر است. فرض کنید که ما مجموعه داده X را به kکلاستر تقسیم کنیم. C1، C2، · · ·، Ck و | Ci | نشان دهنده تعداد نقاط داده در خوشه Ci است.
فاصله ی اقلیدسی
فاصله مسیر
فاصله مسیر بین یک جفت نقطه به عنوان حداقل مقدار حداکثر فاصله در هر مسیری است که جفت نقطه را به هم مرتبط می کند.
Πi,j = {π|π = π0π1 · · · πk, 1 ≤ πs ≤ n, π0 = i, πk = j}
تمام مسیرهای ممکن که xi و xj را پیوند می دهند را نشان می دهد. فاصله مسیر بین xi و xj به صورت ریاضی تعریف شده است
ماتریس فاصله اقلیدسی DM1 و DM2 ماتریس مسافت مسیر به عنوان معادله 3 تعریف شده اند.
ب. MOECDM
در این بخش چندین مولفه کلیدی MOECDM معرفی می شوند. آنها عبارتند از شمای طراحی، طرح اولیه سازی، اپراتور کراس اور، اپراتور جهش، اپراتور انتخاب و طراحی توابع هدف.
1) ارائه و مقدار دهی اولیه
در این طرح، هر فرد R شامل n ژن استr = {r1, r2, ..., rn} و ارزش هر ژن ri یک عدد صحیح است که نشان می دهد داده xi به خوشه rith اختصاص داده می شود.محدوده ri به {1، ...، k} تقسیم می شود، که k شماره خوشه است.
تمام اشیاء داده که دارای برچسب خوشه ای مشابه هستند، سپس به همان خوشه اختصاص داده می شوند. طرح برچسبگذاری یک محدودیت عمده دارد که به دلیل ژنهای بزرگ اندازه آن معمولا وقت گیر است. برای رفع این محدودیت و ایجاد احتمال بالایی برای افراد تا در فضای جستجو قرار بگیرند، در این بخش سه استراتژی استفاده شده برای تولید جمعیت اولیه P(0) = {r1(0), ..., rpop(0)} جایی که rl(0) نشان دهنده یك فرد در جمعیت اولیه است و پاپ نشان دهنده جمعیت است. یکی از این استراتژی ها تولید افراد به طور تصادفی است، استراتژی دیگری برای تولید افراد از طریق خوشه بندی NCUK بر اساس فاصله اقلیدسی است و آخرین راه کار این است که افراد را از طریق خوشه بندی NCUT بر اساس فاصله مسیر تولید کنیم.
تعداد افراد ایجاد شده توسط استراتژی های مختلف با سه پارامتر، pop_α1، pop_α2، pop_α3 تنظیم می شوند. P op_α1 * pop، pop_α2 * pop و pop_α3 * pop به ترتیب با سه استراتژی تولید می شوند.
روش طرح اولیه سازی را می توان در الگوریتم 1 مشاهده کرد. از طریق این طرح، مجموعه ای از افراد با دو اندازه گیری فاصله مشخص شده در یک اجرای MOECDM تولید شده است.
ورودی: تعداد خوشه ها k ، ماتریس های فاصله
اندازه جمعیت pop
و 3 پارامتر تنظیم
خروجی: جمعیت اولیه
2) طراحی توابع هدف
در بیشتر مقالات موجود، تعدادی از ترکیبات تابع هدف برای افزایش دقت نتیجه خوشه بندی پیشنهاد شده است. با این حال، همانطور که همه ما می دانیم، بسیاری از ترکیب ها بر اساس یک اندازه گیری از راه دور، مانند فاصله اقلیدس طراحی شده است. رسیدن به نتیجه قابل اعتماد برای مجموعه داده ها با ساختارهای پیچیده بسیار مشکل است.
برای حل این مسئله، در این بخش، ما یک استراتژی ترکیب چند بعدی جدید پیشنهاد می کنیم و دو تابع هدف در دو فضای فاصله مختلف طراحی شده و به طور همزمان با الگوریتم تکاملی چند هدفه بهینه می شوند.
به خوبی شناخته شده است که قاعده اصلی خوشه بندی این است که فشرده سازی هر خوشه به حداقل برسد، که معمولا با فاصله بین اجسام و یا centroids یا medoids خوشه محاسبه می شود. با این حال، centroid هر خوشه با ساختار نامنظم نامعتبر است و با استفاده از بسیاری از طرح های encoding موجود در الگوریتم تکاملی رسیدن به medoid سخت است.بنابراین، در اینجا، توابع هدف با به حداقل رساندن فاصله هر یک از جفت از اشیاء در یک خوشه، و آنها را می توان به عنوان معادله 4 نوشته شده است.
که در آن d1xy و d2xy به ترتیب دو فاصله بین شی x و y را نشان می دهند. نتایج خوشه بندی با دست آوردن همزمان JF1 و JF2 به دست می آید.
3) اپراتور Crossover
در این بخش، هر یک از افراد انتخاب می شود که با یک فرد Crossover با احتمال pc کراس شوند. فرد Crossover باید در افراد باقی مانده انتخاب شود. ما از دو راهبرد برای انتخاب فرد Crossover استفاده می کنیم. اول این که آن را از همه افراد فعلی انتخاب کنیم، و دوم این است که آن را از افراد غریبه انتخاب کنیم.
تنها یک استراتژی میتواند در یک فرآیند Crossover یکبار استفاده شود. استراتژی اول می تواند تنوع جمعیت جدید را ارتقا دهد و استراتژی دوم می تواند افراد را به سرعت به سوی بهترین نقطه global برساند.
در هر نسل، استراتژی را می توان بسته به یک پارامتر انتخاب کرد، pc_α (g) که با معادله 5 به دست می آید. از این فرمول می توانیم ببینیم که مقدار pc_α (g) در محدوده بین [1, exp(1 − max_g)] کاهش پیدا میکند هر چقدر g افزایش می یابد. در مرحله اول، استراتژی اول ممکن است در یک احتمال بزرگتر انتخاب شود. از سوی دیگر، در مرحله بعد، استراتژی دوم ممکن است با احتمال بیشتری انتخاب شود. جزئیات اپراتور کراس اوور در الگوریتم 2 نشان داده شده است.
ورودی های الگوریتم :
pc: احتمال Crossover
جمعیت نسل gام
خروجی: جمعیت کراس اوور
4) اپراتور جهش
اپراتور جهش پس از عمل Crossover اجرا می شود. این اپراتور میتواند برای افزایش تنوع جمعیت استفاده شود. مقدار هر ژن rcjl(g + 1) یک عدد صحیح از {1، ...، k} است که باید با احتمال pm تغییر کند. در این روش، ما نیز دو استراتژی را برای انتخاب طراحی می کنیم.
یکی تغییر rcjl(g + 1) به عدد صحیح دیگر از
{1, ...rcjl(g + 1) − 1, rcjl(g + 1) + 1, ..., k}
با احتمال k−11 و دیگری این است که آن را تغییر دهیم به rcjbestبهترین فردی است که تا نسل فعلی ارزیابی شده. pm_α (g) پارامتر انتخاب است که تعیین می کند که کدام استراتژی را می توان انتخاب کرد و می تواند با معادله 6 محاسبه شود. روش اپراتور جهش در الگوریتم 3 دیده می شود.
5) اپراتور انتخاب
پس از عملگر جهش، دو مجموعه جمعیت، جمعیت فعلی P (g) = {r1 (g)، ...، rpop (g)} و جمعیت جهش Pm (g + 1) = {rm1 (g + 1)، ...، rmpop (g + 1)} می تواند بدست آید. برای حفظ اندازه برابر جمعیت با نسل های بعد و اطمینان از اینکه راه حل های مطلوب حفظ می شوند، باید یک اپراتور انتخابی برای حفظ افراد با فیتنس بالا و حذف افراد با فیتنس پایین اجرا شود. MOECDM اپراتور انتخابی پیشنهاد شده در NSGA-II را برای رسیدن به این هدف به کار می گیرد. در این اپراتور انتخاب همه افراد (P (g) ∪Pm (g + 1)) می توانند براساس یک مرتب سازی نامغلوب و یک روش برآورد فاصله که یک اندازه است که می تواند اهمیت افراد دارای رتبه یکسان را تشخیص دهد. سپس، جمعیت نسل بعدی P(g + 1) = {r1(g + 1), ..., rpop(g + 1)} می توان به دست آید.
C. MOEACDM
یک نکته مهم این که MOECDM هنگامی که تعداد خوشه k ثابت باشد، برای ما منطقی است. با این حال، هنگامی که k متغیر است، عمل نمی کند. در حقیقت، ساده است که ببینیم که می توان JF1 و JF2 را با افزایش k به حداقل برسانیم، و این دو توابع برابر با صفر می شوند در برخی موارد. بسیاری از محققان تمایل دارند این پارامتر را به طور خودکار با جستجوی تعادل میان خوشه درونی و میان خوشه ای به طور خودکار شناسایی کنند. در این بخش، ما همچنین سعی می کنیم این پارامتر را به طور خودکار با طراحی یک توابع هدف مناسب در فضاهای فاصله مختلف بدست آوریم . روش، اپراتور crossover، اپراتور جهش، اپراتور انتخاب همانند MOECDM هستند. و در مورد مراحل مختلف، از جمله استراتژی اولیه سازی و طراحی توابع هدف، بحث خواهیم کرد.
1) نمایندگی و اولیه سازی
تفاوت اندکی بین MOEACDM و MOECDM در محدوده هر ژن {1، ...، k} در MOECDM است و {1، ...، kmax} در MOEACDM، که در آن kmax نشان دهنده حداکثر تعداد خوشه ای است که انتظار می رود . بر خلاف MOECDM، بهترین k * در MOEACDM ناشناخته است.
از این رو، جمعیت اولیه توسط NCUK تولید می شود نه تنها با فاصله های مختلف، بلکه با تعداد خوشه های مختلف نیز تولید می شود. روش اولیه برای MOEACDM در الگوریتم 4 نشان داده شده است.
2) طراحی توابع هدف
در بسیاری از مقالات، استراتژی "شاخص اعتبار" به طور گسترده ای استفاده می شود که می تواند تعداد خوشه مطلوب را با جستجوی تعادل بین فشرده سازی برای هر خوشه و جدایی بین هر جفت خوشه ها، شناسایی کند. اکثر شاخص های اعتبار موجود در اینجا غیر قابل استفاده هستند. ما سعی خواهیم کرد یک ترکیبی از تابع هدف جدید طراحی کنیم، هر تابع هدف که بر اساس یک معیار اندازه گیری خاص است. علاوه بر این، هر تابع هدف همچنین نیاز به تعادل فشرده سازی و جداسازی را در نظر می گیرد تا به طور خودکار خوشه مطلوب را شناسایی کند. الهام ما از روش Modularity است که یک روش معمول برای تشخیص ساختارهای شبکه با رابطه هر جفت گره است. بر اساس این ایده، دو تابع هدف جدید در فضاهای فاصله مختلف، به صورت معادله 7 بازنویسی خواهند شد.
که در آن تعاریف d1xy و d2xy همانند مقادیر قبلی است.
D. روش الگوریتم MOECDM (یا MOEACDM)روش های دقیق الگوریتم MOECDM (MOEACDM) همانطور که در شکل 1 نشان داده شده است:
~~~~~~~~~~~~~~~~~
قسمت اول : ورودی داده ها و پارامترها
پارامترهای خوشه بندی
خوشه خوشه k
برایMOECDM یا حداکثر مرز بالا از خوشه k_max برای MOEACDM
دیتا ست X
پارامترهای تکاملی: اندازه جمعیت pop حداکثر تعداد نسل max_g، پارامترهای راه اندازی pop_α1، pop_α2 و pop_α3، پارامترهای crossover pc و pc_α، پارامترهای جهش pm و pm_α، g = 0
~~~~~~~~~~~~~~~~~
قسمت دوم: آماده سازی
ماتریس فاصله اقلیدسیDM1 و DM2 ماتریس فاصله مسیر را محاسبه می کند
ایجاد جمعیت اولیه P (0) با استفاده از الگوریتم 1 برای MOECDM یا الگوریتم 4 برای MOEACDM
دو تابع هدف، J1 و J2 را برای هر فرد در جمعیت اولیه محاسبه کنید.
~~~~~~~~~~~~~~~~~
قسمت سوم: اپراتورهای تکاملی
g<max_g
اگر Yes باشد:
مرتب سازی نامغلوب :
مرتب سازی نامغلوب را بر روی جمعیت انجام می دهد و رتبه را به هر فرد اختصاص می دهد. فاصله ازدحام را برای هر مورد به دست می آورد.
با استفاده از الگوریتم 2، جمعیت متقاطع جمعیت P (g + 1) را تولید کنید
ایجاد جمعیت جهش Pm (g + 1) با استفاده از الگوریتم 3
انتخاب جمعیت برای ایجاد جمعیت بعدی P (g + 1) توسط طرح مرتب سازی نامغلوب و طرح فاصله ازدحام
اگر NO باشد
مرتب سازی نامنظم: انجام مرتب سازی غلط ناپذیر بر روی جمعیت نهایی را انجام دهید و راه حل های نهایی پارتو را بدست آورید
آزمایشات MOECDM
در این بخش، اثربخشی روش پیشنهادی، MOECDM را در ترکیب معیار های فاصله مختلف مورد آزمون قرار خواهیم داد. جزئیات به شرح زیر معرفی می شوند.
A. دیتاست های استفاده شده
برای نشان دادن مزیت MOECDM برای کشف ساختارهای مختلف، برای تست روش پیشنهادی، از 16 مجموعه داده استفاده می کنیم. این مجموعه داده ها را می توان به چهار گروه، مجموعه داده های کروی، مجموعه داده های غیر عادی و نامرتب، مجموعه داده های نوع شکل و مجموعه داده های واقعی تقسیم کرد. مجموعه داده های کروی شامل چهار مجموعه داده به نام Data_separated1، Data_separated2، Data_connected1 و Data_connected2 هستند، همانطور که در شکل 2 نشان داده شده است. خوشه ها در هر مجموعه داده دارای ساختار کروی هستند و می توانند برای آزمایش رابطه ی اقلیدسی در MOECDM استفاده شوند. داده های نوع نامنظم شامل چهار مجموعه داده است که در شکل 3 نشان داده شده است، Data_spiral، Data_rect، Data_circle1 و Data_circle2 هستند. چهار مجموعه داده دارای خوشه ای با ساختار پیچیده هستند و می توانند برای رابطه مسیر در MOECDM استفاده شوند.
این دو گروه مجموعه داده های مصنوعی هستند. مجموعه داده های Shapetype حاوی R15، Pathbased، Jain و Spiral2 است که در شکل 4 نشان داده شده است. چهار مجموعه داده ها به صورت گسترده در بسیاری از مقالات دیگر مقالات مورد استفاده قرار می گیرند. و داده های موجود در دنیای واقعی نیز شامل چهار مجموعه داده ای به نام Iris، Soybean، Wine، Glass می باشد و از مجموعه داده های UCI محبوب استفاده می کنند. آنها می توانند برای تست اعتبار MOECDM برای داده های دنیای واقعی استفاده شوند. جزئیات این مجموعه داده ها از لحاظ تعداد اشیاء، تعداد ابعاد و تعداد خوشه ها در جدول 1 خلاصه شده است. بعد، دو آزمایش در این مجموعه داده ها اجرا می شود. اولین آزمایش، جبهه ناپایدار پارتو و نتایج مربوط به خوشه بندی را نشان می دهد، و آزمایش دوم، مقایسه MOECDM با برخی از روش های خوشه بندی دیگر است که بعدا در این بخش شرح داده خواهد شد.
B. تعیین پارامترها
نتایج خوشه بندی با تنظیم پارامترها، که باید با دقت بر اساس شواهد نظری انتخاب شود، تحت تأثیر قرار می گیرد. اندازه جمعیت (pop) و حداکثر تعداد نسل (max_g) دو پارامتر مهم در الگوریتم تکاملی هستند. برای pop، از یک سو، طبق مقدمه بخش دوم-B1، جمعیت باید شامل افراد تولید شده با استفاده از سه استراتژی باشد. از این رو، باید با pop> 3. از این رو ما در MOECDM روی pop = 200 قرار دادیم و می توان آن را ثابت کرد که برای ذخیره تمام راه حل های غیرمتعارف مناسب است. برای max_g، روشن است که با افزایش این پارامتر، دقت الگوریتم بهبود می یابد. بنابراین، ما در این آزمایش آن را به max_g = 200 تعیین می کنیم. مقادیر pop_α1، pop_α2 و pop_α3 به ترتیب 0.5، 0.25 و 0.25 تنظیم می شوند. دلیل این است که ما امیدواریم که افراد تولید شده به صورت تصادفی بیش از دو استراتژی دیگر باشند. برای اپراتور کراس اوور و اپراتور جهش، مقادیر pc، pc_α، pm و pm_α همه به 1 تعلق دارند. دلیل این است که جمعیت جدید P (g + 1) از P (g) ∪ Pm (g + 1) انتخاب شده است، از این رو Pm (g + 1) باید تفاوت بزرگی از P (g) داشته باشد.
پارامترهای مورد استفاده برای MOECDM در مطالعه تجربی ما در جدول 2 آمده است.
C. ارزیابی عملکرد
برای ارزیابی عملکرد الگوریتم های خوشه ای به صورت کمی، در اینجا، دو شاخص، Rand index (R) و F-measure (F) مورد استفاده قرار می گیرند. آنها به ترتیب معادلات 8 و 9 تعریف می شوند.
SS تعداد نقاط زوج دارای برچسب های کلاس مشابه متعلق به همان خوشه است، SD تعداد نقاط زوج دارای برچسب های کلاس های مختلف متعلق به خوشه های مشابه است، DS تعداد نقاط زوج دارای برچسب های کلاس مشابه متعلق به خوشه های مختلف است و DD تعداد نقاط زوج دارای برچسب های مختلف کلاس متعلق به خوشه های مختلف است. توجه داشته باشید که هرچقدر ارزش R (F) بالاتر باشد، خوشه بندی بهتر است.
D. تحلیل MOECDM
در این بخش، دو مجموعه داده ها برای نشان دادن جبهه ناپایدار پارتو و نتایج خوشه بندی مربوطه انتخاب شد. به منظور مؤثرتر کردن روش پیشنهادی، خوشه های دو مجموعه داده باید ساختارهای متفاوت داشته باشند. از این رو، اولین مجموعه داده، Data_connected1 است و مجموعه داده دوم، Data_circle1 است.
مجموعه داده اولی که انتخاب شدهData_connected1 است که در شکل 2 نشان داده شده است. هر خوشه در این مجموعه داده به صورت یکنواخت از یک ناحیه فوق کروی تولید می شود و به یکدیگر متصل هستند. قسمت جبهه پارتو و چندین نتایج خوشه بندی در شکل 5 ارائه شده است. ما می توانیم ده نتیجه خوشه بندی مربوط به جبهه ناپایدار پارتو دریافت کنیم. از سمت چپ به راست در شکل 5 (a)، J1 با افزایش J2 کاهش می یابد. ما سه پارتیشن معنی دار را برای تحلیل این جبهه ناپایدار پارتو انتخاب می کنیم. شکل 5 (b) مربوط به نقطه قرمز جبهه ناپایدار پارتو است. این بهترین نتیجه خوشه بندی است و مقدار رند (F-measure) 0.96 (0.94) است. شکل 5 (c) مربوط به نقطه سیاه است که نشان دهنده حداکثر J1 و حداقل J2 است.
از این رو این نتیجه خوشه ای ایده آل نیست. شکل 5 (d) مربوط به نقطه سبز است و دارای حداقل J1 و حداکثر J2 است. برای این نتیجه خوشه بندی، فاصله اقلیدس بیشتر از فاصله مسیر است و مقدار رند (اندازه گیری F) 0.94 (0.92) است. Data_circle1 در شکل 3 نشان داده شده است. بر خلاف Data_connected1، این مجموعه داده شامل دو خوشه با ساختار پیچیده است. جبهه ناپایدار پارتو که توسط MOECDM برای Data_circle1 به دست می آید و نتایج خوشه بندی چندگانه در شکل 6 ارائه شده است. ما همچنین سه نتیجه معنی دار برای تجزیه و تحلیل جبهه ناپایدار پارتو را انتخاب می کنیم. نقطه پارتو قرمز مربوط به شکل 6 (b) است که بهترین است و شاخص رند (اندازه گیری F) 1.00 (1.00) است. نقطه پارتو سبز مربوط به شکل 6 (d) است که بدترین نتیجه است. در این نتیجه خوشه بندی، فاصله ی اقلیدسی بیشتر از فاصله مسیر است و شاخص رند (اندازه ی F) 0.50 (0.50) است. نقطه سیاه میان نقطه قرمز و نقطه سبز است و مقدار رند آن (اندازه گیری F) 0.67 (0.66) است. این آزمایش نشان می دهد که فاصله مسیر برای Data_circle1 مناسب است.
شکل 5:
(a) قسمت پارتو Data_connected1
(ب) نتیجه خوشه بندی از نقطه قرمز در قسمت جلو پارتو
(c) نتیجه خوشه بندی از نقطه سیاه در قسمت پارتو و
(د) نتیجه خوشه بندی از نقطه سبز در جبهه پارتو
شکل 6:
(a) قسمت پارتو Data_circle1
(ب) نتیجه خوشه ای از نقطه قرمز در قسمت جلو پارتو
(c) نتیجه خوشه ای از نقطه سیاه در قسمت پارتو و
(د) نتیجه خوشه بندی از نقطه سبز در جبهه پارتو
دو آزمایش نشان می دهد که MOECDM می تواند توزیع پنهان هر خوشه را با ساختارهای کروی و نامنظم کشف کند.
E. مقايسه با روشهاي ديگر
در این بخش، MOECDM با سه روش دیگر از جمله Kmeans، FCM و NCUT مقایسه می شود. Kmeans و FCM دو تکنیک خوشه بندی سنتی هستند و بر اساس فاصله اقلیدس طراحی شده اند. NCUT روش خوشه بندی طیفی است و می تواند مجموعه داده های ارتباطی را به خوبی تجزیه کند. مجموعه داده های ارتباطی، فاصله بین هر جفت اشیا را توصیف می کند. از این رو، اندازه گیری های مختلف فاصله و ترکیب آنها می تواند برای تنظیم ماتریس های مختلف استفاده شود. در اینجا چهار راهبرد برای ایجاد ماتریس اعمال می شود:
(1) فاصله اقلیدس (NCUT (E))،
(2) فاصله مسیر (NCUT (P))،
(3) ترکیب اندازه فاصله اقلیدسی و فاصله مسیر (NCUT (E + P))
و (4) نرمال سازی مسائل اقلیدسی و مسافت مسیر در محدوده [0، 1] (NCUT (Norm (E) + Norm (P))).
جدول 3 نتایج مقایسه الگوریتم هایی که گفته شد را ارائه می دهد. هر الگوریتم بیست بار در هر مجموعه داده اجرا می شود. Mean_R، Max_R، Mean_F و Max_F نشان می دهد که به طور متوسط شاخص Rand، حداکثر شاخص Rand، میانگین F-measure و حداکثر F-measures در بیست اجرا می شود.
جدول 3: مقایسه شاخص Rand و F-measureبا روش های مختلف محاسبه شده است.
در آزمایش ما kmin به 2 تنظیم شده است. برای انتخاب بهترین kmax مشکل است. این پارامتر اندازه فضای جستجو را تعیین می کند، که از اندازه knmax برای مجموعه ای از اندازه n تعیین شده است. فضای جستجو در هنگام kmax به صورت نمادین رشد می کند افزایش. برای سادگی، ما kmax را به 10 تنظیم می کنیم و در بخش IV-C توضیح خواهیم داد. جدا از kmax، max_g و pop نیز باید مورد توجه قرار گیرد. از آنجا که فضای جستجو MOEACDM kn max است که بیشتر از kn در MOECDM است. از این رو max_g باید از MOECDM بزرگتر باشد. بر اساس تجزیه و تحلیل بالا، ما حداکثر 500 را تنظیم می کنیم. برای پاپ، طبق الگوریتم 4، جمعیت باید شامل افرادی باشد که از استراتژی های مختلف ساخته شده باشد؛ به عبارت دیگر پاپ باید به عنوان معادله 10 را برآورده کند.
جایی که NIR، NIE و NIP تعداد افراد تولید شده را به طور تصادفی نشان می دهد، تعداد افراد تولید شده بر اساس فاصله اقلیدس و تعداد افراد تولید شده بر اساس فاصله مسافت به ترتیب. β یک پارامتر است که می تواند از هر روش پیش خوشه ای ناشی از ورودی جلوگیری کند اگر تنها یک بار اجرا شود و باید با β> 2 برآورده شود. عموما سه پارامتر باید با NIR = NIE + NIP و NIE = NIP = (kmax - kmin + 1). از این رو، پاپ را می توان به 2 * 2 * (kmax - kmin + 1) * β تنظیم کرد. ما در این آزمایش پاپ را به 500 تنظیم کردیم.
A. تجزیه و تحلیل MOEACDM
در این بخش، ما همچنین دو مجموعه داده را برای نشان دادن جبهه ناپایدار پارتو به دست آمده از MOEACDM و نتایج خوشه بندی مربوطه اعمال می کنیم. Data_separated1 و Data_spiral را می توان در اینجا استفاده کرد. شکل 7 نمایش جبهه پارتو نامغلوب و نتایج خوشه بندی را با استفاده از MOEACDM برای Data_separated1 نشان می دهد. به طور خاص، این تنها دو نقطه در جبهه ناپایدار پارتو است، و دو نتیجه خوشه بندی مربوط به جبهه ناپایدار پارتو بسیار مشابه است. توضیح خوبی برای این وضعیت این است که هر دو معیار فاصله نشان دهنده رابطه مشابه در این مجموعه داده است. جدول 4 نیز همانند شکل 7 نمایش می دهد.
جدول 4: شماره خوشه، شاخص رند و F-measure مربوط به نقاط مختلف پاروتو برای Data_separated1
شکل 7:a) ) جبهه پارتو Data_separated1،
(b)نتیجه خوشه بندی در نقطه قرمز در جبهه ناپایدار پاروتو
c)) نتیجه خوشه بندی در نقطه سبز در جبهه ناپایدار پاروتو
جدول 5، تغییرات J1، J2، شماره خوشه ، مقدار Rand و F-measure را با ده نقطه پارتو متفاوت نشان می دهد. همانطور که از این جدول دیده می شود، مقدار J1 کاهش می یابد و مقدار J2 در همان زمان افزایش می یابد. همانطور که همه ما می دانیم، ما باید بین دو تابع هدف trade-off را پیدا کنیم. برای این مجموعه داده، فاصله مسیر بیشتر از فاصله اقلیدس غالب است، بنابراین بهترین نتیجه خوشه بندی، اولین نقطه پارتو است با حداقل J2. علاوه بر این، با افزایش J1 و کاهش J2، تعداد خوشه ها از تعداد خوشه صحیح دور است و مقدار Rand (F-measure) کاهش می یابد.
جدول 5: تعداد خوشه ها، شاخص رند و F-measure که مربوط به پارامترهای مختلف جبهه پاروتو برای Data_spiral است.
شکل 8، جبهه پارتو و سه نتایج خوشه بندی برای Data_spiral را نمایش میدهد. واضح است که شکل 8 (b) بهترین نتیجه خوشه بندی است و اولین نقطه در جبهه ناپایدار پارتو است.
C. بررسی KMAX
حداکثر تعداد خوشه انتظار می رود، kmax، باید در این قسمت تحلیل شود. این یک پارامتر مهم است زیرا اندازه فضای جستجو را تعیین می کند. از یک طرف، kmax بیش از حد بزرگ باعث تلف شدن حافظه و زمان پردازش می شود. از سوی دیگر، یک kmax بسیار کوچک احتمال احتمال خروج k * را افزایش می دهد، بنابراین غیر قابل دسترس خواهد بود. در حالت ایده آل، kmax = n بهترین راه است، اما زمان محاسبه نیز بالا است. ما باید یک trade-off بین سرعت و دقت روش پیشنهادی را انتخاب کنیم.
شکل 9، تعداد خوشه ای مطلوب به دست آمده با استفاده از مقدار kmax مختلف برای چهار مجموعه داده های آزمون می باشد. مقدار kmax از 2 تا 20 با اندازه گام یک تغییر می کند. همان طور که می توان از این تنظیمات دید، اگر kmax کوچکتر از k * باشد، kmax به عنوان k * تشخیص داده می شود. و اگر kmax بزرگتر از k * باشد، MOEACDM همیشه می تواند تعداد خوشه مطلوب k * را در تمام اجرا ها پیدا کند. نتایج تجربی نشان می دهد که MOEACDM به این پارامتر kmax حساس نیست.
شکل 9: تعداد خوشه مطلوب با استفاده از مقادیر مختلف kmax برای چهار مجموعه داده های تست
جدول 6 خلاصه نتایج خوشه بندی به دست آمده از روش های فوق برای هشت مجموعه داده های تست. هر الگوریتم باید ده بار در هر مجموعه داده اجرا شود. CN نشان دهنده شماره خوشه درست است.
AP (P) شماره درست را برای Data_separated1، Data_connected1، Iris و Soybean به درستی شناسایی می کند. دلیل این است که چهار مجموعه داده دارای خوشه ای با ساختار کروی هستند. AP(P) به درستی شماره صحیح Data_separated1، Data_connected1، Data_spiral، Data_circle1، Spiral2 و Iris را در هر اجرا حساب می کند، با این حال، مقادیر Mean_R و Mean_F برای Data_separated1، Data_connected1 و Iris بهترین نیستند.
جدول 6: مقایسه تعداد خوشه به دست آمده (CN)، میانگین رند ((Mean_Rو میانگینF-measure(Mean_F) محاسبه شده توسط روابط مقایسه شده
D. تفاوت بین MOECDM و MOEACDM
در این بخش، ما تفاوت و رابطه بین دو رویکرد، MOECDM و MOEACDM بحث می کنیم. بعضی از مراحل باید مورد توجه قرار گیرند شامل نمایندگی، واسطه، اپراتور crossover، اپراتور جهش، اپراتور انتخابی و طراحی تابع هدف. برای نمایندگی، اپراتور crossover، اپراتور جهش و اپراتور انتخاب، دو رویکرد استراتژی مشابهی را اعمال می کند. برای ابتدایی، جمعیت اولیه توسط روش قبل از خوشه سازی NCUK با یک خوشه ثابت k ایجاد می شود و از طریق روش پیش خوشه ای مشابه تولید می شود، اما با یک متغیر k از محدوده {kmin، ...، kmax} تولید می شود.
برای طراحی تابع هدف، MOECDM تنها قوام درون خوشه را در نظر می گیرد به عنوان معادله 4 و MOEACDM نه تنها قوام درون خوشه را در نظر می گیرد، بلکه همسانی بین خوشه ای را به عنوان معادله 7 در نظر می گیرد. بنابراین زمان اجرای MOEACDM بالاتر از MOECDM است. بعد، ما همچنین زمان اجرای هر دو روش، MOECDM و MOEACDM، با استفاده از پارامترهای مشابه (پاپ، max_g) برای مجموعه داده های آزمون. نتایج در جدول 7 ذکر شده است. هر مجموعه داده آزمایشی در 10 برنامه مستقل توسط دو روش با پارامترهای مختلف انجام می شود. این دو روش در Matlab R2014b برنامه ریزی شده اند و رایانه برای آزمایش با یک پردازش مرکزی اینتل هسته ای 2.5 گیگاهرتزی و 4 گیگابایت حافظه مجهز شده است. از جدول 7 می توانیم ببینیم که زمان محاسبه MOECDM در MOEACDM با پارامترهای مشابه (pop، max_g) در اکثر مجموعه داده های آزمون کمتر است. نتایج خوشه بندی نتایج تجزیه و تحلیل فوق را ثابت می کند.
V. بحث و نتیجه گیری
خوشه بندی یک وظیفه مهم برای کشف ساختارهای پنهان است و موضوع پژوهش های فعال در زمینه های مختلف تحقیقاتی از جمله بازیابی اطلاعات، سرمایه گذاری، مدیریت شبکه و پزشکی است. مجموعه داده ها در این زمینه ها ممکن است از ویژگی های گوناگون تشکیل شده و دارای ساختارهای متفاوت باشند. رویکردهای خوشه تکاملی موجود معمولا یک تابع هدف را بهینه سازی می کنند و به سختی می توانند با مجموعه داده های با ساختارهای چندگانه مقابله کنند. در این مقاله، یک چارچوب خوشهبندی تکاملی جدید چند منظوره را برای تقسیم مجموعه داده ها با ساختارهای مختلف پیشنهاد می کنیم. دو تابع هدف طراحی شده بر اساس دو اندازه گیری از راه دور به طور همزمان با الگوریتم تکاملی چند هدفه بهینه می شوند. این رویکرد، رابطه بین نقاط داده را براساس دو اندازه گیری فاصله مختلف در نظر می گیرد، بنابراین می تواند ساختارهای پیچیده را تقسیم کند. با این حال، دو محدودیت وجود دارد که باید بهبود یابد. اول بهترین راه حل است که به طور خودکار از جبهه ناپایدار پارتو انتخاب نشده و دوم الگوریتم تکاملی است که معمولا زمان گیر است. در آینده، ما سعی خواهیم کرد این مشکلات را حل کنیم و رویکردهای خود را در زمینه های کاربردی مانند تجزیه و تحلیل داده ها و تقسیم بندی تصویر قرار دهیم. علاوه بر این، آزمایش سایر ترکیبات اندازه گیری از راه دور نیز کار آینده ای است.
جدول 7: متوسط زمان اجرا (ثانیه) MOECDM و MOEACDMبا مقدار متفاوت (pop,max_g) ، برای هشت مجموعه داده