There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
Β
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...ijcseit
Β
Many studies have been done in the area of Wireless Sensor Networks (WSNs) in recent years. In this kind of networks, some of the key objectives that need to be satisfied are area coverage, number of active sensors and energy consumed by nodes. In this paper, we propose a NSGA-II based multi-objective algorithm for optimizing all of these objectives simultaneously. The efficiency of our algorithm is demonstrated in the simulation results. This efficiency can be shown as finding the optimal balance point among the maximum coverage rate, the least energy consumption, and the minimum number of active nodes while maintaining the connectivity of the network
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the βkernel methodβ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the βkernel methodβ, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Critical Paths Identification on Fuzzy Network Projectiosrjce
Β
In this paper, a new approach for identifying fuzzy critical path is presented, based on converting the
fuzzy network project into deterministic network project, by transforming the parameters set of the fuzzy
activities into the time probability density function PDF of each fuzzy time activity. A case study is considered as
a numerical tested problem to demonstrate our approach.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Β
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
Β
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...ijcseit
Β
Many studies have been done in the area of Wireless Sensor Networks (WSNs) in recent years. In this kind of networks, some of the key objectives that need to be satisfied are area coverage, number of active sensors and energy consumed by nodes. In this paper, we propose a NSGA-II based multi-objective algorithm for optimizing all of these objectives simultaneously. The efficiency of our algorithm is demonstrated in the simulation results. This efficiency can be shown as finding the optimal balance point among the maximum coverage rate, the least energy consumption, and the minimum number of active nodes while maintaining the connectivity of the network
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the βkernel methodβ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the βkernel methodβ, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Critical Paths Identification on Fuzzy Network Projectiosrjce
Β
In this paper, a new approach for identifying fuzzy critical path is presented, based on converting the
fuzzy network project into deterministic network project, by transforming the parameters set of the fuzzy
activities into the time probability density function PDF of each fuzzy time activity. A case study is considered as
a numerical tested problem to demonstrate our approach.
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
Β
Cluster analysis of graph related problems is an important issue now-a-day. Different types of graph
clustering techniques are appeared in the field but most of them are vulnerable in terms of effectiveness
and fragmentation of output in case of real-world applications in diverse systems. In this paper, we will
provide a comparative behavioural analysis of RNSC (Restricted Neighbourhood Search Clustering) and
MCL (Markov Clustering) algorithms on Power-Law Distribution graphs. RNSC is a graph clustering
technique using stochastic local search. RNSC algorithm tries to achieve optimal cost clustering by
assigning some cost functions to the set of clusterings of a graph. This algorithm was implemented by A.
D. King only for undirected and unweighted random graphs. Another popular graph clustering
algorithm MCL is based on stochastic flow simulation model for weighted graphs. There are plentiful
applications of power-law or scale-free graphs in nature and society. Scale-free topology is stochastic i.e.
nodes are connected in a random manner. Complex network topologies like World Wide Web, the web of
human sexual contacts, or the chemical network of a cell etc., are basically following power-law
distribution to represent different real-life systems. This paper uses real large-scale power-law
distribution graphs to conduct the performance analysis of RNSC behaviour compared with Markov
clustering (MCL) algorithm. Extensive experimental results on several synthetic and real power-law
distribution datasets reveal the effectiveness of our approach to comparative performance measure of
these algorithms on the basis of cost of clustering, cluster size, modularity index of clustering results and
normalized mutual information (NMI).
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
Research Inventy : International Journal of Engineering and Scienceinventy
Β
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Β
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
Β
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
Β
Data clustering is a process of arranging similar data into groups. A clustering algorithm
partitions a data set into several groups such that the similarity within a group is better than
among groups. In this paper a hybrid clustering algorithm based on K-mean and K-harmonic
mean (KHM) is described. The proposed algorithm is tested on five different datasets. The research is focused on fast and accurate clustering. Its performance is compared with the traditional K-means & KHM algorithm. The result obtained from proposed hybrid algorithm is much better than the traditional K-mean & KHM algorithm
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
Β
Regularization and transfer learning are two popular techniques to enhance generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task- and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not generalize across tasks and architectures, and may introduce new training cost for adapting to the target task. To bridge the gap between the two, we propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. MetaPerturb is implemented as a set-based lightweight network that is agnostic to the size and the order of the input, which is shared across the layers. Then, we propose a meta-learning framework, to jointly train the perturbation function over heterogeneous tasks in parallel. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize to heterogeneous tasks and architectures. We validate the efficacy and generality of MetaPerturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with MetaPerturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
Β
Clustering is an important step in the process of data analysis with applications to numerous fields. Clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a quality cluster. Existing clustering aggregation algorithms are applied directly to large number of data points. The algorithms are inefficient if the number of data points is large. This project defines an efficient approach for clustering aggregation based on data fragments. In fragment-based approach, a data fragment is any subset of the data. To increase the efficiency of the proposed approach, the clustering aggregation can be performed directly on data fragments under comparison measure and normalized mutual information measures for clustering aggregation, enhanced clustering aggregation algorithms are described. To show the minimal computational complexity. (Agglomerative, Furthest, and Local Search); nevertheless, which increases the accuracy.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Β
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
Research Inventy : International Journal of Engineering and Scienceinventy
Β
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
Β
Radial Basis Probabilistic Neural Network (RBPNN) has a broader generalized capability that been successfully applied to multiple fields. In this paper, the Euclidean distance of each data point in RBPNN is extended by calculating its kernel-induced distance instead of the conventional sum-of squares distance. The kernel function is a generalization of the distance metric that measures the distance between two data points as the data points are mapped into a high dimensional space. During the comparing of the four constructed classification models with Kernel RBPNN, Radial Basis Function networks, RBPNN and Back-Propagation networks as proposed, results showed that, model classification on Iris Data with Kernel RBPNN display an outstanding performance in this regard.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
Β
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
Β
Data clustering is a process of arranging similar data into groups. A clustering algorithm
partitions a data set into several groups such that the similarity within a group is better than
among groups. In this paper a hybrid clustering algorithm based on K-mean and K-harmonic
mean (KHM) is described. The proposed algorithm is tested on five different datasets. The research is focused on fast and accurate clustering. Its performance is compared with the traditional K-means & KHM algorithm. The result obtained from proposed hybrid algorithm is much better than the traditional K-mean & KHM algorithm
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
Β
Regularization and transfer learning are two popular techniques to enhance generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task- and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not generalize across tasks and architectures, and may introduce new training cost for adapting to the target task. To bridge the gap between the two, we propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. MetaPerturb is implemented as a set-based lightweight network that is agnostic to the size and the order of the input, which is shared across the layers. Then, we propose a meta-learning framework, to jointly train the perturbation function over heterogeneous tasks in parallel. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize to heterogeneous tasks and architectures. We validate the efficacy and generality of MetaPerturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with MetaPerturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
Β
Clustering is an important step in the process of data analysis with applications to numerous fields. Clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a quality cluster. Existing clustering aggregation algorithms are applied directly to large number of data points. The algorithms are inefficient if the number of data points is large. This project defines an efficient approach for clustering aggregation based on data fragments. In fragment-based approach, a data fragment is any subset of the data. To increase the efficiency of the proposed approach, the clustering aggregation can be performed directly on data fragments under comparison measure and normalized mutual information measures for clustering aggregation, enhanced clustering aggregation algorithms are described. To show the minimal computational complexity. (Agglomerative, Furthest, and Local Search); nevertheless, which increases the accuracy.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Β
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
Β
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problemβs optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
Β
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
Β
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problemβs optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Improve the Performance of Clustering Using Combination of Multiple Clusterin...ijdmtaiir
Β
The ever-increasing availability of textual
documents has lead to a growing challenge for information
systems to effectively manage and retrieve the information
comprised in large collections of texts according to the userβs
information needs. There is no clustering method that can
adequately handle all sorts of cluster structures and properties
(e.g. shape, size, overlapping, and density). Combining
multiple clustering methods is an approach to overcome the
deficiency of single algorithms and further enhance their
performances. A disadvantage of the cluster ensemble is the
highly computational load of combing the clustering results
especially for large and high dimensional datasets. In this paper
we propose a multiclustering algorithm , it is a combination of
Cooperative Hard-Fuzzy Clustering model based on
intermediate cooperation between the hard k-means (KM) and
fuzzy c-means (FCM) to produce better intermediate clusters
and ant colony algorithm. This proposed method gives better
result than individual clusters.
The International Journal of Engineering and Science (The IJES)theijes
Β
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Β
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
Β
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Β
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
Β
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsIJORCS
Β
Control of traffic lights at the intersections of the main issues is the optimal traffic. Intersections to regulate traffic flow of vehicles and eliminate conflicting traffic flows are used. Modeling and simulation of traffic are widely used in industry. In fact, the modeling and simulation of an industrial system is studied before creating economically and when it is affordable. The aim of this article is a smart way to control traffic. The first stage of the project with the objective of collecting statistical data (cycle time of each of the intersection of the lights of vehicles is waiting for a red light) steps where the data collection found optimal amounts next it is. Introduced by genetic algorithm optimization of parameters is performed. GA begin with coding step as a binary variable (the range specified by the initial data set is obtained) will start with an initial population and then a new generation of genetic operators mutation and crossover and will Finally, the members of the optimal fitness values are selected as the solution set. The optimal output of Petri nets CPN TOOLS modeling and software have been implemented. The results indicate that the performance improvement project in intersections traffic control systems. It is known that other data collected and enforced intersections of evolutionary methods such as genetic algorithms to reduce the waiting time for traffic lights behind the red lights and to determine the appropriate cycle.
Welcoming the research scholars, scientists around the globeΒ in theΒ Open Access Dimension, IJORCS isΒ now accepting manuscripts for its next issue (Volume 4, Issue 4). Authors are encouraged to contribute to the research community by submitting to IJORCS, articles that clarify new research results, projects, surveying works and industrial experiences that describe significant advances in field of computer science.
All paper submissions (http://www.ijorcs.org/submit-paper) are received and managed electronically by IJORCS Team. Detailed instructions about the submission procedure are available on IJORCS website (http://www.ijorcs.org/author-guidelines)
License plate recognition system is one of the core technologies in intelligent traffic control. In this paper, a new and tunable algorithm which can detect multiple license plates in high resolution applications is proposed. The algorithm aims at investigation into and identification of the novel Iranian and some European countries plate, characterized by both inclusion of blue area on it and its geometric shape. Obviously, the suggested algorithm contains suitable velocity due to not making use of heavy pre-processing operation such as image-improving filters, edge-detection operation and omission of noise at the beginning stages. So, the recommended method of ours is compatible with model-adaptation, i.e., the very blue section of the plate so that the present method indicated the fact that if several plates are included in the image, the method can successfully manage to detect it. We evaluated our method on the two Persian single vehicle license plate data set that we obtained 99.33, 99% correct recognition rate respectively. Further we tested our algorithm on the Persian multiple vehicle license plate data set and we achieved 98% accuracy rate. Also we obtained approximately 99% accuracy in character recognition stage.
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveIJORCS
Β
This Paper is a review study of FPGA implementation of Finite Impulse response (FIR) with low cost and high performance. The key observation of this paper is an elaborate analysis about hardware implementations of FIR filters using different algorithm i.e., Distributed Arithmetic (DA), DA-Offset Binary Coding (DA-OBC), Common Sub-expression Elimination (CSE) and sum-of-power-of-two (SOPOT) with less resources and without affecting the performance of the original FIR Filter.
Using Virtualization Technique to Increase Security and Reduce Energy Consump...IJORCS
Β
An approach has been presented in this paper in order to generate a secure environment on internet Based Virtual Computing platform and also to reduce energy consumption in green cloud computing. The proposed approach constantly checks the accuracy of stored data by means of a central control service inside the network environment and also checks system security through isolating single virtual machines using a common virtual environment. This approach has been simulated on two types of Virtual Machine Manager (VMM) Quick EMUlator (Qemu), HVM (Hardware Virtual Machine) Xen and outputs of the simulation in VMInsight show that when service is getting singly used, the overhead of its performance will be increased. As a secure system, the proposed approach is able to recognize malicious behaviors and assure service security by means of operational integrity measurement. Moreover, the rate of system efficiency has been evaluated according to the amount of energy consumption on five applications (Defragmentation, Compression, Linux Boot Decompression and Kernel Boot). Therefore, this has been resulted that to secure multi-tenant environment, managers and supervisors should independently install a security monitoring system for each Virtual Machines (VMs) which will come up to have the management heavy workload of. While the proposed approach, can respond to all VMβs with just one virtual machine as a supervisor.
Algebraic Fault Attack on the SHA-256 Compression FunctionIJORCS
Β
The cryptographic hash function SHA-256 is one member of the SHA-2 hash family, which was proposed in 2000 and was standardized by NIST in 2002 as a successor of SHA-1. Although the differential fault attack on SHA-1compression function has been proposed, it seems hard to be directly adapted to SHA-256. In this paper, an efficient algebraic fault attack on SHA-256 compression function is proposed under the word-oriented random fault model. During the attack, an automatic tool STP is exploited, which constructs binary expressions for the word-based operations in SHA-256 compression function and then invokes a SAT solver to solve the equations. The simulation of the new attack needs about 65 fault injections to recover the chaining value and the input message block with about 200 seconds on average. Moreover, based on the attack on SHA-256 compression function, an almost universal forgery attack on HMAC-SHA-256 is presented. Our algebraic fault analysis is generic, automatic and can be applied to other ARX-based primitives.
Enhancement of DES Algorithm with Multi State LogicIJORCS
Β
The principal goal to design any encryption algorithm must be the security against unauthorized access or attacks. Data Encryption Standard algorithm is a symmetric key algorithm and it is used to secure the data. Enhanced DES algorithm works on increasing the key length or complex S-BOX design or increased the number of states in which the information is to be represented or combination of above criteria. By increasing the key length, the number of combinations for key will increase which is hard for the intruder to do the brute force attack. As the S-BOX design will become the complex there will be a good avalanche effect. As the number of states increases in which the information is represented, it is hard for the intruder to crack the actual information. Proposed algorithm replace the predefined XOR operation applied during the 16 round of the standard algorithm by a new operation called βHash functionβ depends on using two keys. One key used in βFβ function and another key consists of a combination of 16 states (0,1,2β¦13,14,15) instead of the ordinary 2 state key (0, 1). This replacement adds a new level of protection strength and more robustness against breaking methods.
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...IJORCS
Β
This paper presents a new algorithm for solving large scale global optimization problems based on hybridization of simulated annealing and Nelder-Mead algorithm. The new algorithm is called simulated Nelder-Mead algorithm with random variables updating (SNMRVU). SNMRVU starts with an initial solution, which is generated randomly and then the solution is divided into partitions. The neighborhood zone is generated, random number of partitions are selected and variables updating process is starting in order to generate a trail neighbor solutions. This process helps the SNMRVU algorithm to explore the region around a current iterate solution. The Nelder- Mead algorithm is used in the final stage in order to improve the best solution found so far and accelerates the convergence in the final stage. The performance of the SNMRVU algorithm is evaluated using 27 scalable benchmark functions and compared with four algorithms. The results show that the SNMRVU algorithm is promising and produces high quality solutions with low computational costs.
Welcoming the research scholars, scientists around the globeΒ in theΒ Open Access Dimension, IJORCS isΒ now accepting manuscripts for its next issue (Volume 4, Issue 2). Authors are encouraged to contribute to the research community by submitting to IJORCS, articles that clarify new research results, projects, surveying works and industrial experiences that describe significant advances in field of computer science.
To view complete list ofΒ topics coverage of IJORCS, Aim & Scope, please visit, www.ijorcs.org/scope
Welcoming the research scholars, scientists around the globe in the Open Access Dimension, IJORCS is now accepting manuscripts for its next issue (Volume 4, Issue 1). Authors are encouraged to contribute to the research community by submitting to IJORCS, articles that clarify new research results, projects, surveying works and industrial experiences that describe significant advances in field of computer science.
Voice Recognition System using Template MatchingIJORCS
Β
It is easy for human to recognize familiar voice but using computer programs to identify a voice when compared with others is a herculean task. This is due to the problem that is encountered when developing the algorithm to recognize human voice. It is impossible to say a word the same way in two different occasions. Human speech analysis by computer gives different interpretation based on varying speed of speech delivery. This research paper gives detail description of the process behind implementation of an effective voice recognition algorithm. The algorithm utilize discrete Fourier transform to compare the frequency spectra of two voice samples because it remained unchanged as speech is slightly varied. Chebyshev inequality is then used to determine whether the two voices came from the same person. The algorithm is implemented and tested using MATLAB.
Channel Aware Mac Protocol for Maximizing Throughput and FairnessIJORCS
Β
The proper channel utilization and the queue length aware routing protocol is a challenging task in MANET. To overcome this drawback we are extending the previous work by improving the MAC protocol to maximize the Throughput and Fairness. In this work we are estimating the channel condition and Contention for a channel aware packet scheduling and the queue length is also calculated for the routing protocol which is aware of the queue length. The channel is scheduled based on the channel condition and the routing is carried out by considering the queue length. This queue length will provide a measurement of traffic load at the mobile node itself. Depending upon this load the node with the lesser load will be selected for the routing; this will effectively balance the load and improve the throughput of the ad hoc network.
A Review and Analysis on Mobile Application Development Processes using Agile...IJORCS
Β
Over a last decade, mobile telecommunication industry has observed a rapid growth, proved to be highly competitive, uncertain and dynamic environment. Besides its advancement, it has also raised number of questions and gained concern both in industry and research. The development process of mobile application differs from traditional softwares as the users expect same features similar to their desktop computer applications with additional mobile specific functionalities. Advanced mobile applications require assimilation with existing enterprise computing systems such as databases, legacy applications and Web services. In addition, the lifecycle of a mobile application moves much faster than that of a traditional Web application and therefore the lifecycle management associated therein must be adjusted accordingly. The Security and application testing are more stimulating and interesting in mobile application than in Web applications since the technology in mobile devices progresses rapidly and developers must stay in touch with the latest developments, news and trends in their area of work. With the rising competence of software market, researchers are seeking more flexible methods that can adjust to dynamic situations where software system requirements are changing over time, producing valuable software in short duration and within low budget. The intrinsic uncertainty and complexity in any software project therefore requires an iterative developmental plan to cope with uncertainty and a large number of unknown variables. Agile Methodologies were thus introduced to meet the new requirements of the software development companies. The agile methodologies aim at facilitating software development processes where changes are acceptable at any stage and provide a structure for highly collaborative software development. Therefore, the present paper aims in reviewing and analysing different prevalent methodologies utilizing agile techniques that are currently in use for the development of mobile applications. This paper provides a detailed review and analysis on the use of agile methodologies in the proposed processes associated with mobile application skills and highlights its benefit and constraints. In addition, based on this analysis, future research needs are identified and discussed.
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...IJORCS
Β
In general, nodes in Wireless Sensor Networks (WSNs) are equipped with limited battery and computation capabilities but the occurrence of congestion consumes more energy and computation power by retransmitting the data packets. Thus, congestion should be regulated to improve network performance. In this paper, we propose a congestion prediction and adaptive rate adjustment technique for Wireless Sensor Networks. This technique predicts congestion level using fuzzy logic system. Node degree, data arrival rate and queue length are taken as inputs to the fuzzy system and congestion level is obtained as an outcome. When the congestion level is amidst moderate and maximum ranges, adaptive rate adjustment technique is triggered. Our technique prevents congestion by controlling data sending rate and also avoids unsolicited packet losses. By simulation, we prove the proficiency our technique. It increases system throughput and network performance significantly.
A Study of Routing Techniques in Intermittently Connected MANETsIJORCS
Β
A Mobile Ad hoc Network (MANET) is a self-configuring infrastructure less network of mobile devices connected by wireless. These are a kind of wireless Ad hoc Networks that usually has a routable networking environment on top of a Link Layer Ad hoc Network. The routing approach in MANET includes mainly three categories viz., Reactive Protocols, Proactive Protocols and Hybrid Protocols. These traditional routing schemes are not pertinent to the so called Intermittently Connected Mobile Ad hoc Network (ICMANET). ICMANET is a form of Delay Tolerant Network, where there never exists a complete end β to β end path between two nodes wishing to communicate. The intermittent connectivity araise when network is sparse or highly mobile. Routing in such a spasmodic environment is arduous. In this paper, we put forward the indication of prevailing routing approaches for ICMANET with their benefits and detriments
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
Β
In the field of speech signal processing, Spectral subtraction method (SSM) has been successfully implemented to suppress the noise that is added acoustically. SSM does reduce the noise at satisfactory level but musical noise is a major drawback of this method. To implement spectral subtraction method, transformation of speech signal from time domain to frequency domain is required. On the other hand, Wavelet transform displays another aspect of speech signal. In this paper we have applied a new approach in which SSM is cascaded with wavelet thresholding technique (WTT) for improving the quality of speech signal by removing the problem of musical noise to a great extent. Results of this proposed system have been simulated on MATLAB.
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
Β
Due to the restriction of designing faster and faster computers, one has to find the ways to maximize the performance of the available hardware. A distributed system consists of several autonomous nodes, where some nodes are busy with processing, while some nodes are idle without any processing. To make better utilization of the hardware, the tasks or load of the overloaded node will be sent to the under loaded node that has less processing weight to minimize the response time of the tasks. Load balancing is a tool used effectively for balancing the load among the systems. Dynamic load balancing takes into account of the current system state for migration of the tasks from heavily loaded nodes to the lightly loaded nodes. In this paper, we devised an adaptive load-sharing algorithm to balance the load by taking into consideration of connectivity among the nodes, processing capacity of each node and link capacity.
The Design of Cognitive Social Simulation Framework using Statistical Methodo...IJORCS
Β
Modeling the behavior of the cognitive architecture in the context of social simulation using statistical methodologies is currently a growing research area. Normally, a cognitive architecture for an intelligent agent involves artificial computational process which exemplifies theories of cognition in computer algorithms under the consideration of state space. More specifically, for such cognitive system with large state space the problem like large tables and data sparsity are faced. Hence in this paper, we have proposed a method using a value iterative approach based on Q-learning algorithm, with function approximation technique to handle the cognitive systems with large state space. From the experimental results in the application domain of academic science it has been verified that the proposed approach has better performance compared to its existing approaches.
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...IJORCS
Β
To efficiently process continuous spatio-temporal queries, we need to efficiently and effectively handle large number of moving objects and continuous updates on these queries. In this paper, we propose a framework that employs a new indexing algorithm that is built on top of SQL Server 2008 and avoid the overhead related to R-Tree indexing. To answer range queries, we utilize dynamic materialized view concept to efficiently handle update queries. We propose an adaptive safe region to reduce communication costs between the client and the server and to minimize position update load. Caching of results was utilized to enhance the overall performance of the framework. To handle concurrent spatio-temporal queries, we utilize publish/subscribe paradigm to group similar queries and efficiently process these requests. Experiments show that the overall proposed framework performance was able to outperform R-Tree index and produce promising and satisfactory results.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
Β
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. Whatβs changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Β
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Β
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But thereβs more:
In a second workflow supporting the same use case, youβll see:
Your campaign sent to target colleagues for approval
If the βApproveβ button is clicked, a Jira/Zendesk ticket is created for the marketing design team
Butβif the βRejectβ button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Β
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Β
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as βpredictable inferenceβ.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Β
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overviewβ
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Β
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Β
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Β
Clients donβt know what they donβt know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsβ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Β
A PSO-Based Subtractive Data Clustering Algorithm
1. International Journal of Research in Computer Science
eISSN 2249-8265 Volume 3 Issue 2 (2013) pp. 1-9
www.ijorcs.org, A Unit of White Globe Publications
doi: 10.7815/ijorcs. 32.2013.060
A PSO-BASED SUBTRACTIVE DATA
CLUSTERING ALGORITHM
Mariam El-Tarabily1, Rehab Abdel-Kader2, Mahmoud Marie3, Gamal Abdel-Azeem4
124
Electrical Engineering Department, Faculty of Engineering - Port-Said, Port-Said University, EGYPT
E-mail: 1mariammokhtar75@hotmail.com, 2r.abdelkader@eng.psu.edu.eg, 4gamalagag@hotmail.com
3
Computers and Systems Engineering Department, Faculty of Engineering, Al-Azhar University, Cairo, EGYPT
E-mail: mahmoudim@hotmail.com
Abstract: There is a tremendous proliferation in the 18]. There are two major clustering techniques:
amount of information available on the largest shared βPartitioningβ and βHierarchicalβ [2, 9]. In hierarchical
information source, the World Wide Web. Fast and clustering, the output is a tree showing a sequence of
high-quality clustering algorithms play an important clustering with each clustering being a partition of the
role in helping users to effectively navigate, data set. On the other hand, Partitioning clustering [1]
summarize, and organize the information. Recent algorithms partition the data set into a specified
studies have shown that partitional clustering number of clusters. These algorithms try to minimize a
algorithms such as the k-means algorithm are the most certain criteria (e.g. a square error function) and can
popular algorithms for clustering large datasets. The therefore be treated as optimization problems.
major problem with partitional clustering algorithms
is that they are sensitive to the selection of the initial In recent years, it has been recognized that the
partitions and are prone to premature converge to partitional clustering technique is well suited for
local optima. Subtractive clustering is a fast, one-pass clustering large datasets due to their relatively low
algorithm for estimating the number of clusters and computational requirements. The time complexity of
cluster centers for any given set of data. The cluster the partitioning technique is almost linear, which
estimates can be used to initialize iterative makes it a widely used technique. The best-known
optimization-based clustering methods and model partitioning clustering algorithm is the K-means
identification methods. In this paper, we present a algorithm and its variants [10].
hybrid Particle Swarm Optimization, Subtractive + Subtractive clustering method, as proposed by Chiu
(PSO) clustering algorithm that performs fast [13], is a relatively simple and effective approach to
clustering. For comparison purpose, we applied the approximate estimation of cluster centers on the basis
Subtractive + (PSO) clustering algorithm, PSO, and of a density measure in which the data points are
the Subtractive clustering algorithms on three different considered candidates for cluster centers. This method
datasets. The results illustrate that the Subtractive + can obtain initial cluster centers that are required by
(PSO) clustering algorithm can generate the most more sophisticated clustering algorithms. It can also be
compact clustering results as compared to other used as quick stand-alone method for approximate
algorithms. clustering.
Keywords: Data Clustering, Subtractive Clustering, Particle Swarm Optimization (PSO) algorithm is a
Particle Swarm Optimization, Subtractive Algorithm, population based stochastic optimization technique
Hybrid Algorithm. that can be used to find an optimal, or near optimal,
solution to a numerical and qualitative problem [4, 11,
I. INTRODUCTION 17]. Several attempts were proposed in the literature to
Clustering is one of the most extensively studied apply PSO to the data clustering problem [6, 18, 19,
research topics due to its numerous important 20, 21]. The major drawback is that the number of
applications in machine learning, image segmentation, cluster is initially unknown and the clustering result is
information retrieval, and pattern recognition. sensitive to the selection of the initial cluster centroids
Clustering involves dividing a set of objects into a and may converge to the local optima. Therefore, the
specified number of clusters [14]. The motivation initial selection of the cluster centroids decides the
behind clustering a set of data is to find inherent processing of PSO and the partition result of the
structure in the data and expose this structure as a set dataset as well. The same initial cluster centroids in a
of groups. The data objects within each group should dataset will always generate the same cluster results.
exhibit a large degree of similarity while the similarity However, if good initial clustering centroids can be
among different clusters should be minimized [3, 9, obtained using any of the other techniques, the PSO
www.ijorcs.org
2. 2 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem
would work well in refining the clustering centroids to digital pheromones to coordinate swarms within an n-
find the optimal clustering centers. The Subtractive dimensional design space to improve the search
clustering algorithm can be used to generate the efο¬ciency and reliability. In [28] a hybrid fuzzy
number of clusters and a good initial cluster centroids clustering method based on FCM and fuzzy PSO
for the PSO. (FPSO) is proposed which make use of the merits of
both algorithms. Experimental results show that the
In this paper, we present a hybrid Subtractive + proposed method is efficient and can reveal
(PSO) clustering algorithm that performs fast encouraging results.
clustering. Experimental results indicate that the
Subtractive + (PSO) clustering algorithm can find the III. DATA CLUSTERING PROBLEM
clustered is represented as a set of vectors X = {π₯π₯1 , π₯π₯2 ,
optimal solution after nearly 50 iterations in
β¦., π₯π₯ π }, where the vector π₯π₯ ππ corresponds to a single
comparison with the ordinary PSO algorithm. The In most clustering algorithms, the dataset to be
remainder of this paper is organized as follows:
Section 2 provides the related works in data clustering
using PSO. Section 3 provides a general overview of object and is called the feature vector. The feature
the data clustering problem and the basic PSO vector should include proper features to represent the
algorithm. The proposed hybrid Subtractive + (PSO) object.
clustering algorithm is described in Section 4. Section
The similarity metric: Since similarity is fundamental
5 provides the detailed experimental setup and results
to the definition of a cluster, a measure of the
for comparing the performance of the Subtractive +
similarity between two data sets from the same feature
(PSO) clustering algorithm with the Subtractive
space is essential to most clustering procedures.
algorithm, and PSO. The discussion of the
Because of the variety of feature types and scales, the
experimentβs results is also presented. Conclusions are
compute the similarity between data π π and π π . The
distance measure must be chosen carefully. Over the
drawn in Section 6.
years, two prominent ways have been proposed to
II. RELATED WORK
most popular metric for continuous features is the
The well-known partitioning algorithm is the K- Euclidean distance, given by:
π π οΏ½π ππ β π ππ οΏ½ οΏ½
2
π(π π ,π π) = οΏ½β
means algorithm [2, 6, 7, 16] and its variants. The
π=1 π
main drawback of the K-means algorithm is that the
π
cluster result is sensitive to the selection of the initial (1)
cluster centroids and may converge to the local
optimal and it generally requires a prior knowledge of which is a special case of the Minkowski metric [5] ,
the probable number of clusters for a data collection.
1οΏ½
given by:
π· π οΏ½π π , π π οΏ½ = οΏ½β ππ=1 οΏ½π ππ.π β π ππ,π οΏ½ οΏ½
ππ π π
In recent years scientists have proposed several
approaches [3] inspired from the biological collective
where π π and π π are two data vectors; π π denotes the
behaviors to solve the clustering problem, such as (2)
dimension number of the vector space; π ππ and π ππ
Genetic Algorithm (GA) [8], Particle Swarm
stand for the data π π and π π βs weight values in
Optimization (PSO), Ant clustering and Self-
Organizing Maps (SOM) [9]. In [6] authors
represented a hybrid PSO+K-means document
clustering algorithm that performed fast document dimension k.
clustering. The results indicated that the PSO+K-
The second commonly used similarity measure in
means algorithm can generate the best results in just 50
clustering is the cosine correlation measure [15], given
π π‘ππ π
cos (π π , π π ) =
iterations in comparison with the K-means algorithm
by:
and the PSO algorithm. Reference [24] proposed a
οΏ½π π ||π π|
where π π‘π , π π denotes the dot-product of the two data
Discrete PSO with crossover and mutation operators of (3)
Genetic Algorithm for document clustering. The
proposed system markedly increased the success of the
clustering problem, it tried to avoid the stagnation vectors; |.| indicates the length of the vector. Both
behavior of the particles, but it could not always avoid similarity metrics are widely used in clustering
that behavior. In [26] authors investigated the literatures.
application of the EPSO to cluster data vectors. The
EPSO algorithm was compared against the PSO A. Subtractive Clustering Algorithm
clustering algorithm which showed that the EPSO
In Subtractive clustering data points are considered
convergence is slower to lower quantization error, as candidates for the cluster centers [25]. In this
while the PSO convergence is faster to a large
method the computation complexity is linearly
quantization error. Reference [27] presented a new proportional to the number of data points and
approach to particle swarm optimization (PSO) using
www.ijorcs.org
3. A PSO-Based Subtractive Data Clustering Algorithm 3
independent of the dimension of the problem under dimensional problem space represents one solution for
Consider a collection on n data points {x1 , β¦ , xn } in
consideration. the problem. When a particle moves to a new location,
a different problem solution is generated. The fitness
position for that particle π πππ π‘ and to the fitness of the
function is evaluated for each particle in the swarm
point xi is defined as
an M-dimensional space. Since each data point is a and is compared to the fitness of the best previous
candidate for cluster centers, a density measure at data
π πππ π‘ . After finding the two best values, the π π‘β
π·ππ = β π=1 ππ₯π₯π οΏ½β οΏ½
π β₯π₯ π βπ₯ π β₯2
global best particle among all particles in the swarm
(π π β2)2
where ra is a positive constant. Hence, a data point
(4) particles evolve by updating their velocities and
π£ ππ π = π€ β π£ ππ π + π1β ππππ1 β (π πππ π‘ β π₯π₯ ππ π ) + π2
positions according to the following equations:
neighboring data points. The radius ra defines a β ππππ2 β (π πππ π‘ β π₯π₯ ππ π )
will have a high density value if it has many
radius ra contribute only slightly to the density π₯π₯ ππ π = π₯π₯ ππ π + π£ ππ π
neighborhood for a data point, Data points outside the (6)
(7)
measure.
where d denotes the dimension of the problem space;
is selected as the first cluster center. Let xc1 be the
After calculating the density measure of all data rand1, rand2 are random values in the range of (0, 1).
point selected and Dc1 is its corresponding density
points, the data point with the highest density measure The random values, rand1 and rand2, are used for the
sake of completeness, that is, to make sure that
xi are recalculated as follows:
particles explore wide search space before converging
measure. The density measure Di, for each data point around the optimal solution. c1 and c2 are constants and
are known as acceleration coefficients; The values of
π·ππ = π·ππ β π· πΆ1 ππ₯π₯π οΏ½β 2 οΏ½
β₯π₯ π β π₯ π1 β₯2
c1 and c2 control the weight balance of pbest and gbest in
π
οΏ½ ποΏ½2οΏ½
(5) deciding the particleβs next movement. w denotes the
Where, π π is a positive constant. Therefore, data
inertia weight factor; An improvement to original PSO
points close to the first cluster center π₯π₯ π1 will have
is constituted by the fact that w is not kept constant
during execution; rather, starting from a maximal
value, it is linearly decremented as the number of
π π defines a neighborhood that has a measurable
significantly reduced density measure and are unlikely iterations increases down to a minimal value [4],
reduction in the density measure. The constant rb is
to be selected as the next cluster center. The constant
π€ = (π€ β 0.4) + 0.4
initially set to 0.9, decreasing to 0.4 according to:
(ππ΄ππΌππΈπ β πΌππΈπ π΄ππΌππ)
normally larger than π π to prevent closely spaced ππ΄ππΌππΈπ
cluster centers; generally π π is equal to 1.5 π π , as
(8)
MAXITER is the maximum number of iterations,
suggested in [25]. and ITERATION represents the current number of
recalculated, the next cluster center π₯π₯ π2 is selected and
iterations. The inertia weight factor w provides the
After the density measure for each data point is necessary diversity to the swarm by changing the
momentum of particles to avoid the stagnation of
the density measures for all data points are particles at the local optima. The empirical research
recalculated. This iterative process is repeated until a conducted by Eberhart and Shi shows improvement of
sufficient number of cluster centers are generated. search efficiency through gradually decreasing the
When applying subtractive clustering to a set of value of inertia weight factor from a high value during
input-output data, each of the cluster centers represent the search.
current coordinate π₯π₯ ππ π , and its velocity π£ ππ π that
a prototype that exhibits certain characteristics of the Equation 6 indicates that each particle records its
system to be modeled. These cluster centers would be
reasonably used as the initial clustering centers for indicates the speed of its movement along the
PSO algorithm.
particleβs current velocity, π£ -vector, to its location, π₯π₯ -
dimensions in a problem space. For every generation,
the particleβs new location is computed by adding the
B. PSO Algorithm
PSO was originally developed by Eberhart and vector.
Kennedy in 1995 based on the phenomenon of
collective intelligence inspired by the social behavior The best fitness values are updated at each
πππ (π‘) ποΏ½πππ (π‘ + 1)οΏ½ β€ π( πππ (π‘))
of bird flocking or fish schooling [11]. In the PSO generation, based on
πππ (π‘ + 1) = οΏ½ οΏ½
πππ (π‘ + 1) π(πππ (π‘ + 1)) > π(πππ (π‘))
algorithm, the birds in a flock are symbolically
represented by particles. These particles can be (9)
considered as simple agents βflyingβ through a
problem space. A particleβs location in the multi-
www.ijorcs.org
4. 4 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem
It is possible to view the clustering problem as an vector space at each generation. The average distance
optimization problem that locates the optimal centroids of data objects to the cluster centroid is used as the
of the clusters rather than finding an optimal partition fitness value to evaluate the solution represented by
[18, 20, 21, 22]. This view offers us a chance to apply each particle. The fitness value is measured by the
PSO optimal algorithm on the clustering solution. The equation below:
π
β π π(π π ,π ππ )
β π=1οΏ½ οΏ½
ππ π=1
PSO clustering algorithm performs a globalized search
π =
ππ
in the entire solution space [4, 17]. Utilizing the PSO
ππ
algorithmβs optimal ability, if given enough time, the
where mij denotes the π π‘β data vector, which belongs
proposed hybrid Subtractive+(PSO) clustering (10)
to cluster i ; πππ is the centroid vector of π π‘β cluster;
algorithm can yield more compact clustering results
d(π ππ ,π ππ π ) is the distance between data point π ππ π and
compared to traditional PSO clustering algorithm.
the cluster centroid πππ ; πππ stands for the data number,
However, in order to cluster the large datasets, PSO
which belongs to clusterci ; π π stands for the cluster
requires much more iteration (generally more than 500
iterations) to converge to the optima than the hybrid
Subtractive + (PSO) clustering algorithm does.
Although the PSO algorithm is inherently parallel and number.
can be implemented using parallel hardware, such as a
computer cluster, the computation requirement for In the hybrid Subtractive + (PSO) clustering
clustering extremely huge datasets is still high. In algorithm, the Subtractive algorithm is used at the
terms of execution time, hybrid Subtractive + (PSO) initial stage to help discovering the vicinity of the
clustering algorithm is the most efficient for large optimal solution by suggesting good initial cluster
dataset [1]. centers and the number of clusters. The result from
Subtractive algorithm is used as the initial seed of the
IV. HYBRID SUBTRACTIVE + (PSO) PSO algorithm, which is applied for refining and
CLUSTERING ALGORITHM generating the final result using the global search
capability of PSO. The flow chart of the hybrid
In the hybrid Subtractive + (PSO) clustering Subtractive + (PSO) is depicted graphically in Figure
algorithm, the multidimensional vector space is 1.
modeled as a problem space. Each vector can be
represented as a dot in the problem space. The whole V. EXPERIMENTAL STUDIES
dataset can be represented as a multiple dimension
space with a large number of dots in the space. The The main purpose in this paper is to compare the
hybrid Subtractive + (PSO) clustering algorithm quality of the PSO and hybrid Subtractive + (PSO)
includes two modules, the Subtractive clustering clustering algorithm, where the quality of the
module and PSO module. At the initial stage, the clustering is measured according to the intraβcluster
Subtractive clustering module is executed to search for distances, i.e. the distance between the data vectors
the clustersβ centroid locations and the suggested and the cluster centroid within a cluster, where the
number of clusters. This information is transferred to objective is to minimize the intra-cluster distances.
the PSO module for refining and generating the final Clustering Problem: We used three different data
optimal clustering solution. collections to compare the performance of the PSO
The Subtractive clustering module: The Subtractive and hybrid Subtractive + (PSO) clustering algorithms.
clustering module predicts the optimal number of These datasets are downloaded from Machine
clusters and finds the optimal initial cluster centroids Learning Repository site [23]. A description of the test
for the next phase. datasets is given in Table 1. In order to reduce the
impact of the length variations of different data, each
The PSO clustering module: In the PSO clustering data vector is normalized so that it is of unit length.
algorithm, the whole dataset can be represented as a
Table 1: Summary of datasets
multiple dimension space with a large number of dots
particle maintains a matrix X i= (C1 , C2 , β¦, Ci , .., Ck ),
in space. One particle in the swarm represents one Number of Number of Number of
where Ci represents the ith cluster centroid vector and
possible solution for clustering the dataset. Each Instances Attributes classes
Iris 150 4 3
Wine 178 13 3
k represent the total number of clusters. According to Yeast 1484 8 10
its own experience and those of its neighbors, the
particle adjusts the centroid vector position in the
www.ijorcs.org
5. A PSO-Based Subtractive Data Clustering Algorithm 5
Start
Calculate the density function for each data point π₯π₯ ππ using equation 4.
Subtractive clustering module
Choose the cluster center which has the highest density function.
Recalculate the density function for each data point π₯π₯ ππ using equation 5.
sufficient NO
number of
cluster centers
YES
Inherit cluster centroid vectors and the number of clusters into the particles as an initial seed.
Assign each data point vector in the data set to the closest centroid vector for each particle
Calculate the fitness value based on equation 10.
Use the velocity and particle position to update equations 6 and 7 and generate the next solutions
The maximum number
of iterations is
exceeded? NO
or
The average change in
centroid vectors is less
YES PSO clustering module
STOP
Figure 1: The flowchart of hybrid Subtractive + (PSO)
Experimental Setting: In this section we present the database and 50 particles in the Yeast database. All
experimental results of the hybrid Subtractive + (PSO) these values were chosen to ensure good convergence.
clustering algorithm. For the sake of comparison, we
also include the results of the Subtractive and PSO Results and Discussion: The fitness equation (10) is
clustering algorithms. In our case the Euclidian used not only in the PSO algorithm for the fitness
distance measure is used as the similarity metrics in value calculation, but also in the evaluation of the
cluster quality. It indicates the value of the average
swarm with the result of the subtractive algorithm. π1 =
each algorithm. The performance of the clustering
distance between a data point and the cluster centroid
π2 = 1.49 and w inertia weight is according to equation
algorithm can be improved by seeding the initial
to which they belong. The smaller the value, the more
compact the clustering solution is. Table 2
(8). These values are chosen respectively based on the demonstrates the experimental results by using the
results reported in [17]. We choose number of particles Subtractive, PSO and Subtractive + (PSO) clustering
as a function of number of classes. In Iris plants algorithm respectively. For an easy comparison, the
database we chose 15 particles, 15 particles in Wine PSO and hybrid Subtractive + (PSO) clustering
www.ijorcs.org
6. 6 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem
algorithm runs 200 iterations in each experiment. For Table 2: Performance comparison Subtractive, PSO,
all the result reported, averages over more than ten Subtractive+(PSO)
simulations are given in Table 2. To illustrate the
convergence behavior of different clustering Fitness value
algorithms, the clustering fitness values at each Subtractive PSO Subtractive + PSO
iteration are recorded when these two algorithms are Iris 6.12 6.891 3.861
applied on datasets separately. As shown in Table 2, Wine 2.28 2.13 1.64
the Subtractive + (PSO) clustering approach generates
the clustering result that has the lower fitness value for Yeast 1.30 1.289 0.192
all three datasets using the Euclidian similarity metric, Figure 2 shows the suggested cluster centers done
The results from the Subtractive+(PSO) approach have by the Subtractive clustering algorithm, the cluster
improvements compared to the results of the PSO centers appear in black as shown figure. These centers
approach. will be used as the initial seed of the PSO algorithm.
Figures 3, 4, 5 illustrate the convergence behaviors of
the two algorithms on the three datasets using the
Euclidian distance as a similarity metric.
4.5
data points
Iris data points suggested clusters
4
3.5
Y
3
2.5
2
4 4.5 5 5.5 6 6.5 7 7.5 8
X
0.015
Wine data points
data points
suggested clusters
0.01
0.005
0
0 0.01 0.02 0.03 0.04 0.05
X
0.8
data points
Yeast data points
0.7
suggested clusters
0.6
0.5
Y
0.4
0.3
0.2
0.1
0 0.2 0.4 0.6 0.8
Figure 2: Suggested cluster centers by the Subtractive clustering algorithm
www.ijorcs.org
8. 8 Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem
1.4
1.2
PSO Algorithm
Hybrid Subtractive PSO Algorithm
1
0.8
gbest
0.6
0.4
0.2
0
0 20 40 60 80 100 120 140 160 180 200
iteration
Figure 5: Algorithm convergence for Yeast database
Subtractive PSO Algorithm, π πππ π‘ is the fitness of the
In all the previous figures representing Hybrid algorithm, PSO can conduct a globalized searching for
the optimal clustering, but it requires more iteration
global best particle among all particles in the swarm numbers. The subtractive clustering helps the PSO to
particle that was inherited with cluster centroid vectors start with good initial cluster centroid to converge
from Subtractive algorithm. Now, we notice that the faster with small fitness function which means a more
Subtractive + PSO algorithm has a good start and it compact result. The algorithm includes two modules,
converges quickly with lower fitness function. As the Subtractive module and the PSO module. The
shown in Figure 3, the fitness value of the Subtractive Subtractive module is executed at the initial stage to
+ PSO algorithm starts with 6.1 and it reduced sharply discover good initial cluster centroids. The result from
from 6.1 to 3.8 within 25 iterations and fixed at 3.68. the Subtractive module is used as the initial seed of the
The PSO algorithm starts with 7.4, the reduction of the PSO module to discover the optimal solution by a
fitness value in PSO is not as sharp as in Subtractive + global search and at the same time to avoid consuming
PSO and becomes smoothly after 55 iterations. The high computation. The PSO algorithm will be applied
same happened in the Figures 4 and 5. The Subtractive for refining and generating the final result.
+ PSO algorithm shows good improvement for large Experimental results illustrate that using this hybrid
dataset as shown in Figure 5. This indicates that upon Subtractive + PSO algorithm can generate better
termination the Subtractive + PSO yield minimal clustering results compared to using ordinary PSO.
fitness values. Therefore, the proposed algorithm is an
efficient and effective solution to the data clustering VII. REFERENCES
problem. [1] Khaled S. Al-Sultana, M. Maroof Khan,
"Computational experience on four algorithms for the
VI. CONCLUSION hard clustering problem". Pattern Recognition Letter,
Vol.17, No.3, pp.295β308, 1996. doi: 10.1016/0167-
This paper investigated the application of the 8655(95)00122-0
Subtractive + PSO algorithm, which is a hybrid of
[2] Michael R. Anderberg , "Cluster Analysis for
PSO and Subtractive algorithms to cluster data vectors.
Applications". Academic Press Inc., New York, 1973.
Subtractive clustering module is executed to search for
the cluster's centroid locations and the suggested [3] Pavel Berkhin, "Survey of clustering data mining
number of clusters. This information is transferred to techniques". Accrue Software Research Paper, pp.25-
71, 2002. doi: 10.1007/3-540-28349-8_2
the PSO module for refining and generating the final
optimal clustering solution. In the general PSO
www.ijorcs.org
9. A PSO-Based Subtractive Data Clustering Algorithm 9
[4] A. Carlisle, G. Dozier, "An Off-The- Shelf PSO". In [18] Michael Steinbach, George Karypis, Vipin Kumar, "A
Proceedings of the Particle Swarm Comparison of Document Clustering Techniques".
Optimization Workshop, 2001, PP: 1-6. TextMining Workshop, KDD, 2000.
[5] Krzysztof J. Cios, Witold Pedrycz, Roman W. [19] Razan Alwee, Siti Mariyam, Firdaus Aziz, K.H.Chey,
Swiniarski, "Data Mining β Methods for Knowledge Haza Nuzly, "The Impact of Social Network Structure
Discovery". Kluwer Academic Publishers, 1998. doi: in Particle Swarm Optimization for Classification
10.1007/978-1-4615-5589-6 Problems". International Journal of Soft Computing ,
[6] X. Cui, P. Palathingal, T.E. Potok, "Document Vol. 4, No. 4, 2009, pp:151-156.
Clustering using Particle Swarm Optimization". IEEE [20] Van D. M., Engelbrecht. A.P., "Data clustering using
Swarm Intelligence Symposium 2005, Pasadena, particle swarm optimization". Proceedings of IEEE
California, pp. 185 - 191. doi: Congress on Evolutionary Computation 2003, Canbella,
10.1109/SIS.2005.1501621 Australia. pp: 215-220. doi:
[7] Eberhart, R.C., Shi, Y. "Comparing Inertia Weights and 10.1109/CEC.2003.1299577
Constriction Factors in Particle Swarm Optimization". [21] Sherin M. Youssef, Mohamed Rizk, Mohamed El-
Congress on Evolutionary Computing, vol. 1, 2000, pp: Sherif, "Dynamically Adaptive Data Clustering Using
84-88. doi: 10.1109/CEC.2000.870279 Intelligent Swarm-like Agents". International Journal of
[8] Everitt, B. "Cluster Analysis". 2nd Edition, Halsted Mathematics and Computer in simulation, Vol. 1, No.2,
Press, New York, 1980. 2007.
[9] A. K. Jain , M. N. Murty , P. J. Flynn, "Data Clustering: [22] Rehab F. Abdel-Kader, "Genetically Improved PSO
A Review". ACM ComputingSurvey, Vol. 31, No. 3, Algorithm for Efficient Data Clustering". Proceeding
pp: 264-323, 1999. doi: 10.1145/331499.331504 Second International Conference on Machine Learning
and Computing 2010, pp.71-75. doi:
[10] J. A. Hartigan, "Clustering Algorithms". John Wiley 10.1109/ICMLC.2010.19
and Sons, Inc., New York, 1975.
[23] UCI Repository of Machine Learning Databases.
[11] Eberhart RC, Shi Y, Kennedy J, "Swarm Intelligence". http://www.ics.uci .edu/~mlearn/MLRepository.html .
Morgan Kaufmann, New York, 2001.
[24] K. Premalatha, A.M. Natarajan, "Discrete PSO with GA
[12] Mahamed G. Omran, Ayed Salman, Andries P. Operators for Document Clustering". International
Engelbrecht, "Image classification using particle swarm Journal of Recent Trends in Engineering, Vol. 1, No. 1,
optimization". Proceedings of the 4th Asia-Pacific 2009.
Conference on Simulated Evolution and Learning 2002,
Singapore, pp: 370-374. doi: [25] JunYing Chen, Zheng Qin, Ji Jia, "A Weighted Mean
10.1142/9789812561794_0019 Subtractive Clustering Algorithm". Information
Technology Journal, No. 7, pp.356-360, 2008. doi:
[13] S. L. Chiu, "Fuzzy model identification based on cluster 10.3923/itj.2008.356.360
estimation". Journal of Intelligent and Fuzzy Systems,
Vol. 2, No. 3, 1994. [26] Neveen I. Ghali, Nahed El-dessouki, Mervat A. N,
Lamiaa Bakraw, "Exponential Particle Swarm
[14] Salton G. and Buckley C., "Term-weighting approaches Optimization Approach for Improving Data
in automatic text retrieval". Information Processing and Clustering". International Journal of Electrical &
Management, Vol. 24, No. 5, pp: 513-523, 1988. doi: Electronics Engineering, Vol. 3, Issue 4, May 2009.
10.1016/0306-4573(88)90021-0
[27] Vijay Kalivarapu, Jung-Leng Foo, Eliot Winer,
[15] Song Liangtu, Zhang Xiaoming, "Web Text Feature "Improving solution characteristics of particle swarm
Extraction with Particle Swarm Optimization". IJCSNS optimization using digital pheromones". Structural and
International Journal of Computer Science and Network Multidisciplinary Optimization - STRUCT
Security, Vol. 7, No. 6, 2007. MULTIDISCIP OPTIM, Vol. 37, No. 4, pp: 415-427,
[16] Selim, Shokri Z., "K-means type algorithms: A 2009. doi: 10.1007/s00158-008-0240-9
generalized convergence theorem and characterization [28] H. Izakian, A. Abraham, and V. SnΓ‘sel, "Fuzzy
of local optimality". Pattern Analysis and Machine Clustering using Hybrid Fuzzy c-means and Fuzzy
Intelligence, IEEE Transactions Vol. 6, No.1, pp:81β87, Particle Swarm Optimization", World Congress on
1984. doi: 10.1109/TPAMI.1984.4767478 Nature & Biologically Inspired Computing, NaBIC
[17] Yuhui Shi, Russell C. Eberhart, "Parameter Selection in 2009. In Proc. NaBIC, pp.1690-1694, 2009. doi:
Particle Swarm Optimization". The 7th Annual 10.1109/NABIC.2009.5393618
Conference on Evolutionary Programming, San Diego,
pp. pp 591-600, 1998. doi: 10.1007/BFb0040810
How to cite
Mariam El-Tarabily, Rehab Abdel-Kader, Mahmoud Marie, Gamal Abdel-Azeem, " A PSO-Based Subtractive Data
Clustering Algorithm ". International Journal of Research in Computer Science, 3 (2): pp. 1-9, March 2013. doi:
10.7815/ijorcs. 32.2013.060
www.ijorcs.org