Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...ijtsrd
Clustering a large sparse and large scale data is an open research in the data mining. To discover the significant information through clustering algorithm stands inadequate as most of the data finds to be non actionable. Existing clustering technique is not feasible to time varying data in high dimensional space. Hence Subspace clustering will be answerable to problems in the clustering through incorporation of domain knowledge and parameter sensitive prediction. Sensitiveness of the data is also predicted through thresholding mechanism. The problems of usability and usefulness in 3D subspace clustering are very important issue in subspace clustering. . The Solutions is highly helpful benefit for police departments and law enforcement organisations to better understand stock issues and provide insights that will enable them to track activities, predict the likelihood. Also determining the correct dimension is inconsistent and challenging issue in subspace clustering .In this thesis, we propose Centroid based Subspace Forecasting Framework by constraints is proposed, i.e. must link and must not link with domain knowledge. Unsupervised Subspace clustering algorithm with inbuilt process like inconsistent constraints correlating to dimensions has been resolved through singular value decomposition. Principle component analysis is been used in which condition has been explored to estimate the strength of actionable to be particular attributes and utilizing the domain knowledge to refinement and validating the optimal centroids dynamically. An experimental result proves that proposed framework outperforms other competition subspace clustering technique in terms of efficiency, Fmeasure, parameter insensitiveness and accuracy. G. Raj Kamal | A. Deepika | D. Pavithra | J. Mohammed Nadeem | V. Prasath Kumar "Principle Component Analysis Based on Optimal Centroid Selection Model for SubSpace Clustering Model" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31374.pdf Paper Url :https://www.ijtsrd.com/computer-science/data-miining/31374/principle-component-analysis-based-on-optimal-centroid-selection-model-for-subspace-clustering-model/g-raj-kamal
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets automatically. The AMOC is an extension to standard k-means with a two phase iterative procedure combining certain validation techniques in order to find optimal clusters with automation of merging of clusters. Experiments on both synthetic and real data have proved that the proposed algorithm finds nearly optimal clustering structures in terms of number of clusters, compactness and separation.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Data imputing uses to posit missing data values, as missing data have a negative effect on the computation validity of models. This study develops a genetic algorithm (GA) to optimize imputing for missing cost data of fans used in road tunnels by the Swedish Transport Administration (Trafikverket). GA uses to impute the missing cost data using an optimized valid data period. The results show highly correlated data (R- squared 0.99) after imputing the missing data. Therefore, GA provides a wide search space to optimize imputing and create complete data. The complete data can be used for forecasting and life cycle cost analysis. Ritesh Kumar Pandey | Dr Asha Ambhaikar"Data Imputation by Soft Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14112.pdf http://www.ijtsrd.com/computer-science/real-time-computing/14112/data-imputation-by-soft-computing/ritesh-kumar-pandey
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...ijtsrd
Clustering a large sparse and large scale data is an open research in the data mining. To discover the significant information through clustering algorithm stands inadequate as most of the data finds to be non actionable. Existing clustering technique is not feasible to time varying data in high dimensional space. Hence Subspace clustering will be answerable to problems in the clustering through incorporation of domain knowledge and parameter sensitive prediction. Sensitiveness of the data is also predicted through thresholding mechanism. The problems of usability and usefulness in 3D subspace clustering are very important issue in subspace clustering. . The Solutions is highly helpful benefit for police departments and law enforcement organisations to better understand stock issues and provide insights that will enable them to track activities, predict the likelihood. Also determining the correct dimension is inconsistent and challenging issue in subspace clustering .In this thesis, we propose Centroid based Subspace Forecasting Framework by constraints is proposed, i.e. must link and must not link with domain knowledge. Unsupervised Subspace clustering algorithm with inbuilt process like inconsistent constraints correlating to dimensions has been resolved through singular value decomposition. Principle component analysis is been used in which condition has been explored to estimate the strength of actionable to be particular attributes and utilizing the domain knowledge to refinement and validating the optimal centroids dynamically. An experimental result proves that proposed framework outperforms other competition subspace clustering technique in terms of efficiency, Fmeasure, parameter insensitiveness and accuracy. G. Raj Kamal | A. Deepika | D. Pavithra | J. Mohammed Nadeem | V. Prasath Kumar "Principle Component Analysis Based on Optimal Centroid Selection Model for SubSpace Clustering Model" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31374.pdf Paper Url :https://www.ijtsrd.com/computer-science/data-miining/31374/principle-component-analysis-based-on-optimal-centroid-selection-model-for-subspace-clustering-model/g-raj-kamal
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets automatically. The AMOC is an extension to standard k-means with a two phase iterative procedure combining certain validation techniques in order to find optimal clusters with automation of merging of clusters. Experiments on both synthetic and real data have proved that the proposed algorithm finds nearly optimal clustering structures in terms of number of clusters, compactness and separation.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
Engineering Research Publication
Best International Journals, High Impact Journals,
International Journal of Engineering & Technical Research
ISSN : 2321-0869 (O) 2454-4698 (P)
www.erpublication.org
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Data imputing uses to posit missing data values, as missing data have a negative effect on the computation validity of models. This study develops a genetic algorithm (GA) to optimize imputing for missing cost data of fans used in road tunnels by the Swedish Transport Administration (Trafikverket). GA uses to impute the missing cost data using an optimized valid data period. The results show highly correlated data (R- squared 0.99) after imputing the missing data. Therefore, GA provides a wide search space to optimize imputing and create complete data. The complete data can be used for forecasting and life cycle cost analysis. Ritesh Kumar Pandey | Dr Asha Ambhaikar"Data Imputation by Soft Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14112.pdf http://www.ijtsrd.com/computer-science/real-time-computing/14112/data-imputation-by-soft-computing/ritesh-kumar-pandey
IOSR Journal of Mathematics(IOSR-JM) is an open access international journal that provides rapid publication (within a month) of articles in all areas of mathemetics and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in mathematics. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Surfactant-assisted Hydrothermal Synthesis of Ceria-Zirconia Nanostructured M...IOSR Journals
CeO2–ZrO2 oxides were prepared by the surfactant-templated method using cetyl trimethyl ammonium bromide (CTAB) as template and modified with chromium nitrate. These were characterized by XRD, FT-IR, TEM, SEM, BET and TPD-CO2. The XRD data showed that as prepared CeO2-ZrO2 powder particles have single phase cubic fluorite structure. HRTEM shows mesoscopic ordering. Average particle size is 12-13 nm as calculated from particle histogram. The nitrogen adsorption/desorption isotherm were classified to be type IV isotherm, typical of mesoporous material. The presence of uni-modal mesopores are confirmed by the pore size distribution which shows pore distribution at around 60 A°. Catalytic activity was studied towards liquid-phase oxidation of benzene.
IOSR Journal of Humanities and Social Science is an International Journal edited by International Organization of Scientific Research (IOSR).The Journal provides a common forum where all aspects of humanities and social sciences are presented. IOSR-JHSS publishes original papers, review papers, conceptual framework, analytical and simulation models, case studies, empirical research, technical notes etc.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
Multilevel techniques for the clustering problemcsandit
Data Mining is concerned with the discovery of interesting patterns and knowledge in data
repositories. Cluster Analysis which belongs to the core methods of data mining is the process
of discovering homogeneous groups called clusters. Given a data-set and some measure of
similarity between data objects, the goal in most clustering algorithms is maximizing both the
homogeneity within each cluster and the heterogeneity between different clusters. In this work,
two multilevel algorithms for the clustering problem are introduced. The multilevel
paradigm suggests looking at the clustering problem as a hierarchical optimization process
going through different levels evolving from a coarse grain to fine grain strategy. The clustering
problem is solved by first reducing the problem level by level to a coarser problem where an
initial clustering is computed. The clustering of the coarser problem is mapped back level-bylevel
to obtain a better clustering of the original problem by refining the intermediate different
clustering obtained at various levels. A benchmark using a number of data sets collected from a
variety of domains is used to compare the effectiveness of the hierarchical approach against its
single-level counterpart.
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
An Heterogeneous Population-Based Genetic Algorithm for Data Clusteringijeei-iaes
As a primary data mining method for knowledge discovery, clustering is a technique of classifying a dataset into groups of similar objects. The most popular method for data clustering K-means suffers from the drawbacks of requiring the number of clusters and their initial centers, which should be provided by the user. In the literature, several methods have proposed in a form of k-means variants, genetic algorithms, or combinations between them for calculating the number of clusters and finding proper clusters centers. However, none of these solutions has provided satisfactory results and determining the number of clusters and the initial centers are still the main challenge in clustering processes. In this paper we present an approach to automatically generate such parameters to achieve optimal clusters using a modified genetic algorithm operating on varied individual structures and using a new crossover operator. Experimental results show that our modified genetic algorithm is a better efficient alternative to the existing approaches.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
Scalable Rough C-Means clustering using Firefly algorithm..................................................................1
Abhilash Namdev and B.K. Tripathy
Significance of Embedded Systems to IoT................................................................................................. 15
P. R. S. M. Lakshmi, P. Lakshmi Narayanamma and K. Santhi Sri
Cognitive Abilities, Information Literacy Knowledge and Retrieval Skills of Undergraduates: A
Comparison of Public and Private Universities in Nigeria ........................................................................ 24
Janet O. Adekannbi and Testimony Morenike Oluwayinka
Risk Assessment in Constructing Horseshoe Vault Tunnels using Fuzzy Technique................................ 48
Erfan Shafaghat and Mostafa Yousefi Rad
Evaluating the Adoption of Deductive Database Technology in Augmenting Criminal Intelligence in
Zimbabwe: Case of Zimbabwe Republic Police......................................................................................... 68
Mahlangu Gilbert, Furusa Samuel Simbarashe, Chikonye Musafare and Mugoniwa Beauty
Analysis of Petrol Pumps Reachability in Anand District of Gujarat ....................................................... 77
Nidhi Arora
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
K-Nearest Neighbour (K-NN) is one of the popular classification algorithm, in this research K-NN use to classify internet traffic, the K-NN is appropriate for huge amounts of data and have more accurate classification, K-NN algorithm has a disadvantages in computation process because K-NN algorithm calculate the distance of all existing data in dataset. Clustering is one of the solution to conquer the K-NN weaknesses, clustering process should be done before the K-NN classification process, the clustering process does not need high computing time to conqest the data which have same characteristic, Fuzzy C-Mean is the clustering algorithm used in this research. The Fuzzy C-Mean algorithm no need to determine the first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered. The Fuzzy C-Mean has weakness in clustering results obtained are frequently not same even though the input of dataset was same because the initial dataset that of the Fuzzy C-Mean is less optimal, to optimize the initial datasets needs feature selection algorithm. Feature selection is a method to produce an optimum initial dataset Fuzzy C-Means. Feature selection algorithm in this research is Principal Component Analysis (PCA). PCA can reduce non significant attribute or feature to create optimal dataset and can improve performance for clustering and classification algorithm. The resultsof this research is the combination method of classification, clustering and feature selection of internet traffic dataset was successfully modeled internet traffic classification method that higher accuracy and faster performance.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
Clustering heterogeneous categorical data using enhanced mini batch K-means ...IJECEIAES
Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
I017235662
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 2, Ver. III (Mar – Apr. 2015), PP 56-62
www.iosrjournals.org
DOI: 10.9790/0661-17235662 www.iosrjournals.org 56 | Page
Particle Swarm Optimization based K-Prototype Clustering
Algorithm
K. Arun Prabha1
, N. Karthi Keyani Visalakshi2
1
Assistant professor, Department of Computer Technology, Vellalar College for Women, Erode, Tamilnadu,
INDIA.
2
Associate Professor, Department of Computer Applications, Kongu Engineering College, Perundurai, Erode,
Tamilnadu, INDIA.
Abstract: Clustering in data mining is a discovery process that groups a set of data so as to maximize the intra-
cluster similarity and to minimize the inter-cluster similarity. The K-Means algorithm is best suited for
clustering large numeric data sets when at possess only numeric values. The K-Modes extends to the K-Means
when the domain is categorical. But in some applications, data objects are described by both numeric and
categorical features. The K-Prototype algorithm is one of the most important algorithms for clustering this type
of data. This algorithm produces locally optimal solution that dependent on the initial prototypes and order of
object in the data. Particle Swarm Optimization is one of the simple optimization techniques, which can be
effectively implemented to enhance the clustering results. But discrete or binary Particle Swarm Optimization
mechanisms are useful for handle mixed data set. This leads to a better cost evaluation in the description space
and subsequently enhanced processing of mixed data by the Particle Swarm Optimization. This paper proposes
a new variant of binary Particle Swarm Optimization and K-Prototype algorithms to reach global optimal
solution for clustering optimization problem. The proposed algorithm is implemented and evaluated on standard
benchmark dataset taken from UCI machine learning repository. The comparative analysis proved that Particle
Swarm based on K-Prototype algorithm provides better performance than the traditional K-modes and K-
Prototype algorithms.
Keywords: Clustering, K-Means, K-Prototype Algorithm, Centroid, Particle Swarm Optimization.
I. Introduction
Data clustering is a popular approach used to implement the partitioning operation and it provides an
intelligent way of finding interesting groups when a problem becomes intractable for human analysis. It groups
data objects based on the information found in the data that describes the objects and their relationships. A
cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to
the objects in other clusters10
. Clustering has been studied in the field of machine learning and pattern
recognition and it plays an important role in data mining applications such as scientific data exploration,
information retrieval, opinion mining, and text mining. It also has a significant function in spatial database
applications, web analysis, customer relationship management, social network data analysis, bio-medical
analysis and many other related areas3
.
Clustering algorithm can be classified into Hierarchical clustering and Partitional clustering.
Hierarchical clustering algorithm creates a hierarchical decomposition of data set, represented by a tree
structure. Partitioning clustering constructs a partition of a given database of n data points into a predefined
number of clusters. The partitional methods usually lead to better results because of its the nature of iterative
and revised-type grouping method. The K-Means is one of the most widely used partitional clustering methods
due to its simplicity, versatility, efficiency, empirical success and ease of implementation. This is evidenced by
more than hundreds of publications over the last fifty five years that extend k-means in variety of ways2
.
The K-Means algorithm starts with K arbitrary centroids, typically chosen uniformly at random from
the data objects5
. Each data object is assigned to the nearest cluster centroid and then each centroid is
recalculated as the mean of all data objects assigned to it. These two steps are repeated until a predefined
termination criterion is met. The major handicap for K-Means is that it often limited to numerical data. Because,
it typically uses Euclidean or squared Euclidean distance to measure the distortion between data objects and
centroid and mean computation plays vital role in cluster identification. When mixed data are encountered,
several researchers have applied different data transformation approaches to convert one type of attributes to the
other, before executing K-Means algorithm. However, in some cases, these transformation approaches may
result in loss of information, leading to undesired clustering results8
.
K-Modes7
extends to the well known K-Means algorithm for clustering categorical data. This
approach modifies the standard K-Means process for clustering categorical data by replacing the Euclidean
distance function with the simple matching dissimilarity measure, using modes to represent cluster centroids and
2. Particle Swarm Optimization based K-Prototype Clustering Algorithm
DOI: 10.9790/0661-17235662 www.iosrjournals.org 57 | Page
updating modes with the most frequent categorical values in each iteration of clustering process. Huang9
proposed k-prototypes algorithm which is based on the k-means paradigm but removes the numeric data
limitation whilst preserving its efficiency. This algorithm integrates the K-Means and K-Modes processes to
cluster data with mixed numeric and categorical values. The random selection of starting centroids in these
algorithms may lead to different clustering results and falling into local optima. Abundant algorithms have been
developed to resolve this issue in K-Means by integrating excellent global optimization methods like Genetic
algorithms (GA), ant colony optimization, Particle Swarm Optimization and etc.The Particle Swarm
optimization (PSO)algorithms are randomized search and optimization techniques based on the concept of
swarm. They are efficient, adaptive and robust search processes, performing multi-dimensional search in order
to provide near optimal solutions of an evaluation (fitness) function in an optimization problem. In this paper,
an attempt is made to integrate PSO with K- Prototype algorithm to reach global results while clustering
categorical data.
Background
K-Prototype Clustering Algorithm
The K-Prototype algorithm integrates K-Means and K-Modes19
. It is practically more useful for mixed-type
objects. The dissimilarity between two mixed-type objects X and Y , which are described by attributes Ar
1,
Ar
2,……… Ar
p, Ar
p+1,……. Ac
m,, can be measured by
)()( ,
1
2
1
2 zxzxd j
m
pj
p
j
j
(1)
where the first term is the squared Euclidean distance measure on the numeric attributes and the second term is
the simple matching dissimilarity measure on the categorical attributes. Selection of a value is guided by the
average standard deviation (of numeric attributes20
. A suitable to balance the similarity measure is
between 0.5 and 0.7. A suitable lies between 1 and 2. Therefore,a suitable lies between 1/3and
2/3for these data sets18
. Based on this approach, work to be done to find the value of . The influence of
in the clustering process is given in Table-2.
According to Huang6
the cost function for mixed-type objects is as follows:
)(
1
c
l
k
l
r
l
PPJ
(2)
Where 2
1
,
1
, )( zxwp
p
j
ji
n
i
li
r
l
(3)
)(
1 1
,, zxwp
n
i
m
pj
jili
c
l
(4)
The above (4) has written as
m
pj
ji
p
j
ji
k
l
n
i
li zxzxwJ
1
,
2
1
,
1 1
, )),()(( (5)
Since both r
l
P and c
l
P are nonnegative, minimizing J is equivalent to minimizing r
l
P and c
l
P for 1<=l<=k.
ALGORITHM : K-Prototype Clustering Algorithm
Step 1: Select k initial prototypes for k clusters from the data set X.
Step 2: Allocate each data object in X to the cluster whose prototype is the
nearest to it according to (1).Update the prototype of the cluster after
each allocation.
Step 3: After all data objects have been allocated to a cluster, Retest the
similarity of data objects against the current prototypes. If a data
object is found that its nearest prototype belongs to another cluster
rather than its current one, reallocate the data object to that cluster
and update the prototypes of both clusters.
Step 4: Repeat Step 3 until no data object has changed clusters after a full
cycle test of X.
Particle Swarm Optimization:
PSO is an efficient and effective global optimization algorithm, which can be used to solve multimodal,
non-convex and noncontiguous problems12
. A Particle is individual object and when a number of particles are
grouped, it is termed as swarm. PSO is associated with velocity. Particles fly through the search space with
3. Particle Swarm Optimization based K-Prototype Clustering Algorithm
DOI: 10.9790/0661-17235662 www.iosrjournals.org 58 | Page
velocities dynamically adjusted velocities as per their historical behaviors. The particles therefore have the
tendency to fly towards the better and better search area all over the course of the process of search. The
particles try to achieve to global minimum by using global and local best information3
. PSO is working based
on the intelligence and the search can be carried out by the speed of the particle. The PSO algorithm operates
iteration by iteration and solution produced in each iteration is compared with self-local best and global best of
swarm.
PSO algorithm consists of following steps:
ALGORITHM : PSO Algorithm
Step 1 : Initialize each particle with random position and Velocity
Step 2 : Evaluate the fitness of each particle
Step 3 : Update pbest and Gbest of each particle
Step 4 : Update velocity and position of each particle using (6) and (7)
respectively
))(())(()()1( 1211 tXGrctXPrctVwtV pbestpbestpp (6)
)1()()1( tVtXtX ppp (7)
Step 5 : Terminate till the condition met.
The inertia weight w is calculated for each iteration using (8).
titerationsofnoWWW )..max/( minmax (8)
The main advantage of PSO is that it has less parameter to adjust and fast convergence, when it is compared
with many global optimization algorithms like Genetic algorithms (GA), Simulated Annealing (SA) and other
global optimization algorithms.
Discrete or Binary Particle Swarm Optimization
PSO is designed for continuous function optimization problems17
. It is not used for discrete function
optimization problems. To overcome this problem discrete or binary PSO is proposed. The major difference
between binary PSO and ordinary PSO is that the velocities and positions of the particles are defined in terms of
the changes of probabilities and the particles are formed by integers in {0,1}. Therefore a particle flies in a
search space restricted to zero or one. The speed of the particle must be considered to the interval [0,1]. A
logistic sigmoid transformation function S(vi(t+1)) is shown in the following equation,
))1((1
1
))1((
tve
tvS
i
i (9)
The new position of the particle is obtained by using the following equation )1( tX p = 1 if r3 < ))1(( tvS i
otherwise 0.Where r3 is the uniform random number in the range [0,1].
Related Research
This section reviews various algorithms proposed for mixed numeric and categorical data and recently
published Particle Swarm Optimization based K-Means algorithms.
Zhexue Huang6
proposed K-Modes to introduce new dissimilarity measures and to deal with
categorical objects, which replace the means of clusters with modes and activate the use a frequency based
method to update modes in the clustering process to minimize the clustering cost function. Zhexue Huang and
Michael K. Ng20
formulated a fuzzy K-Mode approach to the K-Means paradigm to cluster large categorical
data sets efficiently. Michael K. Ng, Mark Junjie Liy Joshua, Zhexue Huang and Zengyou He18
has derived the
updating formula of the K-Modes clustering algorithm with the new dissimilarity measure for the convergence
of the algorithm under the optimization framework.
Zhexue Huang7
proposed a method to dynamically update the K-Prototypes in order to maximize the
intra cluster similarity of objects. An improved multi-level clustering algorithm based on k-prototype proposed
by LI Shi-jin, ZHU Yue-long, LIU Jing13
. The low purity problem was occurred when k-prototype algorithm
was working to process complex data sets. To tackle this issue, the new algorithm was proposed. In order to
improve the quality of clustering, re-clustering was performed on those clusters with low-purity through
automatic selection of attributes. Extension to the K-Prototypes algorithms, hard and fuzzy K is proposed by
Wei-Dong Zhao, Wei-Hui Dai, and Chun-Bin Tang21
. It focuses on effects of attribute values with different
frequencies on clustering accuracy to propose new update method for centroids. A fuzzy K-Prototype
clustering algorithm for mixed numeric and categorical data are proposed by Jinchao Ji, Wei Pang, Chunguang
Zhou, Xiao Han, Zhe Wang11
. In this paper, mean and fuzzy centroid are combined to represent the prototype of
a cluster, and employed in a new measure based on co-occurrence of values, to evaluate the dissimilarity
between data objects and prototypes of clusters. This measure also takes into account the significance of
different attributes towards the clustering process. An algorithm for clustering mixed data is formulated.
4. Particle Swarm Optimization based K-Prototype Clustering Algorithm
DOI: 10.9790/0661-17235662 www.iosrjournals.org 59 | Page
An improved k-prototype clustering algorithm for mixed numeric and categorical data proposed by
Jinchao Ji, Tian Bai, Chunguang Zhou, Chao Ma, Zhe Wang10
. In this paper, the concept of the distribution
centroid for representing the prototype of categorical attributes in a cluster was introduced. Then combine both
mean with distribution centroid to represent the prototype of the cluster with mixed attributes, and thus propose
a new measure to calculate the dissimilarity between data objects and prototypes of clusters. This measure takes
into account the significance of different attributes towards the clustering process for mixed datasets. Izhar
Ahmad9
compared the performance of K-Means and K-Prototype Algorithm. In this research a detail discussion
of the K-Means and K-Prototype to recommend efficient algorithm for outlier detection and other issues relating
to the database clustering. The verification and validation of the system is based on the simulation.
R. Madhuri, M. Ramakrishna Murty, J. V. R. Murthy, P. V. G. D. Prasad Reddy, Suresh C. Satapathy14
proposed two algorithms namely K-Modes and K-Prototype algorithms for clustering categorical data sets. And
also reduce the cost functions.
Particle Swarm Optimization based K-Means clustering approach for security assessment in power
systems was proposed by S.Kalyani, K.S.Swarup12
. This paper demonstrates how the traditional K-Means
clustering algorithm can be profitably modified to be used as a classifier algorithm. The proposed algorithm
combines the Particle Swarm Optimization (PSO) with the traditional K-Means algorithm to satisfy the
requirements of a classifier. Omar S. Soliman, Doaa A. Saleh, and Samaa Rashwan, T. Huang et al19
proposed
bio inspired fuzzy K-modes clustering algorithm. It integrates concepts of FK-Modes algorithm to handle the
uncertainty phenomena and FPSO to reach global optimal solution of clustering optimization problem. K. Arun
Prabha, N. Karthikeyani Visalakshi3
proposed an effective partitional clustering algorithm which is developed
by integrating the merits of Particle Swarm Optimization and normalization with traditional K-Means clustering
algorithms. Improved global-best particle swarm optimization algorithm with mixed-attribute data classification
capability proposed by Nabila Nouaoria and Mounir Boukadoum17
. In this algorithm, a new particle-position
update mechanism is proposed to handle mixed data. This interpretation mechanism uses the frequencies of non
numerical attributes. This enhanced algorithm gives better cost function and processing of mixed attribute data.
Proposed Algorithm
PSO based K-Prototype Clustering Algorithm
K-Prototype Clustering is an effective algorithm for clustering mixed type data sets. The dependency of
the algorithm on the initialization of the centers is a major problem and its usually gets stuck in local optima. To
solve this issue, PSO and K-Prototype algorithms are combined. The proposed algorithm does not depend on the
initial clusters. It can be avoided by being trapped in local optimal solutions. In this method, the process is
initialized with a group of random population N. A population is called a Swarm.
The PSO based K-Prototype algorithms consists of the following steps :
ALGORITHM : PSO based K-Prototype Clustering Algorithm
Input : Data of n objects with d features, PSO Parameters 1c , p, 1r , 2r ,
maxW , minW , the value of and the value of K
Output : K clusters
Procedure
Step 1 : Initialize a population of particle with small random positions, xp
and velocities, vp of the pth particle on problem space of K x D
dimensions.
Step 2 : Initialize the PSO parameters c1, c2 , r1, r2, p, wmax and wmin.
Step 3 : Repeat the step 4 to step 11
Step 4 : Start the procedure and set the iterative count t=1.
Step 5 : Run the following steps in the K-Prototype algorithm ,
For every object in the population
i. Calculate Squared Euclidean distance measure for numeric
data
ii. Calculate the simple matching dissimilarity measure on the
categorical attributes.
iii. Assign each data object to nearest cluster center.
Step 6: After grouping the data objects based on the minimum distance,
Calculate the cost function using (5) for every object
Step 7 : Compute pbest based on the cost function
Step 8 : After updating pbest, choose the best value among the particles in
pbest and assign to Gbest ,i.e., If pbest < Gbest , then Gbest = pbest
Step 9 : Modify the velocity using the equations (6) and update the new
position of each particle using the equations (7) and (9)
respectively.
Step 10 : Compute t = t + 1
Step 11 : Check the convergence criteria, which may be a good fitness
value or a maximum number of iterations.
5. Particle Swarm Optimization based K-Prototype Clustering Algorithm
DOI: 10.9790/0661-17235662 www.iosrjournals.org 60 | Page
A Swarm consists of X particles moving around a D-dimensional search space. Given a data set X =
{x1, x2, x3……, xN}, where xi is a data pattern in a D dimensional feature space, each particle is of dimension K
x D, K being the number of clusters for partitioning the data set X. The position of the ith object is represented
by xi =(xi1, xi2, xi3,… xiD,) and its velocity is represented as vi = (vi1, vi2 , vi3, … viD) where i is the index of the
object and D is the dimensionality of the search space. PSO maintains a population of particles, each one
characterized by a position vector in the search space and a velocity vector which determines its motion. The
velocity is calculated based on i)particle’s current direction, ii) each particle is attracted to the best position it
has achieved so far and iii) each particle is attracted to the best particle in population. Each object declares its
best velocity by pbest and the best value in group as Gbest. Here K-Prototype clustering algorithm is executed to
find the optimum value. The Squared Euclidean distance is calculated for every numeric object in the datasets.
For categorical attribute Huang’s simple dissimilarity measure is applied. The evaluation of the previous pbest
value is compared with the current pbest value in terms of cost function. If the current position is better than pbest
it is contended as Gbest or else the previous of the pbest can be retained as Gbest. The position and velocity of the
ith object are updated by pbesti and Gbest in the each generation. After finding the two best values, the particle
updates its velocity and position with the equations (6) and (7) respectively. The value of inertia weight w is
calculated using the equation(8). The inertia weight w controls the impact of the previous velocity. The particles
cannot fly continuously through a discrete-valued space. So here the discrete values of the particles are
converted into the continuous values based on discrete or binary PSO. The new position update mechanism is
implemented by the equation (9). For every object in the dataset K-Prototype cost function is calculated using
the Equation(5) and the pbest is found. This leads to better cost function evaluation in the description space. This
enhanced procedure proposed to handle mixed data attributes with the PSO. Here the typical value of c1 and c2
is taken as 2.0, r1 and r2 is a random number generated between 0 and 1. A linearly decreasing inertia weight
(w) was implemented by starting at wmax = 0.9 and wmin = 0.2 . This helps to expand the search space in the
beginning so that the particles can explore new areas, implying a global search. The value can be formulated
based on the average standard deviation ( of numeric attributes. Therefore, a suitable lies between
1/3and 2/3for these data sets15
. In this paper the value of isevaluated and it stands between 0.5 and
0.7. The proposed method enhances the convergence speed of PSO and aids in tracing the initial centriod K-
Prototype clustering algorithm. The swarm based algorithms aids to analyze the global optimal solutions.
II. Results And Discussions
The experiment analysis is performed with Hepatitis, Post operative patient, Australian Credit
Approval, German Credit Data and Stat log Heart benchmark data sets available in the UCI machine learning
repository15
. The details of the data sets are given in the following Table-1.
The performance of K-Modes, K-Prototype and PSO based K-Prototype algorithm is measured in
terms of four external validity measures namely Rand Index, Jaccard Index , F-Measure and Entropy. The
external validity measures test the quality of clusters by comparing the results of clustering with the ‘ground
truth’ (true class labels). All these four measures have a value between 0 and 1. In case of Rand Index, Jaccard
Index and F-Measure, the value 1 indicates that the data clusters are exactly same and so increase in the values
of these measures proves the better performance.
Table-1: Details of datasets
S. No. Dataset
No. of
Instances
No. of
Attributes
No. of
Classes
1. Hepatitis 155 19 2
2. Post operative patient 90 8 3
3.
Australian Credit
Approval
690 14 2
4. German Credit Data 1000 20 2
5. Statlog Heart 270 13 2
The results of PSO based K-Prototype clustering algorithm, in comparison with the results of K-Modes
algorithm, K-Prototype algorithm in terms of Rand Index, Jaccard Index , Entropy and F-Measure are shown
in Table-3, Table-4, Table-5 and Table-6 respectively.By means of analysis the details of the data sets and
corresponding lambda values for the five benchmark datasets are shown in the following Table-2.
According to Rand Index, the performance of PSO-KP clustering yields consistent and improved
results than K-Modes and K-Prototype Algorithm in almost all datasets. From the Table-3 it is observed that
PSO-KP algorithm yields consistent and better results for Stat log Heart ,Hepatitis, German Credit Data than
Post operative patient, and Australian Credit Approval data sets.
6. Particle Swarm Optimization based K-Prototype Clustering Algorithm
DOI: 10.9790/0661-17235662 www.iosrjournals.org 61 | Page
Table-2: Details of data sets and lamda value
Dataset Hepatitis
Post
operative
patient
Australian
Credit
Approval
German
Credit
Data
Statlog
Heart
Lambda vaue 0.0533 0.1055 0.0680 0.0995 0.0845
Table-3: Comparative analysis based on Rand index
S. No Dataset K-Modes
K-
Prototype
PSO based
K-Prototype
1. Hepatitis 0.6234 0.6171 0.6723
2.
Post operative
patient
0.4815 0.4822 0.4998
3.
Australian
Credit Approval
0.6565 0.6853 0.6971
4.
German Credit
Data
0.5000 0.5144 0.5312
5. Statlog Heart 0.5988 0.7029 0.7429
From the Table-4, based to Jaccard Index, the performance of PSO-KP algorithm yields consistent and
better results for data sets. K-Modes and K-Prototype Algorithm in almost all datasets.
Table-4:Comparative analysis based on Jaccard Index
S.
No.
Dataset K-Modes
K-
Prototype
PSO based
K-Prototype
1. Hepatitis 0.5225 0.6099 0.6723
2.
Post operative
patient
0.3598 0.3870 0.3950
3.
Australian
Credit
Approval
0.5019 0.5240 0.5326
4.
German Credit
Data
0.3839 0.3959 0.4141
5. Statlog Heart 0.4424 0.5505 0.5705
In case of F-Measure, the value 1 indicates that the data clusters are exactly same and so the increase in
the values of these measures proves the better performance. Based on this, the results of PSO-KP is appreciable
than K-Modes and K-Prototype algorithm for all datasets represented in Table 5.
Table-5: Comparative analysis based on F-measure
S. No Dataset K-Modes K-Prototype
PSO based
K-Prototype
1. Hepatitis 0.7266 0.7217 0.7521
2.
Post operative
patient
0.5486 0.5615 0.5752
3.
Australian
Credit
Approval
0.7829 0.8039 0.8229
4.
German
Credit Data
0.5921 0.6041 0.6261
5. Statlog Heart 0.7258 0.8197 0.8387
The decrease in the values of Entropy measure proves the better performance. Based on that the
performance of PSO-KP based on Entropy is highly significant than K-Modes and K-Prototype for all dataset
except Australian Credit Approval represented in Table-6.
Table-6 : Comparative analysis based on Entropy
S. No Dataset K-Modes K-Prototype
PSO based
K-Prototype
1. Hepatitis 0.4060 0.3625 0.3421
2.
Post operative
patient
0.6662 0.6635 0.6231
3.
Australian
Credit
Approval
0.5267 0.4896 0.5725
4. German
Credit Data
0.6012 0.6090 0.5961
5. Statlog Heart 0.5550 0.4735 0.4635
7. Particle Swarm Optimization based K-Prototype Clustering Algorithm
DOI: 10.9790/0661-17235662 www.iosrjournals.org 62 | Page
Based on the comparative analysis, it is concluded that that PSO-KP algorithm proves better
performance for all experimented mixed numeric and categorical datasets. It shows the superiority of the
proposed algorithm to produce the optimal number of clusters.
III. Conclusion
This paper proposed PSO based K-Prototype Clustering algorithm by incorporating the benefit of PSO
with the existing K-Prototype algorithm, to reach the global optimum cluster solution. The proposed algorithm
has been tested on the five benchmark data sets which include both numeric and categorical attributes. It is
proved that the performance of the proposed algorithm is superior to the performance of conventional K-Modes
and K-Prototype clustering algorithms. In future, appropriate optimization algorithm will be applied for tuning
of parameter to produce superior quality clusters. The global cluster results can further be improved by setting
alternate values for the parameters of PSO.
References
[1]. Amir Ahmad and Lipika Dey, Algorithm for Fuzzy Clustering of Mixed Data with Numeric and Categorical Attributes,
Proceedings of ICDCIT, 561-572 (2005)
[2]. Anil K. Jain, Data Clustering: 50 Years Beyond K-Means, Pattern Recognition Letters, (2009)
[3]. Arun Prabha K. and Karthikeyani Visalakshi N., Improved Particle Swarm Optimization based K-Means Clustering, International
Conference on Intelligent Computing Applications, (2014)
[4]. Han J. and Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, (2006)
[5]. He Z., Xu X., Deng S., Scalable Algorithms for Clustering Mixed Type Attributes in Large datasets, International Journal of
Intelligent Systems, 20, 1077-1089 (2005)
[6]. Huang Z., Extensions to the K-Means algorithm for clustering large data sets with categorical values, DataMining and Knowledge
Discovery , 2(3), 283-304 (1998)
[7]. Huang Z., Clustering large data sets with mixed numeric and categorical Values, Proceedings of the FirstAsia Confernce on
Knowledge Discovery and Data Mining, 21-34 (1997)
[8]. Huang Z., A fast clustering algorithm to cluster very large categorical data sets in data mining, proceedings of the SIGMOD
Workshop on Research Issues on Data Mining and Know ledge Discovery, Tucson, Arizona, USA, 1-8 (1997a)
[9]. Izhar Ahmad, K-Mean and K-Prototype Algorithms Performance Analysis, American Research Institute for Policy Development,
2(1), 95-109 (2014).
[10]. Jinchao Ji., Tian Bai., Chunguang Zhou., Chao Ma., Zhe Wang., An Improved K-Prototypes clustering algorithm for mixed
numeric and categorical data, Image Feature Detection and Description, 20, 590-596 (2013)
[11]. Jinchao Ji., Wei Pang., Chunguang Zhou., Xiao Han., Zhe Wang., A fuzzy K-Prototype clustering algorithm for mixed numeric and
categorical data, Knowledge-Based Systems, 30, 129-135 (2012)
[12]. Kalyani S. and Swarup K.S., Particle swarm optimization based K-Means clustering approach for security assessment in power
systems, Expert systems with applications, 38, 10839-10846 (2011)
[13]. LI Shi-jin, ZHU Yue-long, LIU Jing, An improved multi-level clustering algorithm based on k-prototype, Journal of Software,
(2005)
[14]. Madhuri R., Ramakrishna Murty M., Murthy J.V.R , Prasad Reddy P.V.G.D., Suresh Satapathy C., Cluster Analysis on Different
Data Sets Using K-Modes and K-Prototype Algorithms, Advances in Intelligent Systems and Computing, 249, 137-144 (2014)
[15]. Merz C.J., Murphy P.M., UCI repository of machine learning data bases, Irvine, Universityof California,
http://www.ics.uci.eedu/~mlearn/, (1998)
[16]. Ming-Yi Shih, Jar-Wen Jheng, Lien-Fu Lai, A Two-Step Method for Clustering Mixed Categorical and Numeric Data, Tamkang
Journal of Science and Engineering, 13(1), 11-19 (2010)
[17]. Nabila Nouaouria , Mounir Boukadoum, Improved global-best particle swarm optimization algorithm with mixed-attribute data
classification capability , Applied Soft Computing, 21, 554–567 (2014)
[18]. Ng, Li M.J., Huang J.Z., He Z., On the impact of dissimilarity measure in K-Modes clustering algorithm , IEEE Transactions on
Pattern Analysis and Machine Intelligence , 29(3) , 503–507 (2007)
[19]. Omar S. Soliman., Doaa A. Saleh., and Samaa Rashwan., A Bio Inspired Fuzzy K-Modes Clustring Algorithm , ICONIP, Part III,
LNCS 7665, 663–669 (2012)
[20]. Sotirios P. Chatzis, A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing
a probabilistic dissimilarity functions, Expert Systems with Applications, 38, 8684–8689 (2011)
[21]. Wei-Dong Zhao, Wei-Hui Dai, Chun-Bin Tang, K-Centrs Algorithm for Clustering Mixed type data, PAKDD, 1140-1147 (2007)
[22]. Zhexue Huang and Michael K. Ng, A Fuzzy K-Modes Algorithm for Clustering Categorical Data, IEEE Transactions on Fuzzy
Systems, 7(4) , 446-452
[23]. Zhi Zheng , Maoguo Gong , Jingjing Ma,; Licheng Jiao ,Unsupervised evolutionary clustering algorithm for mixed type data ,
Evolutionary Computation (CEC), 1-8 (2010)