Clustering mixed type data is one of the major research topics in the area of data mining. In
this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution
centroid is used to represent the prototype of categorical variables in a cluster which is then combined
with the mean to represent the prototype of clusters with mixed type variables. In the method, data is
observed from different views and the variables are grouped into different views. Those instances that
can be viewed differently from different viewpoints can be defined as multiview data. During clustering
process the differences among views are ignored in usual cases. Here, both views and variables weights
are computed simultaneously. The view weight is used to determine the closeness or density of view and
variable weight is used to identify the significance of each variable. With the intention of determining
the cluster of objects both these weights are used in the distance function. In the proposed method,
enhancement to the k-prototypes is done so that it automatically computes both view and variable
weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering
algorithms.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
This document provides an overview of different techniques for clustering categorical data. It discusses various clustering algorithms that have been used for categorical data, including K-modes, ROCK, COBWEB, and EM algorithms. It also reviews more recently developed algorithms for categorical data clustering, such as algorithms based on particle swarm optimization, rough set theory, and feature weighting schemes. The document concludes that clustering categorical data remains an important area of research, with opportunities to develop techniques that initialize cluster centers better.
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
This document discusses various methods for customer segmentation through analysis of massive customer transaction data, including K-Means clustering, PAM clustering, agglomerative clustering, divisive clustering, and density-based clustering. It finds that K-Means is the most commonly used partitioning method. The document also reviews related work on customer segmentation and clustering algorithms like CLARA, CLARANS, BIRCH, ROCK, CHAMELEON, CURE, DHCC, DBSCAN, and LOF. It proposes a framework for an online shopping site that would apply these techniques to group customers based on their product preferences in transaction data.
The document discusses implementing an integrated approach of the K-means clustering algorithm for prediction analysis. It begins with motivating the need to improve the accuracy and dependability of existing overlapping K-means clustering by removing its dependency on random initialization parameters. The proposed methodology determines the optimal number of clusters K based on the dataset, calculates initial centroid positions using a harmonic means method, and applies overlapping K-means clustering. The implementation and results on two large datasets show the integrated approach outperforms original overlapping K-means in terms of accuracy, F-measure, Rand index, and number of iterations.
The document discusses clustering documents using a multi-viewpoint similarity measure. It begins with an introduction to document clustering and common similarity measures like cosine similarity. It then proposes a new multi-viewpoint similarity measure that calculates similarity between documents based on multiple reference points, rather than just the origin. This allows a more accurate assessment of similarity. The document outlines an optimization algorithm used to cluster documents by maximizing the new similarity measure. It compares the new approach to existing document clustering methods and similarity measures.
This document discusses a hybridization of the Magnetic Charge System Search (MCSS) method for efficient data clustering. MCSS is a meta-heuristic algorithm inspired by electromagnetic theory that has shown potential but also has issues with convergence rate and getting stuck in local optima. The authors propose a Hybrid MCSS (HMCSS) that incorporates a local search strategy and differential evolution inspired updating to improve convergence. An experiment on benchmark functions and real clustering problems shows HMCSS provides better results than existing algorithms and enhances MCSS convergence.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
This document provides an overview of different techniques for clustering categorical data. It discusses various clustering algorithms that have been used for categorical data, including K-modes, ROCK, COBWEB, and EM algorithms. It also reviews more recently developed algorithms for categorical data clustering, such as algorithms based on particle swarm optimization, rough set theory, and feature weighting schemes. The document concludes that clustering categorical data remains an important area of research, with opportunities to develop techniques that initialize cluster centers better.
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
This document discusses various methods for customer segmentation through analysis of massive customer transaction data, including K-Means clustering, PAM clustering, agglomerative clustering, divisive clustering, and density-based clustering. It finds that K-Means is the most commonly used partitioning method. The document also reviews related work on customer segmentation and clustering algorithms like CLARA, CLARANS, BIRCH, ROCK, CHAMELEON, CURE, DHCC, DBSCAN, and LOF. It proposes a framework for an online shopping site that would apply these techniques to group customers based on their product preferences in transaction data.
The document discusses implementing an integrated approach of the K-means clustering algorithm for prediction analysis. It begins with motivating the need to improve the accuracy and dependability of existing overlapping K-means clustering by removing its dependency on random initialization parameters. The proposed methodology determines the optimal number of clusters K based on the dataset, calculates initial centroid positions using a harmonic means method, and applies overlapping K-means clustering. The implementation and results on two large datasets show the integrated approach outperforms original overlapping K-means in terms of accuracy, F-measure, Rand index, and number of iterations.
The document discusses clustering documents using a multi-viewpoint similarity measure. It begins with an introduction to document clustering and common similarity measures like cosine similarity. It then proposes a new multi-viewpoint similarity measure that calculates similarity between documents based on multiple reference points, rather than just the origin. This allows a more accurate assessment of similarity. The document outlines an optimization algorithm used to cluster documents by maximizing the new similarity measure. It compares the new approach to existing document clustering methods and similarity measures.
This document discusses a hybridization of the Magnetic Charge System Search (MCSS) method for efficient data clustering. MCSS is a meta-heuristic algorithm inspired by electromagnetic theory that has shown potential but also has issues with convergence rate and getting stuck in local optima. The authors propose a Hybrid MCSS (HMCSS) that incorporates a local search strategy and differential evolution inspired updating to improve convergence. An experiment on benchmark functions and real clustering problems shows HMCSS provides better results than existing algorithms and enhances MCSS convergence.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
This document compares the performance of K-means and adaptive K-means clustering algorithms on different images. It finds that adaptive K-means clustering more accurately detects tumor regions in MRI brain images and the area of a lake in a satellite image, compared to K-means clustering. This is evaluated by comparing the time taken, peak signal-to-noise ratio, and root mean square error between the original and segmented images. Adaptive K-means clustering does not require pre-specifying the number of clusters, which allows it to better segment images without user input.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
The improved k means with particle swarm optimizationAlexander Decker
This document summarizes a research paper that proposes an improved K-means clustering algorithm using particle swarm optimization. It begins with an introduction to data clustering and types of clustering algorithms. It then discusses K-means clustering and some of its drawbacks. Particle swarm optimization is introduced as an optimization technique inspired by swarm behavior in nature. The proposed algorithm uses particle swarm optimization to select better initial cluster centroids for K-means clustering in order to overcome some limitations of standard K-means. The algorithm works in two phases - the first uses particle swarm optimization and the second performs K-means clustering using the outputs from the first phase.
Cluster analysis is an unsupervised machine learning technique that groups similar data objects into clusters. It finds internal structures within unlabeled data by partitioning it into groups based on similarity. Some key applications of cluster analysis include market segmentation, document classification, and identifying subtypes of diseases. The quality of clusters depends on both the similarity measure used and how well objects are grouped within each cluster versus across clusters.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...TELKOMNIKA JOURNAL
Indonesian government agencies under the Ministry of Energy and Mineral Resources have
problems in classifying data dictionary of coal. This research conduct grouping coal dictionary using KMeans
and MeanShift algorithm. K-means algorithm is used to get cluster value on character and word
criteria. The last iteration of Euclidian distance calculation data on k-means combine with Meanshift
algorithm. The meanshift calculates centroid by selecting different bandwidths. The result of grouping
using k-means and meanshift algorithm shows different centroid to find optimum bandwidth value. The
data dictionary of this research has sorted in alphabetically.
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
This document proposes a new multi-viewpoint based similarity measure for clustering text documents that aims to overcome limitations of existing measures. Existing measures use a single viewpoint to measure similarity between documents, but the proposed measure uses multiple viewpoints to ensure clusters exhibit all relationships between documents. The empirical study found that using a multi-viewpoint similarity measure forms more meaningful clusters by capturing more informative relationships between documents.
Survey on Unsupervised Learning in DataminingIOSR Journals
This document summarizes unsupervised learning techniques in data mining. It discusses clustering methods like partitioning and hierarchical clustering. Partitioning methods include k-means clustering and density-based clustering. K-means aims to minimize variance within clusters. Density-based clustering finds clusters as areas of high density separated by low density. Hierarchical clustering is agglomerative or divisive, building clusters either bottom-up or top-down. Agglomerative clustering starts with each point as a cluster and merges the closest pairs.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
This document discusses multidimensional clustering methods for data mining and their industrial applications. It begins with an introduction to clustering, including definitions and goals. Popular clustering algorithms are described, such as K-means, fuzzy C-means, hierarchical clustering, and mixture of Gaussians. Distance measures and their importance in clustering are covered. The K-means and fuzzy C-means algorithms are explained in detail. Examples are provided to illustrate fuzzy C-means clustering. Finally, applications of clustering algorithms in fields such as marketing, biology, and earth sciences are mentioned.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
The document presents a mathematical programming approach for selecting important variables in cluster analysis. It formulates a nonlinear binary model to minimize the distance between observations within clusters, using indicator variables to select important variables. The model is applied to a sample dataset of 30 observations across 5 variables, correctly identifying variables 3, 4 and 5 as most important for clustering the observations into two groups. The results are compared to an existing variable selection heuristic, with the mathematical programming approach achieving a 100% correct classification versus 97% for the other method.
This document presents a feature clustering algorithm to reduce the dimensionality of feature vectors for text classification. The algorithm groups words in documents into clusters based on similarity, with each cluster characterized by a membership function. Words not similar to existing clusters form new clusters. This avoids specifying features in advance and the need for trial and error. Experimental results showed the method can classify text faster and with better extracted features than other methods.
This document summarizes Chapter 10 of the book "Data Mining: Concepts and Techniques (3rd ed.)" which covers cluster analysis. The chapter introduces different types of clustering methods including partitioning methods like k-means and k-medoids, hierarchical methods, density-based methods, and grid-based methods. It discusses how to evaluate the quality of clustering results and highlights considerations for cluster analysis such as similarity measures, clustering space, and challenges like scalability and high dimensionality.
The document discusses different clustering techniques used for grouping large amounts of data. It covers partitioning methods like k-means and k-medoids that organize data into exclusive groups. It also describes hierarchical methods like agglomerative and divisive clustering that arrange data into nested groups or trees. Additionally, it mentions density-based and grid-based clustering and provides algorithms for different clustering approaches.
This document discusses hierarchical clustering and similarity measures for document clustering. It summarizes that hierarchical clustering creates a hierarchical decomposition of data objects through either agglomerative or divisive approaches. The success of clustering depends on the similarity measure used, with traditional measures using a single viewpoint, while multiviewpoint measures use different viewpoints to increase accuracy. The paper then focuses on applying a multiviewpoint similarity measure to hierarchical clustering of documents.
An Empirical Study on Identification of Strokes and their Significance in Scr...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
This document compares the performance of K-means and adaptive K-means clustering algorithms on different images. It finds that adaptive K-means clustering more accurately detects tumor regions in MRI brain images and the area of a lake in a satellite image, compared to K-means clustering. This is evaluated by comparing the time taken, peak signal-to-noise ratio, and root mean square error between the original and segmented images. Adaptive K-means clustering does not require pre-specifying the number of clusters, which allows it to better segment images without user input.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
The improved k means with particle swarm optimizationAlexander Decker
This document summarizes a research paper that proposes an improved K-means clustering algorithm using particle swarm optimization. It begins with an introduction to data clustering and types of clustering algorithms. It then discusses K-means clustering and some of its drawbacks. Particle swarm optimization is introduced as an optimization technique inspired by swarm behavior in nature. The proposed algorithm uses particle swarm optimization to select better initial cluster centroids for K-means clustering in order to overcome some limitations of standard K-means. The algorithm works in two phases - the first uses particle swarm optimization and the second performs K-means clustering using the outputs from the first phase.
Cluster analysis is an unsupervised machine learning technique that groups similar data objects into clusters. It finds internal structures within unlabeled data by partitioning it into groups based on similarity. Some key applications of cluster analysis include market segmentation, document classification, and identifying subtypes of diseases. The quality of clusters depends on both the similarity measure used and how well objects are grouped within each cluster versus across clusters.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
K Means Clustering and Meanshift Analysis for Grouping the Data of Coal Term ...TELKOMNIKA JOURNAL
Indonesian government agencies under the Ministry of Energy and Mineral Resources have
problems in classifying data dictionary of coal. This research conduct grouping coal dictionary using KMeans
and MeanShift algorithm. K-means algorithm is used to get cluster value on character and word
criteria. The last iteration of Euclidian distance calculation data on k-means combine with Meanshift
algorithm. The meanshift calculates centroid by selecting different bandwidths. The result of grouping
using k-means and meanshift algorithm shows different centroid to find optimum bandwidth value. The
data dictionary of this research has sorted in alphabetically.
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
This document proposes a new multi-viewpoint based similarity measure for clustering text documents that aims to overcome limitations of existing measures. Existing measures use a single viewpoint to measure similarity between documents, but the proposed measure uses multiple viewpoints to ensure clusters exhibit all relationships between documents. The empirical study found that using a multi-viewpoint similarity measure forms more meaningful clusters by capturing more informative relationships between documents.
Survey on Unsupervised Learning in DataminingIOSR Journals
This document summarizes unsupervised learning techniques in data mining. It discusses clustering methods like partitioning and hierarchical clustering. Partitioning methods include k-means clustering and density-based clustering. K-means aims to minimize variance within clusters. Density-based clustering finds clusters as areas of high density separated by low density. Hierarchical clustering is agglomerative or divisive, building clusters either bottom-up or top-down. Agglomerative clustering starts with each point as a cluster and merges the closest pairs.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
This document discusses multidimensional clustering methods for data mining and their industrial applications. It begins with an introduction to clustering, including definitions and goals. Popular clustering algorithms are described, such as K-means, fuzzy C-means, hierarchical clustering, and mixture of Gaussians. Distance measures and their importance in clustering are covered. The K-means and fuzzy C-means algorithms are explained in detail. Examples are provided to illustrate fuzzy C-means clustering. Finally, applications of clustering algorithms in fields such as marketing, biology, and earth sciences are mentioned.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
The document presents a mathematical programming approach for selecting important variables in cluster analysis. It formulates a nonlinear binary model to minimize the distance between observations within clusters, using indicator variables to select important variables. The model is applied to a sample dataset of 30 observations across 5 variables, correctly identifying variables 3, 4 and 5 as most important for clustering the observations into two groups. The results are compared to an existing variable selection heuristic, with the mathematical programming approach achieving a 100% correct classification versus 97% for the other method.
This document presents a feature clustering algorithm to reduce the dimensionality of feature vectors for text classification. The algorithm groups words in documents into clusters based on similarity, with each cluster characterized by a membership function. Words not similar to existing clusters form new clusters. This avoids specifying features in advance and the need for trial and error. Experimental results showed the method can classify text faster and with better extracted features than other methods.
This document summarizes Chapter 10 of the book "Data Mining: Concepts and Techniques (3rd ed.)" which covers cluster analysis. The chapter introduces different types of clustering methods including partitioning methods like k-means and k-medoids, hierarchical methods, density-based methods, and grid-based methods. It discusses how to evaluate the quality of clustering results and highlights considerations for cluster analysis such as similarity measures, clustering space, and challenges like scalability and high dimensionality.
The document discusses different clustering techniques used for grouping large amounts of data. It covers partitioning methods like k-means and k-medoids that organize data into exclusive groups. It also describes hierarchical methods like agglomerative and divisive clustering that arrange data into nested groups or trees. Additionally, it mentions density-based and grid-based clustering and provides algorithms for different clustering approaches.
This document discusses hierarchical clustering and similarity measures for document clustering. It summarizes that hierarchical clustering creates a hierarchical decomposition of data objects through either agglomerative or divisive approaches. The success of clustering depends on the similarity measure used, with traditional measures using a single viewpoint, while multiviewpoint measures use different viewpoints to increase accuracy. The paper then focuses on applying a multiviewpoint similarity measure to hierarchical clustering of documents.
An Empirical Study on Identification of Strokes and their Significance in Scr...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Cost Analysis of Small Scale Solar and Wind Energy SystemsIJMER
Abstract: The recent dramatic increase in the use of
renewable energy systems leading towards competitive
markets within the various individual renewable energy
systems. The aim of this paper is to prove the hypothesis
i.e. in next few years, when the cost of the solar PV
modules come down below 1$ per Watt, small wind
turbines become more costlier because of the cost of
structures required to support the wind generator. And
also if the threat from the market is not addressed both
technically and commercially, small wind turbine
manufactures will lose the competition to solar PV module
manufactures in the near visible future. Objective of this
paper is to do cost analysis on the industry data and prove
hypothesis and to arrive at the cutoff point, where after the
generating energy from the wind is not economically
feasible. With this cost analysis, author here by alarm for
the small scale wind turbine manufactures to take
necessary measures to survive the competitive markets of
small scale renewable energy systems.
Keywords: Renewable energy, Solar PV, Wind Turbine, Curve Fitting, Cost analysis.
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
The Effect of Design Parameters of an Integrated Linear Electromagnetic Moto...IJMER
This paper assess the influence of design parameters of ferromagnetic guide housing at the possess of pulling away the anchor from the holding device which is integrated in the design of the motor. The design of an integrated circuit and the equivalent magnetic circuit of the integrated LEMM on breakaway stage was built, mathematical models of system were laid out. An expression for its magnetic
induction, with which you can set the beginning of saturation of the shunt, defining moment of pulling
away anchor from the holding area. an expression is derived for its magnetic induction, with which you
can set the beginning of saturation of the shunt, define moment of anchor pulling away from the holding
area, the zone of permissible combinations of cross-sectional area of the upper magnetic shunt and
holding area, and the zone of change in the magnetic induction in the yoke at the pulling away moment of
the motor anchor
Static Analysis of a Pyro Turbine by using CFDIJMER
This paper aims to develop a standard design procedure for pyro turbine that can be manufactured locally in developing countries with very low head, steady power (200W to 1 kW with no discharge regulation), low cost and isolated network operation. The present research work has been carried out to modify the original blade material i.e. AK Stainless Steel 340 with different blade material ASME Stainless Steel SA516 Gr. 70 to withstand turbulence at the site which significantly affected the turbine operation. For this, a solid 3D model of turbine is generated through Catia V5. Static analysis by using CFD for original blade material is done Further static analysis by using CFD is
done to the modified blade material for turbulence which shows that turbulence was successfully withstanded and had withstanded the high Pressure and Von-Mises Stress as well as minimum
.deformation The results obtained by comparing original and modified blade materials are within the
limits. The design is safe.
Web search engines help users find useful information on the WWW. However, when the same
query is submitted by different users, typical search engines return the same result regardless of who
submitted the query. Generally, each user has different information needs for his/her query. Therefore,
the search results should be adapted to users with different information needs. So, there is need of
several approaches to adapting search results according to each user’s need for relevant information
without any user effort. Such search systems that adapt to each user’s preferences can be achieved by
constructing user profiles based on modified collaborative filtering with detailed analysis of user’s
browsing history.
There are three possible types of web search system which can provide personalized
information: (1) systems using relevance feedback, (2) systems in which users register their interest, and
(3) systems that recommend information based on user’s history. In first technique, users have to provide
feedback on relevant or irrelevant judgments which is time consuming and the second one needs
registration of users with their static interests which need extra effort from user. So, the third technique
is best in which users don’t have to give explicit rating; relevancy automatically tracked by user
behavior with search results and history of data usage. It doesn’t require registration of interests; it
captures changing interests of user dynamically by itself. The result section shows that user’s browsing
history allows each user to perform more fine-grained search by capturing changes of each user’s
preferences without any user effort. Users need less time to find the relevant snippet in personalized
search results compared to original results
Analysis of Conditions in Boundary Lubrications Using Bearing MaterialsIJMER
In order to clearly establish the tribological potential of these alloys as bearing materials, the tribological parameters of the RAR Zn-Al alloys are compared to parameters of the CuPb15Sn8 lead-tin bronze, as a widely applied conventional bearing material. Existing Bearing of connecting rod is manufactured by using non ferrous materials like Gunmetal, Phosphor Bronze etc.. This paper describes the tribological behavior analysis for the conventional materials i.e. Brass and Gunmetal as well as New non metallic material Cast Nylon. Friction and Wear are the most important parameters to decide the
performance of any bearing. In this paper attempt is made to check major tribological parameters for three material and try to suggest better new material compared to conventional existing material. It could help us to minimize the problem of handling materials like Lead , Tin, Zinc etc.After Test on wear machine. Our experimental results are accessing efficient processing in bearing conditions in semantic data representation of extracted related data materials
The usual star, left-star, right-star, plus order, minus order and Lowner ordering have been
generalized to bimatrices. Also it is shown that all these orderings are partial orderings in bimatrices.
The relationship between star partial order and minus partial order of bimatrices and their squares are
examined.
The document summarizes a study that assessed the vulnerability of aquifers in the Imo River Basin in southeastern Nigeria to pollution. Eight locations were investigated to determine parameters like depth to water table, recharge rate, aquifer and soil properties, topography, and hydraulic conductivity. These parameters were used in the DRASTIC model to develop a vulnerability map. The map showed that areas within the Benin Formation generally have moderate vulnerability due to fine to coarse grained sandy overburden. Higher vulnerabilities were found near Aba, while lower vulnerabilities occurred around Obibiezena and Naze. The study demonstrated the usefulness of the DRASTIC model for assessing vulnerability of aquifer systems.
This document summarizes a research paper on clustering algorithms in data mining. It begins by defining clustering as an unsupervised learning technique that organizes unlabeled data into groups of similar objects. The document then reviews different types of clustering algorithms and methods for evaluating clustering results. Key steps in clustering include feature selection, algorithm selection, and cluster validation to assess how well the derived groups represent the underlying data structure. A variety of clustering algorithms exist and must be chosen based on the problem characteristics.
Secure and Efficient Hierarchical Data Aggregation in Wireless Sensor NetworksIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Influence of chemical admixtures on density and slump loss of concreteIJMER
This document summarizes a study on the influence of chemical admixtures on the density and slump loss of concrete. [1] Plasticizers and super plasticizers were found to improve workability at a constant water-cement ratio and increase density at a reduced ratio. [2] However, slump loss was also observed to increase with the use of admixtures and at higher dosage levels. [3] The study concluded that admixtures can effectively reduce the water-cement ratio needed for a given slump and increase the density and strength of concrete.
Radiation and Mass Transfer Effects on MHD Natural Convection Flow over an In...IJMER
A numerical solution for the unsteady, natural convective flow of heat and mass transfer along an inclined plate is presented. The dimensionless unsteady, coupled, and non-linear partial differential conservation equations for the boundary layer regime are solved by an efficient, accurate and unconditionally stable finite difference scheme of the Crank-Nicolson type. The velocity, temperature, and concentration fields have been studied for the effect of Magnetic parameter, buoyancy ratio parameter, Prandtl number, radiation parameter and Schmidt number. The local skin-friction, Nusselt number and Sherwood number are also presented and analyzed graphically.
This document provides information about developing apps for BlackBerry 10 devices. It describes the BlackBerry 10 devices, development frameworks including Cascades, HTML5, Android apps, Adobe Air, and best practices. It also outlines the process for getting an app certified as "Built for BlackBerry" which provides benefits for marketing and distribution through BlackBerry World.
Analysis of Machining Characteristics of Cryogenically Treated Die Steels Usi...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Query Answering Approach Based on Document SummarizationIJMER
The growing of online information obliged the availability of a thorough research in the
domain of automatic text summarization within the Natural Language Processing (NLP)
community.The aim of this paper is to propose a novel approach for a language independent automatic
summarization approach that combines three main approaches. The Rhetorical Structure Theory
(RST), the query processing approach, and the Network Representationapproach (NRA). RST, as a
theory of major aspect for the structure of natural text, is used to extract the semantic relation behind
the text.Query processing approachclassifies the question type and finds the answer in a way that suits
the user’s needs. The NRA is used to create a graph representing the extracted semantic relation. The
output is an answer, which not only responses to the question, but also gives the user an opportunity to
find additional information that is related to the question.We implemented the proposed approach. As a
case study, the implemented approachis applied on Arabic text in the agriculture field. The
implemented approach succeeded in summarizing extension documents according to user's query. The
approach results have been evaluated using Recall, Precision and F-score measures.
Bill Gates grew up in Seattle and showed an early interest in computers. He attended Harvard but dropped out to focus on Microsoft, which he co-founded with Paul Allen in 1975. Guided by a vision of computers in every home and office, Microsoft became hugely successful with its MS-DOS and Windows operating systems. Gates stepped down as CEO in 2000 but remains chairman of Microsoft. He and his wife Melinda have also dedicated themselves to philanthropic causes through their foundation.
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Step by step operations by which we make a group of objects in which attributes
of all the objects are nearly similar, known as clustering. So, a cluster is a collection of
objects that acquire nearly same attribute values. The property of an object in a cluster is
similar to other objects in same cluster but different with objects of other clusters.
Clustering is used in wide range of applications like pattern recognition, image processing,
data analysis, machine learning etc. Nowadays, more attention has been put on categorical
data rather than numerical data. Where, the range of numerical attributes organizes in a
class like small, medium, high, and so on. There is wide range of algorithm that used to
make clusters of given categorical data. Our approach is to enhance the working on well-
known clustering algorithm k-modes to improve accuracy of algorithm. We proposed a new
approach named “High Accuracy Clustering Algorithm for Categorical datasets”.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
This document discusses using particle swarm optimization to improve the k-prototype clustering algorithm. The k-prototype algorithm clusters data with both numeric and categorical attributes but can get stuck in local optima. The proposed method uses particle swarm optimization, a global optimization technique, to guide the k-prototype algorithm towards better clusterings. Particle swarm optimization models potential solutions as particles that explore the search space. It is integrated with k-prototype clustering to avoid locally optimal solutions and produce better clusterings. The method is tested on standard benchmark datasets and shown to outperform traditional k-modes and k-prototype clustering algorithms.
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
Unsupervised learning Algorithms and Assumptionsrefedey275
Topics :
Introduction to unsupervised learning
Unsupervised learning Algorithms and Assumptions
K-Means algorithm – introduction
Implementation of K-means algorithm
Hierarchical Clustering – need and importance of hierarchical clustering
Agglomerative Hierarchical Clustering
Working of dendrogram
Steps for implementation of AHC using Python
Gaussian Mixture Models – Introduction, importance and need of the model
Normal , Gaussian distribution
Implementation of Gaussian mixture model
Understand the different distance metrics used in clustering
Euclidean, Manhattan, Cosine, Mahala Nobis
Features of a Cluster – Labels, Centroids, Inertia, Eigen vectors and Eigen values
Principal component analysis
Supervised learning (classification)
Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Types of Hierarchical Clustering
There are mainly two types of hierarchical clustering:
Agglomerative hierarchical clustering
Divisive Hierarchical clustering
A distribution in statistics is a function that shows the possible values for a variable and how often they occur.
In probability theory and statistics, the Normal Distribution, also called the Gaussian Distribution.
is the most significant continuous probability distribution.
Sometimes it is also called a bell curve.
k-Means is a rather simple but well known algorithms for grouping objects, clustering. Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these point serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same. In this project we will discuss about K-means clustering algorithm, implementation and its application to the problem of unsupervised learning
Clustering is an unsupervised machine learning technique used to group unlabeled data points. There are two main approaches: hierarchical clustering and partitioning clustering. Partitioning clustering algorithms like k-means and k-medoids attempt to partition data into k clusters by optimizing a criterion function. Hierarchical clustering creates nested clusters by merging or splitting clusters. Examples of hierarchical algorithms include agglomerative clustering, which builds clusters from bottom-up, and divisive clustering, which separates clusters from top-down. Clustering can group both numerical and categorical data.
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
This document discusses various clustering techniques used in data mining. It begins by defining clustering as an unsupervised learning technique that groups similar objects together. It then discusses advantages of clustering such as quality improvement and reuse opportunities. Several clustering methods are described such as K-means clustering, which aims to partition observations into k clusters where each observation belongs to the cluster with the nearest mean. The document concludes by discussing advantages of K-means clustering such as its linear time complexity and its use for spherical cluster shapes.
The document discusses different clustering algorithms, including k-means and EM clustering. K-means aims to partition items into k clusters such that each item belongs to the cluster with the nearest mean. It works iteratively to assign items to centroids and recompute centroids until the clusters no longer change. EM clustering generalizes k-means by computing probabilities of cluster membership based on probability distributions, with the goal of maximizing the overall probability of items given the clusters. Both algorithms are used to group similar items in applications like market segmentation.
Image Segmentation Using Two Weighted Variable Fuzzy K MeansEditor IJCATR
Image segmentation is the first step in image analysis and pattern recognition. Image segmentation is the process of dividing an image into different regions such that each region is homogeneous. The accurate and effective algorithm for segmenting image is very useful in many fields, especially in medical image. This paper presents a new approach for image segmentation by applying k-means algorithm with two level variable weighting. In image segmentation, clustering algorithms are very popular as they are intuitive and are also easy to implement. The K-means and Fuzzy k-means clustering algorithm is one of the most widely used algorithms in the literature, and many authors successfully compare their new proposal with the results achieved by the k-Means and Fuzzy k-Means. This paper proposes a new clustering algorithm called TW-fuzzy k-means, an automated two-level variable weighting clustering algorithm for segmenting object. In this algorithm, a variable weight is also assigned to each variable on the current partition of data. This could be applied on general images and/or specific images (i.e., medical and microscopic images). The proposed TW-Fuzzy k-means algorithm in terms of providing a better segmentation performance for various type of images. Based on the results obtained, the proposed algorithm gives better visual quality as compared to several other clustering methods.
This document summarizes an academic paper that proposes an innovative modified K-Mode clustering algorithm for categorical data. The paper begins by introducing clustering algorithms and discusses existing algorithms like K-Means, K-Medoids, and K-Mode that are used for numerical and categorical data. It then describes the limitations of traditional K-Mode clustering and proposes a modified K-Mode algorithm that aims to provide better initial cluster means/modes to result in clusters with better accuracy. The paper experimentally evaluates the traditional and modified K-Mode algorithms on large datasets to compare their performance for varying data values.
A New Framework for Kmeans Algorithm by Combining the Dispersions of ClustersIJMTST Journal
Kmeans algorithm performs clustering by using a partitioning method which partition data into different
clusters in such a way that similar object are present in one cluster that is within cluster compactness and
dissimilar objects are present in different clusters that is between cluster separations. Many of the Kmeans
type clustering algorithms considered only similarities among objects but do not consider dissimilarities. In
existing system extended version of Kmeans algorithm is described. Both cluster compactness within cluster
and cluster separations between clusters is considered in new clustering algorithm. Existing work initially
developed a group of objective function for clustering and then rules for updating the algorithm are
determined. The new algorithm with new objective function to solve the problem of cluster compactness
within cluster and cluster separations between clusters has been proposed. Proposed FCS algorithm works
simultaneously on both i.e. similarities among objects and dissimilarities among objects. It will give a better
performance over existing kmeans.
The document provides a literature review of different clustering techniques. It begins by defining clustering and its applications. It then categorizes and describes several clustering methods including hierarchical (BIRCH, CURE, CHAMELEON), partitioning (k-means, k-medoids), density-based (DBSCAN, OPTICS, DENCLUE), grid-based (CLIQUE, STING, MAFIA), and model-based (RBMN, SOM) methods. For each method, it discusses the algorithm, advantages, disadvantages and time complexity. The document aims to provide an overview of various clustering techniques for classification and comparison.
Similar to MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data (20)
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
- Translucent concrete is a concrete based material with light-transferring properties,
obtained due to embedded light optical elements like Optical fibers used in concrete. Light is conducted
through the concrete from one end to the other. This results into a certain light pattern on the other
surface, depending on the fiber structure. Optical fibers transmit light so effectively that there is
virtually no loss of light conducted through the fibers. This paper deals with the modeling of such
translucent or transparent concrete blocks and panel and their usage and also the advantages it brings
in the field. The main purpose is to use sunlight as a light source to reduce the power consumption of
illumination and to use the optical fiber to sense the stress of structures and also use this concrete as an
architectural purpose of the building
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
A low cost automation system for removal of lint from cottonseed is to be designed and
developed. The setup consists of stainless steel drum with stirrer in which cottonseeds having lint is mixed
with concentrated sulphuric acid. So lint will get burn. This lint free cottonseed treated with lime water to
neutralize acidic nature. After water washing this cottonseeds are used for agriculter purpose
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
The incorporation of natural fibres such as munja fiber composites has gained
increasing applications both in many areas of Engineering and Technology. The aim of this study is to
evaluate mechanical properties such as flexural and tensile properties of reinforced epoxy composites.
This is mainly due to their applicable benefits as they are light weight and offer low cost compared to
synthetic fibre composites. Munja fibres recently have been a substitute material in many weight-critical
applications in areas such as aerospace, automotive and other high demanding industrial sectors. In
this study, natural munja fibre composites and munja/fibreglass hybrid composites were fabricated by a
combination of hand lay-up and cold-press methods. A new variety in munja fibre is the present work
the main aim of the work is to extract the neat fibre and is characterized for its flexural characteristics.
The composites are fabricated by reinforcing untreated and treated fibre and are tested for their
mechanical, properties strictly as per ASTM procedures.
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
Hybrid engine is a combination of Stirling engine, IC engine and Electric motor. All these 3 are
connected together to a single shaft. The power source of the Stirling engine will be a Solar Panel. The aim of
this is to run the automobile using a Hybrid engine
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
This document summarizes research on the fabrication and characterization of bio-composite materials using sunnhemp fibre. The document discusses how sunnhemp fibre was used to reinforce an epoxy matrix through hand lay-up methods. Various mechanical properties of the bio-composites were tested, including tensile, flexural, and impact properties. The results of the mechanical tests on the bio-composite specimens are presented. Potential applications of the sunnhemp fibre bio-composites are also suggested, such as in fall ceilings, partitions, packaging, automotive interiors, and toys.
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
The Greenstone belts of Karnataka are enriched in BIFs in Dharwar craton, where Iron
formations are confined to the basin shelf, clearly separated from the deeper-water iron formation that
accumulated at the basin margin and flanking the marine basin. Geochemical data procured in terms of
major, trace and REE are plotted in various diagrams to interpret the genesis of BIFs. Al2O3, Fe2O3 (T),
TiO2, CaO, and SiO2 abundances and ratios show a wide variation. Ni, Co, Zr, Sc, V, Rb, Sr, U, Th,
ΣREE, La, Ce and Eu anomalies and their binary relationships indicate that wherever the terrigenous
component has increased, the concentration of elements of felsic such as Zr and Hf has gone up. Elevated
concentrations of Ni, Co and Sc are contributed by chlorite and other components characteristic of basic
volcanic debris. The data suggest that these formations were generated by chemical and clastic
sedimentary processes on a shallow shelf. During transgression, chemical precipitation took place at the
sediment-water interface, whereas at the time of regression. Iron ore formed with sedimentary structures
and textures in Kammatturu area, in a setting where the water column was oxygenated.
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
In this paper, the mechanical characteristics of C45 medium carbon steel are investigated
under various working conditions. The main characteristic to be studied on this paper is impact toughness
of the material with different configurations and the experiment were carried out on charpy impact testing
equipment. This study reveals the ability of the material to absorb energy up to failure for various
specimen configurations under different heat treated conditions and the corresponding results were
compared with the analysis outcome
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
Robot guns are being increasingly employed in automotive manufacturing to replace
risky jobs and also to increase productivity. Using a single robot for a single operation proves to be
expensive. Hence for cost optimization, multiple guns are mounted on a single robot and multiple
operations are performed. Robot Gun structure is an efficient way in which multiple welds can be done
simultaneously. However mounting several weld guns on a single structure induces a variety of
dynamic loads, especially during movement of the robot arm as it maneuvers to reach the weld
locations. The primary idea employed in this paper, is to model those dynamic loads as equivalent G
force loads in FEA. This approach will be on the conservative side, and will be saving time and
subsequently cost efficient. The approach of the paper is towards creating a standard operating
procedure when it comes to analysis of such structures, with emphasis on deploying various technical
aspects of FEA such as Non Linear Geometry, Multipoint Constraint Contact Algorithm, Multizone
meshing .
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
This paper aims to do modelling, simulation and performing the static analysis of a go
kart chassis consisting of Circular beams. Modelling, simulations and analysis are performed using 3-D
modelling software i.e. Solid Works and ANSYS according to the rulebook provided by Indian Society of
New Era Engineers (ISNEE) for National Go Kart Championship (NGKC-14).The maximum deflection is
determined by performing static analysis. Computed results are then compared to analytical calculation,
where it is found that the location of maximum deflection agrees well with theoretical approximation but
varies on magnitude aspect.
In récent year various vehicle introduced in market but due to limitation in
carbon émission and BS Séries limitd speed availability vehicle in the market and causing of
environnent pollution over few year There is need to decrease dependancy on fuel vehicle.
bicycle is to be modified for optional in the future To implement new technique using change in
pedal assembly and variable speed gearbox such as planetary gear optimise speed of vehicle
with variable speed ratio.To increase the efficiency of bicycle for confortable drive and to
reduce torque appli éd on bicycle. we introduced epicyclic gear box in which transmission done
throgh Chain Drive (i.e. Sprocket )to rear wheel with help of Epicyclical gear Box to give
number of différent Speed during driving.To reduce torque requirent in the cycle with change in
the pedal mechanism
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
This document discusses integrating the Spring, Struts, and Hibernate frameworks to develop enterprise applications. It provides an overview of each framework and their features. The Spring Framework is a lightweight, modular framework that allows for inversion of control and aspect-oriented programming. It can be used to develop any or all tiers of an application. The document proposes an architecture for an e-commerce website that integrates these three frameworks, with Spring handling the business layer, Struts the presentation layer, and Hibernate the data access layer. This modular approach allows for clear separation of concerns and reduces complexity in application development.
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
Microcontroller based Automatic Sprinkler System is a new concept of using
intelligence power of embedded technology in the sprinkler irrigation work. Designed system replaces
the conventional manual work involved in sprinkler irrigation to automatic process. Using this system a
farmer is protected against adverse inhuman weather conditions, tedious work of changing over of
sprinkler water pipe lines & risk of accident due to high pressure in the water pipe line. Overall
sprinkler irrigation work is transformed in to a comfortableautomatic work. This system provides
flexibility & accuracy in respect of time set for the operation of a sprinkler water pipe lines. In present
work the author has designed and developed an automatic sprinkler irrigation system which is
controlled and monitored by a microcontroller interfaced with solenoid valves.
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
This document introduces and studies the concept of δˆ s-locally closed sets in ideal topological spaces. Some key points:
- A subset A is δˆ s-locally closed if A can be written as the intersection of a δˆ s-open set and a δˆ s-closed set.
- Various properties of δˆ s-locally closed sets are introduced and characterized, including relationships to other concepts like generalized locally closed sets.
- It is shown that a subset A is δˆ s-locally closed if and only if A can be written as the intersection of a δˆ s-open set and the δˆ s-closure of A.
- Theore
Natural Language Ambiguity and its Effect on Machine LearningIJMER
This document discusses natural language ambiguity and its effect on machine learning. It begins by introducing different types of ambiguity that exist in natural languages, including lexical, syntactic, semantic, discourse, and pragmatic ambiguities. It then examines how these ambiguities present challenges for computational linguistics and machine translation systems. Specifically, it notes that ambiguity is a major problem for computers in processing human language as they lack the world knowledge and context that humans use to resolve ambiguities. The document concludes by outlining the typical process of machine translation and how ambiguities can interfere with tasks like analysis, transfer, and generation of text in the target language.
Today in era of software industry there is no perfect software framework available for
analysis and software development. Currently there are enormous number of software development
process exists which can be implemented to stabilize the process of developing a software system. But no
perfect system is recognized till yet which can help software developers for opting of best software
development process. This paper present the framework of skillful system combined with Likert scale. With
the help of Likert scale we define a rule based model and delegate some mass score to every process and
develop one tool name as MuxSet which will help the software developers to select an appropriate
development process that may enhance the probability of system success.
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
The present study investigates the creep in a thick-walled composite cylinders made
up of aluminum/aluminum alloy matrix and reinforced with silicon carbide particles. The distribution
of SiCp is assumed to be either uniform or decreasing linearly from the inner to the outer radius of
the cylinder. The creep behavior of the cylinder has been described by threshold stress based creep
law with a stress exponent of 5. The composite cylinders are subjected to internal pressure which is
applied gradually and steady state condition of stress is assumed. The creep parameters required to
be used in creep law, are extracted by conducting regression analysis on the available experimental
results. The mathematical models have been developed to describe steady state creep in the composite
cylinder by using von-Mises criterion. Regression analysis is used to obtain the creep parameters
required in the study. The basic equilibrium equation of the cylinder and other constitutive equations
have been solved to obtain creep stresses in the cylinder. The effect of varying particle size, particle
content and temperature on the stresses in the composite cylinder has been analyzed. The study
revealed that the stress distributions in the cylinder do not vary significantly for various combinations
of particle size, particle content and operating temperature except for slight variation observed for
varying particle content. Functionally Graded Materials (FGMs) emerged and led to the development
of superior heat resistant materials.
Energy Audit is the systematic process for finding out the energy conservation
opportunities in industrial processes. The project carried out studies on various energy conservation
measures application in areas like lighting, motors, compressors, transformer, ventilation system etc.
In this investigation, studied the technical aspects of the various measures along with its cost benefit
analysis.
Investigation found that major areas of energy conservation are-
1. Energy efficient lighting schemes.
2. Use of electronic ballast instead of copper ballast.
3. Use of wind ventilators for ventilation.
4. Use of VFD for compressor.
5. Transparent roofing sheets to reduce energy consumption.
So Energy Audit is the only perfect & analyzed way of meeting the Industrial Energy Conservation.
An Implementation of I2C Slave Interface using Verilog HDLIJMER
This document describes the implementation of an I2C slave interface using Verilog HDL. It introduces the I2C protocol which uses only two bidirectional lines (SDA and SCL) for communication. The document discusses the I2C protocol specifications including start/stop conditions, addressing, read/write operations, and acknowledgements. It then provides details on designing an I2C slave module in Verilog that responds to commands from an I2C master and allows synchronization through clock stretching. The module is simulated in ModelSim and synthesized in Xilinx. Simulation waveforms demonstrate successful read and write operations to the slave device.
Discrete Model of Two Predators competing for One PreyIJMER
This paper investigates the dynamical behavior of a discrete model of one prey two
predator systems. The equilibrium points and their stability are analyzed. Time series plots are obtained
for different sets of parameter values. Also bifurcation diagrams are plotted to show dynamical behavior
of the system in selected range of growth parameter
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...IJMER
Pipelines are the least expensive and most effective method for the oil transportation.
Due to high viscosity of crude oil, the pressure drop and pumping power requirements are very high.
So it is necessary to bring down the viscosity of crude oil. Heated pipelines are used reduce the oil
viscosity by increasing the oil temperature. Electrical heating and direct flame heating are the common
method used for heating the oil pipeline. In this work, a new application of Parabolic Trough Collector
in the field of oil pipeline transport is introduced for reducing pressure drop in oil pipelines. Oil
pipeline is heated by applying concentrated solar radiation on the pipe surface using a Parabolic
Trough Collector in which the oil pipeline acts as the absorber pipe. 3-D steady state analysis is
carried out on a heated oil pipeline using commercial CFD software package ANSYS Fluent 14.5. In
this work an effort is made to investigate the effect of concentrated solar radiation for reducing
pressure drop in the oil pipeline. The results from the numerical analysis shows that the pressure drop
in oil pipeline is get reduced by heating the pipe line using concentrated solar radiation. From this
work, the application of PTC in oil pipeline transportation is justified.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
1. International
OPEN ACCESS Journal
Of Modern Engineering Research (IJMER)
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 55 |
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type
Data
N. Aparna1
, M. Kalaiarasu2
1, 2
(Department of Computer Science, Department of Information Technology, Sri Ramakrishna Engineering
College, Coimbatore)
I. Introduction
Clustering is a fundamental technique of unsupervised learning in machine learning and statistics. It is
generally used to find groups of similar items in a set of unlabeled data. The aim of clustering is to divide a set
of data objects into clusters so that those data objects that belongs to the same cluster are more similar to each
other than those in other clusters [1-4]. In real world, datasets usually contain both numeric and categorical
variables [5,6]. However, most existing clustering algorithms assume all variables are either numeric or
categorical , examples of which include the k-means [7], k-modes [8], fuzzy k-modes [9] algorithms. Here, the
data is observed from multiple outlooks and in multiple types of dimensions. For example, in a student data set,
variables can be divided into personal information view showing the information about the student’s personal
information, the academic view describing the student’s academic performance and the extra-curricular view
which gives the extra-curricular activities and achievements made by the student.
Traditional methods take multiple views as a set of flat variables and do not take into account the
differences among various views [10], [11], [12]. In the case of multiview clustering, it takes the information
from multiple views and also considers the variations among different views which produces a more precise and
efficient partitioning of data.
In this paper, a new algorithm Multi-viewpoint K-prototypes (MK-Prototypes) for clustering mixed
type data is proposed. It is an enhancement to the usual k-prototypes algorithm. In order to differentiate the
effects of different views and different variables in clustering, the view weights and individual variables are
applied to the distance function. Here while computing the view weights, the complete set of variables are
considered and while calculating the weights of variables in a view, only a part of the data that includes the
variables in the view is considered. Thus, the view weights show the significance of views in the complete data
and the variables weights in a view shows the significance of variables in a view alone.
II. Related Works
Till date, there exist a number of algorithms and methods to directly deal with mixed type data. In [13],
Cen Li and Gautam Biswas proposed an algorithm, Similarity-based agglomerative clustering(SBAC) that
works well for data with mixed attributes. It adopts a similarity measure proposed by Goodall [14] for biological
taxonomy. In this method, while computing the similarity, higher weight is assigned to infrequent attribute value
matches. It does not make any suppositions on the underlying features of the attribute values. An agglomerative
algorithm is used to generate a dendrogram and a simple distinctness heuristic is used to extract a partition of the
data.Hsu and Chen proposed CAVE [15], a clustering algorithm based on the Variance and Entropy for
Abstract: Clustering mixed type data is one of the major research topics in the area of data mining. In
this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution
centroid is used to represent the prototype of categorical variables in a cluster which is then combined
with the mean to represent the prototype of clusters with mixed type variables. In the method, data is
observed from different views and the variables are grouped into different views. Those instances that
can be viewed differently from different viewpoints can be defined as multiview data. During clustering
process the differences among views are ignored in usual cases. Here, both views and variables weights
are computed simultaneously. The view weight is used to determine the closeness or density of view and
variable weight is used to identify the significance of each variable. With the intention of determining
the cluster of objects both these weights are used in the distance function. In the proposed method,
enhancement to the k-prototypes is done so that it automatically computes both view and variable
weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering
algorithms.
Keywords: clustering, mixed data, multiview, variable weighting, view weighting, k-prototypes.
2. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 56 |
clustering mixed data. It builds a distance hierarchy for every categorical attributes which needs domain
expertise.Hsu et al.[16] proposed an extension to the self-organizing map to analyze mixed data where the
distance hierarchy is automatically constructed by using the values of class attributes.
In [17] Chatziz propsed KL-FCM-GM algorithm in which data derived from the clusters are in the
Guassian form and is designed for the Guass-Multinomial distributed data.
Huang presented a k-prototypes algorithm [18] where k-means is integrated with k-modes to partition
mixed data. Bezdek et al. considered the fuzzy nature of the objects in his work the fuzzy k-prototypes[19] and
Zheng et al. proposed [20] an evolutionary type k-prototypes algorithm by introducing an evolutionary
algorithm framework.
III. Proposed System
The motivation for the proposed system is on one hand to provide a better representation for the
categorical variable part in a mixed data since the numerical variables can be well represented using the mean
concept itself. On the other hand it considers the importance of view and variables weights in the process of
clustering. The concept of distribution centroid represents the cluster centroid for the categorical variable part.
Huang’s strategy of evaluation is used for the computation of both view weights and variable weights.
A. The distribution centroid
The idea of distribution centroid for a better representation of categorical variables is stimulated from
fuzzy centroid proposed by Kim et al.[ 21]. It makes use of a fuzzy scenario to represent the cluster centers for
the categorical variable part.
For Dom(Vj)={{𝑣𝑖
1
, 𝑣𝑖
2
, 𝑣𝑖
3
, … 𝑣𝑖
𝑡
}, the distribution centroid of a cluster o, denoted as 𝐶𝑜
′
, is represented as follows
𝐶𝑜
′
= 𝑐 𝑜1
′
, 𝑐 𝑜2
′
, … , 𝑐 𝑜𝑗
′
, … 𝑐 𝑜𝑚
′
(1)
where
𝑐 𝑜𝑗
′
= 𝑏𝑗
1
, 𝑤 𝑜𝑗
1
, 𝑏𝑗
2
, 𝑤 𝑜𝑗
2
,… 𝑏𝑗
𝑘
, 𝑤 𝑜𝑗
𝑘
, … 𝑏𝑗
𝑡
, 𝑤 𝑜𝑗
𝑡
(2)
.
In the above equation
𝑤 𝑜𝑗
𝑘
= 𝜇(
𝑛
𝑖=1
𝑥𝑖𝑗 ) (3)
where
𝜇(𝑥𝑖𝑗 )=
𝑢 𝑖𝑜
𝑢 𝑖𝑜
𝑛
𝑖=1
if 𝑥𝑖𝑗 = 𝑏𝑗
𝑘
𝑜 if 𝑥𝑖𝑗 ≠ 𝑏𝑗
𝑘
(4)
Here, 𝑢𝑖𝑜 is assigned the value 1, if the data object xi belongs to cluster o and as 0, if the data object xi do not
belong to cluster o
From the above mentioned equations it is clear that the computation of distribution centroid considers the
number of times each categorical value repeat in a cluster. Thus to denote the center of a cluster it takes into
account the distribution features of categorical variables
B. Weight calculation using Huang’s approach
Weight of a variable identifies the effect of that variable in clustering process. In 2005, Huang et al.
proposed an approach to calculate the weight of variable [22]. According to their method, the weight is
computed by minimizing the value of objective function.
The standard for assigning weight of variable is to allocate a larger value to a variable that has a
smaller sum of the within cluster distances (WCD), and vice versa. This principle is given by
𝑤𝑗 ∝
1
𝐷𝑗
(5)
where 𝑤𝑗 is the significance of the variable j, ∝ is the mathematical symbol denoting direct proportionality, and
𝐷 𝑗 is the sum of the within cluster distances for this variable.
3. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 57 |
C. Multiview concept
FIGURE 1 : Multiview concept
In 2013, Chen Et Al [23] proposed Tw-K-Means where the concept of multiview data was introduced.
The above figure illustrates the multiview concept. During the process of clustering, the differences among
different views are not considered. In the process of multiview clustering, in addition to variable weights, the
variables are grouped according to their characteristic properties. Each group is termed as a view and a weight is
assigned to each view. The view weight is assigned according to Huang’s approach.
D. The proposed algorithm
The proposed algorithm, MK-prototypes put together the concepts in section 3.1, section 3.2, section
3.3. The figure 2 describes the steps involved in the algorithm:
Steps in the proposed algorithm:
1. Compute the distribution centroid to represent the categorical variable centroid
2. Compute the mean for the numerical variables
3. Integrate the distribution centroid and mean to represent the prototype for the mixed data
4. Compute the view weights and variable weights.
5. Measure the similarity between the data objects and the prototypes
6. Assign the data object to that prototype to which the considered data object is the closest
7. Repeat steps 1-6 until an effective clustering result is obtained.
E. The optimization model
The clustering process to partition the dataset X into k clusters that considers both view weights and
variable weights is represented according to the framework of [23] as a minimization of the following objective
function.
𝑃 𝑈, 𝑍, 𝑅, 𝑉 = 𝑢𝑖,𝑜 𝑣𝑡 𝑟𝑠 𝑑(𝑥𝑖,𝑠
𝑠∈𝐺𝑡
𝑄
𝑡=1
,
𝑛
𝑖=1
𝑧 𝑜,𝑠
𝑘
𝑜=1
) (6)
subject to 𝑢𝑖,𝑜 = 1𝑘
𝑜=1 , 𝑢𝑖,𝑙 ∈ 0,1 , 1 ≤ 𝑖 ≤ 𝑛
𝑣𝑡 = 1, 0 ≤ 𝑣𝑡
𝑄
𝑖=1
≤ 1, 0 ≤ 𝑟𝑗 ≤ 1, 1 ≤ 𝑡 ≤ 𝑄, 𝑟𝑗 = 1
𝑗∈𝐺𝑡
where U is an n x k partition matrix whose elements 𝑢𝑖,𝑜 are binary where 𝑢𝑖,𝑜 = 1 indicates that object i is
allocated to cluster o.𝑍 = {𝑍1, 𝑍2, … . 𝑍 𝑘 } is a set of k vectors on behalf of the centers of the k clusters.𝑉 =
{𝑉1, 𝑉2, … . 𝑉𝑄 } are Q weights for Q views. 𝑅 = {𝑟1, 𝑟2…𝑟𝑠} are s weights for s variables.𝑑(𝑥𝑖,𝑠, 𝑧 𝑜,𝑠) is a distance
or dissimilarity measure on the 𝑠 𝑡ℎ
variable between the 𝑖 𝑡ℎ
object and the center of the 𝑜 𝑡ℎ
cluster.
4. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 58 |
FIGURE 2. Flowchart for the proposed algorithm
In order to minimize the equation, the problem is divided into four sub-problems:
1. Sub-problem 1: Fix Z=Z^,R=R^ and V=V^ and solve the reduced problem P(U,Z^,R^,V^).
2. Sub-problem 2: Fix U=U^, R=R^ and V=V^ and solve the reduced problem P(U^,Z,R^,V^).
3. Sub-problem 3: Fix Z=Z^, U=U^ and V=V^ and solve the reduced problem P(U^,Z^,R,V^).
4. Sub-problem 4: Fix Z=Z^, R=R^ and U=U^ and solve the reduced problem P(U^,Z^,R^,V).
The sub-problem 1 is solved by:
𝑢𝑖,𝑜 = 1 (7)
if
𝑣𝑡 𝑟𝑠d 𝑥𝑖,𝑜 , 𝑧𝑖,0
𝑚
𝑠=1
≤ 𝑣𝑡 𝑟𝑠
𝑚
𝑠=1
d 𝑥𝑖,𝑜 , 𝑧𝑒,0 (8)
where 1≤ 𝑒 ≤ 𝑘
𝑢𝑖,𝑜 = 0 𝑤ℎ𝑒𝑟𝑒 𝑒 ≠ 𝑜
The sub-problem 2 is solved for the numeric variable by
5. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 59 |
𝑧 𝑜,𝑠 =
𝑢𝑖,𝑜 𝑥𝑖,𝑠
𝑛
𝑖=1
𝑢𝑖,𝑜
𝑛
𝑖=1
(9)
and for the categorical variables by 𝑧 𝑜,𝑠 = 𝑐′𝑖,𝑠 which is already defined.
𝑑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝑥𝑖,𝑠 − 𝑧 𝑜,𝑠 if the sth variable is a numeric variable .
𝑑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝜑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 if the sth variable is a categorical variable .
where 𝜑 𝑥𝑖,𝑠, 𝑧 𝑜,𝑠 = 𝛿 𝑥𝑖,𝑠, 𝑏𝑗
𝑘𝑡
𝑘=1 and 𝛿 𝑥𝑖,𝑠, 𝑏𝑗
𝑘
𝑖𝑠 0 𝑖𝑓 𝑥𝑖,𝑠 ≠ 𝑏𝑗
𝑘
and 𝑤 𝑜,𝑗
𝑘
if 𝑥𝑖,𝑠 = 𝑏𝑗
𝑘
.
The solution to the sub-problem 3 is as followed:
Let Z=Z^, U=U^ and V=V^ be fixed . Then the reduced problem P(U^,Z^,R,V^) is minimized if
𝑟𝑠 =
1
𝐷𝑠
𝐷ℎ
1
𝛾
ℎ𝜖𝐺𝑡
(10)
where
𝐷𝑠 = 𝑢′
𝑖,𝑜
𝑛
𝑖=1
𝑤′
𝑡 𝑑 𝑥𝑖,𝑠, 𝑧′
𝑜,𝑠
𝑘
𝑜=1
(11)
Sub-problem 4 is solved as follows
𝑤𝑡 =
1
𝐹𝑠
𝐹𝑡
1
𝜇ℎ
𝑡=1
(12)
where
𝐹𝑠 = 𝑢′
𝑖,𝑜 𝑟′
𝑠 𝑑 𝑥𝑖,𝑠, 𝑧′
𝑜,𝑠
𝑠𝜖 𝐺𝑡
𝑛
𝑖=1
𝑘
𝑜=1
(13)
Having presented the detailed computations required for calculating the important variables, the proposed
algorithm
MK-Prototypes can be described as given below:
1. Choose the number of iterations, number of clusters k, value of μ and γ, randomly choose k distinct data
objects and convert them into initial prototypes and initialize the view weights and variable weights.
2. Fix Z’, R’, V’ as 𝑍 𝑡
, 𝑅 𝑡
, 𝑉 𝑡
respectively and minimize the problem P(U, Z’, R’, V’) to obtain 𝑈 𝑡+1
.
3. Fix U’, R’, V’ as 𝑈 𝑡
, 𝑅 𝑡
, 𝑉 𝑡
respectively and minimize the problem P(U’, Z, R’, V’) to obtain 𝑍 𝑡+1
.
4. Fix U’, Z’, V’ as 𝑈 𝑡
, 𝑍 𝑡
, 𝑉 𝑡
respectively and minimize the problem P(U’, Z’, R, V’) to obtain 𝑅 𝑡+1
.
5. Fix U’, Z’, R’, V as 𝑈 𝑡
, 𝑍 𝑡
, 𝑅 𝑡
respectively and minimize the problem P(U’, Z’, R’, V) to obtain 𝑉 𝑡+1
.
6. If there is no improvement in P or if the maximum iterations is reached, then stop. Else increment t by 1 ,
decrement number of iterations by 1 and go to Step 2.
IV. Experiments on Performance Of Mk-Prototypes Algorithm
In order to measure the performance level of the proposed algorithm, it is used to cluster a real-world dataset
Heart (disease). The dataset is taken from UCI Machine Learning Repository.
The proposed algorithm is compared with k-prototypes and SBAC algorithm. They are well known for
clustering mixed type data. In this paper, the clustering accuracy is measured using one of the most commonly
used criteria. The clustering accuracy r is given by
𝑟 =
𝑎𝑖
𝑘
𝑖=1
𝑛
(14)
6. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 60 |
where 𝑎𝑖is the number of data objects that occur in both the ith cluster and its corresponding true class and n is
the number of data objects in a data set.
Higher the value of r , the higher the clustering accuracy . A perfect clustering gives a value of r=1.0.
A. Dataset description
The Heart disease data set is a mixed dataset. It contains 303 patient instances. The actual data set
contains 76 variables out of which 14 are considered usually. In the proposed algorithm, in order to define three
views 19 out of 76 variables are considered here. It consists of seven numeric variables and twelve categorical
variables.
These 19 variables can be naturally divided into 3 views.
1. Personal data view: It includes those variables which describes a patient’s personal data.
2. Historical data view: It includes those variables which describes a patient’s historical data like the habits.
3. Test output view: It includes all those variables which describes the results of various tests conducted for
the patient.
Here, 𝐺1, 𝐺2, 𝐺3 represents the three views personal, historical, test output respectively.
B. Results and analysis
Below are the graphical representations of the clustering results. Fig 3 shows the variation in variable
weights for varying μ values and fixed γ values. Fig 4 shows the variation in view weights for varying μ values
and fixed γ values.
From Table 1, it is observed that as μ increased, the variance of V decreased rapidly. This result can be
explained from equation (10) as μ increases, V becomes flatter. The graphical representation of the Table 1 has
been shown below.
Table 1: Variable weights vs γ value For fixed μ value
Table 2 shows that as γ increased, the variance of view weights decreased rapidly. This result can be explained
from equation (11) as γ increases, W becomes flatter. The graphical representation has been shown below.
Fig 3: Variable weights vs γ value for fixed μ value
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 5 10 15 20 25 30 35
Variableweights
γ
μ=1
μ=4
μ=12
μ
γ
1 4 12
10 0.01 0 0
15 0 0.01 0
20 0.7 0 0.05
25 0.03 0.4 0.1
30 0 0.1 0.02
35 0.02 0 0
7. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 61 |
Table 2: View weights vs μ value for fixed γ value
γ
μ
1 4 12
10 0.05 0.075 0.01
20 0.075 0.14 0.015
30 0.095 0.05 0.04
40 0.16 0.06 0.01
50 0.04 0.07 0.005
60 0.05 0.04 0.02
70 0.07 0.035 0.01
From above analysis, it can be summarized that the following method can be used to control two types of weight
distributions in MK-Prototypes algorithm by setting different values of γ and μ.
Figure 4: View weights vs μ value for fixed γ value
The experiments have been conducted for three different values of μ and γ for varying values of γ and μ
respectively.
1. Large μ makes more variables contribute to the clustering while small μ makes only important variables
contribute to the clustering.
2. Large γ makes more views contribute to the clustering while small γ makes only important views
contribute to the clustering.
Table 3: Comparison of accuracy rates of dataset considering all views
From the above table, it is clear that the proposed algorithm has a better clustering accuracy than the
existing k-prototypes and SBAC.
V. Conclusion
Mixed type data are encountered everywhere in the real world. In this paper, a new algorithm,
Multiview point based clustering algorithm for mixed type data has been proposed. When compared with the
existing algorithms the proposed algorithm has many significant contributions. The proposed algorithm
encapsulates the characteristics of clusters with mixed type variables more efficiently since it includes the
distribution information of both numeric and categorical variables.
It also takes into account the importance of various variables and views during the process of clustering
by using Huang’s approach and a new dissimilarity measure.
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0 10 20 30 40 50 60 70 80
Viewweights
μ
γ=1
γ=4
γ=12
Algorithms Clustering accuracy %
k-prototypes
SBAC
MK-Prototypes
0.521
0.747
0.846
8. MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 4 | Apr. 2014 | 62 |
It can compute weights for views and individual variables simultaneously in the clustering process.
With the two types of weights, dense views and significant variables can be identified and effect of low-quality
views and noise variables can be reduced.
Because of these contributions the proposed algorithm obtains higher clustering accuracy, which has
been validated by experimental results.
REFERENCES
[1] Z.X. Huang, Extensions to the k means algorithm for clustering large datasets with categorical values, Data Min.
Knowl. Discovery2 (3) (1998) 283–304.
[2] A.K.Jain, R.C.Dubes, Algorithms for Clustering Data, Prentice-Hall, New Jersey, 1988.
[3] A.K.Jain, M.N.Murty, P.J.Flynn, Data clustering: a survey, ACM Comput. Surv. 31 (3) (1999) 264–323.
[4] J.W.Han, M.Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, SanFrancisco,2001.
[5] C.Hsu,Y.P.Huang, Incremental clustering of mixed data based on distance hierarchy, Expert Syst. Appl. 35 (3)
(2008) 1177–1185. [6] C.Hsu, S.Lin, W.Tai, Apply extended self-organizing map to cluster and classify
mixed-type data, Neurocomputing 74 (18) (2011) 3832–3842.
[7] S.Lloyd, Least squares quantization in PCM, IEEE Trans. Inf. Theory 28 (2) (1982) 129–137.
[8] Z.X.Huang, Extensions to the k-meansalgorithm for clustering large datasets with categorical values, Data Min.
Knowl. Discovery 2 (3) (1998) 283–304. [9]Z.X.Huang, M.K.Ng, A fuzzy k-modes algorithm for clustering
categorical data, IEEE Trans. Fuzzy Syst. 7 (4) (1999) 446–452.
[10] J. Mui and K. Fu, “Automated Classification Of Nucleated Blood Cells Using A Binary Tree Classifier,” IEEE
Trans. Pattern Analysis And Machine Intelligence, Vol. 2, No. 5, Pp. 429-443, May 1980.
[11] J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, And W. Ma, “Recom: Reinforcement Clustering Of Multitype
Interrelated Data Objects,”Proc. 26th Ann. Int’l ACM SIGIR Conf. Research And Development In Informaion
Retrieval, Pp. 274-281, 2003.
[12] S. Bickel And T. Scheffer, “Multi-View Clustering,” Proc. IEEE Fourth Int’l Conf. Data Mining, Pp. 19-26, 2004.
[13] C.Li, G.Biswas, Unsupervised Learning with Mixed Numeric And Nominal Data, IEEE Trans. Knowl. Data
Eng.14 (4) (2002) 673–690.
[14] D.W.Goodall, A New Similarity Index Based On Probability, Biometrics 22 (4) (1966) 882–907.
[15] C.C.Hsu, Y.C.Chen, Mining Of Mixed Data With Application To Catalog Marketing, Expert Syst. Appl. 32 (1)
(2007) 12–27.
[16] C. Hsu, S.Lin, W.Tai, Apply Extended Self-Organizing Map To Cluster And Classify Mixed-Type Data,
Neurocomputing 74 (18) (2011) 3832–3842.
[17] S.P.Chatzis, A Fuzzy C-Means-Type Algorithm For Clustering Of Data With Mixed Numeric And Categorical
Attributes Employing A Probabilistic Dissimilarity Functional, Expert Syst. Appl. 38 (7) (2011) 8684–8689.
[18] Z.X.Huang, Clustering Large Datasets with Mixed Numeric and Categorical Values, In: Proceedings Of The First
Pacific-Asia Knowledge Discovery And Data Mining Conference, 1997, Pp.21–34.
[19] J.C.Bezdek, J.Keller, R.Krisnapuram, Fuzzy Models And Algorithms For Pattern Recognition And Image
Processing, Kluwer Academy Publishers, Boston, 1999. [25].
[20] Z.Zheng, M.G.Gong, J.J.Ma, L.C.Jiao, Unsupervised Evolutionary Clustering Algorithm For Mixed Type Data, In
: Proceedings Of The IEEE Congresson Evolutionary Computation (CEC), 2010, Pp.1–8.
[21] W.Kim, K.H.Lee, D.Lee, Fuzzy Clustering Of Categorical Data Using Fuzzy Centroid, Pattern Recognition
Lett.25 (11) (2004) 1263–1271.
[22] Z.X.Huang, M.K.Ng, H.Q.Rong, Z.C.Li, Automated Variable Weighting In K-Means Type Clustering, IEEE
Trans. Pattern Anal. Mach. Intell.27 (5) (2005) 657–668.
[23] Xiaojun Chen, Xiaofei Xu, Joshua Zhexue Huang, And Yunming Ye, Tw-K-Means: Automated Two-Level
Variable Weighting Clustering Algorithm For Multiview Data, IEEE Transactions On Knowledge And Data
Engineering, Vol. 25, No. 4, April 2013, pp 932-945