Kmeans algorithm performs clustering by using a partitioning method which partition data into different
clusters in such a way that similar object are present in one cluster that is within cluster compactness and
dissimilar objects are present in different clusters that is between cluster separations. Many of the Kmeans
type clustering algorithms considered only similarities among objects but do not consider dissimilarities. In
existing system extended version of Kmeans algorithm is described. Both cluster compactness within cluster
and cluster separations between clusters is considered in new clustering algorithm. Existing work initially
developed a group of objective function for clustering and then rules for updating the algorithm are
determined. The new algorithm with new objective function to solve the problem of cluster compactness
within cluster and cluster separations between clusters has been proposed. Proposed FCS algorithm works
simultaneously on both i.e. similarities among objects and dissimilarities among objects. It will give a better
performance over existing kmeans.
The document discusses different clustering algorithms, including k-means and EM clustering. K-means aims to partition items into k clusters such that each item belongs to the cluster with the nearest mean. It works iteratively to assign items to centroids and recompute centroids until the clusters no longer change. EM clustering generalizes k-means by computing probabilities of cluster membership based on probability distributions, with the goal of maximizing the overall probability of items given the clusters. Both algorithms are used to group similar items in applications like market segmentation.
The document describes a project analyzing clustering of galaxies using the Shapley Galaxy dataset. The dataset contains information on 4215 galaxies. The author applies hierarchical clustering algorithms and Gaussian mixture modeling to determine the optimal number of clusters in the data. Single, complete, and average linkage clustering are applied but do not clearly show clustering structure. Gaussian mixture modeling with the EM algorithm and BIC criterion indicates the best fit model has 8 clusters. Scatter plots are presented for solutions with 1 to 8 clusters.
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
This document discusses various clustering techniques for image segmentation. It begins by defining clustering and image segmentation. It then describes four main clustering techniques - exclusive clustering (e.g. k-means), overlapping clustering (e.g. fuzzy c-means), hierarchical clustering, and probabilistic-D clustering. For each technique, it provides details on the clustering algorithm and steps. It concludes that fuzzy c-means is superior to other approaches for image segmentation efficiency but has high computational time, while probabilistic-D clustering aims to reduce this time.
Step by step operations by which we make a group of objects in which attributes
of all the objects are nearly similar, known as clustering. So, a cluster is a collection of
objects that acquire nearly same attribute values. The property of an object in a cluster is
similar to other objects in same cluster but different with objects of other clusters.
Clustering is used in wide range of applications like pattern recognition, image processing,
data analysis, machine learning etc. Nowadays, more attention has been put on categorical
data rather than numerical data. Where, the range of numerical attributes organizes in a
class like small, medium, high, and so on. There is wide range of algorithm that used to
make clusters of given categorical data. Our approach is to enhance the working on well-
known clustering algorithm k-modes to improve accuracy of algorithm. We proposed a new
approach named “High Accuracy Clustering Algorithm for Categorical datasets”.
Hierarchical clustering is a method of partitioning a set of data into meaningful sub-classes or clusters. It involves two approaches - agglomerative, which successively links pairs of items or clusters, and divisive, which starts with the whole set as a cluster and divides it into smaller partitions. Agglomerative Nesting (AGNES) is an agglomerative technique that merges clusters with the least dissimilarity at each step, eventually combining all clusters. Divisive Analysis (DIANA) is the inverse, starting with all data in one cluster and splitting it until each data point is its own cluster. Both approaches can be visualized using dendrograms to show the hierarchical merging or splitting of clusters.
The document discusses different clustering algorithms, including k-means and EM clustering. K-means aims to partition items into k clusters such that each item belongs to the cluster with the nearest mean. It works iteratively to assign items to centroids and recompute centroids until the clusters no longer change. EM clustering generalizes k-means by computing probabilities of cluster membership based on probability distributions, with the goal of maximizing the overall probability of items given the clusters. Both algorithms are used to group similar items in applications like market segmentation.
The document describes a project analyzing clustering of galaxies using the Shapley Galaxy dataset. The dataset contains information on 4215 galaxies. The author applies hierarchical clustering algorithms and Gaussian mixture modeling to determine the optimal number of clusters in the data. Single, complete, and average linkage clustering are applied but do not clearly show clustering structure. Gaussian mixture modeling with the EM algorithm and BIC criterion indicates the best fit model has 8 clusters. Scatter plots are presented for solutions with 1 to 8 clusters.
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
This document discusses various clustering techniques for image segmentation. It begins by defining clustering and image segmentation. It then describes four main clustering techniques - exclusive clustering (e.g. k-means), overlapping clustering (e.g. fuzzy c-means), hierarchical clustering, and probabilistic-D clustering. For each technique, it provides details on the clustering algorithm and steps. It concludes that fuzzy c-means is superior to other approaches for image segmentation efficiency but has high computational time, while probabilistic-D clustering aims to reduce this time.
Step by step operations by which we make a group of objects in which attributes
of all the objects are nearly similar, known as clustering. So, a cluster is a collection of
objects that acquire nearly same attribute values. The property of an object in a cluster is
similar to other objects in same cluster but different with objects of other clusters.
Clustering is used in wide range of applications like pattern recognition, image processing,
data analysis, machine learning etc. Nowadays, more attention has been put on categorical
data rather than numerical data. Where, the range of numerical attributes organizes in a
class like small, medium, high, and so on. There is wide range of algorithm that used to
make clusters of given categorical data. Our approach is to enhance the working on well-
known clustering algorithm k-modes to improve accuracy of algorithm. We proposed a new
approach named “High Accuracy Clustering Algorithm for Categorical datasets”.
Hierarchical clustering is a method of partitioning a set of data into meaningful sub-classes or clusters. It involves two approaches - agglomerative, which successively links pairs of items or clusters, and divisive, which starts with the whole set as a cluster and divides it into smaller partitions. Agglomerative Nesting (AGNES) is an agglomerative technique that merges clusters with the least dissimilarity at each step, eventually combining all clusters. Divisive Analysis (DIANA) is the inverse, starting with all data in one cluster and splitting it until each data point is its own cluster. Both approaches can be visualized using dendrograms to show the hierarchical merging or splitting of clusters.
This document provides an overview of clustering and k-means clustering algorithms. It begins by defining clustering as the process of grouping similar objects together and dissimilar objects separately. K-means clustering is introduced as an algorithm that partitions data points into k clusters by minimizing total intra-cluster variance, iteratively updating cluster means. The k-means algorithm and an example are described in detail. Weaknesses and applications are discussed. Finally, vector quantization and principal component analysis are briefly introduced.
K-Means, its Variants and its ApplicationsVarad Meru
This presentation was given by our project group at the Lead College competition at Shivaji University. Our project got the 1st Prize. We focused mainly on Rough K-Means and build a Social-Network-Recommender System based on Rough K-Means.
The Members of the Project group were -
Mansi Kulkarni,
Nikhil Ingole,
Prasad Mohite,
Varad Meru
Vishal Bhavsar.
Wonderful Experience !!!
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
This document compares the performance of K-means and adaptive K-means clustering algorithms on different images. It finds that adaptive K-means clustering more accurately detects tumor regions in MRI brain images and the area of a lake in a satellite image, compared to K-means clustering. This is evaluated by comparing the time taken, peak signal-to-noise ratio, and root mean square error between the original and segmented images. Adaptive K-means clustering does not require pre-specifying the number of clusters, which allows it to better segment images without user input.
This document provides an overview of clustering techniques. It defines clustering as grouping a set of similar objects into classes, with objects within a cluster being similar to each other and dissimilar to objects in other clusters. The document then discusses partitioning, hierarchical, and density-based clustering methods. It also covers mathematical elements of clustering like partitions, distances, and data types. The goal of clustering is to minimize a similarity function to create high similarity within clusters and low similarity between clusters.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
Pattern recognition binoy k means clustering108kaushik
This document discusses clustering and the k-means clustering algorithm. It defines clustering as grouping a set of data objects into clusters so that objects within the same cluster are similar to each other but dissimilar to objects in other clusters. The k-means algorithm is described as an iterative process that assigns each object to one of k predefined clusters based on the object's distance from the cluster's centroid, then recalculates the centroid, repeating until cluster assignments no longer change. A worked example demonstrates how k-means partitions 7 objects into 2 clusters over 3 iterations. The k-means algorithm is noted to be efficient but requires specifying k and can be impacted by outliers, noise, and non-convex cluster shapes.
K-means Clustering Algorithm with Matlab Source codegokulprasath06
The K-means clustering algorithm partitions observations into K clusters by minimizing the distance between observations and cluster centroids. It works by randomly assigning observations to K clusters, calculating the distance between each observation and centroid, reassigning observations to their closest centroid, and repeating until cluster assignments are stable. Common distance measures used include Euclidean, squared Euclidean, and Manhattan distances. The algorithm aims to group similar observations together based on feature similarity to reduce the size of codebooks for applications like speech processing.
The document discusses the K-means clustering algorithm, an unsupervised learning technique that groups unlabeled data points into K clusters based on their similarities. It works by randomly initializing K cluster centroids and then iteratively assigning data points to their nearest centroid and recalculating the centroid positions based on the new assignments until convergence is reached. The document notes that K-means clustering requires specifying the number of clusters K in advance and presents the elbow method as a way to help determine an appropriate value of K for a given dataset.
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
The document describes an enhancement to the standard k-means clustering algorithm. The enhancement aims to improve computational speed by storing additional information from each iteration, such as the closest cluster and distance for each data point. This avoids needing to recompute distances to all cluster centers in subsequent iterations if a point does not change clusters. The complexity of the enhanced algorithm is reduced from O(nkl) to O(nk) where n is points, k is clusters, and l is iterations.
An improvement in k mean clustering algorithm using better time and accuracyijpla
This document summarizes a research paper that proposes an improved K-means clustering algorithm to enhance accuracy and reduce computation time. The standard K-means algorithm randomly selects initial cluster centroids, affecting results. The proposed algorithm systematically determines initial centroids based on data point distances. It assigns data to the closest initial centroid to generate initial clusters. Iteratively, it calculates new centroids and reassigns data only if distances decrease, reducing unnecessary computations. Experiments on various datasets show the proposed algorithm achieves higher accuracy faster than standard K-means.
This document provides an introduction to k-means clustering, including:
1. K-means clustering aims to partition n observations into k clusters by minimizing the within-cluster sum of squares, where each observation belongs to the cluster with the nearest mean.
2. The k-means algorithm initializes cluster centroids and assigns observations to the nearest centroid, recomputing centroids until convergence.
3. K-means clustering is commonly used for applications like machine learning, data mining, and image segmentation due to its efficiency, though it is sensitive to initialization and assumes spherical clusters.
This document is a seminar report on the K-Means clustering algorithm submitted by Gaurav Handa. It includes an introduction that discusses the importance of data mining and describes K-Means clustering. It also includes chapters that analyze and plan the implementation of K-Means, describe the algorithm and its flowchart, discuss limitations, and provide examples of implementing K-Means using graphs and Java code. The report was submitted in partial fulfillment of seminar requirements and includes acknowledgements and certificates.
The k-means clustering algorithm takes as input the number of clusters k and a set of data points, and assigns each data point to one of k clusters. It works by first randomly selecting k data points as initial cluster centroids. It then assigns each remaining point to the closest centroid, and recalculates the centroid positions. This process repeats until the centroids are stable or a stopping criteria is reached. As an example, the document applies k-means to cluster 6 data points into 2 groups, showing the random selection of initial centroids, assignment of points, and recalculation of centroids over multiple steps.
K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number of clusters (k) based on their similarity. It works by randomly assigning data points to k clusters and then iteratively updating cluster centroids and reassigning points until cluster membership stabilizes. K-means clustering aims to minimize intra-cluster variation while maximizing inter-cluster variation. There are various applications and variants of the basic k-means algorithm.
Introductory session for basic matlab commands and a brief overview of K-mean clustering algorithm with image processing example.
NOTE: you can find code of k-mean clustering algorithm for image processing in notes.
This document outlines clustering algorithms for large datasets. It discusses k-means clustering and extensions like k-means++ that improve initialization. It also covers spectral relaxation methods that reformulate k-means as a trace maximization problem to address local minima. Additionally, it proposes landmark-based clustering algorithms for biological sequences that select landmarks in one pass and assign sequences to the nearest landmark using hashing to search for neighbors. The document provides analysis of the time and space complexity of these algorithms as well as assumptions about separability and cluster size.
The document discusses K-means clustering, an unsupervised machine learning algorithm that partitions observations into k clusters defined by centroids. It compares clustering to classification, noting clustering does not use training data and maps observations into natural groupings. The K-means algorithm is then explained, with the steps of initializing centroids, assigning observations to the closest centroid, revising centroids as cluster means, and repeating until convergence. Applications of clustering in business contexts like banking, retail, and insurance are also briefly mentioned.
Bioanalytical Validated LC-MS Method for Determination of Naproxen in Human P...IJMTST Journal
Naproxen is an anti-inflammatory drug belongs to the category of analgesics and antipyretics. Naproxen
has the ability to bind and inhibit the synthesis of prostaglandins and produces anti-inflammatory effect.
Naproxen also inhibits COX-II which is involved in the inflammation. Bioanalytical method for naproxen has
been developed using human plasma. Standard stock solution of naproxen was prepared using methanol.
Zidovudine was used as internal standard. The linearity of the proposed method was developed in the range
between 100 and 10000 ng/ml. The slope of the curve was found to be 111.46 and intercept was 6395.2 and
correlation coefficient was 0.999. The proposed method was evaluated with validation parameters like
accuracy, precision and stability. The developed method is suitable for pharmacokinetic studies.
Implementation of Three phase SPWM Inverter with Minimum Number of Power Elec...IJMTST Journal
In the past decades, the researchers have dealt with the conventional topology, which possesses sum switches of Multilevel Inverter is applied to PWM method. The present research work has been introduced a new method of multilevel inverter using reduced switches is applied with PWM technique. In introduction part the conventional new multilevel inverter & switching pattern are explained. In second part PWM technique of proposed work and circuits is explained. The width of this pulses are modulated in order to obtain inverter output voltage control and to reduce its harmonic content. Sinusoidal pulse width modulation or SPWM is the most common method in motor control and inverter application. Conventionally, to generate the signal, triangle wave as a carrier signal is compared with the sinusoidal wave, whose frequency is the desired frequency.
This document provides an overview of clustering and k-means clustering algorithms. It begins by defining clustering as the process of grouping similar objects together and dissimilar objects separately. K-means clustering is introduced as an algorithm that partitions data points into k clusters by minimizing total intra-cluster variance, iteratively updating cluster means. The k-means algorithm and an example are described in detail. Weaknesses and applications are discussed. Finally, vector quantization and principal component analysis are briefly introduced.
K-Means, its Variants and its ApplicationsVarad Meru
This presentation was given by our project group at the Lead College competition at Shivaji University. Our project got the 1st Prize. We focused mainly on Rough K-Means and build a Social-Network-Recommender System based on Rough K-Means.
The Members of the Project group were -
Mansi Kulkarni,
Nikhil Ingole,
Prasad Mohite,
Varad Meru
Vishal Bhavsar.
Wonderful Experience !!!
Parametric Comparison of K-means and Adaptive K-means Clustering Performance ...IJECEIAES
This document compares the performance of K-means and adaptive K-means clustering algorithms on different images. It finds that adaptive K-means clustering more accurately detects tumor regions in MRI brain images and the area of a lake in a satellite image, compared to K-means clustering. This is evaluated by comparing the time taken, peak signal-to-noise ratio, and root mean square error between the original and segmented images. Adaptive K-means clustering does not require pre-specifying the number of clusters, which allows it to better segment images without user input.
This document provides an overview of clustering techniques. It defines clustering as grouping a set of similar objects into classes, with objects within a cluster being similar to each other and dissimilar to objects in other clusters. The document then discusses partitioning, hierarchical, and density-based clustering methods. It also covers mathematical elements of clustering like partitions, distances, and data types. The goal of clustering is to minimize a similarity function to create high similarity within clusters and low similarity between clusters.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
Pattern recognition binoy k means clustering108kaushik
This document discusses clustering and the k-means clustering algorithm. It defines clustering as grouping a set of data objects into clusters so that objects within the same cluster are similar to each other but dissimilar to objects in other clusters. The k-means algorithm is described as an iterative process that assigns each object to one of k predefined clusters based on the object's distance from the cluster's centroid, then recalculates the centroid, repeating until cluster assignments no longer change. A worked example demonstrates how k-means partitions 7 objects into 2 clusters over 3 iterations. The k-means algorithm is noted to be efficient but requires specifying k and can be impacted by outliers, noise, and non-convex cluster shapes.
K-means Clustering Algorithm with Matlab Source codegokulprasath06
The K-means clustering algorithm partitions observations into K clusters by minimizing the distance between observations and cluster centroids. It works by randomly assigning observations to K clusters, calculating the distance between each observation and centroid, reassigning observations to their closest centroid, and repeating until cluster assignments are stable. Common distance measures used include Euclidean, squared Euclidean, and Manhattan distances. The algorithm aims to group similar observations together based on feature similarity to reduce the size of codebooks for applications like speech processing.
The document discusses the K-means clustering algorithm, an unsupervised learning technique that groups unlabeled data points into K clusters based on their similarities. It works by randomly initializing K cluster centroids and then iteratively assigning data points to their nearest centroid and recalculating the centroid positions based on the new assignments until convergence is reached. The document notes that K-means clustering requires specifying the number of clusters K in advance and presents the elbow method as a way to help determine an appropriate value of K for a given dataset.
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
The document describes an enhancement to the standard k-means clustering algorithm. The enhancement aims to improve computational speed by storing additional information from each iteration, such as the closest cluster and distance for each data point. This avoids needing to recompute distances to all cluster centers in subsequent iterations if a point does not change clusters. The complexity of the enhanced algorithm is reduced from O(nkl) to O(nk) where n is points, k is clusters, and l is iterations.
An improvement in k mean clustering algorithm using better time and accuracyijpla
This document summarizes a research paper that proposes an improved K-means clustering algorithm to enhance accuracy and reduce computation time. The standard K-means algorithm randomly selects initial cluster centroids, affecting results. The proposed algorithm systematically determines initial centroids based on data point distances. It assigns data to the closest initial centroid to generate initial clusters. Iteratively, it calculates new centroids and reassigns data only if distances decrease, reducing unnecessary computations. Experiments on various datasets show the proposed algorithm achieves higher accuracy faster than standard K-means.
This document provides an introduction to k-means clustering, including:
1. K-means clustering aims to partition n observations into k clusters by minimizing the within-cluster sum of squares, where each observation belongs to the cluster with the nearest mean.
2. The k-means algorithm initializes cluster centroids and assigns observations to the nearest centroid, recomputing centroids until convergence.
3. K-means clustering is commonly used for applications like machine learning, data mining, and image segmentation due to its efficiency, though it is sensitive to initialization and assumes spherical clusters.
This document is a seminar report on the K-Means clustering algorithm submitted by Gaurav Handa. It includes an introduction that discusses the importance of data mining and describes K-Means clustering. It also includes chapters that analyze and plan the implementation of K-Means, describe the algorithm and its flowchart, discuss limitations, and provide examples of implementing K-Means using graphs and Java code. The report was submitted in partial fulfillment of seminar requirements and includes acknowledgements and certificates.
The k-means clustering algorithm takes as input the number of clusters k and a set of data points, and assigns each data point to one of k clusters. It works by first randomly selecting k data points as initial cluster centroids. It then assigns each remaining point to the closest centroid, and recalculates the centroid positions. This process repeats until the centroids are stable or a stopping criteria is reached. As an example, the document applies k-means to cluster 6 data points into 2 groups, showing the random selection of initial centroids, assignment of points, and recalculation of centroids over multiple steps.
K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number of clusters (k) based on their similarity. It works by randomly assigning data points to k clusters and then iteratively updating cluster centroids and reassigning points until cluster membership stabilizes. K-means clustering aims to minimize intra-cluster variation while maximizing inter-cluster variation. There are various applications and variants of the basic k-means algorithm.
Introductory session for basic matlab commands and a brief overview of K-mean clustering algorithm with image processing example.
NOTE: you can find code of k-mean clustering algorithm for image processing in notes.
This document outlines clustering algorithms for large datasets. It discusses k-means clustering and extensions like k-means++ that improve initialization. It also covers spectral relaxation methods that reformulate k-means as a trace maximization problem to address local minima. Additionally, it proposes landmark-based clustering algorithms for biological sequences that select landmarks in one pass and assign sequences to the nearest landmark using hashing to search for neighbors. The document provides analysis of the time and space complexity of these algorithms as well as assumptions about separability and cluster size.
The document discusses K-means clustering, an unsupervised machine learning algorithm that partitions observations into k clusters defined by centroids. It compares clustering to classification, noting clustering does not use training data and maps observations into natural groupings. The K-means algorithm is then explained, with the steps of initializing centroids, assigning observations to the closest centroid, revising centroids as cluster means, and repeating until convergence. Applications of clustering in business contexts like banking, retail, and insurance are also briefly mentioned.
Bioanalytical Validated LC-MS Method for Determination of Naproxen in Human P...IJMTST Journal
Naproxen is an anti-inflammatory drug belongs to the category of analgesics and antipyretics. Naproxen
has the ability to bind and inhibit the synthesis of prostaglandins and produces anti-inflammatory effect.
Naproxen also inhibits COX-II which is involved in the inflammation. Bioanalytical method for naproxen has
been developed using human plasma. Standard stock solution of naproxen was prepared using methanol.
Zidovudine was used as internal standard. The linearity of the proposed method was developed in the range
between 100 and 10000 ng/ml. The slope of the curve was found to be 111.46 and intercept was 6395.2 and
correlation coefficient was 0.999. The proposed method was evaluated with validation parameters like
accuracy, precision and stability. The developed method is suitable for pharmacokinetic studies.
Implementation of Three phase SPWM Inverter with Minimum Number of Power Elec...IJMTST Journal
In the past decades, the researchers have dealt with the conventional topology, which possesses sum switches of Multilevel Inverter is applied to PWM method. The present research work has been introduced a new method of multilevel inverter using reduced switches is applied with PWM technique. In introduction part the conventional new multilevel inverter & switching pattern are explained. In second part PWM technique of proposed work and circuits is explained. The width of this pulses are modulated in order to obtain inverter output voltage control and to reduce its harmonic content. Sinusoidal pulse width modulation or SPWM is the most common method in motor control and inverter application. Conventionally, to generate the signal, triangle wave as a carrier signal is compared with the sinusoidal wave, whose frequency is the desired frequency.
Applications of Smart Grid through Harmonic Current & Reactive Power Compensa...IJMTST Journal
The document discusses applications of smart grids through harmonic current and reactive power compensation. It describes how unidirectional AC-DC boost converters can be used for both harmonic current compensation and reactive power compensation to improve grid power quality while regulating DC bus voltage. A control strategy is proposed for the converter to operate in three modes - power factor correction, harmonic current compensation, and reactive power compensation - to simultaneously compensate for harmonics and reactive power drawn from the grid.
ANFIS Based UPQC for Power Quality ImprovementIJMTST Journal
Analysis of a three-phase three wire Unified Power Quality Conditioner (UPQC) controlled with Adaptive Neuro Fuzzy Inference System based controller is presented in this project. UPQC is a custom power device which is integrated by series and shunt active power filters (APF) sharing a common dc bus capacitor. The shunt and series APFs are realized with the help of three – phase, three leg voltage source converters that are sharing a common DC capacitor. The fundamental voltages, currents are extracted by modified synchronous reference frame technique; switching pulses for both the filters are generated by conventional hysteresis based controller. The capacitance voltage is balanced by ANFIS based controller. Performance of the ANFIS based control algorithm of shunt active filter with series active filter is evaluated in terms of eliminating the power quality problems in a three phase, three-wire distribution system with non-linear and unbalanced load conditions. Adaptive neuro fuzzy logic control is used for dc capacitance balancing. System taken for test and the control algorithm are implemented with the help of Sim power systems and ANFIS editor of MATLAB / SIMULINK.
An Isolated 3 Phase Multilevel Inverter for PV Cell ApplicationsIJMTST Journal
This study presents a novel topology for multilevel inverters so called cascaded transformer inverter
(CTRSI). This topology consists of one DC source and several single-phase transformers. Each single-phase
transformers generates three levels with four semiconductor switches and only two switches for all
transformers alter the direction of single DC source. Whereas each single-phase transformers in conventional
cascaded transformer multilevel inverter includes more switches. Hence, CTRSI has the advantage of a
reduced number of components compared with conventional cascaded transformer multilevel inverter.
Simulation results carried out by MATLAB/SIMULINK. The results show that the proposed inverter topology
is able to reach high-quality output voltages. THD of output voltage is verified using FFT analysis tool
Reliable and Efficient Data Acquisition in Wireless Sensor NetworkIJMTST Journal
The sensors in the WSN sense the surrounding, collects the data and transfers the data to the sink node. It
has been observed that the sensor nodes are deactivated or damaged when exposed to certain radiations or
due to energy problems. This damage leads to the temporary isolation of the nodes from the network which
results in the formation of the holes. These holes are dynamic in nature and can grow and shrink depending
upon the factors causing the damage to the sensor nodes. So a solution has been presented in the base paper
where the dual mode i.e. Radio frequency and the Acoustic mode are considered so that the data can be
transferred easily. Based on this a survey has been done where several factors are studied so that the
performance of the system can be increased.
Spectrum Sensing Detection with Sequential Forward Search in Comparison to Kn...IJMTST Journal
FCC is currently working on the concept of white space users “borrowing” spectrum from free license
holders temporarily to improve the spectrum utilization.
This project provides a relation between a Pf and the SNR value of any spectrum detector to have a
certain performance. Previous spectrum sensing detection techniques are only suitable for Low SNR and
are based on signal information values. But these methods are purely narrow band spectrum applications
In order to overcome the above said drawbacks we propose a novel method of spectrum sensing method
and is suitable for low and high SNR values, the sensed spectrum applicable for wide band applications.
Our proposed method does not require signal information at the receiver and channel information, because
this flexibility sensing rate is very high compared to previous techniques.
Predicting Privacy Policy Automatically To the User Uploaded Images on Conten...IJMTST Journal
Nowadays sharing of images is increasing through social networking sites but maintaining privacy is a major problem. While sharing images users knowingly or unknowingly share their personal information. Due to these incidents, there is a need of tool for setting privacy for their images. To address this need we propose an adaptive privacy policy prediction (A3P) to set privacy for their images. We are considering the metadata for predicting the privacy. Our Solution depends on image classification for image categories and predicting privacy.
Novel Adaptive Hold Logic Circuit for the Multiplier using Add Round Key and ...IJMTST Journal
Digital multipliers are among the most critical arithmetic functional units in many applications, such as the Fourier transform, discrete cosine transforms, and digital filtering. The through put of these applications depends on multipliers, if the multipliers are too slow, the performance of entire circuits will be reduced. The negative bias temperature instability effect occurs when a PMOS transistor is under negative bias (Vgs = −Vdd), increasing the threshold voltage of a PMOS transistor and reducing the multiplier speed. Similarly, positive bias temperature instability occurs when an NMOS transistor is under positive bias. Both effects degrade the speed of the transistor and in the long term, the system may be fail due to timing violations. Therefore, it is required to design reliable high-performance multipliers. In this paper, we implement an aging aware multiplier design with a novel adaptive hold logic (AHL) circuit. The multiplier is able to provide the higher throughput through the variable latency and can adjust the adaptive hold logic (AHL) circuit to lessen performance degradation that is due to the aging effect. The proposed design can be applied to the column bypass multiplier.
Shadow Removal of Individual Tree Crowns in a High Resolution Satellite ImageryIJMTST Journal
Satellite image processing involves many techniques for enhancement/segmentation of raw (Raster)
images acquired from cameras or sensors placed on satellites, space probes and aircrafts. These pictures
have numerous applications in regular day to day life. Remote sensing is a process of creating thematic maps
as spatial sharing of exacting information and is used to discover and isolate the coherent parts of the
surface objects (trees/buildings) and the water bodies.
Shadow detection and removal is an main preprocessing for improving act of such (segmentation, thing
recognition, view study, tracking, etc) vision tasks. Shadow removal of individual tree top in a coniferous
vegetation test zone involves complex procedures. For extracting vegetation zone from a satellite, a pair of
spectral indices called Normalized Difference Vegetation Index(NDVI) and Saturation Index(SI) are available.
The difficulty of shadows occurring in satellite images need to be addressed while segmenting the tree
crowns. In the object slanting shadow detection and deletion or removal method, shadow features are taken
into deliberation during image segmentation, and then, according to the statistical features of the images,
assumed shadows are extracted. In addition to, some dim objects which could be incorrect for shadows are
lined out according to object properties and spatial bond between objects. The retrieved image can be
obtained by considering the cross projection advance or approach for shadow free image.
Correlation Analysis of Electromyogram SignalsIJMTST Journal
An inability to adapt myoelectric interfaces to a user’s unique style of hand motion. The system also adapts
the motion style of an opposite limb. These are the important factors inhibiting the practical application of
myoelectric interfaces. This is mainly attributed to the individual differences in the exhibited electromyogram
(EMG) signals generated by the muscles of different limbs. In this project myoelectric interface easily adapts
the signal from the users and maintains good movement recognition performance. At the initial stage the
myoelectric signal is extracted from the user by using the data acquisition system. A new set of features
describing the movements of user’s is extracted and the user’s features are classifed using SVM
classification. The given signal is then compared with the database signal with the accuracy of 90.910 %
across all the EMG signals.
Garden Environmental Monitoring & Automatic Control System Using SensorsIJMTST Journal
This document describes an automated garden control system that uses sensors to monitor soil moisture, temperature, humidity, and motion to control irrigation and other systems. The system uses sensors connected to a microcontroller that processes the sensor data and controls actuators like irrigation. Wireless communication using Zigbee allows the sensor data to be transmitted remotely and monitored on a computer. The goal is to automate garden maintenance and control environmental factors like moisture and temperature scientifically to optimize plant growth while reducing manual labor.
Analysis of Fuel Cell Based Multilevel DC-DC Boost Converter for Induction MotorIJMTST Journal
In this paper new topologies and interleaving modulation concepts for multilevel DC-DC boost converter
enabling a significantly less loss and a reduced chip size of the power semiconductors are proposed. The
distributed generation (DG) systems based on the renewable energy sources have rapidly developed in
recent years. These DG systems are powered by micro sources such as fuel cells, photovoltaic (PV) systems,
and batteries. Fuel cells are considered to be one of the most promising sources of distributed energy because
of their high efficiency, low environmental impact and scalability. Non-isolated high step-up DC-DC
converters are required in the industrial applications. Many of these conventional DC–DC converters have the
disadvantages of operating at high duty-cycle, high switch voltage stress and high diode peak current. A
three-level step up converter is implemented to boost the fuel cell stack voltage of 96V to 340V. The proposed
converter consists a system of fuel cell based Multilevel DC-DC converter with PI controller is modeled and
simulated by using Matlab/Simulink.
An Efficient Model to Identify A Vehicle by Recognizing the Alphanumeric Char...IJMTST Journal
Automatic Engine Number Recognition (AENR) is the digital image processing and an important aspect/role to identify the theft vehicles by recognizing characters, digits and special symbols. There is increase in the theft of vehicles, so to identify these theft vehicles, the proposed system is introduced. The proposed system controls the theft vehicles by recognizing a digits and characters in the number plate and chassis region and stores in the database in ASCII format to check the theft vehicles are registered or unregistered. Both system consists of 4 common phases: - Preprocessing, Character Extraction (ROI), Character Segmentation, and Character Recognition. This paper proposes a new scheme for engine number and chassis number extraction from the pre-processed image of the vehicle’s engine and chassis region using preprocess techniques, Region of Interest(ROI), Binarization, thresholding, template matching.
Security in Medical Devices using Wireless Monitoring and Detection of AnomaliesIJMTST Journal
Implantable and medical devices (IMDs) have been advanced with the advancements in engineering and
medical science. IMDs are used for applying new therapies to patients, monitoring human body parameters
and making diagnosis as per the monitoring result. Increased use of IMDs has enhanced the chances of
attacks to them. Therefore, to make use of IMDs for various applications, they need to be secured. A system is
developed to achieve the security. The system monitors various human body parameters wirelessly and
detects anomaly if unauthorized node participates in communication. The system uses request response
protocol in wireless communication. Experiments show that body parameters can be successfully monitored
and signal characteristic can be used to detect anomaly.
The document discusses an algorithm for detecting cuts or disconnections in wireless sensor networks. The algorithm allows each node to detect when it becomes disconnected from a designated source node (DOS event) and allows some nodes to detect when a cut occurs elsewhere in the network without disconnecting from the source (CCOS event). The distributed cut detection (DCD) algorithm is asynchronous and involves nodes iteratively computing their electrical potentials. For DOS detection, a node needs only communicate a scalar value to neighbors. For CCOS detection, nodes require position information. The algorithm enables detection of cuts and their approximate locations with convergence independent of network size or structure.
Vehicle Monitoring and Tracking System based on ARM7 and Android ApplicationIJMTST Journal
A vehicle monitoring and tracking system gives the better solutions in the security issues of various applications. Based on ARM7 and android application is designed and implemented for monitoring the vehicle based on location. Security to this system can be provided using finger print module for authentication purpose. The vehicle will be started only when the authorized person enters into the vehicle and provides finger print to biometric device so that by using this authentication process we can avoid misusing of vehicle and thefts. The GPS is a one of location reorganization technology that is used in several applications. GPS is also used for vehicle tracking and monitoring. The tracking system of our proposed system informs the location and route travelled by vehicle and also the GSM technology is used for sending alert message to mobile. In our proposed system we are making use of LPG Gas leakage sensor MQ2 and temperature sensor LM35 to provide a safety for the travelers.
Modified CSKA Application in the Floating Point Adder using Carry Skip Adder ...IJMTST Journal
In this paper, we present a carry skip adder (CSKA) structure that has a higher speed yet lower energy
consumption compared with the conventional one. The speed enhancement is achieved by applying
concatenation and incrementation schemes to improve the efficiency of the conventional CSKA (Conv
CSKA) structure. In addition, instead of utilizing multiplexer logic, the proposed structure makes use of
AND-OR-Invert (AOI) and OR-AND-Invert (OAI) compound gates for the skip logic. The structure maybe
realized with both fixed stage size and variable stage size styles, wherein the latter further improves the
speed and energy parameters of the adder. Finally, a hybrid variable latency extension of the proposed
structure, which lowers the power consumption without considerably impacting the speed, is presented.
This extension utilizes a modified parallel structure for increasing the slack time, and hence, enabling
further voltage reduction. The proposed structures are assessed by comparing their speed, power, and
energy parameters with those of other adders using a 45-nmstatic CMOS technology for a wide range of
supply voltages. The results that are obtained using HSPICE simulations reveal, on average, 44% and
38%improvements in the delay and energy, respectively, compared with those of the Conv-CSKA. In
addition, the power–delay product was the lowest among the structures considered in this paper, while its
energy–delay product was almost the same as that of the Kogge–Stone parallel prefix adder with
considerably smaller area and power consumption. Simulations on the proposed hybrid variable latency
CSKA reveal reduction in the power consumption compared with the latest works in this field while having
a reasonably high speed.
Grid Interfaced 3-phase Solar Inverter with MPPT by using Boost ConverterIJMTST Journal
This document describes a grid-interfaced 3-phase 750VA solar inverter system with maximum power point tracking (MPPT). It uses a boost converter with a perturbation and observation MPPT algorithm to extract maximum power from the solar panels. The boost converter output is fed to a 3-phase inverter to regulate the voltage and frequency for supplying AC loads or grid integration. Simulation results in MATLAB/Simulink show the MPPT algorithm effectively tracks maximum power and the voltage controller with SPWM regulates the inverter output voltage. The inverter can supply 1kW of power to the grid with output voltages and currents as required.
k-Means is a rather simple but well known algorithms for grouping objects, clustering. Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these point serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same. In this project we will discuss about K-means clustering algorithm, implementation and its application to the problem of unsupervised learning
K means Clustering - algorithm to cluster n objectsVoidVampire
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n.
It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data.
It assumes that the object attributes form a vector space.
K-means clustering is an algorithm that groups data points into k clusters based on their attributes and distances from initial cluster center points. It works by first randomly selecting k data points as initial centroids, then assigning all other points to the closest centroid and recalculating the centroids. This process repeats until the centroids are stable or a maximum number of iterations is reached. K-means clustering is widely used for machine learning applications like image segmentation and speech recognition due to its efficiency, but it is sensitive to initialization and assumes spherical clusters of similar size and density.
Image Segmentation Using Two Weighted Variable Fuzzy K MeansEditor IJCATR
Image segmentation is the first step in image analysis and pattern recognition. Image segmentation is the process of dividing an image into different regions such that each region is homogeneous. The accurate and effective algorithm for segmenting image is very useful in many fields, especially in medical image. This paper presents a new approach for image segmentation by applying k-means algorithm with two level variable weighting. In image segmentation, clustering algorithms are very popular as they are intuitive and are also easy to implement. The K-means and Fuzzy k-means clustering algorithm is one of the most widely used algorithms in the literature, and many authors successfully compare their new proposal with the results achieved by the k-Means and Fuzzy k-Means. This paper proposes a new clustering algorithm called TW-fuzzy k-means, an automated two-level variable weighting clustering algorithm for segmenting object. In this algorithm, a variable weight is also assigned to each variable on the current partition of data. This could be applied on general images and/or specific images (i.e., medical and microscopic images). The proposed TW-Fuzzy k-means algorithm in terms of providing a better segmentation performance for various type of images. Based on the results obtained, the proposed algorithm gives better visual quality as compared to several other clustering methods.
Balanced k-means revisited. Energy saving algorithms in sensor equipped gadgetschristopher corlett
The document revisits the k-means clustering algorithm and proposes a new balanced k-means variant. The algorithm treats clustering as a two-objective optimization problem that aims to minimize variance within clusters while also minimizing differences in cluster sizes. It initially focuses on variance minimization but gradually increases weight on the balance constraint in each iteration. The resulting balance is determined by the termination point rather than a tunable parameter, using a specified balance criterion. Experimental results on synthetic datasets show the new approach achieves better balance than regularized k-means while maintaining competitive performance on other metrics.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
This document discusses various clustering techniques used in data mining. It begins by defining clustering as an unsupervised learning technique that groups similar objects together. It then discusses advantages of clustering such as quality improvement and reuse opportunities. Several clustering methods are described such as K-means clustering, which aims to partition observations into k clusters where each observation belongs to the cluster with the nearest mean. The document concludes by discussing advantages of K-means clustering such as its linear time complexity and its use for spherical cluster shapes.
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
Clustering mixed type data is one of the major research topics in the area of data mining. In
this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution
centroid is used to represent the prototype of categorical variables in a cluster which is then combined
with the mean to represent the prototype of clusters with mixed type variables. In the method, data is
observed from different views and the variables are grouped into different views. Those instances that
can be viewed differently from different viewpoints can be defined as multiview data. During clustering
process the differences among views are ignored in usual cases. Here, both views and variables weights
are computed simultaneously. The view weight is used to determine the closeness or density of view and
variable weight is used to identify the significance of each variable. With the intention of determining
the cluster of objects both these weights are used in the distance function. In the proposed method,
enhancement to the k-prototypes is done so that it automatically computes both view and variable
weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering
algorithms.
Unsupervised learning Algorithms and Assumptionsrefedey275
Topics :
Introduction to unsupervised learning
Unsupervised learning Algorithms and Assumptions
K-Means algorithm – introduction
Implementation of K-means algorithm
Hierarchical Clustering – need and importance of hierarchical clustering
Agglomerative Hierarchical Clustering
Working of dendrogram
Steps for implementation of AHC using Python
Gaussian Mixture Models – Introduction, importance and need of the model
Normal , Gaussian distribution
Implementation of Gaussian mixture model
Understand the different distance metrics used in clustering
Euclidean, Manhattan, Cosine, Mahala Nobis
Features of a Cluster – Labels, Centroids, Inertia, Eigen vectors and Eigen values
Principal component analysis
Supervised learning (classification)
Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Types of Hierarchical Clustering
There are mainly two types of hierarchical clustering:
Agglomerative hierarchical clustering
Divisive Hierarchical clustering
A distribution in statistics is a function that shows the possible values for a variable and how often they occur.
In probability theory and statistics, the Normal Distribution, also called the Gaussian Distribution.
is the most significant continuous probability distribution.
Sometimes it is also called a bell curve.
This document discusses k-means clustering, an unsupervised machine learning algorithm. It begins by distinguishing between supervised and unsupervised learning. It then defines clustering as classifying objects into groups where objects within each group share common traits. The document proceeds to describe hierarchical and partitional clustering algorithms. It focuses on k-means clustering, explaining how it works by iteratively assigning objects to centroids to minimize intra-cluster distances. Examples are provided to illustrate the k-means algorithm steps. Weaknesses and applications of k-means clustering are also summarized.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Clustering is an unsupervised machine learning technique used to group unlabeled data points. There are two main approaches: hierarchical clustering and partitioning clustering. Partitioning clustering algorithms like k-means and k-medoids attempt to partition data into k clusters by optimizing a criterion function. Hierarchical clustering creates nested clusters by merging or splitting clusters. Examples of hierarchical algorithms include agglomerative clustering, which builds clusters from bottom-up, and divisive clustering, which separates clusters from top-down. Clustering can group both numerical and categorical data.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
This document discusses multidimensional clustering methods for data mining and their industrial applications. It begins with an introduction to clustering, including definitions and goals. Popular clustering algorithms are described, such as K-means, fuzzy C-means, hierarchical clustering, and mixture of Gaussians. Distance measures and their importance in clustering are covered. The K-means and fuzzy C-means algorithms are explained in detail. Examples are provided to illustrate fuzzy C-means clustering. Finally, applications of clustering algorithms in fields such as marketing, biology, and earth sciences are mentioned.
The International Journal of Engineering and Science (The IJES)theijes
This document summarizes a research paper that proposes a novel approach to improving the k-means clustering algorithm. The standard k-means algorithm is computationally expensive and produces results that depend heavily on the initial centroid selection. The proposed approach determines initial centroids systematically and uses a heuristic to efficiently assign data points to clusters. It improves both the accuracy and efficiency of k-means clustering by ensuring the entire process takes O(n2) time without sacrificing cluster quality.
K-means clustering is an algorithm used to classify objects into k number of groups or clusters. It works by minimizing the sum of squares of distances between data points and assigned cluster centroids. The basic steps are to initialize k cluster centroids, assign each data point to the nearest centroid, recompute the centroids based on new assignments, and repeat until centroids don't change. Some examples of its applications include machine learning, data mining, speech recognition, image segmentation, and color quantization. However, it is sensitive to initialization and may get stuck in local optima.
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
This document discusses machine learning algorithms for image classification using five different classification schemes. It summarizes the mathematical models behind each classification algorithm, including Nearest Class Centroid classifier, Nearest Sub-Class Centroid classifier, k-Nearest Neighbor classifier, Perceptron trained using Backpropagation, and Perceptron trained using Mean Squared Error. It also describes two datasets used in the experiments - the MNIST dataset of handwritten digits and the ORL face recognition dataset. The performance of the five classification schemes are compared on these datasets.
The document discusses the concept of clustering, which is an unsupervised machine learning technique used to group unlabeled data points that are similar. It describes how clustering algorithms aim to identify natural groups within data based on some measure of similarity, without any labels provided. The key types of clustering are partition-based (like k-means), hierarchical, density-based, and model-based. Applications include marketing, earth science, insurance, and more. Quality measures for clustering include intra-cluster similarity and inter-cluster dissimilarity.
Similar to A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters (20)
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfBen Linders
Psychological safety in teams is important; team members must feel safe and able to communicate and collaborate effectively to deliver value. It’s also necessary to build long-lasting teams since things will happen and relationships will be strained.
But, how safe is a team? How can we determine if there are any factors that make the team unsafe or have an impact on the team’s culture?
In this mini-workshop, we’ll play games for psychological safety and team culture utilizing a deck of coaching cards, The Psychological Safety Cards. We will learn how to use gamification to gain a better understanding of what’s going on in teams. Individuals share what they have learned from working in teams, what has impacted the team’s safety and culture, and what has led to positive change.
Different game formats will be played in groups in parallel. Examples are an ice-breaker to get people talking about psychological safety, a constellation where people take positions about aspects of psychological safety in their team or organization, and collaborative card games where people work together to create an environment that fosters psychological safety.
The importance of sustainable and efficient computational practices in artificial intelligence (AI) and deep learning has become increasingly critical. This webinar focuses on the intersection of sustainability and AI, highlighting the significance of energy-efficient deep learning, innovative randomization techniques in neural networks, the potential of reservoir computing, and the cutting-edge realm of neuromorphic computing. This webinar aims to connect theoretical knowledge with practical applications and provide insights into how these innovative approaches can lead to more robust, efficient, and environmentally conscious AI systems.
Webinar Speaker: Prof. Claudio Gallicchio, Assistant Professor, University of Pisa
Claudio Gallicchio is an Assistant Professor at the Department of Computer Science of the University of Pisa, Italy. His research involves merging concepts from Deep Learning, Dynamical Systems, and Randomized Neural Systems, and he has co-authored over 100 scientific publications on the subject. He is the founder of the IEEE CIS Task Force on Reservoir Computing, and the co-founder and chair of the IEEE Task Force on Randomization-based Neural Networks and Learning Systems. He is an associate editor of IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
This presentation by Professor Giuseppe Colangelo, Jean Monnet Professor of European Innovation Policy, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
This presentation by OECD, OECD Secretariat, was made during the discussion “Pro-competitive Industrial Policy” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/pcip.
This presentation was uploaded with the author’s consent.
Gamify it until you make it Improving Agile Development and Operations with ...Ben Linders
So many challenges, so little time. While we’re busy developing software and keeping it operational, we also need to sharpen the saw, but how? Gamification can be a way to look at how you’re doing and find out where to improve. It’s a great way to have everyone involved and get the best out of people.
In this presentation, Ben Linders will show how playing games with the DevOps coaching cards can help to explore your current development and deployment (DevOps) practices and decide as a team what to improve or experiment with.
The games that we play are based on an engagement model. Instead of imposing change, the games enable people to pull in ideas for change and apply those in a way that best suits their collective needs.
By playing games, you can learn from each other. Teams can use games, exercises, and coaching cards to discuss values, principles, and practices, and share their experiences and learnings.
Different game formats can be used to share experiences on DevOps principles and practices and explore how they can be applied effectively. This presentation provides an overview of playing formats and will inspire you to come up with your own formats.
• For a full set of 530+ questions. Go to
https://skillcertpro.com/product/servicenow-cis-itsm-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
This presentation by Thibault Schrepel, Associate Professor of Law at Vrije Universiteit Amsterdam University, was made during the discussion “Artificial Intelligence, Data and Competition” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/aicomp.
This presentation was uploaded with the author’s consent.
This presentation by Juraj Čorba, Chair of OECD Working Party on Artificial Intelligence Governance (AIGO), was made during the discussion “Artificial Intelligence, Data and Competition” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/aicomp.
This presentation was uploaded with the author’s consent.
This presentation by Katharine Kemp, Associate Professor at the Faculty of Law & Justice at UNSW Sydney, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
This presentation by Nathaniel Lane, Associate Professor in Economics at Oxford University, was made during the discussion “Pro-competitive Industrial Policy” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/pcip.
This presentation was uploaded with the author’s consent.
This presentation by Tim Capel, Director of the UK Information Commissioner’s Office Legal Service, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
This presentation by OECD, OECD Secretariat, was made during the discussion “Artificial Intelligence, Data and Competition” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/aicomp.
This presentation was uploaded with the author’s consent.
This presentation by Yong Lim, Professor of Economic Law at Seoul National University School of Law, was made during the discussion “Artificial Intelligence, Data and Competition” held at the 143rd meeting of the OECD Competition Committee on 12 June 2024. More papers and presentations on the topic can be found at oe.cd/aicomp.
This presentation was uploaded with the author’s consent.
This presentation by OECD, OECD Secretariat, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
Carrer goals.pptx and their importance in real lifeartemacademy2
Career goals serve as a roadmap for individuals, guiding them toward achieving long-term professional aspirations and personal fulfillment. Establishing clear career goals enables professionals to focus their efforts on developing specific skills, gaining relevant experience, and making strategic decisions that align with their desired career trajectory. By setting both short-term and long-term objectives, individuals can systematically track their progress, make necessary adjustments, and stay motivated. Short-term goals often include acquiring new qualifications, mastering particular competencies, or securing a specific role, while long-term goals might encompass reaching executive positions, becoming industry experts, or launching entrepreneurial ventures.
Moreover, having well-defined career goals fosters a sense of purpose and direction, enhancing job satisfaction and overall productivity. It encourages continuous learning and adaptation, as professionals remain attuned to industry trends and evolving job market demands. Career goals also facilitate better time management and resource allocation, as individuals prioritize tasks and opportunities that advance their professional growth. In addition, articulating career goals can aid in networking and mentorship, as it allows individuals to communicate their aspirations clearly to potential mentors, colleagues, and employers, thereby opening doors to valuable guidance and support. Ultimately, career goals are integral to personal and professional development, driving individuals toward sustained success and fulfillment in their chosen fields.
2. 74 International Journal for Modern Trends in Science and Technology
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
Consider a set of n objects as X = {X1, X2,... Xn}.
Object Xi = {xi1, xi2,. . . , xim} to which a set of m
features is assigned. U is the membership matrix,
it is n × k binary matrix, where uip= 1 shows that
object i is belongs to cluster p and if it is not equal
to one then it is not belongs to cluster p. A set of k
vectors Z = {Z1, Z2,. . . ,Zk} shows the centroid of k
clusters. The basic kmeans depends on reduction
of the objective function [10].
(1)
Subject to
U and Z can be solved by reducing the objective
function.
Bisecting technique which is a hierarchical divisive
version of kmeans is given by Steinbach et al. [10].
In this technique at each step objects are divided
into two clusters, until K numbers of clusters are
formed. The problem occurred in this method is to
get the exact value of K [10].
2) No Wkmeans algorithms with cluster
separations between clusters:
Some validity indexes are used in the clustering
method [3] which consider both cluster
compactness within cluster and cluster
separations between clusters to get the exact value
of number of clusters that is K. Yang et al. and Wu
et al. [11] gives a fuzzy compactness and separation
(FCS) algorithms for between cluster separation
which find out the distances between the centers of
the cluster and the global center. In the presence of
noises and outliers, FCS performs well but the
drawback is that all the features are considered
equally in this process.
B. Vector Wkmeans algorithm
In the clustering process considering all features
similar is a primary problem of No Wkmeans type
algorithms then by adding weights to the features
this problem is solved [4].
1) Vector Wkmeans algorithm without cluster
separations between clusters:
Vector weighting algorithm is a Wkmeans where
weights are added to Kmeans. It is equation as,
(2)
Subject to
Where W is a weighting vector for the features.
A feature selection method called SYNCLUS was
introduced by De Sarboet al. [4], in which several
groups of features were created and then for each
group weights were assigned.
2) Vector Wkmeans algorithm with cluster
separations between clusters:
De Soete [5] proposed a technique to find optimal
weight for each feature by calculating distance
between all pairs of objects but disadvantage of this
approach is that large computational cost is
required and it is not useful to large dataset.
C. Matrix Wkmeans algorithm
This techniques group objects into clusters in
different subsets of features for different clusters. It
includes two tasks: 1) identification of the subsets
of features where clusters can be found and 2)
discovery of the clusters from different subsets of
features.
Matrix Wkmeans algorithm without cluster
separations between clusters:
Aggarwal et al. [13] proposed the PROjected
CLUStering (PROCLUS) algorithm which is able to
find a subset of features for each cluster. Using
PROCLUS, a user, needs to specify the average
number of cluster features.
AWA is a typical matrix weighting clustering
algorithm, which can be formulated as,
(3)
Subject to
Matrix Wkmeans algorithm with cluster
seperation between clusters:
Friedman and Meulman [14] proposed the
clustering objects on subsets of features algorithm
for matrix weighting clustering which involves the
calculation of the distances between all pairs of
objects at each iterative step. This results in a
high-computational complexity O(tn2m) where n,
m, and t is the number of objects, features, and
iteration.
D. Extensions of Kmeans
1) Extended Kmeans (E-Kmeans)
3. 75 International Journal for Modern Trends in Science and Technology
Volume: 2 | Issue: 07 | July 2016 | ISSN: 2455-3778IJMTST
Distance between center of cluster and an object
that is cluster compactness within cluster are
considered in Kmeans. In Ekmeans the concept of
global centroid is introduced to get between cluster
separations. It tries to reduce the distance between
center of the cluster and objects and also try to
maximize the distance between global centroid and
centroids of clusters. As shown in fig. 1, Z0 is the
global centroid and Z1, Z2 are the centers of cluster
1, cluster 2, respectively [1].
Fig. 1: Effect of between cluster separations
To integrate intracluster compactness and
intercluster separation, modification in the
objective function, as shown in (1), is done into,
(4)
Subject to
z0j is the jth feature of the global centroid z0 of a
data set. z0j can be calculated as
(5)
Limitations:
Kmeans does not weight the feature that is all the
features are treated equally.
2) Extension of Wkmeans (E-WKmeans)
All the objects are considered equal in basic
kmeans and Ekmeans. But objects may have
different weights. Weights are given to each object
in Wkmeans algorithm by considering within
cluster compactness. EWkmeans algorithm
considers the within cluster compactness and the
distances between the centers of all the clusters
and global center simultaneously while assigning
weights to the feature [1].
Let W ={w1,w2,...,wm} be the weights for m features
and β be a parameter for tuning weight wj, then
extend (2) into
(6)
Subject to
Advantages:
E-Wkmeans weights the features with a vector,
which means, each feature has a weight
representing the importance of the feature in the
entire data set.
Limitations:
The same feature in different clusters has the same
weight.
3) Extension of AWA (E-AWA)
If the same feature is present in different clusters
then the similar weight is assigned in Wkmeans
and E-Wkmeans. But in real world applications,
the same feature in different clusters has
dissimilar weights [1]. This problem is solved in
E-AWA by considering two methods that is cluster
compactness within cluster and cluster
separations between clusters.
Let W ={ W1,W2,...,Wk} be a weight matrix for k
clusters. Wp={ wp1,wp2,...,wpm} denotes the
feature weights in cluster p, then extend (3) into
(7)
Subject to
Advantages:
1. This method Consider both intracluster
compactness and intercluster separation.
2. This method is robust because it does not
introduce new parameter to balance intracluster
compactness and intercluster separation.
4. 76 International Journal for Modern Trends in Science and Technology
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
3. This method Produce better clustering results
than earlier
methods because it utilize more information than
traditional Kmeans type algorithm.
Limitations:
This method may produce errors when centroid of
certain cluster is very close to global centroid and
centroids among the clusters are not close to each
other.
III. PROPOSED SYSTEM
In a clustering process addition of the cluster
separations between cluster is main task of
proposed work. Many of the existing kmeans type
algorithms only uses the within cluster
compactness. On the other way, proposed work as
shown in Fig. 2 consider both the within cluster
compactness that is intracluster compactness and
the cluster separation between clusters.
Fig. 2: Architecture of proposed system
Proposed work gives the modified approach based
on FCM (Fuzzy C Mean) which consider the cluster
compactness within cluster and cluster separation
between clusters simultaneously.
IV. IMPLEMENTATION DETAILS
The set of an s dimensional data set is considered
as, X={x1,x2,…,xn}. Consider µ (x1)1…… µ (x)c are
fuzzy c-partitions µij = µi(Xj) represent the degree
that the data point xj belongs to cluster i. a1,…an
are the cluster centers. We can define a objective
function as below.
Where weighting factor m represents the degree
of fuzziness.
Proposed system modifies the fuzzy
compactness and separation algorithm based on
fuzzy scatter matrix. It adds penalized term to
the existing work as below.
Where parameter ŋi>= 1, It is known that the
JFCS=JFCM When ŋi= 0. System will consider the
following equation which minimizes the
objective functions.
And
,
Proposed algorithm for Fuzzy Compactness and
Separation
Input: Data set
Output: Clusters
1. Import Dataset.
2. Normalize dataset.
3. For all data points in dataset
4. Calculate value of Jfcs
5. Compute the value of µij for xj
6. If µij has good degree for cluster ai
7. Add xj to ai cluster.
8. Else
9. Check for other clusters centers
10. End for
11. Return clusters
Where,
Jfcs= Objective Function by using FCS
µij= Degree to which object xj belong to
cluster ai
Mathematical Model
Mathematical model of a proposed system is given
below.
Let S, be a system such that,
S={s, e, I, O, Fmain, Fs, DD, NDD, ɸ}
Where,
S- Proposed System
s- Initial State i.e. Preprocess Dataset.
PD is a function to preprocess dataset. Let
D be the input dataset.
PreD = PD (D)
PreD = Preprocessed Data
D= Input Data
e- End State i.e. Clustering.
CS is clustering function to which outputs
the clustered results.
5. 77 International Journal for Modern Trends in Science and Technology
Volume: 2 | Issue: 07 | July 2016 | ISSN: 2455-3778IJMTST
CD = CS (µij ,ai, xj)
CD = Clustered Data
I- Input of System i.e. Input Dataset.
O- Output of System i.e. Clustered Data.
Fmain- Main function resulting into outcome O. In
a given system it will be Fuzzy Compactness and
Separation (FCS) algorithm.
OF is objective function which outputs value of
objective function.
Jfcs = OF(µij , ai, xj)
Jfcs= objective function
µij = Degree by which data point xj belongs
to cluster ai
ai = cluster I
xj = Data point
Fs- Supportive function. In given system it will be
Fuzzy c-mean (FCM)
DD- Deterministic Data, i.e. data to be clustered is
deterministic.
NDD- Non Deterministic Data i.e. number of
clusters is Non Deterministic.
V. RESULTS AND DISCUSSION
A. Description of Dataset
To find out the performance of proposed
algorithm real life data set called Wisconsin
Diagnostic Breast Cancer (WDBC) is used.
Improvement in the Clustering results is done by
considering cluster separation between clusters.
Ten real-valued features are computed. These are
area, smoothness, radius, texture, perimeter,
compactness, concavity, concave points,
symmetry, and fractal dimension. Attributes used
are ID number and Diagnosis.
Fig. 3: WDBC Dataset
Performance of Metrics
The performance metric used in proposed
algorithm is accuracy. Accuracy gives the
percentage of the objects that are correctly
discovered in a result.
Experimental Result
Fig. 4: GUI of proposed system
Comparative Analysis
WDBC Dataset Existing
Kmeans
Proposed FCS
Malignant Centroid 1933.5765 1932.9265
Benign Centroid 1009.7533 1009.3565
Global Centroid 1471.6649 1471.1416
Detection Accuracy
(Malignant)
92.85 96.42
Detection Accuracy
(Benign)
86.20 92.59
Comparative Graph
The performance of proposed system by
comparing it with existing system is analyzed in
this section.
Fig 5: existing and proposed system object count
graph
6. 78 International Journal for Modern Trends in Science and Technology
A New Framework for Kmeans Algorithm by Combining the Dispersions of Clusters
Figure below shows clustering accuracy for
proposed system.
Fig 6: Accuracy graph for proposed system
Figure below shows scatter graph for proposed
system that shows malignant cluster and benign
cluster.
Fig 7: Scatter graph for proposed system
VI. CONCLUSION
A new framework for kmeans-type algorithms
which include both cluster compactness within
cluster and the cluster separations between
clusters are proposed. Three extensions of kmeans
type algorithms E-Kmeans, E-Wkmeans, E-AWA by
integrating both intracluster compactness and
intercluster separation are given. Kmeans type
algorithms are improved by integrating these
techniques and new objective functions based
Fuzzy Compactness and Separation is proposed.
The extending algorithms are able to produce
better clustering results in comparison to other
algorithms. Therefore Fuzzy Compactness and
Separation (FCS) deliver the best performance in
comparison to other algorithms.
REFERENCES
[1] Xiaohui Huang, Yunming Ye, and Haijun Zhang
“Extensions of Kmeans-Type Algorithms: A New
Clustering Framework by Integrating Intracluster
Compactness and Intercluster Separation”, IEEE
TRANSACTIONS ON NEURAL NETWORKS AND
LEARNING SYSTEMS, VOL. 25, NO. 8, AUGUST
2014.
[2] J. Han, M. Kamber, and J. Pei, Data Mining:
Concepts and Techniques. San Mateo, CA, USA:
Morgan Kaufmann, 2011.
[3] S. Das, A. Abraham, and A. Konar, “Automatic
clustering using an improved differential evolution
algorithm,” IEEE Trans. Syst., Man Cybern., A, Syst.
Humans, vol. 38, no. 1,pp. 218–237, Jan. 2008.
[4] W. De Sarbo, J. Carroll, L. Clark, and P. Green,
“Synthesized clustering: A method for amalgamating
alternative clustering bases with differential
weighting of variables,” Psychometrika, vol. 49, no.
1, pp. 57–78, 1984.
[5] G. De Soete, “Optimal variable weighting for
ultrametric and additive tree clustering,” Qual.
Quantity, vol. 20, no. 2, pp. 169–180, 1986.
[6] M. Al-Razgan and C. Domeniconi, “Weighted
clustering ensembles,” in Proc. SIAM Int.Conf. Data
Mining, 2006, pp. 258–269.
[7] X. Chen, Y. Ye, X. Xu, and J. Zhexue Huang, “A
feature group weighting method for subspace
clustering of high-dimensional data,” Pattern
Recognit., vol. 45, no. 1, pp. 434–446, 012.
[8] A. Jain, “Data clustering: 50 years beyond k-means,”
Pattern Recognit. Lett., vol. 31, no. 8, pp. 651–666,
2010.
[9] R. Xu and D. Wunsch, “Survey of clustering
algorithms,” IEEE Trans. Neural Netw., vol.16, no. 3,
pp. 645–678, May 2005.
[10]M. Steinbach, G. Karypis, and V. Kumar, “A
comparison of document clustering techniques,” in
Proc. KDD Workshop Text Mining, vol. 400. 2000,
pp. 525–526
[11] D. Pelleg and A. Moore, “X-means: Extending
k-means with efficient estimation of the number of
clusters,” in Proc. 17th Int. Conf. Mach. Learn., San
Francisco, CA, USA, 2000, pp. 727–734.
[12]K. Wu, J. Yu, and M. Yang, “A novel fuzzy clustering
algorithm based on a fuzzy scatter matrix with
optimality tests,” Pattern Recognit. Lett. vol. 26, no.
5, pp. 639–652, 2005.
[13]C. Aggarwal, J. Wolf, P. Yu, C. Procopiuc, and J.
Park, “Fast algorithms for Projected clustering,”
ACM SIGMOD Rec., vol. 28, no. 2, pp. 61–72, 1999.
[14]J. Friedman and J. Meulman, “Clustering objects on
subsets of attributes,” J. R. Statist. Soc., Ser. B, vol.
66, no. 4, pp. 815–849, 2004.