This document discusses different methods of cluster analysis. Cluster analysis is a statistical technique that groups similar objects together into clusters. There are several categories of clustering methods, including partitioning, hierarchical, density-based, grid-based, model-based, and constraint-based. The partitioning method divides data into a set number of partitions or clusters, while hierarchical methods create hierarchical groupings by either merging or dividing clusters. Density-based clustering focuses on grouping areas of high density, and grid-based clustering quantizes space into a grid for faster processing. Model-based clustering fits data to hypothesized models, and constraint-based clustering incorporates user-defined constraints.
2. a group of similar things that are close together, sometimes surrounding
something
to form a group, sometimes by surrounding something, or to make something do
this
3. Cluster analysis is a statistical classification technique in which a set of objects or
points with similar characteristics are grouped together in clusters. It
encompasses a number of different algorithms and methods that are all used for
grouping objects of similar kinds into respective categories. The aim of cluster
analysis is to organize observed data into meaningful structures in order to gain
further insight from them.
4. Cluster analysis was originated in anthropology by Driver and Kroeber in 1932
and introduced to psychology by Joseph Zubin in 1938 and Robert Tryon in 1939
and famously used by Cattell beginning in 1943 for trait theory classification in
personality psychology.
5. The clustering methods can be classified into the following categories:
Partitioning Method
Hierarchical Method
Density-based Method
Grid-Based Method
Model-Based Method
Constraint-based Method
6. It is used to make partitions on the data in order to
form clusters. If “n” partitions are done on “p” objects
of the database then each partition is represented by a
cluster and n < p. The two conditions which need to be
satisfied with this Partitioning Clustering Method are:
• One objective should only belong to only one
group.
• There should be no group without even a single
purpose.
In the partitioning method, there is one technique
called iterative relocation, which means the object will
be moved from one group to another to improve the
partitioning
7. In this method, a hierarchical decomposition of the given set of data objects is
created. We can classify hierarchical methods and will be able to know the
purpose of classification on the basis of how the hierarchical decomposition is
formed. There are two types of approaches for the creation of hierarchical
decomposition, they are:
Agglomerative Approach: The agglomerative approach is also known as the bottom-up
approach. Initially, the given data is divided into which objects form separate groups.
Thereafter it keeps on merging the objects or the groups that are close to one another
which means that they exhibit similar properties. This merging process continues until
the termination condition holds.
8. Divisive Approach: The divisive approach is also known as the top-down approach. In
this approach, we would start with the data objects that are in the same cluster. The
group of individual clusters is divided into small clusters by continuous iteration. The
iteration continues until the condition of termination is met or until each cluster
contains one object.
Once the group is split or merged then it can never be undone as it is a rigid method
and is not so flexible. The two approaches which can be used to improve the
Hierarchical Clustering Quality in Data Mining are: –
One should carefully analyse the linkages of the object at every partitioning of hierarchical clustering.
One can use a hierarchical agglomerative algorithm for the integration of hierarchical agglomeration.
In this approach, first, the objects are grouped into micro-clusters. After grouping data objects into
micro clusters, macro clustering is performed on the micro cluster.
9. The density-based method mainly focuses on density. In this method, the given
cluster will keep on growing continuously as long as the density in the
neighbourhood exceeds some threshold, i.e, for each data point within a given
cluster. The radius of a given cluster has to contain at least a minimum number of
points.
10. In the Grid-Based method a grid is formed using the object together,i.e, the object
space is quantized into a finite number of cells that form a grid structure. One of
the major advantages of the grid-based method is fast processing time and it is
dependent only on the number of cells in each dimension in the quantized space.
The processing time for this method is much faster so it can save time.
11. In the model-based method, all the clusters are hypothesized in order to find the
data which is best suited for the model. The clustering of the density function is
used to locate the clusters for a given model. It reflects the spatial distribution of
data points and also provides a way to automatically determine the number of
clusters based on standard statistics, taking outlier or noise into account.
Therefore it yields robust clustering methods.
12. The constraint-based clustering method is performed by the incorporation of
application or user-oriented constraints. A constraint refers to the user
expectation or the properties of the desired clustering results. Constraints
provide us with an interactive way of communication with the clustering process.
The user or the application requirement can specify constraints.