Types of clustering and different types of clustering algorithms
Types of clustering:
Clustering can be divided into different categories based on different criteria
• 1.Hard clustering: A given data point in n-dimensional space only belongs to one cluster. This is also known as exclusive
clustering. The K-Means clustering mechanism is an example of hard clustering.
• 2.Soft clustering: A given data point can belong to more than one cluster in soft clustering. This is also known as overlapping
clustering. The Fuzzy K-Means algorithm is a good example of soft clustering.
• 3.Hierarchial clustering: In hierarchical clustering, a hierarchy of clusters is built using the top-down (divisive) or bottom-up
• 4. Flat clustering: Is a simple technique where no hierarchy is present.
• 5.Model-based clustering: In model-based clustering, data is modeled using a standard statistical model to work with different
distributions. The idea is to find a model that best fits the data.
Different clustering algorithms
• Fuzzy K-Means:
• The K-Means algorithm is for hard clustering. In hard clustering, one data point belongs only to one cluster. However,
there can be situations where one point belongs to more than one cluster. For example, a news article may belong to
both the Technology and Current Affairs categories. In that case, we need a soft clustering mechanism.
• The Fuzzy K-Means algorithm implements soft clustering. It generates overlapping clusters. Each point has a probability of
belonging to each cluster, based on the distance from each centroid.
• In this example, we apply the Fuzzy K-Means algorithm for dataset(22 80 ,25 75 ,28 85 ,55 150,50 145 ,53 153 ,38 115 )
The outcome of the example is given in the following figure. Note that the newly added data point (someone
who had medium weight and height) belongs to cluster 3 in 0.52 probability and to cluster 1 in 0.47
probability, whereas other data points (people who are either large or small) belongs to nearly 0.9 to a
• If the volume of data is too large to be stored in the main memory available, the K-Means algorithm is not suitable, as it's
batch processing mechanism iterates over all the data points. Also, the K-Means algorithm is sensitive to the noise and outliers
• Streaming K-Means algorithms has provided a solution for these problems by operating in two steps, as follows:
• The streaming step
• The ball K-Means step
• The idea is to read data points sequentially, storing very few data points in memory.
• Then, after the first step, a better representative set of weighted data points is produced for further processing.
• The final K number of clusters is produced in the ball K-Means step. During the second step, potential outliers are eliminated.
• The spectral clustering algorithm is helpful in hard, nonconvex clustering problems. It clusters points using
the eigenvectors of matrices derived from data.
• The Fuzzy K-Means and K-Means algorithms model clusters as spheres (circles in n-dimensional space.) K-Means assumes a
common fixed variance. Further, K-Means does not model the data point distribution.
• A normal data distribution should be there for the K-Means and Fuzzy K-Means algorithms to process effectively. If the data
distribution is different, for example, an asymmetrical normal distribution (different standard deviations), the K-Means
algorithm will not perform well and will not give good results.
• Dirichlet clustering can be applied to model different data distributions (data points that are not in normal distribution)
effectively. Dirichlet clustering fits a model over a dataset and tunes parameters to adjust the model's parameters to correctly
fit the data. This approach is suitable to address the hierarchical-clustering problem.