Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Types of clustering and different types of clustering algorithms

10,163 views

Published on

Types of clustering and different types of clustering algorithms

Published in: Engineering
  • Be the first to comment

Types of clustering and different types of clustering algorithms

  1. 1. Types of clustering: Clustering can be divided into different categories based on different criteria • 1.Hard clustering: A given data point in n-dimensional space only belongs to one cluster. This is also known as exclusive clustering. The K-Means clustering mechanism is an example of hard clustering. • 2.Soft clustering: A given data point can belong to more than one cluster in soft clustering. This is also known as overlapping clustering. The Fuzzy K-Means algorithm is a good example of soft clustering. • 3.Hierarchial clustering: In hierarchical clustering, a hierarchy of clusters is built using the top-down (divisive) or bottom-up (agglomerative) approach. • 4. Flat clustering: Is a simple technique where no hierarchy is present. • 5.Model-based clustering: In model-based clustering, data is modeled using a standard statistical model to work with different distributions. The idea is to find a model that best fits the data.
  2. 2. Different clustering algorithms • Fuzzy K-Means: • The K-Means algorithm is for hard clustering. In hard clustering, one data point belongs only to one cluster. However, there can be situations where one point belongs to more than one cluster. For example, a news article may belong to both the Technology and Current Affairs categories. In that case, we need a soft clustering mechanism. • The Fuzzy K-Means algorithm implements soft clustering. It generates overlapping clusters. Each point has a probability of belonging to each cluster, based on the distance from each centroid. • In this example, we apply the Fuzzy K-Means algorithm for dataset(22 80 ,25 75 ,28 85 ,55 150,50 145 ,53 153 ,38 115 ) The outcome of the example is given in the following figure. Note that the newly added data point (someone who had medium weight and height) belongs to cluster 3 in 0.52 probability and to cluster 1 in 0.47 probability, whereas other data points (people who are either large or small) belongs to nearly 0.9 to a particular cluster.
  3. 3. Streaming K-Means • If the volume of data is too large to be stored in the main memory available, the K-Means algorithm is not suitable, as it's batch processing mechanism iterates over all the data points. Also, the K-Means algorithm is sensitive to the noise and outliers in data. • Streaming K-Means algorithms has provided a solution for these problems by operating in two steps, as follows: • The streaming step • The ball K-Means step • The idea is to read data points sequentially, storing very few data points in memory. • Then, after the first step, a better representative set of weighted data points is produced for further processing. • The final K number of clusters is produced in the ball K-Means step. During the second step, potential outliers are eliminated.
  4. 4. Spectral clustering • The spectral clustering algorithm is helpful in hard, nonconvex clustering problems. It clusters points using the eigenvectors of matrices derived from data.
  5. 5. Dirichlet clustering • The Fuzzy K-Means and K-Means algorithms model clusters as spheres (circles in n-dimensional space.) K-Means assumes a common fixed variance. Further, K-Means does not model the data point distribution. • A normal data distribution should be there for the K-Means and Fuzzy K-Means algorithms to process effectively. If the data distribution is different, for example, an asymmetrical normal distribution (different standard deviations), the K-Means algorithm will not perform well and will not give good results. • Dirichlet clustering can be applied to model different data distributions (data points that are not in normal distribution) effectively. Dirichlet clustering fits a model over a dataset and tunes parameters to adjust the model's parameters to correctly fit the data. This approach is suitable to address the hierarchical-clustering problem.

×