Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Clustering in Data Mining by Archana Swaminathan 24798 views
- 12 งานนำสนอ cluster analysis by khuwawa2513 19269 views
- cluster analysis by รุ่งทิวา ปุณะตุง 5910 views
- Cluster analysis by saba khan 15772 views
- Clustering: A Survey by Raffaele Capaldo 16911 views
- Data Mining: clustering and analysis by DataminingTools Inc 22964 views

10,163 views

Published on

Types of clustering and different types of clustering algorithms

Published in:
Engineering

No Downloads

Total views

10,163

On SlideShare

0

From Embeds

0

Number of Embeds

10

Shares

0

Downloads

147

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Types of clustering: Clustering can be divided into different categories based on different criteria • 1.Hard clustering: A given data point in n-dimensional space only belongs to one cluster. This is also known as exclusive clustering. The K-Means clustering mechanism is an example of hard clustering. • 2.Soft clustering: A given data point can belong to more than one cluster in soft clustering. This is also known as overlapping clustering. The Fuzzy K-Means algorithm is a good example of soft clustering. • 3.Hierarchial clustering: In hierarchical clustering, a hierarchy of clusters is built using the top-down (divisive) or bottom-up (agglomerative) approach. • 4. Flat clustering: Is a simple technique where no hierarchy is present. • 5.Model-based clustering: In model-based clustering, data is modeled using a standard statistical model to work with different distributions. The idea is to find a model that best fits the data.
- 2. Different clustering algorithms • Fuzzy K-Means: • The K-Means algorithm is for hard clustering. In hard clustering, one data point belongs only to one cluster. However, there can be situations where one point belongs to more than one cluster. For example, a news article may belong to both the Technology and Current Affairs categories. In that case, we need a soft clustering mechanism. • The Fuzzy K-Means algorithm implements soft clustering. It generates overlapping clusters. Each point has a probability of belonging to each cluster, based on the distance from each centroid. • In this example, we apply the Fuzzy K-Means algorithm for dataset(22 80 ,25 75 ,28 85 ,55 150,50 145 ,53 153 ,38 115 ) The outcome of the example is given in the following figure. Note that the newly added data point (someone who had medium weight and height) belongs to cluster 3 in 0.52 probability and to cluster 1 in 0.47 probability, whereas other data points (people who are either large or small) belongs to nearly 0.9 to a particular cluster.
- 3. Streaming K-Means • If the volume of data is too large to be stored in the main memory available, the K-Means algorithm is not suitable, as it's batch processing mechanism iterates over all the data points. Also, the K-Means algorithm is sensitive to the noise and outliers in data. • Streaming K-Means algorithms has provided a solution for these problems by operating in two steps, as follows: • The streaming step • The ball K-Means step • The idea is to read data points sequentially, storing very few data points in memory. • Then, after the first step, a better representative set of weighted data points is produced for further processing. • The final K number of clusters is produced in the ball K-Means step. During the second step, potential outliers are eliminated.
- 4. Spectral clustering • The spectral clustering algorithm is helpful in hard, nonconvex clustering problems. It clusters points using the eigenvectors of matrices derived from data.
- 5. Dirichlet clustering • The Fuzzy K-Means and K-Means algorithms model clusters as spheres (circles in n-dimensional space.) K-Means assumes a common fixed variance. Further, K-Means does not model the data point distribution. • A normal data distribution should be there for the K-Means and Fuzzy K-Means algorithms to process effectively. If the data distribution is different, for example, an asymmetrical normal distribution (different standard deviations), the K-Means algorithm will not perform well and will not give good results. • Dirichlet clustering can be applied to model different data distributions (data points that are not in normal distribution) effectively. Dirichlet clustering fits a model over a dataset and tunes parameters to adjust the model's parameters to correctly fit the data. This approach is suitable to address the hierarchical-clustering problem.

No public clipboards found for this slide

Be the first to comment