5. Types of Clustering Algorithms in ML.pdf

Part 4 Clustering
Dr. Jyoti Yadav

Introduction to Clustering
• Clustering is similar to classification, but the basis is different.
• In Clustering you don’t know what you are looking for, and you are trying
to identify some segments or clusters in your data. When you use
clustering algorithms on your dataset, unexpected things can suddenly
pop up like structures, clusters and groupings you would have never
thought of otherwise.
• Clustering Models
K-Means Clustering
Hierarchical Clustering

Distribution Model- Based Clustering

Fuzzy Clustering
• Fuzzy clustering is a type of soft method in which a data object
may belong to more than one group or cluster.
• Each dataset has a set of membership coefficients, which depend
on the degree of membership to be in a cluster.
• Fuzzy C-means algorithm is the example of this type of
clustering; it is sometimes also known as the Fuzzy k-means
algorithm.

Clustering Algorithms
• K-Means Algorithm
• Mean-Shift Algorithm
• DBSCAN Algorithm
• Expectation-Maximization Clustering using GMM
• Agglomerative Hierarchical Algorithm
• Affinity Propogation

Applications of Clustering
• Search Result Grouping
• Customer Segmentation
• Medical Imaging
• Anomaly Detection
• Market Segmentation
• Social Network Analysis
• Recommendation Engines

What if we have a bad random initialization?

Choosing the right number of clusters

How to select optimal number of clusters?

WCSS Code
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++',
random_state = 42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
#inertia is a library to compute WCSS
plt.plot(range(1, 11), wcss) # from 1 to 10 clusters
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

For the mall.csv file the WCSS graph shows
the k=5 clusters

IMP Code for K-means
kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)

IMP Code for K-means
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Careful')
# cluster1, column 1 # cluster 1, column 2
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Standard')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Target')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Careless')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label =
'Sensible')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c =
'yellow', label = 'Centroids')
plt.title('Clusters of Clients')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()

2 Types of Hierarchical Clustering
•Agglomerative
•Divisive

Agglomerative Hierarchical Cluster

Hierarchical Clustering using a Dendrogram

Using Dendrogram to form 2 clusters

How to find #optimal clusters?

How many clusters can be formed from this Dendrogram?

Find the number of Clusters from the Dendrogram

Using the dendrogram to find the optimal number of clusters
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method =
'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()

Fitting Hierarchical Clustering to the dataset
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward')
y_hc = hc.fit_predict(X)

Output of Hierarchical Clustering

5. Types of Clustering Algorithms in ML.pdf

5. Types of Clustering Algorithms in ML.pdf

More Related Content

What's hot

Similar to 5. Types of Clustering Algorithms in ML.pdf

More from Jyoti Yadav

Recently uploaded

5. Types of Clustering Algorithms in ML.pdf