Unsupervised Learning
DR. MAJID ALI KHAN
Unsupervised Learning
- Learning in the absence of labelled data
◦ Dimensionality Reduction
◦ Clustering
◦ Anamoly Detection
Clustering
Clustering: It is the task of identifying similar instances and assigning
them to clusters, i.e. create groups of similar instances.
Just like in classification, each instance gets assigned to a group.
However, unlike classification, clustering is an unsupervised task.
Clustering vs. Classification
K-Means
Clustering
The K-Means algorithm is a simple algorithm capable of clustering this kind of
dataset very quickly and efficiently, often in just a few iterations. It was proposed
by Stuart Lloyd at Bell Labs in 1957
K-Means in Scikit
K-Means Decision Boundaries
How Kmeans
works?
KMeans can give suboptimal solutions!!!
K-Means++
K-Means++ introduced a smarter initialization step that tends to select
centroids that are distant from one another, and this improvement makes
the K-Means algorithm much less likely to converge to a suboptimal solution.
- Default implementation of K-Means implements K-Means++ unless you
override the default init parameters with init=“random”
- Default implementation also uses n_init = 10, which means it’s going to run
K-Means ten times with different random initializations and choose the best
model
Choosing the right k?
Kmeans
kmeans.labels_
kmeans.cluster_centers_
kmeans.predict(X_new)
kmeans.inertia_
kmeans.score(X)

Chapter09 Unsupervised Learning Testing Cases