Hierarchical Clustering With KSAI

Hierarchical
clustering with
KSAI
By :-
Nitin Aggarwal
Software Consultant
KNOLDUS INC

Agenda
● What is KSAI ?
● Why KSAI ?
● What is Hierarchical Clustering ?
● How Hierarchical Clustering works?
● Simple Example
● Quick Demo

What is KSAI
● KSAI is a machine learning library which contains various algorithms
such as classification, regression, clustering and many others.
● It is an attempt to build machine learning algorithms with the language
Scala.
● The library Breeze, which is again built on scala is getting used for
doing the mathematical functionalities.

And to answer that, one word is enough!!!

● KSAI mainly used Scala’s in built case classes, Future and some of
the other cool features.
● It has also used Akka in some places and tried doing things in a
asynchronous fashion.
POWER OF KSAI

Right now KSAI might not be that easy to use the library with limited
documentation, however the committers will update them in the
near future.

What is Hierarchical
Clustering?

Hierarchical Clustering
● Hierarchical clustering (also called hierarchical cluster analysis or HCA) is
a method of cluster analysis which seeks to build a hierarchy of clusters.
● Hierarchical clustering, is an algorithm that groups similar objects into
groups called clusters.
● The endpoint is a set of clusters, where each cluster is distinct from each
other cluster, and the objects within each cluster are broadly similar to
each other.

● Agglomerative: This is a "bottom-up" approach: each observation
starts in its own cluster, and pairs of clusters are merged as one
moves up the hierarchy.
● Divisive: This is a "top-down" approach: all observations start in one
cluster, and splits are performed recursively as one moves down the
hierarchy.
Strategies

Required Data
Hierarchical clustering can be performed with either a distance matrix or raw
data.
When raw data is provided, the software will automatically compute a
distance matrix in the background.

Distance Matrix
● A distance matrix is a matrix of distance between objects.
● It will be symmetric (because the distance between x and y is the same
as the distance between y and x) and will have zeroes on the diagonal
(because every item is distance zero from itself).
● The table below is an example of a distance matrix. Only the lower
triangle is shown, because the upper triangle can be filled in by
reflection.

How to build Distance Matrix
● The choice of distance metric should be made based on theoretical
concerns from the domain of study.
● That is, a distance metric needs to define similarity in a way that is
sensible for the field of study.
● For example, if clustering crime sites in a city, city block distance may
be appropriate

How hierarchical clustering
works ?

Simple Working
Hierarchical clustering starts by treating each observation as a separate
cluster. Then, it repeatedly executes the following two steps:
(1) identify the two clusters that are closest together, and
(2) merge the two most similar clusters. This continues until all the clusters
are merged together.

Result(Dendrogram)
The main output of Hierarchical Clustering is a dendrogram, which shows
the hierarchical relationship between the clusters.
Where…
● A dendrogram is a diagram that shows the hierarchical relationship
between objects.
● The main use of a dendrogram is to work out the best way to allocate
objects to clusters

Cluster dissimilarity
● In order to decide which clusters should be combined (for
agglomerative),a measure of dissimilarity between sets of observations
is required.
● In most methods of hierarchical clustering, this is achieved by use of
an appropriate metric (a measure of distance between pairs of
observations), and a linkage criterion which specifies the dissimilarity
of sets as a function of the pairwise distances of observations in the
sets.

Linkage Criteria
After selecting a distance metric, it is necessary to determine from where
distance is computed and how the merging on clusters will take place.
For that we have various Linkage criterias.

Single Linkage
In single-link clustering (also called the connectedness or minimum
method), we consider the distance between one cluster and another cluster
to be equal to the shortest distance from any member of one cluster to any
member of the other cluster.

Complete Linkage
In complete-link clustering (also called the diameter or maximum method),
we consider the distance between one cluster and another cluster to be
equal to the longest distance from any member of one cluster to any
member of the other cluster.

Example
Now let's start clustering.
The smallest distance is between three and five and they get linked up or
merged first into a the cluster '35'.
● To obtain the new distance matrix, we need to remove the 3 and 5
entries, and replace it by an entry "35" .
● Since we are using complete linkage clustering, the distance between
"35" and every other item is the maximum of the distance between this
item and 3 and this item and 5. For example, d(1,3)= 3 and d(1,5)=11.
So, D(1,"35")=11. This gives us the new distance matrix.

This gives us the new distance matrix. The items with the smallest distance
get clustered next. This will be 2 and 4.
Continuing in this way, after 6 steps, everything is clustered.

Dendrogram
● On this plot, the y-axis shows the
distance between the objects at
the time they were clustered.
● This is called the cluster height.
● Different visualizations use
different measures of cluster
height.

Dendrogram
● This single linkage dendrogram for
the same distance matrix.
● It starts with cluster "35" but the
distance between "35" and each
item is now the minimum of d(x,3)
and d(x,5). So c(1,"35")=3.

Determining clusters
● One of the problems with hierarchical clustering is that there is no
objective way to say how many clusters there are.
● If we cut the single linkage tree at the point shown below, we would say
that there are two clusters.

However, if we cut the tree lower we might say that there is one cluster and
two singletons.

KSAI Link
https://github.com/KnoldusLabs/KSAI

Hierarchical Clustering With KSAI

Hierarchical Clustering With KSAI

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hierarchical Clustering With KSAI

Similar to Hierarchical Clustering With KSAI (20)

More from Knoldus Inc.

More from Knoldus Inc. (20)

Recently uploaded

Recently uploaded (20)

Hierarchical Clustering With KSAI

Editor's Notes