Introduction to Machine
Learning
(Unsupervised learning)
Dmytro Fishman (dmytro@ut.ee)
Unsupervised Learning
Tumour size
Age
2
0
4
6
5
0
1
3
3
3
1
5
8
3
8
6
7
2
4
2
3
9
1
3
8
0
8
2
4
1
3
1
8
2
0
4
6
5
4
7
8
2
?
We were given
annotated data
2
0
4
6
5
0
1
3
3
3
1
5
8
3
8
6
7
2
4
2
3
9
1
3
8
0
8
2
4
1
3
1
8
2
0
4
6
5
4
7
8
2
?
We were given
annotated data
Based on which we
could predict a class
of novel data point
2
0
4
6
5
0
1
3
3
3
1
5
8
3
8
6
7
2
4
2
3
9
1
3
8
0
8
2
4
1
3
1
8
2
0
4
6
5
4
7
8
2
?
We were given
annotated data
Based on which we
could predict a class
of novel data point
2
0
4
6
5
0
1
3
3
3
1
5
8
3
8
6
7
2
4
2
3
9
1
3
8
0
8
2
4
1
3
1
8
2
0
4
6
5
4
7
8
2
3
We were given
annotated data
Based on which we
could predict a class
of novel data point
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
We were given
annotated data
Based on which we
could predict a class
of novel data point
Now, imagine that they
have taken away all
these good things from
us.
Clustering
Customer 1
Customer 1
Customer 2
Customer 1
Customer 2
Customer 3
Customer 1
Customer 2
Customer 3
Customer 4
Customer 1
Customer 2
Customer 3
Customer 4
Which segments of
customers should the
store target?
Grouping data points so that
semantically similar points would
be clustered together
Jack Vilo et. al Data Mining: https://courses.cs.ut.ee/MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf
Grouping data points so that
semantically similar points would
be clustered together
Hierarchical clustering
Tumour size
Age
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
Remember NN?
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
Tumour size
Age
Usually similarity is defined by
euclidean or any other
distance measure1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
Distance between clusters can be
estimated with three strategies:
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
Distance between clusters can be
estimated with three strategies:
1. Single linkage (min)1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
Distance between clusters can be
estimated with three strategies:
1. Single linkage (min)
2. Complete linkage (max)
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
Distance between clusters can be
estimated with three strategies:
1. Single linkage (min)
2. Complete linkage (max)
3. Average linkage (avg)
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
K = 2
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
Age
K = 1
1.Let’s first assume that
all instances are
individual clusters
2.Find two most similar
instances and merge
them into one cluster
3.Repeat 2 until all
cluster merge into one
Tumour size
1.Let’s first assume that
all instances are
individual clusters
Age
2.Find two most similar
instances and merge
them into one cluster
Dendrogram
4.Hierarchical clustering
is usually visualised
using dendrogram
3.Repeat 2 until all
cluster merge into one
Tumour size
1.Let’s first assume that
all instances are
individual clusters
Age
2.Find two most similar
instances and merge
them into one cluster
Dendrogram
4.Hierarchical clustering
is usually visualised
using dendrogram
3.Repeat 2 until all
cluster merge into one
K = 2
Tumour size
1.Let’s first assume that
all instances are
individual clusters
Age
2.Find two most similar
instances and merge
them into one cluster
Dendrogram
4.Hierarchical clustering
is usually visualised
using dendrogram
3.Repeat 2 until all
cluster merge into one
K = 3
K = 2
K = 2
K-means clustering
K-means clustering
*although they do have something in
common with K-nearest neighbour, but
they are not the same.
Tumour size
Age
1.Choose K, the number
of potential clusters
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
2.Initialise cluster centers
randomly within the data
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
2.Initialise cluster centers
randomly within the data
3.Instances are clustered
to the nearest cluster
centre
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
2.Initialise cluster centers
randomly within the data
3.Instances are clustered
to the nearest cluster
centre
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
2.Initialise cluster centers
randomly within the data
3.Instances are clustered
to the nearest cluster
centre
4.Centroids of each of
the K clusters become
new cluster centers
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
2.Initialise cluster centers
randomly within the data
3.Instances are clustered
to the nearest cluster
centre
4.Centroids of each of
the K clusters become
new cluster centers
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
2.Initialise cluster centers
randomly within the data
3.Instances are clustered
to the nearest cluster
centre
4.Centroids of each of
the K clusters become
new cluster centers
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
2.Initialise cluster centers
randomly within the data
3.Instances are clustered
to the nearest cluster
centre
4.Centroids of each of
the K clusters become
new cluster centers
5.Steps 3/4 are repeated
until convergence
Tumour size
Age
1.Choose K, the number
of potential clusters
Let K be 2
2.Initialise cluster centers
randomly within the data
3.Instances are clustered
to the nearest cluster
centre
4.Centroids of each of
the K clusters become
new cluster centers
5.Steps 3/4 are repeated
until convergence
Hierarchical K-means
Tumour size
Age
Tumour size
Age
Hierarchical K-means
Tumour size
Age
Tumour size
Age
😈👼
Tumour size
Age
Tumour size
Age
Hierarchical K-means
Slow for modern size datasets
Hard to predict K
Good for visualisation purposes
to predict KTwo body parts methods
to predict KTwo body parts methods
The rule of thumb is to
choose as K
p
n/2
to predict KTwo body parts methods
The rule of thumb is to
choose as K
p
n/2
Elbow method: increase
K until it does not help to
describe data better
References
• Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine-
learning)
• Introduction to Machine Learning by Pascal Vincent given at Deep Learning
Summer School, Montreal 2015 (http://videolectures.net/
deeplearning2015_vincent_machine_learning/)
• Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP
Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf)
• Stanford CS class: Convolutional Neural Networks for Visual Recognition by
Andrej Karpathy (http://cs231n.github.io/)
• Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/
MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf)
• Machine Learning Essential Conepts by Ilya Kuzovkin (https://
www.slideshare.net/iljakuzovkin)
• From the brain to deep learning and back by Raul Vicente Zafra and Ilya
Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)
www.biit.cs.ut.ee www.ut.ee www.quretec.ee

3 Unsupervised learning