Course: Machine Vision
Recognition (2)
Session 11
D5627 – I Gede Putra Kusuma Negara, B.Eng., PhD
Outline
• Unsupervised Learning
• Clustering
• K-Means Clustering
• Agglomerative Hierarchical Clustering
Unsupervised Learning
Unsupervised Learning
• Trying to find hidden structure in unlabeled data. Since the examples
given to the learner are unlabeled, there is no error or reward signal
to evaluate a potential solution
• This distinguishes unsupervised learning from supervised learning
Unsupervised Learning
• Unsupervised learning is closely related to the problem of density
estimation in statistics
• However unsupervised learning also encompasses many other
techniques that seek to summarize and explain key features of the
data
• Approaches to unsupervised learning include:
– Clustering (e.g., k-means, mixture models, hierarchical clustering)
– Hidden Markov models
– Blind signal separation using feature extraction techniques for dimensionality
reduction:
• principal component analysis (PCA)
• independent component analysis (ICA)
• non-negative matrix factorization
• singular value decomposition (SVD)
Unsupervised Learning
Unsupervised procedures are used for several reasons:
• Collecting and labeling a large set of sample patterns can be costly or
may not be feasible.
• One can train with large amount of unlabeled data, and then use
supervision to label the groupings found.
• Unsupervised methods can be used for feature extraction.
• Exploratory data analysis can provide insight into the nature or
structure of the data.
Clustering
Clusters
• A cluster is comprised of a
number of similar objects
collected or grouped together.
• Patterns within a cluster are
more similar to each other than
are patterns in different clusters.
• Clusters may be described as
connected regions of a multi-
dimensional space containing a
relatively high density of points,
separated from other such
regions by a region containing a
relatively low density of points.
Clustering
• Clustering is an unsupervised procedure that uses unlabeled samples
• Clustering is a very difficult problem because data can reveal clusters
with different shapes and sizes
• What makes two points/images/patches similar?
Then umber of clusters in the data often depend on the resolution (fine vs. coarse)
with which we view the data. How many clusters do you see in this figure? 5, 8, 10,
more?
Why Clustering?
• Summarizing data
– Look at large amounts of data
– Patch-based compression or denoising
– Represent a large continuous vector with the cluster number
• Counting
– Histograms of texture, color, SIFT vectors
• Segmentation
– Separate the image into different regions
• Prediction
– Images in the same cluster may have the same labels
Image Segmentation
(example)
Image Segmentation
(example)
Clustering Algorithms
• Most of clustering algorithms are based on the following
two popular techniques:
– Iterative squared-error partitioning,
– Agglomerative hierarchical clustering.
• One of the main challenges is to select an appropriate
measure of similarity to define clusters that is often both
data (cluster shape) and context dependent.
Squared-error Partitioning
• Suppose that the given set of n patterns has somehow been
partitioned into k clusters D1, . . . , Dk.
• Let ni be the number of samples in Di and let mi be the mean of those
samples
• Then, the sum-of-squared errors is defined by
• For a given cluster Di, the mean vector mi (centroid) is the best
representative of the samples in Di.
mi =
1
ni
x
xÎDi
å
Je = x - mi
2
xÎDi
å
i-1
k
å
K-Means Clustering
A general algorithm for iterative squared-error partitioning:
1. Select an initial partition with k clusters.
Repeat steps 2 through 5 until the cluster membership stabilizes.
2. Generate a new partition by assigning each pattern to its closest cluster
center.
3. Compute new cluster centers as the centroids of the clusters.
4. Repeat steps 2 and 3 until an optimum value of the criterion function is found
(e.g., when a local minimum is found or a predefined number of iterations are
completed).
5. Adjust the number of clusters by merging and splitting existing clusters or by
removing small or outlier clusters.
This algorithm, without step 5, is also known as the k-means clustering.
K-Means Clustering (demo)
Number of samples = 30
Number of cluster = 3
K-Means Clustering
• k-means is computationally efficient and gives good results if the
clusters are compact, hyper-spherical in shape and well-separated in
the feature space.
• However, choosing k and choosing the initial partition are the main
drawbacks of this algorithm.
• The value of k is often chosen empirically or by prior knowledge
about the data.
• The initial partition is often chosen by generating k random points
uniformly distributed within the range of the data, or by randomly
selecting k points from the data.
K-Means Clustering
• The k-means algorithm produces a flat data description
where the clusters are disjoint and are at the same level.
• In some applications, groups of patterns share some
characteristics when looked at a particular level.
• Hierarchical clustering tries to capture these multi-level
groupings using hierarchical representations rather than flat
partitions.
Hierarchical Clustering
• In hierarchical clustering, for a set of n samples,
– the first level consists of n clusters (each cluster containing exactly
one sample),
– the second level contains n − 1 clusters,
– the third level contains n − 2 clusters,
– and so on until the last (n-th) level at which all samples form a
single cluster.
• Given any two samples, at some level they will be grouped
together in the same cluster and remain together at all
higher levels.
Hierarchical Clustering
• A natural representation of hierarchical clustering is a tree, also called
a dendrogram, which shows how the samples are grouped.
• If there is an unusually large gap between the similarity values for two
particular levels, one can argue that the level with fewer number of
clusters represents a more natural grouping.
• A dendrogram can represent the results of hierarchical clustering
algorithms.
Agglomerative
Hierarchical Clustering
1. Specify the number of clusters. Place every pattern in a unique
cluster and repeat steps 2 and 3 until a partition with the required
number of clusters is obtained.
2. Find the closest clusters according to a distance measure.
3. Merge these two clusters.
4. Return the resulting clusters.
Popular Distance Measures
• Popular distance measures (for two clusters Di and Dj):
dmin (Di, Dj ) = min
xÎDi
xÎDj
x - x,
dmax (Di, Dj ) = max
xÎDi
xÎDj
x - x,
davg (Di, Dj ) =
1
# DiDj
x - x,
x'
ÎDj
å
xÎDi
å
dmean (Di, Dj ) = mi - mj
Hierarchical Clustering
(demo)
Number of cluster = 10
Number of cluster = 9
Number of cluster = 8
Number of cluster = 7
Number of cluster = 6
Number of cluster = 5
Number of cluster = 4
Number of cluster = 3
Number of cluster = 2
Number of cluster = 1
Texture Clustering (example)
• We are going to cluster texture image into 5 clusters
• Image source: 40 texture images from MIT VisTex database
– Number of classes: 5 (brick, fabric, grass, sand, stone)
• Features:
– Gray-level co-occurrence matrix (GLCM)
– Fourier power spectrum (#ring: 4, #sector:8)
• Clustering method:
– K-means clustering
Texture Images
Acknowledgment
Some of slides in this PowerPoint presentation are adaptation from
various slides, many thanks to:
1. Selim Aksoy, Department of Computer Engineering, Bilkent
University (http://www.cs.bilkent.edu.tr/~saksoy/)
2. James Hays, Computer Science Department, Brown University,
(http://cs.brown.edu/~hays/)
Thank You

PPT s10-machine vision-s2

  • 1.
    Course: Machine Vision Recognition(2) Session 11 D5627 – I Gede Putra Kusuma Negara, B.Eng., PhD
  • 2.
    Outline • Unsupervised Learning •Clustering • K-Means Clustering • Agglomerative Hierarchical Clustering
  • 3.
  • 4.
    Unsupervised Learning • Tryingto find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution • This distinguishes unsupervised learning from supervised learning
  • 5.
    Unsupervised Learning • Unsupervisedlearning is closely related to the problem of density estimation in statistics • However unsupervised learning also encompasses many other techniques that seek to summarize and explain key features of the data • Approaches to unsupervised learning include: – Clustering (e.g., k-means, mixture models, hierarchical clustering) – Hidden Markov models – Blind signal separation using feature extraction techniques for dimensionality reduction: • principal component analysis (PCA) • independent component analysis (ICA) • non-negative matrix factorization • singular value decomposition (SVD)
  • 6.
    Unsupervised Learning Unsupervised proceduresare used for several reasons: • Collecting and labeling a large set of sample patterns can be costly or may not be feasible. • One can train with large amount of unlabeled data, and then use supervision to label the groupings found. • Unsupervised methods can be used for feature extraction. • Exploratory data analysis can provide insight into the nature or structure of the data.
  • 7.
  • 8.
    Clusters • A clusteris comprised of a number of similar objects collected or grouped together. • Patterns within a cluster are more similar to each other than are patterns in different clusters. • Clusters may be described as connected regions of a multi- dimensional space containing a relatively high density of points, separated from other such regions by a region containing a relatively low density of points.
  • 9.
    Clustering • Clustering isan unsupervised procedure that uses unlabeled samples • Clustering is a very difficult problem because data can reveal clusters with different shapes and sizes • What makes two points/images/patches similar? Then umber of clusters in the data often depend on the resolution (fine vs. coarse) with which we view the data. How many clusters do you see in this figure? 5, 8, 10, more?
  • 10.
    Why Clustering? • Summarizingdata – Look at large amounts of data – Patch-based compression or denoising – Represent a large continuous vector with the cluster number • Counting – Histograms of texture, color, SIFT vectors • Segmentation – Separate the image into different regions • Prediction – Images in the same cluster may have the same labels
  • 11.
  • 12.
  • 13.
    Clustering Algorithms • Mostof clustering algorithms are based on the following two popular techniques: – Iterative squared-error partitioning, – Agglomerative hierarchical clustering. • One of the main challenges is to select an appropriate measure of similarity to define clusters that is often both data (cluster shape) and context dependent.
  • 14.
    Squared-error Partitioning • Supposethat the given set of n patterns has somehow been partitioned into k clusters D1, . . . , Dk. • Let ni be the number of samples in Di and let mi be the mean of those samples • Then, the sum-of-squared errors is defined by • For a given cluster Di, the mean vector mi (centroid) is the best representative of the samples in Di. mi = 1 ni x xÎDi å Je = x - mi 2 xÎDi å i-1 k å
  • 15.
    K-Means Clustering A generalalgorithm for iterative squared-error partitioning: 1. Select an initial partition with k clusters. Repeat steps 2 through 5 until the cluster membership stabilizes. 2. Generate a new partition by assigning each pattern to its closest cluster center. 3. Compute new cluster centers as the centroids of the clusters. 4. Repeat steps 2 and 3 until an optimum value of the criterion function is found (e.g., when a local minimum is found or a predefined number of iterations are completed). 5. Adjust the number of clusters by merging and splitting existing clusters or by removing small or outlier clusters. This algorithm, without step 5, is also known as the k-means clustering.
  • 16.
    K-Means Clustering (demo) Numberof samples = 30 Number of cluster = 3
  • 17.
    K-Means Clustering • k-meansis computationally efficient and gives good results if the clusters are compact, hyper-spherical in shape and well-separated in the feature space. • However, choosing k and choosing the initial partition are the main drawbacks of this algorithm. • The value of k is often chosen empirically or by prior knowledge about the data. • The initial partition is often chosen by generating k random points uniformly distributed within the range of the data, or by randomly selecting k points from the data.
  • 18.
    K-Means Clustering • Thek-means algorithm produces a flat data description where the clusters are disjoint and are at the same level. • In some applications, groups of patterns share some characteristics when looked at a particular level. • Hierarchical clustering tries to capture these multi-level groupings using hierarchical representations rather than flat partitions.
  • 19.
    Hierarchical Clustering • Inhierarchical clustering, for a set of n samples, – the first level consists of n clusters (each cluster containing exactly one sample), – the second level contains n − 1 clusters, – the third level contains n − 2 clusters, – and so on until the last (n-th) level at which all samples form a single cluster. • Given any two samples, at some level they will be grouped together in the same cluster and remain together at all higher levels.
  • 20.
    Hierarchical Clustering • Anatural representation of hierarchical clustering is a tree, also called a dendrogram, which shows how the samples are grouped. • If there is an unusually large gap between the similarity values for two particular levels, one can argue that the level with fewer number of clusters represents a more natural grouping. • A dendrogram can represent the results of hierarchical clustering algorithms.
  • 21.
    Agglomerative Hierarchical Clustering 1. Specifythe number of clusters. Place every pattern in a unique cluster and repeat steps 2 and 3 until a partition with the required number of clusters is obtained. 2. Find the closest clusters according to a distance measure. 3. Merge these two clusters. 4. Return the resulting clusters.
  • 22.
    Popular Distance Measures •Popular distance measures (for two clusters Di and Dj): dmin (Di, Dj ) = min xÎDi xÎDj x - x, dmax (Di, Dj ) = max xÎDi xÎDj x - x, davg (Di, Dj ) = 1 # DiDj x - x, x' ÎDj å xÎDi å dmean (Di, Dj ) = mi - mj
  • 23.
    Hierarchical Clustering (demo) Number ofcluster = 10 Number of cluster = 9 Number of cluster = 8 Number of cluster = 7 Number of cluster = 6 Number of cluster = 5 Number of cluster = 4 Number of cluster = 3 Number of cluster = 2 Number of cluster = 1
  • 24.
    Texture Clustering (example) •We are going to cluster texture image into 5 clusters • Image source: 40 texture images from MIT VisTex database – Number of classes: 5 (brick, fabric, grass, sand, stone) • Features: – Gray-level co-occurrence matrix (GLCM) – Fourier power spectrum (#ring: 4, #sector:8) • Clustering method: – K-means clustering
  • 25.
  • 26.
    Acknowledgment Some of slidesin this PowerPoint presentation are adaptation from various slides, many thanks to: 1. Selim Aksoy, Department of Computer Engineering, Bilkent University (http://www.cs.bilkent.edu.tr/~saksoy/) 2. James Hays, Computer Science Department, Brown University, (http://cs.brown.edu/~hays/)
  • 27.