2. Feature Learning For Image Classification
Agenda:
I. Summary and Goal of Study
II. Learning Feature Rep. with K-means
a) i- K-means training routine (initializing & preprocessing)
b) ii- Bag of Words Method
c) iii- Comparison to Sparse Feature Learning
III. Discussion
Source paper: Learning Feature Representations with K-means
Adam Coates and Andrew Y. Ng
Stanford University, Stanford CA 94306, USA / 2012
3. Unlabeled images
Feature representation
Data: Unlabeled Images
Unsupervised training
algorithm
Feature Representation
I. Summary and Goal of Study
1 2
3
4
5
6
Image ID C-1 C-2 C-3 C-4 C-5 .N
1 w1 w2 w3 w4 w5 W6
2 .. .. .. .. .. ..
3 .. .. .. .. .. ..
4 .. .. .. .. .. ..
5 .. .. .. .. .. ..
6 .. .. .. .. ... ..
How many C-1 members
Image 1 have?
How many clusters
we have in total
4. Feature representation
Vocabulary
Supervised learning
Applications in which the training data comprises examples of the input vectors along with their
corresponding target vectors are known as supervised learning problems.
Classification(Naive Bayes, Decision tree learning algorithm)
Unsupervised learning
In other pattern recognition problems, the training data consists of a set of input vectors x without any
corresponding target values. The goal in such unsupervised learning problems may be to discover
groups of similar examples within the data, where it is called clustering
Clustering(K-mean,hierarchical clustering)
Pattern Recognition and Machine Learning (Bishop, 2006)
5. Feature representation
Vocabulary
K-means
Spherical K-means
The idea is to set the center of the cluster such
that it makes both uniform and minimal the
angle between components.
The points should have consistent spacing
between each other.
It prevents the clusters being to large or to
small
7. Feature Learning For Image Classification
Agenda:
I. Summary and Goal of Study
II. Learning Feature Rep. with K-means
a) i- K-means training routine (initializing & preprocessing)
b) ii- Bag of Words Method
c) iii- Comparison to Sparse Feature Learning
III. Discussion
8. 1- Extract patches from unlabeled training images
Each patch has dimension w-by-w and has d channels (d=3 for color images)
Each w-by-w patch can be represented as a vector in RN of pixel values, N = w · w · d
2- Apply a pre-processing stage to the patches
1-Normalization:
2-Whitening (decorrelation )
II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
9. 2- Apply a pre-processing stage to the patches
1-Normalization:
- For each patch, subtract out the mean of the
intensities and divide by the standard deviation
- It is useful to normalize the brightness and contrast
of the patches
II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
10. 2- Apply a pre-processing stage to the patches
2-Whitening (decorrelation )
- We need this because the high correlation
between nearby pixels even if brightness and
contrast normalization
II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
11. II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
2-Whitening (decorrelation )
12. II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
2-Whitening (decorrelation )
13. II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
2-Whitening (decorrelation )
14. II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
2-Whitening (decorrelation )
Raw 2D data
U1 and u2 are eigen vectors and Sigma is covariance
matrix
λ1,λ2 are corresponding eigen values
15. II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
2-Whitening (decorrelation ) U1 and u2 are eigen vectors and Sigma is covariance
matrix
λ1,λ2 are corresponding eigen values
Length of the projection of x
onto the vector u1
We can represent x in (u1,u2)
16. II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
2-Whitening (decorrelation )
We can represent x in (u1,u2)
This is the training set rotated into
u1 u2 basis
17. II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
2-Whitening (decorrelation )
This data now has covariance equal to the identity matrix I
18. II. Learning Feature Rep. with K-means
i- K-means training routine (initializing & preprocessing)
2-Whitening (decorrelation )
This data now has covariance equal to the identity matrix I
Whitened Data PointsRaw Data Points
19. Feature Learning For Image Classification
Agenda:
I. Summary and Goal of Study
II. Learning Feature Rep. with K-means
a) i- K-means training routine (initializing & preprocessing)
b) ii- Bag of Words Method
c) iii- Comparison to Sparse Feature Learning
III. Discussion
20. II. Learning Feature Rep. with K-means
ii- Bag of Words Method - 1
•First, take a bunch of images, extract features, and build up a “dictionary” or “visual
vocabulary” – a list of common features
•Given a new image, extract features and build a histogram – for each feature, find the
closest visual word in the dictionary
23. Feature Learning For Image Classification
Agenda:
I. Summary and Goal of Study
II. Learning Feature Rep. with K-means
a) i- K-means training routine (initializing & preprocessing)
b) ii- Bag of Words Method
c) iii- Comparison to Sparse Feature Learning
III. Discussion
24. Centroid 1
Centroid 2
Centroid 3
K-means
Represent as:
Represent as:
Basis f1
Sparse coding
Basis f2
Basis f3
II. Learning Feature Rep. with K-means
iii- Comparison to Sparse Feature Learning
Whenever using k-means to get a dictionary, if you replace it
with sparse coding it’ll often work better.
https://www.physicsforums.com/threads/what-is-the-difference-between-whitening-and-pca.635358/
http://deeplearning.stanford.edu/wiki/index.php/Exercise:PCA_and_Whitening
the raw input is redundant, since adjacent pixel values are highly correlated. The goal of whitening is to make the input less redundant; more formally, our desiderata are that our learning algorithms sees a training input where (i) the features are less correlated with each other, and (ii) the features all have the same variance.
http://ufldl.stanford.edu/tutorial/unsupervised/PCAWhitening/