1. By
G a j j a r B h av i n ku m a r
(IU1571090002)
30th July, 2022
Synopsis
(Electronics and Communication Engineering)
Sparse based feature parameterization and multi
kernel SVM for large scale scene classification
Under the supervision of
D r. H i r e n M e w a d a
(Associate Professor, EE,PMU)
D r. A s h w i n Pa t n i
(Assistant Professor, E&C,IITE,IU)
2. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
3. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
4. Introduction of Image Classification
***Methods of Feature Selection
Exhaustive search,Branch and Bound Search,Relaxed Branch and Bound,Selecting Best Individual
Features,Sequential Forward Selection(SFS),Sequential Backward Selection(SBS),Sequential
Floating Forward Search(SFFS),Sequential Floating Backward Search and Max-Min approach etc…
Classification of Image feature
Color Feature
Histogram,momemnt(CM),Col
or Coherence Vector(CCV),
Color Correlogram
Texture Feature
The Grey Level Co-occurrence
Matrix, Edge Detection, Laws
Texture Energy Measures
Shape Feature
Binary image algorithm,
Horizontal and vertical
segmentation
4
6. Classification Techniques
Classification
Supervised Unsupervised
Distribution Free
Euclidean classifier
K-nearest
neighbour
Minimum distance
Decision Tree
Statistical
Techniques
based on
probability
Distribution
models,which may
be parametric or
nonparametric
Clustering
No extensive prior
knowledge required
Unknown, but distinct,
spectral classes are
generated
Limited control over
classes and identities
No detailed
information
7. • Large dimensionality of classes reduce the accuracy.
• In real-time most of the high dimensional datasets
do not follow normal distribution. Hence, Linear
kernel fails to classify image.
• Bag of word representation can not capture the
spatial information.
• Dense features representation makes it difficult to
learn.
• Linear SVM algorithm is not suitable for large data
sets
7
Challenges in Image Classification
8. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
9. Motivation from literature
Over the past few years, the classification and recognition of vision have gained
importance.
There are three main component involved
1. Point of interest detection
2. Description of region of interest (Feature based)
3. Classification (Kernel based)
Feature Based:
To solve the multiclass reorganization problems, there are many supervised
[1][2][3][4] and unsupervised [5][6][7][8] techniques used with sparse dictionaries.
The state-of-the-art is accompanied by results on standard benchmark datasets,
i.e. Caltech-101 [9], Caltech-256 [10], and Scene-15 [11]
As reported in [8], vector quantization is used to generate sparse code with
maximum pooling. By using this approach, the computation complexity of SVM is
significantly reduced from O(n2) to O(n).
10. Motivation from literature
[12] suggested a method for multi-scale spatial latent semantic analysis based on
sparse coding. The spatial pyramid matching of image segmentation is used to
extract the target's spatial position information, and feature soft quantization
based on sparse coding is utilised to produce a co-occurrence matrix, which
increases the accuracy of the original feature representation.
For matching multilevel detail locally in the learning and recognition stages, multi
resolution pyramids were introduced in SIFT (PSIFT) feature space in [13]. This P-
SIFT experiment showed positive results for streamline work.
The authors of [14] experimented with a classification technique based on SIFT, in
which SIFT are clustered using KNN to build a dictionary and then used the SPM to
generate a feature vector.
Feature Based:
11. Motivation from literature
Across all these studies, authors did not report the effect of SIFT parameters in
their algorithms.
Table 1 lists the parameters controlling the SIFT features. The majority of
experiments in the literature use default values without tuning them for each task.
As part of the first experiment, we investigated SIFT parameters on a sparse-
based dictionary approach for image classification as suggested by Yang et al. [8].
Feature Based:
12. Motivation from literature
The combination of various descriptors employing multiple kernels SVM was
introduced in [15] and demonstrated a significant improvement in various scene
classifications.
In [16] proposed the multilabel least-squares SVM method. For the multi-label
scene classification problem, they used multi-kernel RBFbased SVM. The classifier
was validated on four datasets, with a maximum accuracy of 85%
Kancherla et al [17] validate the effect of kernel in SVM. They simulated the
algorithm with a 3 to 4 class dataset and used different feature sets with various
linear kernel SVM. On the MIT dataset, they discovered that the RBF kernel
outperforms other kernels with a classification rate of 82.06 percent.
Kernel based:
13. Motivation from literature
[18] presented an SVM-based scene classification method for robotic applications.
The robotic development necessitates quick execution. As a result, from the
captured scene, heuristic metric-based key points were identified and used in the
SVM model. They conclude that combining local binary pattern and SURF features
with SVM yielded higher accuracy than a VGG-based neural network model.
To classify hyperspectral images, [20] proposed a hybrid approach of spatial,
spectral, and semantic features. Gabor-based structural features are combined
with morphological-based spatial features and semantic features based on K-
means and entropy. A composite kernel is then created that corresponds to these
three features, achieving an accuracy of 98%.
Conversely, in a large dataset, SVM outperforms NN when features are interpreted
geometrically. Real-world scene classification was achieved with the combination
of dense SIFT, color SIFT, and structure similarity, as well as localized multikernel
neural networks[23].
Kernel based:
14. Motivation from literature
Overall, Multi-kernel SVMs have proved essential in many recognition and
classification applications. Despite the advantage of multikernel over CNN
approaches for classifying scenes amongst a large number of categories, further
improvement is needed to reduce the miss-classification rates among databases
containing many classes.
In addition, robust features can be achieved if redundancy is minimized and the
SVM kernel is designed with optimized parameters consistent with these feature
sets.
Kernel based:
15. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
16. Objective and Scope of the work
Objective of study:
Check the effectiveness of the sparse data in image classification.
Addressing the issue of which size and types of dictionary is best for large scale
dataset.
Selecting robust features that can address this problem.
How linear vs Non-linear kernels of traditional SVM classifier affect on large
scale dataset?
Find the possibilities of reducing computational cost compared to modern
Neural Networks for satisfactory accuracy.
Experimenting pros and cons of traditional Machine Learning over Modern
deep learning algorithms.
17. Objective and Scope of the work
Scope of the work:
In machine vision, there is no any rigorous study of tuning most proven SIFT
feature in classification task. Our study suggests that SIFT feature can be tuned
according to problem and that features can be sparsified by matching the
appropriate size of dictionary. Any traditional machine learning approach can
take advantage of this feature set in order to deal with modern deep learning
algorithms where the requirements of training data, training time, and
computational hardware are higher.
18. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
19. Problem definition
Image classification problems include intra-class variation, scale variation,
view-point variation, occlusion, lighting, background clutter, etc. Feature
selection, kernels, classifiers, machine learning, and deep-learning algorithms
can be applied.
Until date, it was difficult to apply any of these methodologies to large-scale
data while preserving accuracy.
Sparse representation has shown significant potential in dealing with these
challenges.
Traditional classification techniques that use sparse representations lack
image label information. The current deep learning technique's primary flaw
is its excessively expensive training effort. Integrating existing sparse
representation technologies into deep learning is a valuable unresolved topic.
20. Problem definition
We presented a methodology for bridging sparse and machine learning
algorithms and showed its performance for large datasets. The research aims
to enhance multi-class large dataset classification accuracy.
Sparse picture characteristics and machine learning will be used for
categorization.
Another sub-objective is to optimize machine learning speed and class
detection with appropriate accuracy.
21. Problem statement in summury
1. Classification accuracy in multiclass is still difficult with existing techniques
2. Computational time is second concern to optimize with 1.
3. Sparse and ML based approach for classification will be overlooked
4. Possible outcome will be an efficient algorithm which satisfy 1-2-3.
5. Targeted benchmark data set : Caltech-101,Caltech-256, Scene-15
22. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
23. Original contribution by the thesis
The impact of dictionary size and type
converges quickly
KSVD
16x16 image patch size
Over-complete dictionary of size 256x1024
Parameterizing SIFT (T-SIFT)
SIFT descriptor size of 128 is insufficient for all data sizes
SIFT can be customized
256 size descriptor with 16 angels and 4 SIFT bins is sufficient
Table-3
T-SIFT is more robust
T-SIFT outperforms CNN in hardware, training time, and training data requirements.
Multi-kernel SVM with Tuned SIFT
Gaussian Kernel outperforms the Polynomial and its fusion
Improvement on Caltech-101: 4% and Scene-15 : 10%
Caltech-256 is difficult to train with minimal hardware.
T-SIFT with MKL SVM is a novel method
24. Original contribution by the thesis
The impact of dictionary size and type
Parameterizing SIFT (T-SIFT)
Multi-kernel SVM with Tuned SIFT
Summary:
This thesis presents a distinctive contribution by providing some recommendations for
modifying the parameters value chosen for the Dictionary and SIFT.
When contrasted with the prior art, Tunable-use SIFTs in Sparse coded Spatial Pyramid
Matching (ScSPM) and Multi-Kernel nonlinear Support Vector Machines (SVM) produce
significant gains in terms of classification accuracy.
In addition, the uniqueness of the contribution can be seen in the studies that are
referenced in the bibliography.
25. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
26. Methodologies of Research and Results
Intel Core i3 of 2.50 GHz, 8 GB RAM, and Windows-10 of 64 bit machine
SIFT feature analysis and
T-SIFT implementations
Sparse coded SPM with
multi kernel SVM
implementation
First method
Phase-1:
The impact of
dictionary size
and type
Phase-2:
Parameterizing SIFT
(T-SIFT)
Second method
27. Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Figure-1: Proposed tunable SIFT ScSPM
28. Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
There were two phases of the study for the first method:
1 - Dictionary learning
2 - Training the classifier.
29. Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 1 - Dictionary learning
30. Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 1 - Dictionary learning
31. Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
32. Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
33. Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
34. Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
35. Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
In this experiment, we used kernel weights dm to solve the convex optimization problem stated
in equation-7 using SVM as proposed in [30]. To obtain kernel weights d, the fusions of the
kernels with the weights of respective coefficients are listed in Tab. 4.
36. Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
37. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
38. Conclusion and Future work
The size and sparsity of the dictionary are determined by the SIFT parameters.
Therefore, in the first experiment we are presenting the effect of orientation
and orientation bins on the size and sparsity of feature vectors.
By reducing the average number of coefficients, the study concludes that 30
iterations are sufficient to achieve maximum sparsity in the dictionary.
After obtaining the maximum sparsity of the dictionary, the effect of dictionary
sizes on overall classification accuracy is examined.
In further research, it was found that the classification accuracy would be less
for low values of either orientation or orientation bins in histogram formation.
As a result, the appropriate choice of those two parameters results in a boost in
performance as described in the first method. SVM linear kernels were used in
this empirical study.
39. Conclusion and Future work
Secondly, we investigated the fusion of Nonlinear Multi Kernel Learning (MKL).
Although CNNs have achieved high popularity in classification models, they
require a lot of training time and computation power.
SVM has a greater flexibility in characterization than CNN, if a suitable kernel is
used for challenging datasets. As a single kernel, it is limited to datasets with
linear classification.
Therefore, a multi-kernel SVM has been re-experimented with the aim of
optimizing the kernels and studying the various parameters affecting the kernel
performance in classification.
The function of various parameters has been investigated to eliminate duplicate
features in the evaluation of simple MKL over ScSPM features for classification
accuracy.
40. Conclusion and Future work
The effect of MKL on overall classification accuracy is presented after obtaining
the maximum sparsity of the dictionary. Even with the simplest combination of a
single type kernel, such as Polynomial, as represented in Tab. 4, accuracy will be
greater than the single kernel SVM method.
For 101 class datasets, using several combination of Gaussian kernels improved
classification accuracy to 85.72 percent.
With an increasing number of Gaussian kernels, training time and storage needs
grow, making it impossible to work on huge datasets like Caltech-256 with
minimal hardware requirements.
As a whole, we conclude that working with strong features and Multi kernels on
object identification is still an open area. We will investigate the impact of this
feature on similar classes in the future.
41. 1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
42. List of publications
1. Gajjar, Bhavinkumar, Hiren Mewada and Ashwin Patani. "Parameterizing sift
and sparse dictionary for svm based multi-class object classification“
International Journal of Artificial Intelligence 19 (2021): 95-108.
http://www.ceser.in/ceserp/index.php/ijai/article/view/6647 (SCOPUS)
2. Gajjar, Bhavinkumar, Hiren Mewada, and Ashwin Patani. "Sparse coded
spatial pyramid matching and multikernel integrated SVM for non-linear
scene classification" Journal of Electrical Engineering 72.6 (2021): 374-380.
https://doi.org/10.2478/jee-2021-0053/(SCOPUS)
43. 1. cameraman.tif 2. rice.png 3. circlesBrightDark.png 4. liftingBody.png
Results for Matching Pursuit Algorithm
Dict1- Discrete Wavelet
Dict2- DCT and Kronecker Delta
Dict3- Haar Wavelet Packets and DCT
Dict4- K-SVD
Arpan Patel. ”Image Classification with sparse coding and machine learning” thesis. CSPIT,
2017.
43
L7
44. Objective and Scope of the work
1. Check the effectiveness of the sparse data in image classification.
2. Addressing the issue of which size and types of dictionary is best for large scale
dataset.
3. Selecting robust features that can address this problem.
4. How linear vs Non-linear kernels of traditional SVM classifier affect on large scale
dataset?
5. Find the possibilities of reducing computational cost compared to modern Neural
Networks for satisfactory accuracy.
6. Experimenting pros and cons of traditional Machine Learning over Modern deep
learning algorithms.
45. Objective and Scope of the work
In machine vision, there is no any rigorous study of tuning most proven SIFT feature in
classification task. Our study suggests that SIFT feature can be tuned according to problem
and that features can be sparsified by matching the appropriate size of dictionary. Any
traditional machine learning approach can take advantage of this feature set in order to
deal with modern deep learning algorithms where the requirements of training data,
training time, and computational hardware are higher.
46. Features (+Sparse) Kernel function of
Classifier
Classification techniques(ML)
Speeded Up Robust Features (SURF) Linear K-Means
Features from Accelerated Segment Test
(FAST)
RBF SVM
Binary Robust Independent Elementary
Features (BRIEF)
Polinomial K-nearest neighbour(KNN)
Oriented FAST and Rotated BRIEF (ORB) sigmoid Artificial Neural Network(ANN)
Histogram of Oriented Gradients (HOGs) Convolutional neural
Network(CNN)
… … …
Good features
Classification
techniques
Kernels of
classifier
Accuracy ?
Computation Time?
48. 48
A sparse matrix is a one in which the majority of the values are zero. The proportion of
zero elements to non-zero elements is called the sparsity of the matrix. The opposite of
a sparse matrix, in which the majority of its values are non-zero, is called a dense
matrix.
5 0 0 0
0 11 0 0
0 0 25 0
0 0 0 7
Sparsity = 3 (12 Zeros / 4 Non-zeros)
Advantage:
save a significant amount of memory
speed up the processing of that data
Reduce computation time by eliminating operations on zero elements
What is Sparse?
What is Sparse?
53. Can be applied to almost everything
Classifications or numerical predictions
Widely used in pattern recognition
o Identify cancer or genetic diseases
o Text classification: classify texts based on the language
o Detecting rare events: earthquakes or engine failures
Support vector machine
54. x
We have two features ( x , x ) and some data points
1 2
1
x2
Linearly separable problem
55. We want to find a
hyperplane,
in this case a line, that
separates
the different data points
with
the maximum margin
x1
x2
58. Support vectors:
the points from each class that are closest to the maximum margin hyperplane
each class have at least 1 support vector
Support vectors
x1
x2
59. With the support vectors alone it is possible to reconstruct the hyperplane: it is
good !!!
We can store the classification model even when we have millions of features
Support vectors
x1
x2
60. How to find the hyperplane when the problem is linearly separable? With convex
hulls
x1
x2
61. How to find the hyperplane when the problem is linearly separable? With convex
hulls
Convex hull: smallest convex
set that contains all the point
The hyperplane is the
perpendicular bisector
of the shortest line
between the two hull
x1
x2
62. Mathematical approach
w * x + b = 0 the equation of a hyperplane in n-dimensions
In 2D: y = m*x + b
w w ... w
n
1 2
x x ... x
n
1 2
we have the so called weights
The aim of the SVM algorithm is to find the w weights so that the data points
will be separated accordingly:
w * x + b > +1
w * x + b < -1
n
63. How to find the hyperplane in 2D? With convex hulls
The two planes defined
by the equations
x1
x2
d
H0
H1
64. Mathematical approach
Vector geometry defines, that the distance between the two
planes:
2
w
Euclidean-norm ( distance from 0 )
We want to make the distance as large as possible so we want to
minimize the norm of the w
We usually minimize:
1
2
w
Quadratic optimization solve this problem !!!
2
65. Non-linear spaces
In many real-world applications, the relationships between variables are
non-linear
A key feature of SVMs is their ability to map the problem into a higher
dimensional space using a process known as the “kernel trick”
Non-linear relationship may suddenly appears to be quite linear
66. We have to use slack variables, it is a non-linearly separable problem
a
i
a
i
x1
x2
67. Mathematical approach
We minimize:
1
2 w
+ C
𝒊
𝒂
i
C: cost parameter to all points that violate the constraints
We make our optimization on this cost function
We can tune the C parameter: we can modify the penalty for the data points
that are misclassified
C is very large the algorithm tries to find a 100% separation
C is low wider overall margin is allowed with more misclassified data points
2
70. kernel
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
Higher dimensional space
latitude
longitude
altitude
longitude
71. Can be applied to almost everything
Classifications or numerical predictions
Widely used in pattern recognition
o Identify cancer or genetic diseases
o Text classification: classify texts based on the language
o Detecting rare events: earthquakes or engine failures
Support vector machine
72. x
We have two features ( x , x ) and some data points
1 2
1
x2
Linearly separable problem
73. We want to find a
hyperplane,
in this case a line, that
separates
the different data points
with
the maximum margin
x1
x2
76. Support vectors:
the points from each class that are closest to the maximum margin hyperplane
each class have at least 1 support vector
Support vectors
x1
x2
77. With the support vectors alone it is possible to reconstruct the hyperplane: it is
good !!!
We can store the classification model even when we have millions of features
Support vectors
x1
x2
78. How to find the hyperplane when the problem is linearly separable? With convex
hulls
x1
x2
79. How to find the hyperplane when the problem is linearly separable? With convex
hulls
Convex hull: smallest convex
set that contains all the point
The hyperplane is the
perpendicular bisector
of the shortest line
between the two hull
x1
x2
80. Mathematical approach
w * x + b = 0 the equation of a hyperplane in n-dimensions
In 2D: y = m*x + b
w w ... w
n
1 2
x x ... x
n
1 2
we have the so called weights
The aim of the SVM algorithm is to find the w weights so that the data points
will be separated accordingly:
w * x + b > +1
w * x + b < -1
n
81. How to find the hyperplane in 2D? With convex hulls
The two planes defined
by the equations
x1
x2
d
H0
H1
82. Mathematical approach
Vector geometry defines, that the distance between the two
planes:
2
w
Euclidean-norm ( distance from 0 )
We want to make the distance as large as possible so we want to
minimize the norm of the w
We usually minimize:
1
2
w
Quadratic optimization solve this problem !!!
2
83. Non-linear spaces
In many real-world applications, the relationships between variables are
non-linear
A key feature of SVMs is their ability to map the problem into a higher
dimensional space using a process known as the “kernel trick”
Non-linear relationship may suddenly appears to be quite linear
84. We have to use slack variables, it is a non-linearly separable problem
a
i
a
i
x1
x2
85. Mathematical approach
We minimize:
1
2 w
+ C
𝒊
𝒂
i
C: cost parameter to all points that violate the constraints
We make our optimization on this cost function
We can tune the C parameter: we can modify the penalty for the data points
that are misclassified
C is very large the algorithm tries to find a 100% separation
C is low wider overall margin is allowed with more misclassified data points
2
88. kernel
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
Higher dimensional space
latitude
longitude
altitude
longitude
89. kernel
Higher dimensional space
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
latitude
longitude
altitude
longitude
90. kernel
SVM learns concepts that were not explicitly measured in the original data !!!
Higher dimensional space
latitude
longitude
altitude
longitude
91. Kernel functions
Φ(x) “phi function”
This is the mapping of data x into an other space
K(x , x ) this is the kernel function
i j
K(x , x ) =
i j
x * x
i j
Linear kernel: does not transform the data
( x * x + 1 )
i j
Polynomial kernel
d
x - x
i j
exp
2*
2
2
gaussian RBF kernel
-
K(x , x ) =
i j
K(x , x ) =
i j
92. Advantages
SVM can be used for regression problems as well as for classifications
Not overly influenced by noisy data
Easier to use than neural networks
Finding the best model requires testing of various combinations of kernels
and model parameters
Quite slow especially when the input dataset has a large number of
features
Disadvantages
93. 93
Algorithms/
No. of
Classes
2 Class
Bonsai and
car side
5 class 20 40 80 101
Spars+SIFT
+SVM
100% 95.38% 79.19% 76.07% 75.26% 73.13%
Sparse
+SVM
47.37% 43.10% - - - -
SIFT
+SVM
56.56% 52.60% - - - -
Classification Test results
Caltech 101
94. 94
Overview: Kernel-based learning
Lower dimension
Input Space
Higher dimension
Feature Space
Kernel
Design
Kernel measures the similarity between data points
Kernel transformation helps in using in linear separation algorithm
like Support Vector Classification (SVC) in higher dimensions
95. 95
Same data can have elements that show different patterns
Best kernel is a linear combination
of different kernels
Overview: Kernel-based learning
99. 99
K(x , x ) =
i j
x * x
i j
Linear kernel: does not transform the data
( x * x + 1 )
i j
Polynomial kernel
d
x - x
i j
exp
2*
2
2
gaussian RBF kernel
-
K(x , x ) =
i j
K(x , x ) =
i j
Kernels used
101. 101
Algorithms 15 training images/class 30 training images/class
Zhang et al. [1] 59.10±0.60 66.20±0.50
KSPM [2] 56.40 64.40±0.80
NBNN [3] 65.00±1.14 70.40
ML+CORR [4] 61.00 69.60
KC [5] - 64.14±1.18
LSPM[6] 53.23±0.65 58.81±1.51
ScSPM[6] 67.0±0.45 73.2±0.54
DMKDL [7] - 82.66± 0.36*
MKLDPL [7] - 86.81±0.21*
Our method
(Best Result)
69.29±0.98 75.70±1.30
*30 images for training and 15 images for testing
Comp aris on with oth er meth od s for Caltech 1 0 1 Datas et
102. 102
Try other Kernels with different L-norms
Work on Two more dataset Caltech256, Scene-15
Understand the cost function effect on different dataset in SVM
Divide the training and testing data with standard approximation for all class
and check performance
Publish a paper on above results
Future Work
103. 103
Feature extraction
SVM
Caltech-101
Caltech-256
Scene-15
Dictionary Learning
Sparse Coding
Training features & Labels Testing features
Classified labels
% A c c u r a c y
SIFT, LBP etc…
KSVD,SimCO
OMP,MP,BP
Multikernel,
Cost Function
Using LBP:63
Using SIFT: 65
Fusion of SIFT+LBP : -
SPM+SIFT: ~77
Using SimCo: ~68
Using OMP: ~66
Using KSVD: ~73
Multikernel: ~ 75.70
Single kernel: ~ 69.71
Testing Labels
104. 104
Sparse formulation of feature vector
Attractive properties of Sparse Coding.
First, compared with the VQ coding, SC coding can achieve a much lower reconstruction
error due to the less restrictive constraint;
Second, sparsity allows the representation to be specialize, and to capture salient
properties of images;
Third, research in image statistics clearly reveals that image patches are sparse signals.
113. 113
Key Parameter of SIFT feature
[ Lowe, David G. "Distinctive image features from scale-
invariant keypoints." International journal of computer
vision 60.2 (2004): 91-110.]
2 × 2 descriptor array computed from an 8 × 8 set of samples