SlideShare a Scribd company logo
1 of 113
By
G a j j a r B h av i n ku m a r
(IU1571090002)
30th July, 2022
Synopsis
(Electronics and Communication Engineering)
Sparse based feature parameterization and multi
kernel SVM for large scale scene classification
Under the supervision of
D r. H i r e n M e w a d a
(Associate Professor, EE,PMU)
D r. A s h w i n Pa t n i
(Assistant Professor, E&C,IITE,IU)
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Introduction of Image Classification
***Methods of Feature Selection
Exhaustive search,Branch and Bound Search,Relaxed Branch and Bound,Selecting Best Individual
Features,Sequential Forward Selection(SFS),Sequential Backward Selection(SBS),Sequential
Floating Forward Search(SFFS),Sequential Floating Backward Search and Max-Min approach etc…
Classification of Image feature
Color Feature
Histogram,momemnt(CM),Col
or Coherence Vector(CCV),
Color Correlogram
Texture Feature
The Grey Level Co-occurrence
Matrix, Edge Detection, Laws
Texture Energy Measures
Shape Feature
Binary image algorithm,
Horizontal and vertical
segmentation
4
What is Image/object Classification?
Classification Techniques
Classification
Supervised Unsupervised
Distribution Free
Euclidean classifier
K-nearest
neighbour
Minimum distance
Decision Tree
Statistical
Techniques
based on
probability
Distribution
models,which may
be parametric or
nonparametric
Clustering
No extensive prior
knowledge required
Unknown, but distinct,
spectral classes are
generated
Limited control over
classes and identities
No detailed
information
• Large dimensionality of classes reduce the accuracy.
• In real-time most of the high dimensional datasets
do not follow normal distribution. Hence, Linear
kernel fails to classify image.
• Bag of word representation can not capture the
spatial information.
• Dense features representation makes it difficult to
learn.
• Linear SVM algorithm is not suitable for large data
sets
7
Challenges in Image Classification
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Motivation from literature
Over the past few years, the classification and recognition of vision have gained
importance.
There are three main component involved
1. Point of interest detection
2. Description of region of interest (Feature based)
3. Classification (Kernel based)
Feature Based:
To solve the multiclass reorganization problems, there are many supervised
[1][2][3][4] and unsupervised [5][6][7][8] techniques used with sparse dictionaries.
The state-of-the-art is accompanied by results on standard benchmark datasets,
i.e. Caltech-101 [9], Caltech-256 [10], and Scene-15 [11]
As reported in [8], vector quantization is used to generate sparse code with
maximum pooling. By using this approach, the computation complexity of SVM is
significantly reduced from O(n2) to O(n).
Motivation from literature
[12] suggested a method for multi-scale spatial latent semantic analysis based on
sparse coding. The spatial pyramid matching of image segmentation is used to
extract the target's spatial position information, and feature soft quantization
based on sparse coding is utilised to produce a co-occurrence matrix, which
increases the accuracy of the original feature representation.
For matching multilevel detail locally in the learning and recognition stages, multi
resolution pyramids were introduced in SIFT (PSIFT) feature space in [13]. This P-
SIFT experiment showed positive results for streamline work.
The authors of [14] experimented with a classification technique based on SIFT, in
which SIFT are clustered using KNN to build a dictionary and then used the SPM to
generate a feature vector.
Feature Based:
Motivation from literature
Across all these studies, authors did not report the effect of SIFT parameters in
their algorithms.
Table 1 lists the parameters controlling the SIFT features. The majority of
experiments in the literature use default values without tuning them for each task.
As part of the first experiment, we investigated SIFT parameters on a sparse-
based dictionary approach for image classification as suggested by Yang et al. [8].
Feature Based:
Motivation from literature
The combination of various descriptors employing multiple kernels SVM was
introduced in [15] and demonstrated a significant improvement in various scene
classifications.
In [16] proposed the multilabel least-squares SVM method. For the multi-label
scene classification problem, they used multi-kernel RBFbased SVM. The classifier
was validated on four datasets, with a maximum accuracy of 85%
Kancherla et al [17] validate the effect of kernel in SVM. They simulated the
algorithm with a 3 to 4 class dataset and used different feature sets with various
linear kernel SVM. On the MIT dataset, they discovered that the RBF kernel
outperforms other kernels with a classification rate of 82.06 percent.
Kernel based:
Motivation from literature
[18] presented an SVM-based scene classification method for robotic applications.
The robotic development necessitates quick execution. As a result, from the
captured scene, heuristic metric-based key points were identified and used in the
SVM model. They conclude that combining local binary pattern and SURF features
with SVM yielded higher accuracy than a VGG-based neural network model.
To classify hyperspectral images, [20] proposed a hybrid approach of spatial,
spectral, and semantic features. Gabor-based structural features are combined
with morphological-based spatial features and semantic features based on K-
means and entropy. A composite kernel is then created that corresponds to these
three features, achieving an accuracy of 98%.
Conversely, in a large dataset, SVM outperforms NN when features are interpreted
geometrically. Real-world scene classification was achieved with the combination
of dense SIFT, color SIFT, and structure similarity, as well as localized multikernel
neural networks[23].
Kernel based:
Motivation from literature
Overall, Multi-kernel SVMs have proved essential in many recognition and
classification applications. Despite the advantage of multikernel over CNN
approaches for classifying scenes amongst a large number of categories, further
improvement is needed to reduce the miss-classification rates among databases
containing many classes.
In addition, robust features can be achieved if redundancy is minimized and the
SVM kernel is designed with optimized parameters consistent with these feature
sets.
Kernel based:
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Objective and Scope of the work
Objective of study:
 Check the effectiveness of the sparse data in image classification.
 Addressing the issue of which size and types of dictionary is best for large scale
dataset.
 Selecting robust features that can address this problem.
 How linear vs Non-linear kernels of traditional SVM classifier affect on large
scale dataset?
 Find the possibilities of reducing computational cost compared to modern
Neural Networks for satisfactory accuracy.
 Experimenting pros and cons of traditional Machine Learning over Modern
deep learning algorithms.
Objective and Scope of the work
Scope of the work:
In machine vision, there is no any rigorous study of tuning most proven SIFT
feature in classification task. Our study suggests that SIFT feature can be tuned
according to problem and that features can be sparsified by matching the
appropriate size of dictionary. Any traditional machine learning approach can
take advantage of this feature set in order to deal with modern deep learning
algorithms where the requirements of training data, training time, and
computational hardware are higher.
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Problem definition
 Image classification problems include intra-class variation, scale variation,
view-point variation, occlusion, lighting, background clutter, etc. Feature
selection, kernels, classifiers, machine learning, and deep-learning algorithms
can be applied.
 Until date, it was difficult to apply any of these methodologies to large-scale
data while preserving accuracy.
 Sparse representation has shown significant potential in dealing with these
challenges.
 Traditional classification techniques that use sparse representations lack
image label information. The current deep learning technique's primary flaw
is its excessively expensive training effort. Integrating existing sparse
representation technologies into deep learning is a valuable unresolved topic.
Problem definition
 We presented a methodology for bridging sparse and machine learning
algorithms and showed its performance for large datasets. The research aims
to enhance multi-class large dataset classification accuracy.
 Sparse picture characteristics and machine learning will be used for
categorization.
 Another sub-objective is to optimize machine learning speed and class
detection with appropriate accuracy.
Problem statement in summury
1. Classification accuracy in multiclass is still difficult with existing techniques
2. Computational time is second concern to optimize with 1.
3. Sparse and ML based approach for classification will be overlooked
4. Possible outcome will be an efficient algorithm which satisfy 1-2-3.
5. Targeted benchmark data set : Caltech-101,Caltech-256, Scene-15
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Original contribution by the thesis
The impact of dictionary size and type
 converges quickly
 KSVD
 16x16 image patch size
 Over-complete dictionary of size 256x1024
Parameterizing SIFT (T-SIFT)
 SIFT descriptor size of 128 is insufficient for all data sizes
 SIFT can be customized
 256 size descriptor with 16 angels and 4 SIFT bins is sufficient
 Table-3
 T-SIFT is more robust
 T-SIFT outperforms CNN in hardware, training time, and training data requirements.
Multi-kernel SVM with Tuned SIFT
 Gaussian Kernel outperforms the Polynomial and its fusion
 Improvement on Caltech-101: 4% and Scene-15 : 10%
 Caltech-256 is difficult to train with minimal hardware.
 T-SIFT with MKL SVM is a novel method
Original contribution by the thesis
The impact of dictionary size and type
Parameterizing SIFT (T-SIFT)
Multi-kernel SVM with Tuned SIFT
Summary:
 This thesis presents a distinctive contribution by providing some recommendations for
modifying the parameters value chosen for the Dictionary and SIFT.
 When contrasted with the prior art, Tunable-use SIFTs in Sparse coded Spatial Pyramid
Matching (ScSPM) and Multi-Kernel nonlinear Support Vector Machines (SVM) produce
significant gains in terms of classification accuracy.
 In addition, the uniqueness of the contribution can be seen in the studies that are
referenced in the bibliography.
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Methodologies of Research and Results
Intel Core i3 of 2.50 GHz, 8 GB RAM, and Windows-10 of 64 bit machine
SIFT feature analysis and
T-SIFT implementations
Sparse coded SPM with
multi kernel SVM
implementation
First method
Phase-1:
The impact of
dictionary size
and type
Phase-2:
Parameterizing SIFT
(T-SIFT)
Second method
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Figure-1: Proposed tunable SIFT ScSPM
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
There were two phases of the study for the first method:
1 - Dictionary learning
2 - Training the classifier.
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 1 - Dictionary learning
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 1 - Dictionary learning
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
Methodologies of Research and Results
First Method: SIFT feature analysis and T-SIFT implementations
Phase 2 - Training the classifier.
Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
In this experiment, we used kernel weights dm to solve the convex optimization problem stated
in equation-7 using SVM as proposed in [30]. To obtain kernel weights d, the fusions of the
kernels with the weights of respective coefficients are listed in Tab. 4.
Methodologies of Research and Results
Second Method : Sparse coded SPM with multi kernel SVM implementation
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
Conclusion and Future work
 The size and sparsity of the dictionary are determined by the SIFT parameters.
Therefore, in the first experiment we are presenting the effect of orientation
and orientation bins on the size and sparsity of feature vectors.
 By reducing the average number of coefficients, the study concludes that 30
iterations are sufficient to achieve maximum sparsity in the dictionary.
 After obtaining the maximum sparsity of the dictionary, the effect of dictionary
sizes on overall classification accuracy is examined.
 In further research, it was found that the classification accuracy would be less
for low values of either orientation or orientation bins in histogram formation.
As a result, the appropriate choice of those two parameters results in a boost in
performance as described in the first method. SVM linear kernels were used in
this empirical study.
Conclusion and Future work
 Secondly, we investigated the fusion of Nonlinear Multi Kernel Learning (MKL).
Although CNNs have achieved high popularity in classification models, they
require a lot of training time and computation power.
 SVM has a greater flexibility in characterization than CNN, if a suitable kernel is
used for challenging datasets. As a single kernel, it is limited to datasets with
linear classification.
 Therefore, a multi-kernel SVM has been re-experimented with the aim of
optimizing the kernels and studying the various parameters affecting the kernel
performance in classification.
 The function of various parameters has been investigated to eliminate duplicate
features in the evaluation of simple MKL over ScSPM features for classification
accuracy.
Conclusion and Future work
 The effect of MKL on overall classification accuracy is presented after obtaining
the maximum sparsity of the dictionary. Even with the simplest combination of a
single type kernel, such as Polynomial, as represented in Tab. 4, accuracy will be
greater than the single kernel SVM method.
 For 101 class datasets, using several combination of Gaussian kernels improved
classification accuracy to 85.72 percent.
 With an increasing number of Gaussian kernels, training time and storage needs
grow, making it impossible to work on huge datasets like Caltech-256 with
minimal hardware requirements.
 As a whole, we conclude that working with strong features and Multi kernels on
object identification is still an open area. We will investigate the impact of this
feature on similar classes in the future.
1. Introduction of Image Classification
2. Problem Definitions
3. Objective and Scope of the work
4. Motivation from literature
5. Original Contribution by the thesis
6. Methodologies of Research and Results
7. Conclusion and Future work
8. List of publications
9. References
Highlights of Synopsis
List of publications
1. Gajjar, Bhavinkumar, Hiren Mewada and Ashwin Patani. "Parameterizing sift
and sparse dictionary for svm based multi-class object classification“
International Journal of Artificial Intelligence 19 (2021): 95-108.
http://www.ceser.in/ceserp/index.php/ijai/article/view/6647 (SCOPUS)
2. Gajjar, Bhavinkumar, Hiren Mewada, and Ashwin Patani. "Sparse coded
spatial pyramid matching and multikernel integrated SVM for non-linear
scene classification" Journal of Electrical Engineering 72.6 (2021): 374-380.
https://doi.org/10.2478/jee-2021-0053/(SCOPUS)
1. cameraman.tif 2. rice.png 3. circlesBrightDark.png 4. liftingBody.png
Results for Matching Pursuit Algorithm
Dict1- Discrete Wavelet
Dict2- DCT and Kronecker Delta
Dict3- Haar Wavelet Packets and DCT
Dict4- K-SVD
Arpan Patel. ”Image Classification with sparse coding and machine learning” thesis. CSPIT,
2017.
43
L7
Objective and Scope of the work
1. Check the effectiveness of the sparse data in image classification.
2. Addressing the issue of which size and types of dictionary is best for large scale
dataset.
3. Selecting robust features that can address this problem.
4. How linear vs Non-linear kernels of traditional SVM classifier affect on large scale
dataset?
5. Find the possibilities of reducing computational cost compared to modern Neural
Networks for satisfactory accuracy.
6. Experimenting pros and cons of traditional Machine Learning over Modern deep
learning algorithms.
Objective and Scope of the work
In machine vision, there is no any rigorous study of tuning most proven SIFT feature in
classification task. Our study suggests that SIFT feature can be tuned according to problem
and that features can be sparsified by matching the appropriate size of dictionary. Any
traditional machine learning approach can take advantage of this feature set in order to
deal with modern deep learning algorithms where the requirements of training data,
training time, and computational hardware are higher.
Features (+Sparse) Kernel function of
Classifier
Classification techniques(ML)
Speeded Up Robust Features (SURF) Linear K-Means
Features from Accelerated Segment Test
(FAST)
RBF SVM
Binary Robust Independent Elementary
Features (BRIEF)
Polinomial K-nearest neighbour(KNN)
Oriented FAST and Rotated BRIEF (ORB) sigmoid Artificial Neural Network(ANN)
Histogram of Oriented Gradients (HOGs) Convolutional neural
Network(CNN)
… … …
Good features
Classification
techniques
Kernels of
classifier
Accuracy ?
Computation Time?
Introduction of Image Classification
 Challenges: intra-class variation, scale variation, view-point variation, occlusion,
illumination, background clutter
 Approaches: feature selection, kernels, classifiers, machine learning and deep-learning
algorithms
48
A sparse matrix is a one in which the majority of the values are zero. The proportion of
zero elements to non-zero elements is called the sparsity of the matrix. The opposite of
a sparse matrix, in which the majority of its values are non-zero, is called a dense
matrix.
5 0 0 0
0 11 0 0
0 0 25 0
0 0 0 7
Sparsity = 3 (12 Zeros / 4 Non-zeros)
Advantage:
 save a significant amount of memory
 speed up the processing of that data
 Reduce computation time by eliminating operations on zero elements
What is Sparse?
What is Sparse?
49
Sparse model
50
Greedy Algorithms
Matching Pursuit(MP)
Orthogonal Matching Pursuit(OMP)
[Stagewise Orthogonal Matching Pursuit (StOMP),
Subspace Pursuit (SP),
Compressive Sampling Matching Pursuit (CoSaMP),
Regularized Orthogonal Matching Pursuit (ROMP),
Gradient Pursuit (GP),
Iterative Hard Thresholding (IHT),
Hard Thresholding Pursuit (HTP)]
Relaxation Algorithm
 Basic Pursuit(BP)
 Least-Absolute-Shrinkage-and-
Selection-Operator (LASSO)
 FOcal Under-determined System
Solver (FOCUSS)
Sparse coding
51
1. Maximum Likelihood (ML)
2. Method of Optimal Directions (MOD)
3. K-SVD
4. Simultaneous Codeword Optimization (SimCO)
Dictionary Learning Algorithms
D
Initialize D
Sparse Coding
(Greedy/Relaxation
Algorithms)
Dictionary Update
(DL Algorithms)
Aharon, Elad, & Bruckstein (`04)
Y
T
The K-SVD Algorithm - General
 Can be applied to almost everything
 Classifications or numerical predictions
 Widely used in pattern recognition
o Identify cancer or genetic diseases
o Text classification: classify texts based on the language
o Detecting rare events: earthquakes or engine failures
Support vector machine
x
We have two features ( x , x ) and some data points
1 2
1
x2
Linearly separable problem
We want to find a
hyperplane,
in this case a line, that
separates
the different data points
with
the maximum margin
x1
x2
x 1
x2
This is the maximum margin
solution
x 1
x2
Support vectors:
 the points from each class that are closest to the maximum margin hyperplane
 each class have at least 1 support vector
Support vectors
x1
x2
With the support vectors alone it is possible to reconstruct the hyperplane: it is
good !!!
We can store the classification model even when we have millions of features
Support vectors
x1
x2
How to find the hyperplane when the problem is linearly separable? With convex
hulls
x1
x2
How to find the hyperplane when the problem is linearly separable? With convex
hulls
Convex hull: smallest convex
set that contains all the point
The hyperplane is the
perpendicular bisector
of the shortest line
between the two hull
x1
x2
Mathematical approach
w * x + b = 0 the equation of a hyperplane in n-dimensions
In 2D: y = m*x + b
w w ... w
n
1 2
x x ... x
n
1 2
we have the so called weights
The aim of the SVM algorithm is to find the w weights so that the data points
will be separated accordingly:
w * x + b > +1
w * x + b < -1
n
How to find the hyperplane in 2D? With convex hulls
The two planes defined
by the equations
x1
x2
d
H0
H1
Mathematical approach
Vector geometry defines, that the distance between the two
planes:
2
w
Euclidean-norm ( distance from 0 )
We want to make the distance as large as possible  so we want to
minimize the norm of the w
We usually minimize:
1
2
w
Quadratic optimization solve this problem !!!
2
Non-linear spaces
 In many real-world applications, the relationships between variables are
non-linear
 A key feature of SVMs is their ability to map the problem into a higher
dimensional space using a process known as the “kernel trick”
 Non-linear relationship may suddenly appears to be quite linear
We have to use slack variables, it is a non-linearly separable problem
a
i
a
i
x1
x2
Mathematical approach
We minimize:
1
2 w
+ C
𝒊
𝒂
i
C: cost parameter to all points that violate the constraints
We make our optimization on this cost function
We can tune the C parameter: we can modify the penalty for the data points
that are misclassified
C is very large  the algorithm tries to find a 100% separation
C is low  wider overall margin is allowed with more misclassified data points
2
latitude
longitude
Kernels
It can be weather classes: sunny and snowy
kernel
altitude
longitude
latitude
longitude
kernel
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
Higher dimensional space
latitude
longitude
altitude
longitude
 Can be applied to almost everything
 Classifications or numerical predictions
 Widely used in pattern recognition
o Identify cancer or genetic diseases
o Text classification: classify texts based on the language
o Detecting rare events: earthquakes or engine failures
Support vector machine
x
We have two features ( x , x ) and some data points
1 2
1
x2
Linearly separable problem
We want to find a
hyperplane,
in this case a line, that
separates
the different data points
with
the maximum margin
x1
x2
x 1
x2
This is the maximum margin
solution
x 1
x2
Support vectors:
 the points from each class that are closest to the maximum margin hyperplane
 each class have at least 1 support vector
Support vectors
x1
x2
With the support vectors alone it is possible to reconstruct the hyperplane: it is
good !!!
We can store the classification model even when we have millions of features
Support vectors
x1
x2
How to find the hyperplane when the problem is linearly separable? With convex
hulls
x1
x2
How to find the hyperplane when the problem is linearly separable? With convex
hulls
Convex hull: smallest convex
set that contains all the point
The hyperplane is the
perpendicular bisector
of the shortest line
between the two hull
x1
x2
Mathematical approach
w * x + b = 0 the equation of a hyperplane in n-dimensions
In 2D: y = m*x + b
w w ... w
n
1 2
x x ... x
n
1 2
we have the so called weights
The aim of the SVM algorithm is to find the w weights so that the data points
will be separated accordingly:
w * x + b > +1
w * x + b < -1
n
How to find the hyperplane in 2D? With convex hulls
The two planes defined
by the equations
x1
x2
d
H0
H1
Mathematical approach
Vector geometry defines, that the distance between the two
planes:
2
w
Euclidean-norm ( distance from 0 )
We want to make the distance as large as possible  so we want to
minimize the norm of the w
We usually minimize:
1
2
w
Quadratic optimization solve this problem !!!
2
Non-linear spaces
 In many real-world applications, the relationships between variables are
non-linear
 A key feature of SVMs is their ability to map the problem into a higher
dimensional space using a process known as the “kernel trick”
 Non-linear relationship may suddenly appears to be quite linear
We have to use slack variables, it is a non-linearly separable problem
a
i
a
i
x1
x2
Mathematical approach
We minimize:
1
2 w
+ C
𝒊
𝒂
i
C: cost parameter to all points that violate the constraints
We make our optimization on this cost function
We can tune the C parameter: we can modify the penalty for the data points
that are misclassified
C is very large  the algorithm tries to find a 100% separation
C is low  wider overall margin is allowed with more misclassified data points
2
latitude
longitude
Kernels
It can be weather classes: sunny and snowy
kernel
altitude
longitude
latitude
longitude
kernel
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
Higher dimensional space
latitude
longitude
altitude
longitude
kernel
Higher dimensional space
With the kernel function we can transform the problem into linearly
separable one !!! ( slack variable: altitude )
latitude
longitude
altitude
longitude
kernel
SVM learns concepts that were not explicitly measured in the original data !!!
Higher dimensional space
latitude
longitude
altitude
longitude
Kernel functions
Φ(x) “phi function”
This is the mapping of data x into an other space
K(x , x ) this is the kernel function
i j
K(x , x ) =
i j
x * x
i j
Linear kernel: does not transform the data
( x * x + 1 )
i j
Polynomial kernel
d
x - x
i j
exp
2*
2
2
gaussian RBF kernel
-
K(x , x ) =
i j
K(x , x ) =
i j
Advantages
 SVM can be used for regression problems as well as for classifications
 Not overly influenced by noisy data
 Easier to use than neural networks
 Finding the best model requires testing of various combinations of kernels
and model parameters
 Quite slow  especially when the input dataset has a large number of
features
Disadvantages
93
Algorithms/
No. of
Classes
2 Class
Bonsai and
car side
5 class 20 40 80 101
Spars+SIFT
+SVM
100% 95.38% 79.19% 76.07% 75.26% 73.13%
Sparse
+SVM
47.37% 43.10% - - - -
SIFT
+SVM
56.56% 52.60% - - - -
Classification Test results
Caltech 101
94
Overview: Kernel-based learning
Lower dimension
Input Space
Higher dimension
Feature Space
Kernel
Design
 Kernel measures the similarity between data points
 Kernel transformation helps in using in linear separation algorithm
like Support Vector Classification (SVC) in higher dimensions
95
Same data can have elements that show different patterns
Best kernel is a linear combination
of different kernels
Overview: Kernel-based learning
96
Single Kernel SVM to Multikernel SVM[ 8 ]
97
Single Kernel SVM to Multikernel SVM[ 8 ]
98
SIFT
Dataset Features MKL SVM
Multikernel Approach
99
K(x , x ) =
i j
x * x
i j
Linear kernel: does not transform the data
( x * x + 1 )
i j
Polynomial kernel
d
x - x
i j
exp
2*
2
2
gaussian RBF kernel
-
K(x , x ) =
i j
K(x , x ) =
i j
Kernels used
100
Results we obtained Caltech101
MKL performance in our Algorithm
No. Kernel Parameter
No of training
image/Class
Accuracy
1 Gaussian [0.5 1 2 5 7 10 12 15 17 20] 30 75.52
2 Gaussian [0.5 1 2 5 7 10 12 15 17 20] 15 -
3 Polynomial [1 2 3] 30 75.70
4 Polynomial [1 2 3] 15 69.29
5 Gaussian +
Gaussian+Polynomial
+Polynomial +
[0.5 1 2 5 7 10 12 15 17 20]
[1 2 3]
30 74.97
6 Polynomial + Gausian +
Polynomia
[1 2 3]
[0.5 1 2 5 7 10 12 15 17 20]
30 75.58
Single Kernel Performance in our Algorithm
No. Kernel No of training Images/class Accuracy
1 Linear 30 69.71
2 Polynomial 30 64.18
3 Gaussian 30 61.81
101
Algorithms 15 training images/class 30 training images/class
Zhang et al. [1] 59.10±0.60 66.20±0.50
KSPM [2] 56.40 64.40±0.80
NBNN [3] 65.00±1.14 70.40
ML+CORR [4] 61.00 69.60
KC [5] - 64.14±1.18
LSPM[6] 53.23±0.65 58.81±1.51
ScSPM[6] 67.0±0.45 73.2±0.54
DMKDL [7] - 82.66± 0.36*
MKLDPL [7] - 86.81±0.21*
Our method
(Best Result)
69.29±0.98 75.70±1.30
*30 images for training and 15 images for testing
Comp aris on with oth er meth od s for Caltech 1 0 1 Datas et
102
 Try other Kernels with different L-norms
 Work on Two more dataset Caltech256, Scene-15
 Understand the cost function effect on different dataset in SVM
 Divide the training and testing data with standard approximation for all class
and check performance
 Publish a paper on above results
Future Work
103
Feature extraction
SVM
Caltech-101
Caltech-256
Scene-15
Dictionary Learning
Sparse Coding
Training features & Labels Testing features
Classified labels
% A c c u r a c y
SIFT, LBP etc…
KSVD,SimCO
OMP,MP,BP
Multikernel,
Cost Function
Using LBP:63
Using SIFT: 65
Fusion of SIFT+LBP : -
SPM+SIFT: ~77
Using SimCo: ~68
Using OMP: ~66
Using KSVD: ~73
Multikernel: ~ 75.70
Single kernel: ~ 69.71
Testing Labels
104
Sparse formulation of feature vector
Attractive properties of Sparse Coding.
 First, compared with the VQ coding, SC coding can achieve a much lower reconstruction
error due to the less restrictive constraint;
 Second, sparsity allows the representation to be specialize, and to capture salient
properties of images;
Third, research in image statistics clearly reveals that image patches are sparse signals.
D
Initialize D
Sparse Coding
(Greedy/Relaxation
Algorithms)
Dictionary Update
(DL Algorithms)
Aharon, Elad, & Bruckstein (`04)
Y
T
The K-SVD Algorithm - General
A
106
Caltech101 Learned Dictionary patches
256 x 256:
32x8(Patches)
256 x 512:
32x16(Patches)
256 x 1024:
32x32(Patches)
107
Caltech256 Learned Dictionary patches
256 x 256:
32x8(Patches)
256 x 512:
32x16(Patches)
256 x 1024:
32x32(Patches)
108
Caltech256 Learned Dictionary patches
256 x 2048:
32x64(Patches)
109
Scene-15 Learned Dictionary patches
256 x 256:
32x8(Patches)
256 x 512:
32x16(Patches)
256 x 1024:
32x32(Patches)
110
Linear SPM kernel for SVM
Histogram z
SVM decision function
for binary class
K : Any Kernel Function
α : weight
b : fix bias
111
Linear SPM kernel for SVM
Pooling Function
Max pooling
SPM kernel
Primal
formulation for
SVM [6]
Y
A
z
112
Spatial Pyramid Matching
113
Key Parameter of SIFT feature
[ Lowe, David G. "Distinctive image features from scale-
invariant keypoints." International journal of computer
vision 60.2 (2004): 91-110.]
2 × 2 descriptor array computed from an 8 × 8 set of samples

More Related Content

Similar to SYNOPSIS on Parse representation and Linear SVM.

IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET Journal
 
IRJET - Object Detection using Hausdorff Distance
IRJET -  	  Object Detection using Hausdorff DistanceIRJET -  	  Object Detection using Hausdorff Distance
IRJET - Object Detection using Hausdorff DistanceIRJET Journal
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff DistanceIRJET Journal
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...CSCJournals
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streamsirjes
 
Case Study: Prediction on Iris Dataset Using KNN Algorithm
Case Study: Prediction on Iris Dataset Using KNN AlgorithmCase Study: Prediction on Iris Dataset Using KNN Algorithm
Case Study: Prediction on Iris Dataset Using KNN AlgorithmIRJET Journal
 
Matrioska tracking keypoints in real-time
Matrioska tracking keypoints in real-timeMatrioska tracking keypoints in real-time
Matrioska tracking keypoints in real-timepowerUserHallo
 
Introduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionIntroduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionSaibee Alam
 
Soft Computing based Learning for Cognitive Radio
Soft Computing based Learning for Cognitive RadioSoft Computing based Learning for Cognitive Radio
Soft Computing based Learning for Cognitive Radioidescitation
 
8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kaflePAWAN KAFLE
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection incsandit
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...GagandeepKaur872517
 

Similar to SYNOPSIS on Parse representation and Linear SVM. (20)

IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
 
IRJET - Object Detection using Hausdorff Distance
IRJET -  	  Object Detection using Hausdorff DistanceIRJET -  	  Object Detection using Hausdorff Distance
IRJET - Object Detection using Hausdorff Distance
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff Distance
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
Case Study: Prediction on Iris Dataset Using KNN Algorithm
Case Study: Prediction on Iris Dataset Using KNN AlgorithmCase Study: Prediction on Iris Dataset Using KNN Algorithm
Case Study: Prediction on Iris Dataset Using KNN Algorithm
 
Matrioska tracking keypoints in real-time
Matrioska tracking keypoints in real-timeMatrioska tracking keypoints in real-time
Matrioska tracking keypoints in real-time
 
Introduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionIntroduction to image processing and pattern recognition
Introduction to image processing and pattern recognition
 
Soft Computing based Learning for Cognitive Radio
Soft Computing based Learning for Cognitive RadioSoft Computing based Learning for Cognitive Radio
Soft Computing based Learning for Cognitive Radio
 
8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection in
 
T0 numtq0n tk=
T0 numtq0n tk=T0 numtq0n tk=
T0 numtq0n tk=
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
 

Recently uploaded

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 

Recently uploaded (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 

SYNOPSIS on Parse representation and Linear SVM.

  • 1. By G a j j a r B h av i n ku m a r (IU1571090002) 30th July, 2022 Synopsis (Electronics and Communication Engineering) Sparse based feature parameterization and multi kernel SVM for large scale scene classification Under the supervision of D r. H i r e n M e w a d a (Associate Professor, EE,PMU) D r. A s h w i n Pa t n i (Assistant Professor, E&C,IITE,IU)
  • 2. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 3. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 4. Introduction of Image Classification ***Methods of Feature Selection Exhaustive search,Branch and Bound Search,Relaxed Branch and Bound,Selecting Best Individual Features,Sequential Forward Selection(SFS),Sequential Backward Selection(SBS),Sequential Floating Forward Search(SFFS),Sequential Floating Backward Search and Max-Min approach etc… Classification of Image feature Color Feature Histogram,momemnt(CM),Col or Coherence Vector(CCV), Color Correlogram Texture Feature The Grey Level Co-occurrence Matrix, Edge Detection, Laws Texture Energy Measures Shape Feature Binary image algorithm, Horizontal and vertical segmentation 4
  • 5. What is Image/object Classification?
  • 6. Classification Techniques Classification Supervised Unsupervised Distribution Free Euclidean classifier K-nearest neighbour Minimum distance Decision Tree Statistical Techniques based on probability Distribution models,which may be parametric or nonparametric Clustering No extensive prior knowledge required Unknown, but distinct, spectral classes are generated Limited control over classes and identities No detailed information
  • 7. • Large dimensionality of classes reduce the accuracy. • In real-time most of the high dimensional datasets do not follow normal distribution. Hence, Linear kernel fails to classify image. • Bag of word representation can not capture the spatial information. • Dense features representation makes it difficult to learn. • Linear SVM algorithm is not suitable for large data sets 7 Challenges in Image Classification
  • 8. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 9. Motivation from literature Over the past few years, the classification and recognition of vision have gained importance. There are three main component involved 1. Point of interest detection 2. Description of region of interest (Feature based) 3. Classification (Kernel based) Feature Based: To solve the multiclass reorganization problems, there are many supervised [1][2][3][4] and unsupervised [5][6][7][8] techniques used with sparse dictionaries. The state-of-the-art is accompanied by results on standard benchmark datasets, i.e. Caltech-101 [9], Caltech-256 [10], and Scene-15 [11] As reported in [8], vector quantization is used to generate sparse code with maximum pooling. By using this approach, the computation complexity of SVM is significantly reduced from O(n2) to O(n).
  • 10. Motivation from literature [12] suggested a method for multi-scale spatial latent semantic analysis based on sparse coding. The spatial pyramid matching of image segmentation is used to extract the target's spatial position information, and feature soft quantization based on sparse coding is utilised to produce a co-occurrence matrix, which increases the accuracy of the original feature representation. For matching multilevel detail locally in the learning and recognition stages, multi resolution pyramids were introduced in SIFT (PSIFT) feature space in [13]. This P- SIFT experiment showed positive results for streamline work. The authors of [14] experimented with a classification technique based on SIFT, in which SIFT are clustered using KNN to build a dictionary and then used the SPM to generate a feature vector. Feature Based:
  • 11. Motivation from literature Across all these studies, authors did not report the effect of SIFT parameters in their algorithms. Table 1 lists the parameters controlling the SIFT features. The majority of experiments in the literature use default values without tuning them for each task. As part of the first experiment, we investigated SIFT parameters on a sparse- based dictionary approach for image classification as suggested by Yang et al. [8]. Feature Based:
  • 12. Motivation from literature The combination of various descriptors employing multiple kernels SVM was introduced in [15] and demonstrated a significant improvement in various scene classifications. In [16] proposed the multilabel least-squares SVM method. For the multi-label scene classification problem, they used multi-kernel RBFbased SVM. The classifier was validated on four datasets, with a maximum accuracy of 85% Kancherla et al [17] validate the effect of kernel in SVM. They simulated the algorithm with a 3 to 4 class dataset and used different feature sets with various linear kernel SVM. On the MIT dataset, they discovered that the RBF kernel outperforms other kernels with a classification rate of 82.06 percent. Kernel based:
  • 13. Motivation from literature [18] presented an SVM-based scene classification method for robotic applications. The robotic development necessitates quick execution. As a result, from the captured scene, heuristic metric-based key points were identified and used in the SVM model. They conclude that combining local binary pattern and SURF features with SVM yielded higher accuracy than a VGG-based neural network model. To classify hyperspectral images, [20] proposed a hybrid approach of spatial, spectral, and semantic features. Gabor-based structural features are combined with morphological-based spatial features and semantic features based on K- means and entropy. A composite kernel is then created that corresponds to these three features, achieving an accuracy of 98%. Conversely, in a large dataset, SVM outperforms NN when features are interpreted geometrically. Real-world scene classification was achieved with the combination of dense SIFT, color SIFT, and structure similarity, as well as localized multikernel neural networks[23]. Kernel based:
  • 14. Motivation from literature Overall, Multi-kernel SVMs have proved essential in many recognition and classification applications. Despite the advantage of multikernel over CNN approaches for classifying scenes amongst a large number of categories, further improvement is needed to reduce the miss-classification rates among databases containing many classes. In addition, robust features can be achieved if redundancy is minimized and the SVM kernel is designed with optimized parameters consistent with these feature sets. Kernel based:
  • 15. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 16. Objective and Scope of the work Objective of study:  Check the effectiveness of the sparse data in image classification.  Addressing the issue of which size and types of dictionary is best for large scale dataset.  Selecting robust features that can address this problem.  How linear vs Non-linear kernels of traditional SVM classifier affect on large scale dataset?  Find the possibilities of reducing computational cost compared to modern Neural Networks for satisfactory accuracy.  Experimenting pros and cons of traditional Machine Learning over Modern deep learning algorithms.
  • 17. Objective and Scope of the work Scope of the work: In machine vision, there is no any rigorous study of tuning most proven SIFT feature in classification task. Our study suggests that SIFT feature can be tuned according to problem and that features can be sparsified by matching the appropriate size of dictionary. Any traditional machine learning approach can take advantage of this feature set in order to deal with modern deep learning algorithms where the requirements of training data, training time, and computational hardware are higher.
  • 18. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 19. Problem definition  Image classification problems include intra-class variation, scale variation, view-point variation, occlusion, lighting, background clutter, etc. Feature selection, kernels, classifiers, machine learning, and deep-learning algorithms can be applied.  Until date, it was difficult to apply any of these methodologies to large-scale data while preserving accuracy.  Sparse representation has shown significant potential in dealing with these challenges.  Traditional classification techniques that use sparse representations lack image label information. The current deep learning technique's primary flaw is its excessively expensive training effort. Integrating existing sparse representation technologies into deep learning is a valuable unresolved topic.
  • 20. Problem definition  We presented a methodology for bridging sparse and machine learning algorithms and showed its performance for large datasets. The research aims to enhance multi-class large dataset classification accuracy.  Sparse picture characteristics and machine learning will be used for categorization.  Another sub-objective is to optimize machine learning speed and class detection with appropriate accuracy.
  • 21. Problem statement in summury 1. Classification accuracy in multiclass is still difficult with existing techniques 2. Computational time is second concern to optimize with 1. 3. Sparse and ML based approach for classification will be overlooked 4. Possible outcome will be an efficient algorithm which satisfy 1-2-3. 5. Targeted benchmark data set : Caltech-101,Caltech-256, Scene-15
  • 22. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 23. Original contribution by the thesis The impact of dictionary size and type  converges quickly  KSVD  16x16 image patch size  Over-complete dictionary of size 256x1024 Parameterizing SIFT (T-SIFT)  SIFT descriptor size of 128 is insufficient for all data sizes  SIFT can be customized  256 size descriptor with 16 angels and 4 SIFT bins is sufficient  Table-3  T-SIFT is more robust  T-SIFT outperforms CNN in hardware, training time, and training data requirements. Multi-kernel SVM with Tuned SIFT  Gaussian Kernel outperforms the Polynomial and its fusion  Improvement on Caltech-101: 4% and Scene-15 : 10%  Caltech-256 is difficult to train with minimal hardware.  T-SIFT with MKL SVM is a novel method
  • 24. Original contribution by the thesis The impact of dictionary size and type Parameterizing SIFT (T-SIFT) Multi-kernel SVM with Tuned SIFT Summary:  This thesis presents a distinctive contribution by providing some recommendations for modifying the parameters value chosen for the Dictionary and SIFT.  When contrasted with the prior art, Tunable-use SIFTs in Sparse coded Spatial Pyramid Matching (ScSPM) and Multi-Kernel nonlinear Support Vector Machines (SVM) produce significant gains in terms of classification accuracy.  In addition, the uniqueness of the contribution can be seen in the studies that are referenced in the bibliography.
  • 25. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 26. Methodologies of Research and Results Intel Core i3 of 2.50 GHz, 8 GB RAM, and Windows-10 of 64 bit machine SIFT feature analysis and T-SIFT implementations Sparse coded SPM with multi kernel SVM implementation First method Phase-1: The impact of dictionary size and type Phase-2: Parameterizing SIFT (T-SIFT) Second method
  • 27. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Figure-1: Proposed tunable SIFT ScSPM
  • 28. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations There were two phases of the study for the first method: 1 - Dictionary learning 2 - Training the classifier.
  • 29. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 1 - Dictionary learning
  • 30. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 1 - Dictionary learning
  • 31. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 2 - Training the classifier.
  • 32. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 2 - Training the classifier.
  • 33. Methodologies of Research and Results First Method: SIFT feature analysis and T-SIFT implementations Phase 2 - Training the classifier.
  • 34. Methodologies of Research and Results Second Method : Sparse coded SPM with multi kernel SVM implementation
  • 35. Methodologies of Research and Results Second Method : Sparse coded SPM with multi kernel SVM implementation In this experiment, we used kernel weights dm to solve the convex optimization problem stated in equation-7 using SVM as proposed in [30]. To obtain kernel weights d, the fusions of the kernels with the weights of respective coefficients are listed in Tab. 4.
  • 36. Methodologies of Research and Results Second Method : Sparse coded SPM with multi kernel SVM implementation
  • 37. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 38. Conclusion and Future work  The size and sparsity of the dictionary are determined by the SIFT parameters. Therefore, in the first experiment we are presenting the effect of orientation and orientation bins on the size and sparsity of feature vectors.  By reducing the average number of coefficients, the study concludes that 30 iterations are sufficient to achieve maximum sparsity in the dictionary.  After obtaining the maximum sparsity of the dictionary, the effect of dictionary sizes on overall classification accuracy is examined.  In further research, it was found that the classification accuracy would be less for low values of either orientation or orientation bins in histogram formation. As a result, the appropriate choice of those two parameters results in a boost in performance as described in the first method. SVM linear kernels were used in this empirical study.
  • 39. Conclusion and Future work  Secondly, we investigated the fusion of Nonlinear Multi Kernel Learning (MKL). Although CNNs have achieved high popularity in classification models, they require a lot of training time and computation power.  SVM has a greater flexibility in characterization than CNN, if a suitable kernel is used for challenging datasets. As a single kernel, it is limited to datasets with linear classification.  Therefore, a multi-kernel SVM has been re-experimented with the aim of optimizing the kernels and studying the various parameters affecting the kernel performance in classification.  The function of various parameters has been investigated to eliminate duplicate features in the evaluation of simple MKL over ScSPM features for classification accuracy.
  • 40. Conclusion and Future work  The effect of MKL on overall classification accuracy is presented after obtaining the maximum sparsity of the dictionary. Even with the simplest combination of a single type kernel, such as Polynomial, as represented in Tab. 4, accuracy will be greater than the single kernel SVM method.  For 101 class datasets, using several combination of Gaussian kernels improved classification accuracy to 85.72 percent.  With an increasing number of Gaussian kernels, training time and storage needs grow, making it impossible to work on huge datasets like Caltech-256 with minimal hardware requirements.  As a whole, we conclude that working with strong features and Multi kernels on object identification is still an open area. We will investigate the impact of this feature on similar classes in the future.
  • 41. 1. Introduction of Image Classification 2. Problem Definitions 3. Objective and Scope of the work 4. Motivation from literature 5. Original Contribution by the thesis 6. Methodologies of Research and Results 7. Conclusion and Future work 8. List of publications 9. References Highlights of Synopsis
  • 42. List of publications 1. Gajjar, Bhavinkumar, Hiren Mewada and Ashwin Patani. "Parameterizing sift and sparse dictionary for svm based multi-class object classification“ International Journal of Artificial Intelligence 19 (2021): 95-108. http://www.ceser.in/ceserp/index.php/ijai/article/view/6647 (SCOPUS) 2. Gajjar, Bhavinkumar, Hiren Mewada, and Ashwin Patani. "Sparse coded spatial pyramid matching and multikernel integrated SVM for non-linear scene classification" Journal of Electrical Engineering 72.6 (2021): 374-380. https://doi.org/10.2478/jee-2021-0053/(SCOPUS)
  • 43. 1. cameraman.tif 2. rice.png 3. circlesBrightDark.png 4. liftingBody.png Results for Matching Pursuit Algorithm Dict1- Discrete Wavelet Dict2- DCT and Kronecker Delta Dict3- Haar Wavelet Packets and DCT Dict4- K-SVD Arpan Patel. ”Image Classification with sparse coding and machine learning” thesis. CSPIT, 2017. 43 L7
  • 44. Objective and Scope of the work 1. Check the effectiveness of the sparse data in image classification. 2. Addressing the issue of which size and types of dictionary is best for large scale dataset. 3. Selecting robust features that can address this problem. 4. How linear vs Non-linear kernels of traditional SVM classifier affect on large scale dataset? 5. Find the possibilities of reducing computational cost compared to modern Neural Networks for satisfactory accuracy. 6. Experimenting pros and cons of traditional Machine Learning over Modern deep learning algorithms.
  • 45. Objective and Scope of the work In machine vision, there is no any rigorous study of tuning most proven SIFT feature in classification task. Our study suggests that SIFT feature can be tuned according to problem and that features can be sparsified by matching the appropriate size of dictionary. Any traditional machine learning approach can take advantage of this feature set in order to deal with modern deep learning algorithms where the requirements of training data, training time, and computational hardware are higher.
  • 46. Features (+Sparse) Kernel function of Classifier Classification techniques(ML) Speeded Up Robust Features (SURF) Linear K-Means Features from Accelerated Segment Test (FAST) RBF SVM Binary Robust Independent Elementary Features (BRIEF) Polinomial K-nearest neighbour(KNN) Oriented FAST and Rotated BRIEF (ORB) sigmoid Artificial Neural Network(ANN) Histogram of Oriented Gradients (HOGs) Convolutional neural Network(CNN) … … … Good features Classification techniques Kernels of classifier Accuracy ? Computation Time?
  • 47. Introduction of Image Classification  Challenges: intra-class variation, scale variation, view-point variation, occlusion, illumination, background clutter  Approaches: feature selection, kernels, classifiers, machine learning and deep-learning algorithms
  • 48. 48 A sparse matrix is a one in which the majority of the values are zero. The proportion of zero elements to non-zero elements is called the sparsity of the matrix. The opposite of a sparse matrix, in which the majority of its values are non-zero, is called a dense matrix. 5 0 0 0 0 11 0 0 0 0 25 0 0 0 0 7 Sparsity = 3 (12 Zeros / 4 Non-zeros) Advantage:  save a significant amount of memory  speed up the processing of that data  Reduce computation time by eliminating operations on zero elements What is Sparse? What is Sparse?
  • 50. 50 Greedy Algorithms Matching Pursuit(MP) Orthogonal Matching Pursuit(OMP) [Stagewise Orthogonal Matching Pursuit (StOMP), Subspace Pursuit (SP), Compressive Sampling Matching Pursuit (CoSaMP), Regularized Orthogonal Matching Pursuit (ROMP), Gradient Pursuit (GP), Iterative Hard Thresholding (IHT), Hard Thresholding Pursuit (HTP)] Relaxation Algorithm  Basic Pursuit(BP)  Least-Absolute-Shrinkage-and- Selection-Operator (LASSO)  FOcal Under-determined System Solver (FOCUSS) Sparse coding
  • 51. 51 1. Maximum Likelihood (ML) 2. Method of Optimal Directions (MOD) 3. K-SVD 4. Simultaneous Codeword Optimization (SimCO) Dictionary Learning Algorithms
  • 52. D Initialize D Sparse Coding (Greedy/Relaxation Algorithms) Dictionary Update (DL Algorithms) Aharon, Elad, & Bruckstein (`04) Y T The K-SVD Algorithm - General
  • 53.  Can be applied to almost everything  Classifications or numerical predictions  Widely used in pattern recognition o Identify cancer or genetic diseases o Text classification: classify texts based on the language o Detecting rare events: earthquakes or engine failures Support vector machine
  • 54. x We have two features ( x , x ) and some data points 1 2 1 x2 Linearly separable problem
  • 55. We want to find a hyperplane, in this case a line, that separates the different data points with the maximum margin x1 x2
  • 57. This is the maximum margin solution x 1 x2
  • 58. Support vectors:  the points from each class that are closest to the maximum margin hyperplane  each class have at least 1 support vector Support vectors x1 x2
  • 59. With the support vectors alone it is possible to reconstruct the hyperplane: it is good !!! We can store the classification model even when we have millions of features Support vectors x1 x2
  • 60. How to find the hyperplane when the problem is linearly separable? With convex hulls x1 x2
  • 61. How to find the hyperplane when the problem is linearly separable? With convex hulls Convex hull: smallest convex set that contains all the point The hyperplane is the perpendicular bisector of the shortest line between the two hull x1 x2
  • 62. Mathematical approach w * x + b = 0 the equation of a hyperplane in n-dimensions In 2D: y = m*x + b w w ... w n 1 2 x x ... x n 1 2 we have the so called weights The aim of the SVM algorithm is to find the w weights so that the data points will be separated accordingly: w * x + b > +1 w * x + b < -1 n
  • 63. How to find the hyperplane in 2D? With convex hulls The two planes defined by the equations x1 x2 d H0 H1
  • 64. Mathematical approach Vector geometry defines, that the distance between the two planes: 2 w Euclidean-norm ( distance from 0 ) We want to make the distance as large as possible  so we want to minimize the norm of the w We usually minimize: 1 2 w Quadratic optimization solve this problem !!! 2
  • 65. Non-linear spaces  In many real-world applications, the relationships between variables are non-linear  A key feature of SVMs is their ability to map the problem into a higher dimensional space using a process known as the “kernel trick”  Non-linear relationship may suddenly appears to be quite linear
  • 66. We have to use slack variables, it is a non-linearly separable problem a i a i x1 x2
  • 67. Mathematical approach We minimize: 1 2 w + C 𝒊 𝒂 i C: cost parameter to all points that violate the constraints We make our optimization on this cost function We can tune the C parameter: we can modify the penalty for the data points that are misclassified C is very large  the algorithm tries to find a 100% separation C is low  wider overall margin is allowed with more misclassified data points 2
  • 68. latitude longitude Kernels It can be weather classes: sunny and snowy
  • 70. kernel With the kernel function we can transform the problem into linearly separable one !!! ( slack variable: altitude ) Higher dimensional space latitude longitude altitude longitude
  • 71.  Can be applied to almost everything  Classifications or numerical predictions  Widely used in pattern recognition o Identify cancer or genetic diseases o Text classification: classify texts based on the language o Detecting rare events: earthquakes or engine failures Support vector machine
  • 72. x We have two features ( x , x ) and some data points 1 2 1 x2 Linearly separable problem
  • 73. We want to find a hyperplane, in this case a line, that separates the different data points with the maximum margin x1 x2
  • 75. This is the maximum margin solution x 1 x2
  • 76. Support vectors:  the points from each class that are closest to the maximum margin hyperplane  each class have at least 1 support vector Support vectors x1 x2
  • 77. With the support vectors alone it is possible to reconstruct the hyperplane: it is good !!! We can store the classification model even when we have millions of features Support vectors x1 x2
  • 78. How to find the hyperplane when the problem is linearly separable? With convex hulls x1 x2
  • 79. How to find the hyperplane when the problem is linearly separable? With convex hulls Convex hull: smallest convex set that contains all the point The hyperplane is the perpendicular bisector of the shortest line between the two hull x1 x2
  • 80. Mathematical approach w * x + b = 0 the equation of a hyperplane in n-dimensions In 2D: y = m*x + b w w ... w n 1 2 x x ... x n 1 2 we have the so called weights The aim of the SVM algorithm is to find the w weights so that the data points will be separated accordingly: w * x + b > +1 w * x + b < -1 n
  • 81. How to find the hyperplane in 2D? With convex hulls The two planes defined by the equations x1 x2 d H0 H1
  • 82. Mathematical approach Vector geometry defines, that the distance between the two planes: 2 w Euclidean-norm ( distance from 0 ) We want to make the distance as large as possible  so we want to minimize the norm of the w We usually minimize: 1 2 w Quadratic optimization solve this problem !!! 2
  • 83. Non-linear spaces  In many real-world applications, the relationships between variables are non-linear  A key feature of SVMs is their ability to map the problem into a higher dimensional space using a process known as the “kernel trick”  Non-linear relationship may suddenly appears to be quite linear
  • 84. We have to use slack variables, it is a non-linearly separable problem a i a i x1 x2
  • 85. Mathematical approach We minimize: 1 2 w + C 𝒊 𝒂 i C: cost parameter to all points that violate the constraints We make our optimization on this cost function We can tune the C parameter: we can modify the penalty for the data points that are misclassified C is very large  the algorithm tries to find a 100% separation C is low  wider overall margin is allowed with more misclassified data points 2
  • 86. latitude longitude Kernels It can be weather classes: sunny and snowy
  • 88. kernel With the kernel function we can transform the problem into linearly separable one !!! ( slack variable: altitude ) Higher dimensional space latitude longitude altitude longitude
  • 89. kernel Higher dimensional space With the kernel function we can transform the problem into linearly separable one !!! ( slack variable: altitude ) latitude longitude altitude longitude
  • 90. kernel SVM learns concepts that were not explicitly measured in the original data !!! Higher dimensional space latitude longitude altitude longitude
  • 91. Kernel functions Φ(x) “phi function” This is the mapping of data x into an other space K(x , x ) this is the kernel function i j K(x , x ) = i j x * x i j Linear kernel: does not transform the data ( x * x + 1 ) i j Polynomial kernel d x - x i j exp 2* 2 2 gaussian RBF kernel - K(x , x ) = i j K(x , x ) = i j
  • 92. Advantages  SVM can be used for regression problems as well as for classifications  Not overly influenced by noisy data  Easier to use than neural networks  Finding the best model requires testing of various combinations of kernels and model parameters  Quite slow  especially when the input dataset has a large number of features Disadvantages
  • 93. 93 Algorithms/ No. of Classes 2 Class Bonsai and car side 5 class 20 40 80 101 Spars+SIFT +SVM 100% 95.38% 79.19% 76.07% 75.26% 73.13% Sparse +SVM 47.37% 43.10% - - - - SIFT +SVM 56.56% 52.60% - - - - Classification Test results Caltech 101
  • 94. 94 Overview: Kernel-based learning Lower dimension Input Space Higher dimension Feature Space Kernel Design  Kernel measures the similarity between data points  Kernel transformation helps in using in linear separation algorithm like Support Vector Classification (SVC) in higher dimensions
  • 95. 95 Same data can have elements that show different patterns Best kernel is a linear combination of different kernels Overview: Kernel-based learning
  • 96. 96 Single Kernel SVM to Multikernel SVM[ 8 ]
  • 97. 97 Single Kernel SVM to Multikernel SVM[ 8 ]
  • 98. 98 SIFT Dataset Features MKL SVM Multikernel Approach
  • 99. 99 K(x , x ) = i j x * x i j Linear kernel: does not transform the data ( x * x + 1 ) i j Polynomial kernel d x - x i j exp 2* 2 2 gaussian RBF kernel - K(x , x ) = i j K(x , x ) = i j Kernels used
  • 100. 100 Results we obtained Caltech101 MKL performance in our Algorithm No. Kernel Parameter No of training image/Class Accuracy 1 Gaussian [0.5 1 2 5 7 10 12 15 17 20] 30 75.52 2 Gaussian [0.5 1 2 5 7 10 12 15 17 20] 15 - 3 Polynomial [1 2 3] 30 75.70 4 Polynomial [1 2 3] 15 69.29 5 Gaussian + Gaussian+Polynomial +Polynomial + [0.5 1 2 5 7 10 12 15 17 20] [1 2 3] 30 74.97 6 Polynomial + Gausian + Polynomia [1 2 3] [0.5 1 2 5 7 10 12 15 17 20] 30 75.58 Single Kernel Performance in our Algorithm No. Kernel No of training Images/class Accuracy 1 Linear 30 69.71 2 Polynomial 30 64.18 3 Gaussian 30 61.81
  • 101. 101 Algorithms 15 training images/class 30 training images/class Zhang et al. [1] 59.10±0.60 66.20±0.50 KSPM [2] 56.40 64.40±0.80 NBNN [3] 65.00±1.14 70.40 ML+CORR [4] 61.00 69.60 KC [5] - 64.14±1.18 LSPM[6] 53.23±0.65 58.81±1.51 ScSPM[6] 67.0±0.45 73.2±0.54 DMKDL [7] - 82.66± 0.36* MKLDPL [7] - 86.81±0.21* Our method (Best Result) 69.29±0.98 75.70±1.30 *30 images for training and 15 images for testing Comp aris on with oth er meth od s for Caltech 1 0 1 Datas et
  • 102. 102  Try other Kernels with different L-norms  Work on Two more dataset Caltech256, Scene-15  Understand the cost function effect on different dataset in SVM  Divide the training and testing data with standard approximation for all class and check performance  Publish a paper on above results Future Work
  • 103. 103 Feature extraction SVM Caltech-101 Caltech-256 Scene-15 Dictionary Learning Sparse Coding Training features & Labels Testing features Classified labels % A c c u r a c y SIFT, LBP etc… KSVD,SimCO OMP,MP,BP Multikernel, Cost Function Using LBP:63 Using SIFT: 65 Fusion of SIFT+LBP : - SPM+SIFT: ~77 Using SimCo: ~68 Using OMP: ~66 Using KSVD: ~73 Multikernel: ~ 75.70 Single kernel: ~ 69.71 Testing Labels
  • 104. 104 Sparse formulation of feature vector Attractive properties of Sparse Coding.  First, compared with the VQ coding, SC coding can achieve a much lower reconstruction error due to the less restrictive constraint;  Second, sparsity allows the representation to be specialize, and to capture salient properties of images; Third, research in image statistics clearly reveals that image patches are sparse signals.
  • 105. D Initialize D Sparse Coding (Greedy/Relaxation Algorithms) Dictionary Update (DL Algorithms) Aharon, Elad, & Bruckstein (`04) Y T The K-SVD Algorithm - General A
  • 106. 106 Caltech101 Learned Dictionary patches 256 x 256: 32x8(Patches) 256 x 512: 32x16(Patches) 256 x 1024: 32x32(Patches)
  • 107. 107 Caltech256 Learned Dictionary patches 256 x 256: 32x8(Patches) 256 x 512: 32x16(Patches) 256 x 1024: 32x32(Patches)
  • 108. 108 Caltech256 Learned Dictionary patches 256 x 2048: 32x64(Patches)
  • 109. 109 Scene-15 Learned Dictionary patches 256 x 256: 32x8(Patches) 256 x 512: 32x16(Patches) 256 x 1024: 32x32(Patches)
  • 110. 110 Linear SPM kernel for SVM Histogram z SVM decision function for binary class K : Any Kernel Function α : weight b : fix bias
  • 111. 111 Linear SPM kernel for SVM Pooling Function Max pooling SPM kernel Primal formulation for SVM [6] Y A z
  • 113. 113 Key Parameter of SIFT feature [ Lowe, David G. "Distinctive image features from scale- invariant keypoints." International journal of computer vision 60.2 (2004): 91-110.] 2 × 2 descriptor array computed from an 8 × 8 set of samples