How to Remove Document Management Hurdles with X-Docs?
SVM
1. Support Vector Machine
• Contents
• What are SVM
• Why to use SVM
• Large Margin Classifier
• Non linearly Separable classification
• Kernels
• Gaussian kernel
• Multiclass classification
• Conclusion
• SVM for image classification
MOHIT SHRIVASTAVA
DEPARTMENT OF COMPUTER SCIENCE
AND ENGINEERING
NIT W
2. What are SVM?
• In machine learning, support vector machines (SVMs) are supervised learning models with
associated learning algorithms that analyze data used for classification and regression analysis.
• 1979, Vapnik and Chervonenkis, proposed a model called the "Maximal Margin Classifier", that's
where SVM was born.
• In1992, Vapnik had the idea to apply what is called the Kernel Trick, which allow to use the SVM
to classify linearly nonseparable data.
• In 1995, Cortes and Vapnik introduced the Soft Margin Classifier which allows us to accept some
misclassifications when using a SVM.
3. What are SVM?
• When we talk about classification there is already four different Support Vector Machines:
• The original one : the Maximal Margin Classifier
• The kernelized version using the Kernel Trick(Linearly non seprable)
• The soft-margin version(accept some misclassification)
• The soft-margin kernelized version (which combine 1, 2 and 3)
The last one is used most of the time in practice.
4. Why to use SVM?
• SVM are used to fit complex nonlinear hypothesis.
• Neural network also successfully achieves to fit nonlinear hypothesis, but SVM takes
comparatively less time to train.
• It is most widely and also most commonly used in industry and academia.
5. Large Margin classifier
• All three lines on the left
classifies the data.
• But which one is the best??
• SVM gives us the the
decision boundary with a
large margin.
6. How to get a large
margin
• SVM cost function is a
modification of the logistic
cost function
7. How to get a large
margin
• SVM cost function is a
modification of the logistic
cost function
• We need a cost function for
SVM which is similar to red
line.
9. SVM cost function
with large margin
• number of training
example=n
• number of feature=d
• Instead of y=1/0 y=+1/-1
gives us a large margin to
classify
• The cost function is changed
accordingly
10. SVM cost function
with large margin
• The regularization
parameter is removed and C
now acts as 1/regularization
parameter
11. Use of C
• Sometime it is better not to
cange the decision boundry
• And allow some
misclassification
• Parameter C helps here
12. Nonlinearly separable
data
• We transform our input
space to a feature space but
if feature space is too
large??
• To develop complex
nonlinear classifiers, the
main technique for doing
that is called kernels.
14. Gaussian Kernel
Example
• Gaussian is nothing but a
similarity function.
• Pick 3 points or landmark
manually l1,l2,l3.
• Now we use gaussian kernel
to predict y
• It gives 1 when x and l are
similar otherwise 0
15. Gaussian Kernel
Example
• Gaussian is nothing but a
similarity function.
• Pick 3 points or landmark
manually l1,l2,l3.
• Now we use gaussian kernel
to predict y
• It gives 1 when x and l are
similar otherwise 0
17. Multi class
classification
• We use the same one vs
rest rule as we use in logistic
regression
• For K class train K svm one
for each class
18. How choose b/w Logistic regression and SVM
• n = # training examples d = # features
• If d is large (relative to n) (e.g., d > n with d = 10,000, n = 10-1,000),
use logistic regression or SVM with a linear kernel
• If d is small (up to 1,000), n is intermediate (up to 10,000),use SVM
with Gaussian kernel
• If d is small (up to 1,000), n is large (50,000+) , create/add more
features, then use logistic regression or SVM without a kernel
• Neural networks likely to work well for most of these settings, but
may be slower to train
19. Conclusion
• SVMs find optimal linear separator
• The kernel trick makes SVMs learn non-linear decision surfaces
• Strength of SVMs: – Good theoretical and empirical performance –
Supports many types of kernels
• Disadvantages of SVMs: – “Slow” to train/predict for huge data sets
(but relatively fast!) – Need to choose the kernel (and tune its
parameters)
20. SVM for image classification
• We will now demonstrate how to use svm for image classification
• We will use the Olivetti faces dataset.
• This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T
Laboratories Cambridge.
• There are ten different images of each of 40 distinct subjects.
• For some subjects, the images were taken at different times, varying the lighting, facial
expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses).
• All the images were taken against a dark homogeneous background with the subjects in an
upright, frontal position (with tolerance for some side movement).