Support vector machine

1,317 views
1,036 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,317
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
50
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Support vector machine

  1. 1. Musa Al hawamdah 128129001011 1
  2. 2. Topics : Linear Support Vector Machines - Lagrangian (primal) formulation - Dual formulation - Discussion Linearly non-separable case - Soft-margin classification Nonlinear Support Vector Machines - Nonlinear basis functions - The Kernel trick - Mercer’s condition - Popular kernels 2
  3. 3. Support Vector Machine (SVM):  Basic idea: - The SVM tries to find a classifier which maximizes the margin between pos. and neg. data points. - Up to now: consider linear classifiers • Formulation as a convex optimization problem Find the hyperplane satisfying 3
  4. 4. SVM – Primal Formulation:  • Lagrangian primal form: • The solution of Lp needs to fulfill the KKT conditions - Necessary and sufficient conditions 4
  5. 5. SVM – Solution:  Solution for the hyperplane ** Computed as a linear combination of the training examples: **Sparse solution: only for some points, the support vectors ⇒ Only the SVs actually influence the decision boundary! **Compute b by averaging over all support vectors: 5
  6. 6. SVM – Support Vectors:  The training points for which an > 0 are called “support vectors”.  Graphical interpretation: ** The support vectors are the points on the margin. **They define the margin and thus the hyperplane. ⇒ All other data points can be discarded! • 6
  7. 7. SVM – Discussion (Part 1):  Linear SVM:  Linear classifier  Approximative implementation of the SRM principle.  In case of separable data, the SVM produces an empirical risk of zero with minimal value of the VC confidence (i.e. a classifier minimizing the upper bound on the actual risk).  SVMs thus have a “guaranteed” generalization capability.  Formulation as convex optimization problem. ⇒ Globally optimal solution!  Primal form formulation:  Solution to quadratic prog. problem in M variables is in .  Here: D variables ⇒  Problem: scaling with high-dim. data (“curse of dimensionality”) 7
  8. 8. SVM – Dual Formulation: 8
  9. 9. SVM – Dual Formulation: 9
  10. 10. SVM – Dual Formulation: • 10
  11. 11. SVM – Dual Formulation: 11
  12. 12. SVM – Discussion (Part 2):  Dual form formulation  In going to the dual, we now have a problem in N variables (an).  Isn’t this worse??? We penalize large training sets!  However… 12
  13. 13. SVM – Support Vectors:  Only looked at linearly separable case…  Current problem formulation has no solution if the data are not linearly separable!  Need to introduce some tolerance to outlier data points. • 13
  14. 14. SVM – Non-Separable Data: 14
  15. 15. SVM – Soft-Margin Classification: 15
  16. 16. SVM – Non-Separable Data: 16
  17. 17. SVM – New Primal Formulation: 17
  18. 18. SVM – New Dual Formulation 18
  19. 19. SVM – New Solution: 19
  20. 20. Nonlinear SVM: 20
  21. 21. Another Example • Non-separable by a hyperplane in 2D: 21
  22. 22. • Separable by a surface in 3D 22
  23. 23. Nonlinear SVM – Feature Spaces  • General idea: The original input space can be mapped to some higher- dimensional feature space where the training set is separable: 23
  24. 24. Nonlinear SVM: 24
  25. 25. What Could This Look Like? 25
  26. 26. Problem with High-dim. BasisFunctions 26
  27. 27. Solution: The Kernel Trick: 27
  28. 28. Back to Our Previous Example… 28
  29. 29. SVMs with Kernels: 29
  30. 30. Which Functions are ValidKernels?  Mercer’s theorem (modernized version):  Every positive definite symmetric function is a kernel.  Positive definite symmetric functions correspond to a positive definite symmetric Gram matrix: 30
  31. 31. Kernels Fulfilling Mercer’sCondition: 31
  32. 32. THANK YOU 32

×