Upcoming SlideShare
×

# Support vector machine

1,317 views
1,036 views

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,317
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
50
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Support vector machine

1. 1. Musa Al hawamdah 128129001011 1
2. 2. Topics : Linear Support Vector Machines - Lagrangian (primal) formulation - Dual formulation - Discussion Linearly non-separable case - Soft-margin classification Nonlinear Support Vector Machines - Nonlinear basis functions - The Kernel trick - Mercer’s condition - Popular kernels 2
3. 3. Support Vector Machine (SVM):  Basic idea: - The SVM tries to find a classifier which maximizes the margin between pos. and neg. data points. - Up to now: consider linear classifiers • Formulation as a convex optimization problem Find the hyperplane satisfying 3
4. 4. SVM – Primal Formulation:  • Lagrangian primal form: • The solution of Lp needs to fulfill the KKT conditions - Necessary and sufficient conditions 4
5. 5. SVM – Solution:  Solution for the hyperplane ** Computed as a linear combination of the training examples: **Sparse solution: only for some points, the support vectors ⇒ Only the SVs actually influence the decision boundary! **Compute b by averaging over all support vectors: 5
6. 6. SVM – Support Vectors:  The training points for which an > 0 are called “support vectors”.  Graphical interpretation: ** The support vectors are the points on the margin. **They define the margin and thus the hyperplane. ⇒ All other data points can be discarded! • 6
7. 7. SVM – Discussion (Part 1):  Linear SVM:  Linear classifier  Approximative implementation of the SRM principle.  In case of separable data, the SVM produces an empirical risk of zero with minimal value of the VC confidence (i.e. a classifier minimizing the upper bound on the actual risk).  SVMs thus have a “guaranteed” generalization capability.  Formulation as convex optimization problem. ⇒ Globally optimal solution!  Primal form formulation:  Solution to quadratic prog. problem in M variables is in .  Here: D variables ⇒  Problem: scaling with high-dim. data (“curse of dimensionality”) 7
8. 8. SVM – Dual Formulation: 8
9. 9. SVM – Dual Formulation: 9
10. 10. SVM – Dual Formulation: • 10
11. 11. SVM – Dual Formulation: 11
12. 12. SVM – Discussion (Part 2):  Dual form formulation  In going to the dual, we now have a problem in N variables (an).  Isn’t this worse??? We penalize large training sets!  However… 12
13. 13. SVM – Support Vectors:  Only looked at linearly separable case…  Current problem formulation has no solution if the data are not linearly separable!  Need to introduce some tolerance to outlier data points. • 13
14. 14. SVM – Non-Separable Data: 14
15. 15. SVM – Soft-Margin Classification: 15
16. 16. SVM – Non-Separable Data: 16
17. 17. SVM – New Primal Formulation: 17
18. 18. SVM – New Dual Formulation 18
19. 19. SVM – New Solution: 19
20. 20. Nonlinear SVM: 20
21. 21. Another Example • Non-separable by a hyperplane in 2D: 21
22. 22. • Separable by a surface in 3D 22
23. 23. Nonlinear SVM – Feature Spaces  • General idea: The original input space can be mapped to some higher- dimensional feature space where the training set is separable: 23
24. 24. Nonlinear SVM: 24
25. 25. What Could This Look Like? 25
26. 26. Problem with High-dim. BasisFunctions 26
27. 27. Solution: The Kernel Trick: 27
28. 28. Back to Our Previous Example… 28
29. 29. SVMs with Kernels: 29
30. 30. Which Functions are ValidKernels?  Mercer’s theorem (modernized version):  Every positive definite symmetric function is a kernel.  Positive definite symmetric functions correspond to a positive definite symmetric Gram matrix: 30
31. 31. Kernels Fulfilling Mercer’sCondition: 31
32. 32. THANK YOU 32