Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Support Vector Machine<br />Shao-Chuan Wang<br />1<br />
Support Vector Machine<br />1D Classification Problem: how will you separate these data?(H1, H2, H3?)<br />2<br />H1<br />...
Support Vector Machine<br />2D Classification Problem: which H is better?<br />3<br />
Max-Margin Classifier<br />Functional Margin<br />Geometric Margin<br />4<br />We feel more confident <br />when functiona...
Maximize margins<br />Optimization problem: maximize minimal geometric margin under constraints.<br />Introduce scaling fa...
Optimization problem subject to constraints<br />Maximize f(x, y), subject to constraint g(x, y) = c<br />6<br />-> Lagran...
Lagrange duality<br />Primal optimization problem:<br />GeneralizedLagrangian method<br />Primal optimization problem (equ...
Dual Problem<br />The necessary conditions that equality holds:<br />f, giare convex, and hi are affine.<br />KKT conditio...
Optimal margin classifiers<br />Its Lagrangian<br />Its dual problem<br />9<br />Andrew Ng. Part V Support Vector Machines...
Support Vector Machine (cont’d)<br />If not linearly separable, we can<br />Find a nonlinear solution<br />Technically, it...
Kernel and feature mapping<br />Kernel:<br />Positive semi-definite<br />Symmetric<br />For example:<br />Loose Intuition<...
Soft Margin (L1 regularization)<br />12<br />C = ∞ leads to hard margin SVM, <br />Rychetsky (2001)<br />Andrew Ng. Part V...
Why doesn’t my model fit well on test data ?<br />13<br />
Bias/variance tradeoff<br />underfitting(high bias) overfitting(high variance) <br />Training Error = <br />Generalization...
Bias/variance tradeoff<br />15<br />T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Sprin...
Is training error a good estimator of generalization error?<br />16<br />
Chernoff bound (|H|=finite)<br />Lemma: Assume Z1, Z2, …, Zmare drawn iid from Bernoulli(φ), and<br />	and let γ &gt; 0 be...
Chernoff bound (|H|=infinite)<br />VC Dimension d : The size of largest set that H can shatter.<br />e.g. <br />H = linear...
Model Selection<br /><ul><li>Cross Validation: Estimator of generalization error
K-fold: train on k-1 pieces, test on the remaining (here we will get one test error estimation).</li></ul>    Average k te...
Model Selection<br />Loop possible parameters:<br />Pick one set of parameter, e.g. C = 2.0<br />Do cross validation, get ...
Multiclass SVM<br />One against one<br />There are         binary SVMs. (1v2, 1v3, …)<br />To predict, each SVM can vote b...
Multiclass SVM (2/2)<br />DAGSVM (Directed Acyclic Graph SVM)<br />22<br />
An Example: image classification<br />Process<br />23<br />K = 6<br />1/4 <br />3/4<br />1 0:49 1:25 …<br />1 0:49 1:25 …<...
Upcoming SlideShare
Loading in …5
×

Support Vector Machine

30,293 views

Published on

Published in: Education, Technology

Support Vector Machine

  1. 1. Support Vector Machine<br />Shao-Chuan Wang<br />1<br />
  2. 2. Support Vector Machine<br />1D Classification Problem: how will you separate these data?(H1, H2, H3?)<br />2<br />H1<br />H2<br />H3<br />x<br />0<br />
  3. 3. Support Vector Machine<br />2D Classification Problem: which H is better?<br />3<br />
  4. 4. Max-Margin Classifier<br />Functional Margin<br />Geometric Margin<br />4<br />We feel more confident <br />when functional margin is larger<br />Note that scaling on w, b won’t change the plane.<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  5. 5. Maximize margins<br />Optimization problem: maximize minimal geometric margin under constraints.<br />Introduce scaling factor such that<br />5<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  6. 6. Optimization problem subject to constraints<br />Maximize f(x, y), subject to constraint g(x, y) = c<br />6<br />-> Lagrange multiplier method<br />
  7. 7. Lagrange duality<br />Primal optimization problem:<br />GeneralizedLagrangian method<br />Primal optimization problem (equivalent form)<br />Dual optimization problem:<br />7<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  8. 8. Dual Problem<br />The necessary conditions that equality holds:<br />f, giare convex, and hi are affine.<br />KKT conditions.<br />8<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  9. 9. Optimal margin classifiers<br />Its Lagrangian<br />Its dual problem<br />9<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  10. 10. Support Vector Machine (cont’d)<br />If not linearly separable, we can<br />Find a nonlinear solution<br />Technically, it’s a linear solution in higher-order space<br /> Kernel Trick<br />26<br />
  11. 11. Kernel and feature mapping<br />Kernel:<br />Positive semi-definite<br />Symmetric<br />For example:<br />Loose Intuition<br />“similarity” between features<br />11<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  12. 12. Soft Margin (L1 regularization)<br />12<br />C = ∞ leads to hard margin SVM, <br />Rychetsky (2001)<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  13. 13. Why doesn’t my model fit well on test data ?<br />13<br />
  14. 14. Bias/variance tradeoff<br />underfitting(high bias) overfitting(high variance) <br />Training Error = <br />Generalization Error =<br />14<br />In-sample error<br />Out-of-sample error<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  15. 15. Bias/variance tradeoff<br />15<br />T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer series in statistics. Springer, New York, 2001.<br />
  16. 16. Is training error a good estimator of generalization error?<br />16<br />
  17. 17. Chernoff bound (|H|=finite)<br />Lemma: Assume Z1, Z2, …, Zmare drawn iid from Bernoulli(φ), and<br /> and let γ &gt; 0 be fixed. Then,<br /> based on this lemma, one can find, with probability 1-δ<br />(k = # of hypotheses)<br />17<br />Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).<br />
  18. 18. Chernoff bound (|H|=infinite)<br />VC Dimension d : The size of largest set that H can shatter.<br />e.g. <br />H = linear classifiers<br />in 2-D<br />VC(H) = 3<br />With probability at least 1-δ,<br />18<br />Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).<br />
  19. 19. Model Selection<br /><ul><li>Cross Validation: Estimator of generalization error
  20. 20. K-fold: train on k-1 pieces, test on the remaining (here we will get one test error estimation).</li></ul> Average k test error estimations, say, 2%. Then 2% is the estimation of generalization error for this machine learner.<br /><ul><li>Leave-one-out cross validation (m-fold, m = training sample size)</li></ul>19<br />train<br />train<br />validate<br />train<br />train<br />train<br />
  21. 21. Model Selection<br />Loop possible parameters:<br />Pick one set of parameter, e.g. C = 2.0<br />Do cross validation, get a error estimation<br />Pick the Cbest (with minimal error estimation) as the parameter<br />20<br />
  22. 22. Multiclass SVM<br />One against one<br />There are binary SVMs. (1v2, 1v3, …)<br />To predict, each SVM can vote between 2 classes.<br />One against all<br />There are k binary SVMs. (1 v rest, 2 v rest, …)<br />To predict, evaluate , pick the largest.<br />Multiclass SVM by solving ONE optimization problem<br />21<br />K = <br />1<br />3<br />5<br />3<br />2<br />1<br />1<br />2<br />3<br />4<br />5<br />6<br />K = 3<br />poll <br />Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 265-292.<br />
  23. 23. Multiclass SVM (2/2)<br />DAGSVM (Directed Acyclic Graph SVM)<br />22<br />
  24. 24. An Example: image classification<br />Process<br />23<br />K = 6<br />1/4 <br />3/4<br />1 0:49 1:25 …<br />1 0:49 1:25 …<br />:<br /> :<br />2 0:49 1:25 …<br />:<br />Test Data<br />Accuracy<br />
  25. 25. An Example: image classification<br />Results<br />Run Multi-class SVM 100 times for both (linear/Gaussian).<br />Accuracy Histogram<br />24<br />

×