Image Classification And Support Vector Machine

12,919 views

Published on

Published in: Education, Technology, Business
1 Comment
6 Likes
Statistics
Notes
No Downloads
Views
Total views
12,919
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
452
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide

Image Classification And Support Vector Machine

  1. 1. Image Classification and Support Vector Machine<br />Shao-Chuan Wang<br />CITI, Academia Sinica<br />1<br />
  2. 2. Outline (1/2)<br />Quick Review of SVM<br />Intuition<br />Functional margin and geometric margin<br />Optimal margin classifier<br />Generalized Lagrangian multiplier methods<br />Lagrangian duality<br />Kernel and feature mapping<br />Soft Margin ( l1 regularization)<br />2<br />
  3. 3. Outline (2/2)<br />Some basis about Learning theory<br />Bias/variance tradeoff (underfitting vs overfitting)<br />Chernoff bound and VC dimension<br />Model selection<br />Cross validation<br />Dimension Reduction<br />Multiclass SVM<br />One against one<br />One against all<br />Image Classification by SVM<br />Process<br />Results<br />3<br />
  4. 4. Intuition: Margins<br />Functional Margin<br />Geometric Margin<br />We feel more confident <br />when functional margin is larger<br />Note that scaling on w, b won’t change the plane.<br />4<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  5. 5. Maximize margins<br />Optimization problem: maximize minimal geometric margin under constraints.<br />Introduce scaling factor such that<br />5<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  6. 6. Lagrange duality<br />Primal optimization problem:<br />Generalized Lagrangian<br />Primal optimization problem (equivalent form)<br />Dual optimization problem:<br />6<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  7. 7. Dual Problem<br />The necessary conditions that equality holds:<br />f, giare convex, and hi are affine.<br />KKT conditions.<br />7<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  8. 8. Optimal margin classifiers<br />Its Lagrangian<br />Its dual problem<br />8<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  9. 9. Kernel and feature mapping<br />Kernel:<br />Positive semi-definite<br />Symmetric<br />For example:<br />Loose Intuition<br />“similarity” between features<br />9<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  10. 10. Soft Margin (L1 regularization)<br />C = ∞ leads to hard margin SVM, <br />Rychetsky (2001)<br />10<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  11. 11. Why doesn’t my model fit well on test data ?<br />11<br />
  12. 12. Some basis about Learning theory<br />Bias/variance tradeoff<br />underfitting (high bias) (high variance) overfitting<br />Training Error = <br />Generalization Error =<br />12<br />Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).<br />
  13. 13. Bias/variance tradeoff<br />T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer series in statistics. Springer, New York, 2001.<br />13<br />
  14. 14. Is training error a good estimator of generalization error?<br />14<br />
  15. 15. Chernoff bound (|H|=finite)<br />Lemma: Assume Z1, Z2, …, Zmare drawn iid from Bernoulli(φ), and<br /> and let γ &gt; 0 be fixed. Then,<br /> based on this lemma, one can find, with probability 1-δ<br />(k = # of hypotheses)<br />15<br />Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).<br />
  16. 16. Chernoff bound (|H|=infinite)<br />VC Dimension d : The size of largest set that H can shatter.<br />e.g. <br />H = linear classifiers<br />in 2-D<br />VC(H) = 3<br />With probability at least 1-δ,<br />16<br />Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).<br />
  17. 17. Model Selection<br />Cross Validation: Estimator of generalization error<br />K-fold: train on k-1 pieces, test on the remaining (here we will get one test error estimation).<br /> Average k test error estimations, say, 2%. Then 2% is the estimation of generalization error for this machine learner.<br />Leave-one-out cross validation (m-fold, m = training sample size)<br />train<br />train<br />validate<br />train<br />train<br />train<br />17<br />
  18. 18. Model Selection<br />Loop possible parameters:<br />Pick one set of parameter, e.g. C = 2.0<br />Do cross validation, get a error estimation<br />Pick the Cbest (with minimal error estimation) as the parameter<br />18<br />
  19. 19. Dimensionality Reduction<br />Which features are more “important”?<br />Wrapper model feature selection<br />Forward/backward search: add/remove a feature at a time, then evaluate the model with the new feature set.<br />Filter feature selection<br />Compute score S(i) that measures how informative xi is about the class label y<br />S(i) can be correlation Corr(x_i, y), or mutual information MI(x_i, y), etc.<br />Principal Component Analysis (PCA)<br />Vector Quantization (VQ)<br />19<br />
  20. 20. Multiclass SVM<br />One against one<br />There are binary SVMs. (1v2, 1v3, …)<br />To predict, each SVM can vote between 2 classes.<br />One against all<br />There are k binary SVMs. (1 v rest, 2 v rest, …)<br />To predict, evaluate , pick the largest.<br />Multiclass SVM by solving ONE optimization problem<br />K = <br />1<br />3<br />5<br />3<br />2<br />1<br />1<br />2<br />3<br />4<br />5<br />6<br />K = 3<br />poll <br />Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 265-292.<br />20<br />
  21. 21. Image Classification by SVM<br />Process<br />K = 6<br />1/4 <br />3/4<br />1 0:49 1:25 …<br />1 0:49 1:25 …<br />:<br /> :<br />2 0:49 1:25 …<br />:<br />Test Data<br />Accuracy<br />21<br />
  22. 22. Image Classification by SVM<br />Results<br />Run Multi-class SVM 100 times for both (linear/Gaussian).<br />Accuracy Histogram<br />22<br />
  23. 23. Image Classification by SVM<br />If we throw object data that the machine never saw before.<br />23<br />
  24. 24. ~ Thank You ~<br />Shao-Chuan Wang<br />CITI, Academia Sinica<br />24<br />

×