Image Classification And Support Vector Machine

Image Classification and Support Vector Machine Shao-Chuan Wang CITI, Academia Sinica 1

Outline (1/2) Quick Review of SVM Intuition Functional margin and geometric margin Optimal margin classifier Generalized Lagrangian multiplier methods Lagrangian duality Kernel and feature mapping Soft Margin ( l1 regularization) 2

Outline (2/2) Some basis about Learning theory Bias/variance tradeoff (underfitting vs overfitting) Chernoff bound and VC dimension Model selection Cross validation Dimension Reduction Multiclass SVM One against one One against all Image Classification by SVM Process Results 3

Intuition: Margins Functional Margin Geometric Margin We feel more confident when functional margin is larger Note that scaling on w, b won’t change the plane. 4 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Maximize margins Optimization problem: maximize minimal geometric margin under constraints. Introduce scaling factor such that 5 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Lagrange duality Primal optimization problem: Generalized Lagrangian Primal optimization problem (equivalent form) Dual optimization problem: 6 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Dual Problem The necessary conditions that equality holds: f, giare convex, and hi are affine. KKT conditions. 7 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Optimal margin classifiers Its Lagrangian Its dual problem 8 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Kernel and feature mapping Kernel: Positive semi-definite Symmetric For example: Loose Intuition “similarity” between features 9 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Soft Margin (L1 regularization) C = ∞ leads to hard margin SVM, Rychetsky (2001) 10 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Why doesn’t my model fit well on test data ? 11

Some basis about Learning theory Bias/variance tradeoff underfitting (high bias) (high variance) overfitting Training Error = Generalization Error = 12 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Bias/variance tradeoff T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer series in statistics. Springer, New York, 2001. 13

Is training error a good estimator of generalization error? 14

Chernoff bound (|H|=finite) Lemma: Assume Z1, Z2, …, Zmare drawn iid from Bernoulli(φ), and and let γ > 0 be fixed. Then, based on this lemma, one can find, with probability 1-δ (k = # of hypotheses) 15 Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).

Chernoff bound (|H|=infinite) VC Dimension d : The size of largest set that H can shatter. e.g. H = linear classifiers in 2-D VC(H) = 3 With probability at least 1-δ, 16 Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).

Model Selection Cross Validation: Estimator of generalization error K-fold: train on k-1 pieces, test on the remaining (here we will get one test error estimation). Average k test error estimations, say, 2%. Then 2% is the estimation of generalization error for this machine learner. Leave-one-out cross validation (m-fold, m = training sample size) train train validate train train train 17

Model Selection Loop possible parameters: Pick one set of parameter, e.g. C = 2.0 Do cross validation, get a error estimation Pick the Cbest (with minimal error estimation) as the parameter 18

Dimensionality Reduction Which features are more “important”? Wrapper model feature selection Forward/backward search: add/remove a feature at a time, then evaluate the model with the new feature set. Filter feature selection Compute score S(i) that measures how informative xi is about the class label y S(i) can be correlation Corr(x_i, y), or mutual information MI(x_i, y), etc. Principal Component Analysis (PCA) Vector Quantization (VQ) 19

Multiclass SVM One against one There are binary SVMs. (1v2, 1v3, …) To predict, each SVM can vote between 2 classes. One against all There are k binary SVMs. (1 v rest, 2 v rest, …) To predict, evaluate , pick the largest. Multiclass SVM by solving ONE optimization problem K = 1 3 5 3 2 1 1 2 3 4 5 6 K = 3 poll Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 265-292. 20

Image Classification by SVM Process K = 6 1/4 3/4 1 0:49 1:25 … 1 0:49 1:25 … ：： 2 0:49 1:25 … ： Test Data Accuracy 21

Image Classification by SVM Results Run Multi-class SVM 100 times for both (linear/Gaussian). Accuracy Histogram 22

Image Classification by SVM If we throw object data that the machine never saw before. 23

~ Thank You ~ Shao-Chuan Wang CITI, Academia Sinica 24

Image Classification And Support Vector Machine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Image Classification And Support Vector Machine

Similar to Image Classification And Support Vector Machine (20)

More from Shao-Chuan Wang

More from Shao-Chuan Wang (9)

Recently uploaded

Recently uploaded (20)

Image Classification And Support Vector Machine