Support Vector Machine

Support Vector Machine Shao-Chuan Wang 1

Support Vector Machine 1D Classification Problem: how will you separate these data?(H1, H2, H3?) 2 H1 H2 H3 x 0

Support Vector Machine 2D Classification Problem: which H is better? 3

Max-Margin Classifier Functional Margin Geometric Margin 4 We feel more confident when functional margin is larger Note that scaling on w, b won’t change the plane. Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Maximize margins Optimization problem: maximize minimal geometric margin under constraints. Introduce scaling factor such that 5 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Optimization problem subject to constraints Maximize f(x, y), subject to constraint g(x, y) = c 6 -> Lagrange multiplier method

Lagrange duality Primal optimization problem: GeneralizedLagrangian method Primal optimization problem (equivalent form) Dual optimization problem: 7 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Dual Problem The necessary conditions that equality holds: f, giare convex, and hi are affine. KKT conditions. 8 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Optimal margin classifiers Its Lagrangian Its dual problem 9 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Support Vector Machine (cont’d) If not linearly separable, we can Find a nonlinear solution Technically, it’s a linear solution in higher-order space Kernel Trick 26

Kernel and feature mapping Kernel: Positive semi-definite Symmetric For example: Loose Intuition “similarity” between features 11 Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Soft Margin (L1 regularization) 12 C = ∞ leads to hard margin SVM, Rychetsky (2001) Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Why doesn’t my model fit well on test data ? 13

Bias/variance tradeoff underfitting(high bias) overfitting(high variance) Training Error = Generalization Error = 14 In-sample error Out-of-sample error Andrew Ng. Part V Support Vector Machines. CS229 Lecture Notes (2008).

Bias/variance tradeoff 15 T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer series in statistics. Springer, New York, 2001.

Is training error a good estimator of generalization error? 16

Chernoff bound (|H|=finite) Lemma: Assume Z1, Z2, …, Zmare drawn iid from Bernoulli(φ), and and let γ > 0 be fixed. Then, based on this lemma, one can find, with probability 1-δ (k = # of hypotheses) 17 Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).

Chernoff bound (|H|=infinite) VC Dimension d : The size of largest set that H can shatter. e.g. H = linear classifiers in 2-D VC(H) = 3 With probability at least 1-δ, 18 Andrew Ng. Part VI Learning Theory. CS229 Lecture Notes (2008).

Model Selection ,[object Object]

K-fold: train on k-1 pieces, test on the remaining (here we will get one test error estimation). Average k test error estimations, say, 2%. Then 2% is the estimation of generalization error for this machine learner. ,[object Object],19 train train validate train train train

Model Selection Loop possible parameters: Pick one set of parameter, e.g. C = 2.0 Do cross validation, get a error estimation Pick the Cbest (with minimal error estimation) as the parameter 20

Multiclass SVM One against one There are binary SVMs. (1v2, 1v3, …) To predict, each SVM can vote between 2 classes. One against all There are k binary SVMs. (1 v rest, 2 v rest, …) To predict, evaluate , pick the largest. Multiclass SVM by solving ONE optimization problem 21 K = 1 3 5 3 2 1 1 2 3 4 5 6 K = 3 poll Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2, 265-292.

Multiclass SVM (2/2) DAGSVM (Directed Acyclic Graph SVM) 22

An Example: image classification Process 23 K = 6 1/4 3/4 1 0:49 1:25 … 1 0:49 1:25 … ：： 2 0:49 1:25 … ： Test Data Accuracy

Support Vector Machine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Support Vector Machine

Similar to Support Vector Machine (20)

More from Shao-Chuan Wang

More from Shao-Chuan Wang (9)

Recently uploaded

Recently uploaded (20)

Support Vector Machine