### Support Vector Machine

1. 1. Introduction to Support Vector Machine Lucas Xu September 4, 2012Lucas Xu Introduction to Support Vector Machine September 4, 2012 1 / 20
2. 2. 1 Classiﬁer2 Hyper-Plane3 Convex Optimization4 Kernel5 Application Lucas Xu Introduction to Support Vector Machine September 4, 2012 2 / 20
3. 3. Classiﬁer Attributes and Class Labels Training Data S = (x(1) , y (1) ), · · · , (x(m) , y (m) ) , x(i) ∈ Rd , y (i) ∈ {−1, 1} Lucas Xu Introduction to Support Vector Machine September 4, 2012 3 / 20
4. 4. Classiﬁer Umeng Gender Classiﬁcation Data user app1 app2 ··· appd gender user1 1 0 ··· 0 male user2 0 1 ··· 1 f emale . . . . . . .. . . . . . . . . . . usern 1 1 ··· 1 f emale Each App belongs to one category, ≈ 20 categories. Categories are mutual exclusive. Lucas Xu Introduction to Support Vector Machine September 4, 2012 4 / 20
5. 5. Classiﬁer Umeng Gender Classiﬁcation Data S = (x(1) , y (1) ), · · · , (x(m) , y (m) ) , x(i) ∈ Rd , y (i) ∈ {−1, 1} (i) xk ∈ {0, 1}, 0 means not installed, 1 means installed on the device 1 ≤ k ≤ d, d 30, 000, about 30,000 apps y (i) ∈ {male, f emale} Lucas Xu Introduction to Support Vector Machine September 4, 2012 5 / 20
6. 6. Hyper-Plane Figure : Hyper PlaneThe hyper-plane: wT x + b = 0Classiﬁcation function: hw,b (x) = g(wT x + b) 1 if z ≥ 0 g(z) = −1 otherwise Lucas Xu Introduction to Support Vector Machine September 4, 2012 6 / 20
7. 7. Hyper-PlaneFunctional Margin: γ (i) = y (i) (wT x(i) + b) ˆScaling: set constraint normalization condition : w = 1Geometric Margin: w T b γ (i) = y (i) x(i) + w wγ (i) should be a large positive number to increase the predictionconﬁdence. Lucas Xu Introduction to Support Vector Machine September 4, 2012 7 / 20
8. 8. Hyper-PlaneDeﬁnitionThe geometry margin of (w, b) with respect to training dataset S: γ = min γ (i) i=1,...,m Lucas Xu Introduction to Support Vector Machine September 4, 2012 8 / 20
9. 9. Hyper-PlaneThe optimal margin classiﬁer: (Intuitive)ﬁnd a decision boundary that maximizes the margin. maxγ,w,b γ s.t. y (i) (wT x(i) + b) ≥ γ, i = 1, ..., m w = 1. Lucas Xu Introduction to Support Vector Machine September 4, 2012 9 / 20
10. 10. Hyper-PlaneNormalization Constraint: let function margin γ = 1 ˆ ⇓ 1 maxγ,w,b w s.t. y (i) (wT x(i) + b) ≥ γ, i = 1, ..., m ⇓ 1 maxw,b w 2 2 s.t. y (i) (wT x(i) + b) ≥ 1, i = 1, ..., m Lucas Xu Introduction to Support Vector Machine September 4, 2012 10 / 20
11. 11. Hyper-Plane Convex function Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
12. 12. Hyper-Plane Convex function Convex set Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
13. 13. Hyper-Plane Convex function Convex set So-called Quadratic Programming. Their are many software packages to solve the problem. Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
14. 14. Hyper-Plane Convex function Convex set So-called Quadratic Programming. Their are many software packages to solve the problem. Basic Ideas for Support Vector Machine DONE ! Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
15. 15. Hyper-Plane Convex function Convex set So-called Quadratic Programming. Their are many software packages to solve the problem. Basic Ideas for Support Vector Machine DONE ! More eﬃcient solution ? Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
16. 16. Convex OptimizationPrimal Problem: 1 maxw,b w 2 2 s.t. y (i) (wT x(i) + b) ≥ 1, i = 1, ..., m Lucas Xu Introduction to Support Vector Machine September 4, 2012 12 / 20
17. 17. Convex OptimizationLagrangian for the original problem: m 1 2 min max L(w, b, α) = w − αi y (i) (wT x(i) + b) − 1 w,b α:αi ≥0 2 i=1 ⇓Under K.K.T condition, transforms to its Dual problem: m m 1 max W (α) = αi − y (i) y (j) αi αj x(i) , x(j) α 2 i=1 i,j=1 s.t. αi ≥ 0, i = 1, ..., m m αi y (i) = 0 i=1 Lucas Xu Introduction to Support Vector Machine September 4, 2012 13 / 20
18. 18. Convex OptimizationSolutions: m ∗ w = αi y (i) x(i) i=1 maxi:y(i) =−1 w∗T x(i) + mini:y(i) =1 w∗T x(i) b∗ = − 2Predict: g(x) = wT x + b m T = αi y (i) x(i) x+b i=1 m = αi y (i) x(i) , x + b i=1 Lucas Xu Introduction to Support Vector Machine September 4, 2012 14 / 20
19. 19. Kernel For most of αi , αi = 0. For those αi > 0, (x(i) , y (i) ) are called support vectors Only needs to compute x(i) , x (i) (i) (i) if we can map feature space (x1 , x2 , ...xk ) to another high (i) (i) (i) dimension space (z1 , z2 , ...zl ), z = φ(x) i.e. φ(x(i) , φ(x) we can easily compute z (i) , z = K(φ( x(i) , x )) Use a slightly diﬀerent notation: K(x, y) = φ(x), φ(y) Intuitive Explanation: Measure of Similarities Lucas Xu Introduction to Support Vector Machine September 4, 2012 15 / 20
20. 20. KernelDeﬁnitionMercer Kernel: K is positive semi-deﬁnite Lucas Xu Introduction to Support Vector Machine September 4, 2012 16 / 20
21. 21. Kernel Primitive x, y Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
22. 22. Kernel Primitive x, y Polynomial ( x, y + 1)d Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
23. 23. Kernel Primitive x, y Polynomial ( x, y + 1)d RBF exp(−γ||x − y||2 ) Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
24. 24. Kernel Primitive x, y Polynomial ( x, y + 1)d RBF exp(−γ||x − y||2 ) Sigmoid tanh(κ x, y + c). Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
25. 25. Kernel Primitive x, y Polynomial ( x, y + 1)d RBF exp(−γ||x − y||2 ) String Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
26. 26. Kernel Primitive x, y Polynomial ( x, y + 1)d RBF exp(−γ||x − y||2 ) String Tree Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
27. 27. Apply to Umeng Gender Classiﬁcation Problem Description Classify the gender of a user based on apps (s)he installed and categories of apps. Kernel Design m K(x, y) = φ(xi , yj ) i,j=0   (1 + w)xi yj if i = j φ(xi , yj ) = xi yj if i = j but the same category 0 if not the same category  w ≥ 0 , the extra weight if two users have installed the same app. default to 1.0 Experiment Result Lucas Xu Introduction to Support Vector Machine September 4, 2012 18 / 20
28. 28. Apply to Umeng Gender Classiﬁcation   x1  x2     .   .  . xm ⇓   w · x1  w · x2    .    .  .   w · xm     c1     c2     .   . .  c20ci counts the number of apps belonging to category i Lucas Xu Introduction to Support Vector Machine September 4, 2012 19 / 20
29. 29. references Book: Christopher Bishop – PRML Chapter 7: Section 7.1 Slides: Andrew Moore – Support Vector Machines Video: Bernhard Scholkopf – Kernel Methods Video: Liva Ralaivola – Introduction to Kernel Methods Video: Colin Campbell – Introduction to Support Vector Machines Video: Alex Smola – Kernel Methods and Support Vector Machines Video: Partha Niyogi – Introduction to Kernel Methods Many more videos on kernel-related topics herehttp://www.seas.harvard.edu/courses/cs281/ Lucas Xu Introduction to Support Vector Machine September 4, 2012 20 / 20