Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Loading in …5
×

# Least squares support Vector Machine Classifier

1,019 views

Published on

Here Discussed Support Vector Machine classifier and mainly focused on Least Squares Support VEctor machine cllassifier.

• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

### Least squares support Vector Machine Classifier

1. 1. Least squares Support Vector Machine Rajkumar Singh November 25, 2012
2. 2. Table of Contents Support Vector machine Least Squares Support Vector Machine Classiﬁer Conclusion
3. 3. Support Vector Machines SVM is a classiﬁer derived from statistical learning theory by Vapnik and Chervonnkis. SVMs introduced by Boser, Guyon, Vapnik in COLT-92. Initially popularized in NIPS community, now an important and active ﬁeld of all Machine Learning Research What is SVM? SVMs are learning systems that Use a hypothesis space of linear functions In a high dimensional feature space - kernel functions Trained with a learning algorithm from optimization theory - Lagrange Implements a learning bias derived from statistical learning theory - Generalization
4. 4. Support Vector Machines for Classiﬁcation Given a training set of N data points {yk , xk }N , the support k=1 vector method approach aims at constructing a classiﬁer of the form. N y (x) = sign[ αk yk ψ(x, xk ) + b] (1) k=1 where xk ∈ Rn is the k th input pattern yk ∈ Rn is the k t h output. αk are positive constants, b is real constant.  T xk x Linear SVM.    T (x x + 1)d k Polynomial SVM of degree d ψ(x, xk ) = 2 /σ 2 } exp{− x−xk 2  RBF SVM  tanh(kx T x + θ) Two layer neural SVM  k where σ, θ, k are constants.
5. 5. SVMs for Classiﬁcations The classiﬁer is constructed as follows. One assumes that. ω T φ(xk ) + b ≥ 1, ifyk = +1 (2) ω T φ(xk ) + b ≤ −1, ifyk = −1 Which is equivalent to yk [ω T φ(xk ) + b] ≥ 1, k = 1, . . . N (3) Where φ() is a non-linear function which maps the input space into higher dimensional space. In order to have the possibility to violate (3), in case a separating hyperplane in this high dimensional space does not exist, variables ξk are introduced such that yk [ω T φ(xk ) + b] ≥ 1 − ξk , k = 1, . . . N (4) ξk ≥, k = 1, . . . N
6. 6. SVMs for Classiﬁcation According to the structural risk minimization principle, the risk bound is minimized by formulating the optimization problem. N 1 minJ1 (ω, ξk ) = ω T ω + c ξk (5) ω,ξk 2 k=1 Subject to (4). Therefore, one constructs the Lagrangian. N L1 (ω, b, ξk , αk , vk ) = J1 (ω, ξk ) − αk {yk [ω T φ(xk ) + b] − 1 + ξk } k=1 N − v k ξk k=1 (6) by introducing Lagrange multipliers αk ≥ 0, vk ≥ 0(k = 1, . . . N) The solution is given by the saddle point of the Lagrangian by computing. max min L1 (ω, b, ξk , αk , vk ). (7) αk ,vk ω,b,ξk
7. 7. SVMs for Classiﬁcation from (7) one can obtain. N δL1 =0→ω= αk yk φ(xk δω k=1 N δL1 (8) =0→ αk y k = 0 δb k=1 δL1 = 0 → 0 ≤ αk ≤ c, k = 1, . . . N. δξk Which leads to the solution of the following quadratic programming problem N N 1 max Q1 (αk ; φ(xk )) = − yk yl φ(xk )T φ(xl )αk αl + αk (9) αk 2 k,l=1 k=1 such that N αk yk = 0, 0 ≤ α ≤ c, k = 1, . . . N. The function k=1 φ(xk ) in (9) is related then to ψ(x, xk ) by imposing φ(x)T φ(xk ) = ψ(x, xk ), (10)
8. 8. Note that for the two layer neural SVM, Mercer’s condtion onlyholds for certain parameter values of k and θ. The classiﬁer (3) isdesigned by solving. N N 1 max Q1 (αk ; ψ(xk , xl )) = − yk yl ψ(xk , xl )αk αl + αk (11) αk 2 k,l=1 k=1Subject to the constrainnts in (9). one does not have to calculateω nor φ(xk ) in order to determine the decision surface. Thesolution to (11) will be global.Further, it can be show that the hyperplane (3) satisfyingconstraint ω 2 ≤ α have a VC-dimension h which is boundend by h ≤ min([r 2 a2 ], n) + 1 (12)Where [.] denoted the integer part and r is the radius of thesmallest ball containing the points φ(x1 ), . . . φ(xN ). Such ball isfound by deﬁning Lagrangian. N 2 L2 (r , q, λk ) = r − λk (r 2 − φ(xk ) − q 2 2 (13) k=1
9. 9. SVMs for Classiﬁcation in (13) q is the center, λk are positive lagrange multipliers. Here q = k λk φ(xk ), where the lagrangian follows from. N T max Q2 (λk ; φ(xk )) = − Nφ(xk ) φ(xl )λk λl + λk φ(xk )T φ(xk ) λk k,l=1 k=1 (14) N Such that k=1 λk = 1, λk ≥ 0, k = 1, . . . N. Based on (10), Q2 can also be expressed in terms of ψ(xk , xl ). Finally one selects a support vector machine VC dimension by solving (11) and computing (12) and (12).
10. 10. Least Squares Support Vector Machines Least squares version to the SVM classiﬁer by formulating the classiﬁcation problem as N 1 1 min J3 (ω, b, e) = ω T ω + γ 2 ek (15) ω,b,e 2 2 k=1 subject to the equality constraints. yk [w T φ(xk ) + b] = 1 − ek , k = 1, . . . , N. (16) Lagrangian deﬁned as N L3 (ω, b, e, α) = J3 (ω, b, e)− αk {yk [ω T φ(xk )+b]−1+ek } (17) k=1 where αk are lagrange multipliers. The conditions for optimality N δL3 =0→ω= αk yk φ(xk ) δω k=1 (18) N δL3 =0→ αk yk = 0
11. 11. Least Squares Support Vector Machines δL3 = 0 → αk = γek , k = 1, . . . N δek δL3 = 0 → yk [ω T φ(xk ) + b] − 1 + ek = 0, k = 1, . . . , N δαk can be writte n as the solution to the following set of linear equations. −Z T      I 0 0 ω 0 0 0 0 −Y T   b  0   =    (19)  0 0 γI −I   e  0 Z Y I 0 α I Where Z = [φ(x1 )T y1 ; . . . φ(xN )T yN ] Y = [y1 ; . . . ; yN ] → 1 = [1; . . . 1] e = [e1 , . . . eN ], α = [α1 , . . . αN ]
12. 12. Least Squares Support Vector Machines The solution is given by 0 −Y T b 0 = (20) Y ZZ T + γ − 1I a 1 Mercer’s Condition can be applied again to the matrix Ω = ZZ T , where Ωkl = yk yl φ(xk )T φ(xl ) (21) = yk yl ψ(xk ), xl ) Hence the classiﬁer (1) is found by solving the linear equations (20), (21) instead of quadratic programming. The parameters of the kernele such as σ for the RBF kernel can be optimally chosen according to (12). The support values αk are proportional to the errors ar the data points (18), while in case of (14) most values are equal to zero. Hence one could rather speak of a support value spectrum in the least squares case.
13. 13. Conclusion Due to the equality constraints, a set of linear equations has to be solved instead of quadratic programming, Mercer’s condition is applied as in other SVM’s. Least squares SVM with RBF kernel is readily found with excellent generalization performence and low computational cost. References 1. Least Squres Support Vector machine Classiﬁers., J.A.K Suykens, and J. Vandewalie.