Your SlideShare is downloading. ×
0
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Least squares support Vector Machine Classifier

610

Published on

Here Discussed Support Vector Machine classifier and mainly focused on Least Squares Support VEctor machine cllassifier.

Here Discussed Support Vector Machine classifier and mainly focused on Least Squares Support VEctor machine cllassifier.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
610
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Least squares Support Vector Machine Rajkumar Singh November 25, 2012
  • 2. Table of Contents Support Vector machine Least Squares Support Vector Machine Classifier Conclusion
  • 3. Support Vector Machines SVM is a classifier derived from statistical learning theory by Vapnik and Chervonnkis. SVMs introduced by Boser, Guyon, Vapnik in COLT-92. Initially popularized in NIPS community, now an important and active field of all Machine Learning Research What is SVM? SVMs are learning systems that Use a hypothesis space of linear functions In a high dimensional feature space - kernel functions Trained with a learning algorithm from optimization theory - Lagrange Implements a learning bias derived from statistical learning theory - Generalization
  • 4. Support Vector Machines for Classification Given a training set of N data points {yk , xk }N , the support k=1 vector method approach aims at constructing a classifier of the form. N y (x) = sign[ αk yk ψ(x, xk ) + b] (1) k=1 where xk ∈ Rn is the k th input pattern yk ∈ Rn is the k t h output. αk are positive constants, b is real constant.  T xk x Linear SVM.    T (x x + 1)d k Polynomial SVM of degree d ψ(x, xk ) = 2 /σ 2 } exp{− x−xk 2  RBF SVM  tanh(kx T x + θ) Two layer neural SVM  k where σ, θ, k are constants.
  • 5. SVMs for Classifications The classifier is constructed as follows. One assumes that. ω T φ(xk ) + b ≥ 1, ifyk = +1 (2) ω T φ(xk ) + b ≤ −1, ifyk = −1 Which is equivalent to yk [ω T φ(xk ) + b] ≥ 1, k = 1, . . . N (3) Where φ() is a non-linear function which maps the input space into higher dimensional space. In order to have the possibility to violate (3), in case a separating hyperplane in this high dimensional space does not exist, variables ξk are introduced such that yk [ω T φ(xk ) + b] ≥ 1 − ξk , k = 1, . . . N (4) ξk ≥, k = 1, . . . N
  • 6. SVMs for Classification According to the structural risk minimization principle, the risk bound is minimized by formulating the optimization problem. N 1 minJ1 (ω, ξk ) = ω T ω + c ξk (5) ω,ξk 2 k=1 Subject to (4). Therefore, one constructs the Lagrangian. N L1 (ω, b, ξk , αk , vk ) = J1 (ω, ξk ) − αk {yk [ω T φ(xk ) + b] − 1 + ξk } k=1 N − v k ξk k=1 (6) by introducing Lagrange multipliers αk ≥ 0, vk ≥ 0(k = 1, . . . N) The solution is given by the saddle point of the Lagrangian by computing. max min L1 (ω, b, ξk , αk , vk ). (7) αk ,vk ω,b,ξk
  • 7. SVMs for Classification from (7) one can obtain. N δL1 =0→ω= αk yk φ(xk δω k=1 N δL1 (8) =0→ αk y k = 0 δb k=1 δL1 = 0 → 0 ≤ αk ≤ c, k = 1, . . . N. δξk Which leads to the solution of the following quadratic programming problem N N 1 max Q1 (αk ; φ(xk )) = − yk yl φ(xk )T φ(xl )αk αl + αk (9) αk 2 k,l=1 k=1 such that N αk yk = 0, 0 ≤ α ≤ c, k = 1, . . . N. The function k=1 φ(xk ) in (9) is related then to ψ(x, xk ) by imposing φ(x)T φ(xk ) = ψ(x, xk ), (10)
  • 8. Note that for the two layer neural SVM, Mercer’s condtion onlyholds for certain parameter values of k and θ. The classifier (3) isdesigned by solving. N N 1 max Q1 (αk ; ψ(xk , xl )) = − yk yl ψ(xk , xl )αk αl + αk (11) αk 2 k,l=1 k=1Subject to the constrainnts in (9). one does not have to calculateω nor φ(xk ) in order to determine the decision surface. Thesolution to (11) will be global.Further, it can be show that the hyperplane (3) satisfyingconstraint ω 2 ≤ α have a VC-dimension h which is boundend by h ≤ min([r 2 a2 ], n) + 1 (12)Where [.] denoted the integer part and r is the radius of thesmallest ball containing the points φ(x1 ), . . . φ(xN ). Such ball isfound by defining Lagrangian. N 2 L2 (r , q, λk ) = r − λk (r 2 − φ(xk ) − q 2 2 (13) k=1
  • 9. SVMs for Classification in (13) q is the center, λk are positive lagrange multipliers. Here q = k λk φ(xk ), where the lagrangian follows from. N T max Q2 (λk ; φ(xk )) = − Nφ(xk ) φ(xl )λk λl + λk φ(xk )T φ(xk ) λk k,l=1 k=1 (14) N Such that k=1 λk = 1, λk ≥ 0, k = 1, . . . N. Based on (10), Q2 can also be expressed in terms of ψ(xk , xl ). Finally one selects a support vector machine VC dimension by solving (11) and computing (12) and (12).
  • 10. Least Squares Support Vector Machines Least squares version to the SVM classifier by formulating the classification problem as N 1 1 min J3 (ω, b, e) = ω T ω + γ 2 ek (15) ω,b,e 2 2 k=1 subject to the equality constraints. yk [w T φ(xk ) + b] = 1 − ek , k = 1, . . . , N. (16) Lagrangian defined as N L3 (ω, b, e, α) = J3 (ω, b, e)− αk {yk [ω T φ(xk )+b]−1+ek } (17) k=1 where αk are lagrange multipliers. The conditions for optimality N δL3 =0→ω= αk yk φ(xk ) δω k=1 (18) N δL3 =0→ αk yk = 0
  • 11. Least Squares Support Vector Machines δL3 = 0 → αk = γek , k = 1, . . . N δek δL3 = 0 → yk [ω T φ(xk ) + b] − 1 + ek = 0, k = 1, . . . , N δαk can be writte n as the solution to the following set of linear equations. −Z T      I 0 0 ω 0 0 0 0 −Y T   b  0   =    (19)  0 0 γI −I   e  0 Z Y I 0 α I Where Z = [φ(x1 )T y1 ; . . . φ(xN )T yN ] Y = [y1 ; . . . ; yN ] → 1 = [1; . . . 1] e = [e1 , . . . eN ], α = [α1 , . . . αN ]
  • 12. Least Squares Support Vector Machines The solution is given by 0 −Y T b 0 = (20) Y ZZ T + γ − 1I a 1 Mercer’s Condition can be applied again to the matrix Ω = ZZ T , where Ωkl = yk yl φ(xk )T φ(xl ) (21) = yk yl ψ(xk ), xl ) Hence the classifier (1) is found by solving the linear equations (20), (21) instead of quadratic programming. The parameters of the kernele such as σ for the RBF kernel can be optimally chosen according to (12). The support values αk are proportional to the errors ar the data points (18), while in case of (14) most values are equal to zero. Hence one could rather speak of a support value spectrum in the least squares case.
  • 13. Conclusion Due to the equality constraints, a set of linear equations has to be solved instead of quadratic programming, Mercer’s condition is applied as in other SVM’s. Least squares SVM with RBF kernel is readily found with excellent generalization performence and low computational cost. References 1. Least Squres Support Vector machine Classifiers., J.A.K Suykens, and J. Vandewalie.

×