Data Selection For Support Vector Machine Classifier


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Selection For Support Vector Machine Classifier

  1. 1. Glenn Fung and Olvi L. Mangasarian August 2000 20081021 Kuan-Chi-I
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>SVM </li></ul><ul><li>MSVM </li></ul><ul><li>Comparisons </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>A method for selecting a small set of support vectors which determines a separating plane clsssifier. </li></ul><ul><li>Useful for applications contain millions of data points. </li></ul>
  4. 4. SVM <ul><li>A method for classification. </li></ul>
  5. 5. SVM (Linear Separable Case)
  6. 6. SVM <ul><li>To find the maximum margin ,equivelent to find minimum ½|| w || 2. </li></ul><ul><li>We can transfer above problem to a quadratic problem with parameter v > 0. </li></ul><ul><li>A : a real m×n matrix. </li></ul><ul><li>e : column vectors of ones in arbitrary dimension. </li></ul><ul><li>e ′ : transpose of e . </li></ul><ul><li>y : nonnegitive slack variables. </li></ul><ul><li>D : m×m diagonal matrix of 1 or -1. </li></ul>
  7. 7. SVM <ul><li>Written in individual component natation . </li></ul><ul><li>A i :row vector of matrix A . </li></ul>
  8. 8. SVM <ul><li>x′w = γ +1 bounds the class A + points. </li></ul><ul><li>x′w = γ +1 bounds the class A - points. </li></ul><ul><li>γ : the location relative to the origin. </li></ul><ul><li>w : normal to the bounding planes. </li></ul><ul><li>The linear separating surface is the plane: </li></ul>
  9. 9. SVM (Linearly Inseparable Case)
  10. 10. SVM (Inseparable) <ul><li>If the class are inseparable then the two planes bound the two class with a 〝 soft margin”. </li></ul>
  11. 11. MSVM (1-Norm SVM) <ul><li>A minimal support vertor machine (MSVM). </li></ul><ul><li>In order to make use of a faster programming based approach, we reformulate (1) by replacing the 2-norm by a 1-norm as follows: </li></ul>
  12. 12. MSVM <ul><li>The mathematical program (7) is easily convert to a linear program as follows: </li></ul><ul><li>υ : the absolute value | w | of w , and υ i ≧| w i | </li></ul>
  13. 13. MSVM <ul><li>If we define nonnegative multipliers u ∈ R m associated with the first set of constraints of the linear program (8), and multipliers (r, s) ∈ R n+n for the second set of constraints of (8), then the dual linear program associated with the linear SVM formulation (8) is the following: </li></ul>
  14. 14. MSVM <ul><li>We modify the linear program to generate an SVM with as fewer support vector as possible by addingan error term e ′ y * </li></ul><ul><li>The term e ′ y * suppresses mis-classified points and results in our minimal support vector machine MSVM: </li></ul><ul><li>y * :vector x in R n with components ( y * ) i =1 if y i > 0 and 0 otherwise. </li></ul><ul><li>μ :positive parameter ,chosen by a tuning set . </li></ul>
  15. 15. MSVM <ul><li>We approximate e ′ y * here by a smooth concave exponential on the nonnegative real line as was done in the feature selection approach of. For y ≥ 0, the approximation of the step vector y∗ of (9) by the concave exponential, , i = 1, . . . ,m, that is: </li></ul>
  16. 16. MSVM <ul><li>The smooth MSVM: </li></ul>
  17. 17. MSVM (SLA)
  18. 18. Comparison
  19. 19. Observations of Comparisons <ul><li>1. For all test problems MSVM had least number of support vectors. </li></ul><ul><li>2. For the Ionosphere problem, the reduction in the num- </li></ul><ul><li>ber of support vectors of MSVM over SVM| · | 1 is 81%, and </li></ul><ul><li>the average reduction in the number of support vectors of MSVM over SVM| · | is 65.8%. </li></ul><ul><li>3. Tenfold testing set correctness of MSVM was good. </li></ul><ul><li>4. Computing times were higher for MSVM than for other classifiers. </li></ul>
  20. 20. Conclution <ul><li>We proposed a minimal support vector machine. </li></ul><ul><li>Useful in classifying very large datasets by using only a fraction of the data. </li></ul><ul><li>Improves generalization over other classifiers that use a higher number of data points. </li></ul><ul><li>MSVM requires the solution of a few linear programs to determine a sepaeating surface . </li></ul>