Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linear Discrimination Centering on Support Vector Machines

1,017 views

Published on

  • Be the first to comment

Linear Discrimination Centering on Support Vector Machines

  1. 1. CHAPTER 10: Linear Discrimination Eick/Aldaydin: Topic13
  2. 2. Likelihood- vs. Discriminant-based Classification <ul><li>Likelihood-based: Assume a model for p ( x | C i ), use Bayes’ rule to calculate P ( C i | x ) </li></ul><ul><li>g i ( x ) = log P ( C i | x ) </li></ul><ul><li>Discriminant-based: Assume a model for g i ( x | Φ i ); no density estimation </li></ul><ul><li>Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries </li></ul>Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  3. 3. Linear Discriminant <ul><li>Linear discriminant: </li></ul><ul><li>Advantages: </li></ul><ul><ul><li>Simple: O( d ) space/computation </li></ul></ul><ul><ul><li>Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) </li></ul></ul><ul><ul><li>Optimal when p ( x | C i ) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable </li></ul></ul>Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  4. 4. Generalized Linear Model <ul><li>Quadratic discriminant: </li></ul><ul><li>Higher-order (product) terms: </li></ul><ul><li>Map from x to z using nonlinear basis functions and use a linear discriminant in z -space </li></ul>Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  5. 5. Two Classes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  6. 6. Geometry Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  7. 7. Support Vector Machines <ul><li>One Possible Solution </li></ul>
  8. 8. Support Vector Machines <ul><li>Another possible solution </li></ul>
  9. 9. Support Vector Machines <ul><li>Other possible solutions </li></ul>
  10. 10. Support Vector Machines <ul><li>Which one is better? B1 or B2? </li></ul><ul><li>How do you define better? </li></ul>
  11. 11. Support Vector Machines <ul><li>Find hyperplane maximizes the margin => B1 is better than B2 </li></ul>
  12. 12. Support Vector Machines Examples are: (x1,..,xn,y) with y  {-1,1}
  13. 13. Support Vector Machines <ul><li>We want to maximize: </li></ul><ul><ul><li>Which is equivalent to minimizing: </li></ul></ul><ul><ul><li>But subjected to the following N constraints: </li></ul></ul><ul><ul><li>This is a constrained convex quadratic optimization problem that can be solved in polynominal time </li></ul></ul><ul><ul><ul><li>Numerical approaches to solve it (e.g., quadratic programming) exist </li></ul></ul></ul><ul><ul><ul><li>The function to be optimized has only a single minimum  no local minimum problem </li></ul></ul></ul>
  14. 14. Support Vector Machines <ul><li>What if the problem is not linearly separable? </li></ul>
  15. 15. Linear SVM for Non-linearly Separable Problems <ul><li>What if the problem is not linearly separable? </li></ul><ul><ul><li>Introduce slack variables </li></ul></ul><ul><ul><li>Need to minimize: </li></ul></ul><ul><ul><li>Subject to (i=1,..,N): </li></ul></ul><ul><ul><li>C is chosen using a validation set trying to keep the margins wide while keeping the training error low. </li></ul></ul>Measures prediction error Inverse size of margin between hyperplanes Parameter Slack variable allows constraint violation to a certain degree
  16. 16. Nonlinear Support Vector Machines <ul><li>What if decision boundary is not linear? </li></ul>Alternative 1: Use technique that Employs non-linear decision boundaries Non-linear function
  17. 17. Nonlinear Support Vector Machines <ul><li>Transform data into higher dimensional space </li></ul><ul><li>Find the best hyperplane using the methods introduced earlier </li></ul>Alternative 2: Transform into a higher dimensional attribute space and find linear decision boundaries in this space
  18. 18. Nonlinear Support Vector Machines <ul><ul><li>Choose a non-linear kernel function  to transform into a different, usually higher dimensional, attribute space </li></ul></ul><ul><ul><li>Minimize </li></ul></ul><ul><ul><li>but subjected to the following N constraints: </li></ul></ul>Find a good hyperplane in the transformed space
  19. 19. Example: Polynomial Kernel Function <ul><li>Polynomial Kernel Function: </li></ul><ul><li> (x1,x2)=(x1 2 ,x2 2 ,sqrt(2)*x1,sqrt(2)*x2,1) </li></ul><ul><li>K(u,v)=  (u)  (v)= (u  v + 1) 2 </li></ul><ul><li>A Support Vector Machine with polynomial kernel function classifies a new example z as follows: </li></ul><ul><li>sign((   i y i  (x i )  (z))+b) = </li></ul><ul><li>sign((   i y i  (x i  z +1) 2 ))+b) </li></ul><ul><li>Remark:  i and b are determined using the methods for linear SVMs that were discussed earlier </li></ul>Kernel function trick : perform computations in the original space, although we solve an optimization problem in the transformed space  more efficient; More details  Topic14.
  20. 20. Summary Support Vector Machines <ul><li>Support vector machines learn hyperplanes that separate two classes maximizing the margin between them ( the empty space between the instances of the two classes) . </li></ul><ul><li>Support vector machines introduce slack variables, in the case that classes are not linear separable and trying to maximize margins while keeping the training error low. </li></ul><ul><li>The most popular versions of SVMs use non-linear kernel functions to map the attribute space into a higher dimensional space to facilitate finding “good” linear decision boundaries in the modified space. </li></ul><ul><li>Support vector machines find “margin optimal” hyperplanes by solving a convex quadratic optimization problem. However, this optimization process is quite slow and support vector machines tend to fail if the number of examples goes beyond 500/2000/5000… </li></ul><ul><li>In general, support vector machines accomplish quite high accuracies, if compared to other techniques. </li></ul>
  21. 21. Useful Support Vector Machine Links Lecture notes are much more helpful to understand the basic ideas: http://www.ics.uci.edu/~welling/teaching/KernelsICS273B/Kernels.html   http://cerium.raunvis.hi.is/~tpr/courseware/svm/kraekjur.html   Some tools are often used in publications                  livsvm:     http://www.csie.ntu.edu.tw/~cjlin/libsvm/                   spider:     http://www.kyb.tuebingen.mpg.de/bs/people/spider/index.html   Tutorial Slides:      http://www.support-vector.net/icml-tutorial.pdf   Surveys:                 http://www.svms.org/survey/Camp00.pdf   More General Material: http://www.learning-with-kernels.org/ http://www.kernel-machines.org/ http://kernel-machines.org/publications.html http://www.support-vector.net/tutorial.html           Remarks : Thanks to Chaofan Sun for providing these links!
  22. 22. Optimal Separating Hyperplane Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) (Cortes and Vapnik, 1995; Vapnik, 1995) Alpaydin transparencies on Support Vector Machines —not used in lecture!
  23. 23. Margin <ul><li>Distance from the discriminant to the closest instances on either side </li></ul><ul><li>Distance of x to the hyperplane is </li></ul><ul><li>We require </li></ul><ul><li>For a unique sol’n, fix ρ || w ||=1and to max margin </li></ul>Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  24. 24. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  25. 25. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  26. 26. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Most α t are 0 and only a small number have α t >0; they are the support vectors
  27. 27. Soft Margin Hyperplane <ul><li>Not linearly separable </li></ul><ul><li>Soft error </li></ul><ul><li>New primal is </li></ul>Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  28. 28. Kernel Machines <ul><li>Preprocess input x by basis functions </li></ul><ul><li>z = φ ( x ) g ( z )= w T z </li></ul><ul><li>g ( x )= w T φ ( x ) </li></ul><ul><li>The SVM solution </li></ul>Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
  29. 29. Kernel Functions <ul><li>Polynomials of degree q : </li></ul><ul><li>Radial-basis functions: </li></ul><ul><li>Sigmoidal functions: </li></ul>Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) (Cherkassky and Mulier, 1998)
  30. 30. SVM for Regression <ul><li>Use a linear model (possibly kernelized) </li></ul><ul><ul><li>f ( x )= w T x + w 0 </li></ul></ul><ul><li>Use the є -sensitive error function </li></ul>Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

×