Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Support Vector Machines


Published on

Understanding Support Vector Machines

Published in: Science, Technology, Education
  • Be the first to comment

  • Be the first to like this

Support Vector Machines

  1. 1. Support Vector Machines Theory and Implementation in python by Nachi
  2. 2. Definition In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. - Wikipedia
  3. 3. Properties of an SVM Non probabilistic binary linear classifier Support for non-linear classification using the 'kernel trick'
  4. 4. Linear separability Two sets of points in p-dimensional space are said to be linearly separable if they can be separated using a p-1 dimensional hyperplane. Example - The two sets of 2D data in the image are separated by a single straight line (1D hyperplane), and hence are linearly separable
  5. 5. Linear Discriminant The hyperplane that separates the two sets of data is called the linear discriminant. Equation: WT X = C W = [w1,w2,.......wn] X = [X1,X2,......Xn] for the nth dimension
  6. 6. Selecting the hyperplane For every linearly separable data, there exist infinite number of separating hyperplanes. Hence, we must choose the most suitable one for classification.
  7. 7. Maximal Margin Hyperplane We can compute the (perpendicular) distance from each observation in the data set to a given separating hyperplane; the smallest such distance is the minimal distance from the observations to the hyperplane, and is known as the margin. The maximal margin hyperplane is the separating hyperplane for which the margin is largest.
  8. 8. Example - maximal margin hyperplane
  9. 9. Finding the shortest distance (margin) Find Xp Such that ||Xp-X|| is minimum and Wt Xp =C (as Xp is on decision boundary) [Wt - W transpose]
  10. 10. Maximizing the margin Maximize D such that D = (WT X – C) / ||W|| where X is the support v
  11. 11. Why maximum margin hyperplane? ● Supposing we have a maximal margin hyperplane for a data set and want to predict the class for a new observation, we compute the distance from the hyperplane. ● The more the distance from the hyperplane the more confident we are that the sample belongs to that class. ● Thus the hyperplane with the farthest smallest distance from the training observation would be the most suitable.
  12. 12. Classifying a new sample Consider a new sample x’ = [x1,x2,....xn]. To predict the class to which the sample belongs, we must simply compute WT X = C. If WT X > C it lies on one side (positive half space) of the hyperplane or if WT X < C it lies on the other side (negative half space) of the hyperplane. The sample belongs to the class which represents the corresponding half space.
  13. 13. SVM - A linear discriminant An SVM is simply a linear discriminant which tries to build a hyperplane such that it has a large margin. It classifies a new sample by simply computing the distance from the hyperplane.
  14. 14. Support Vectors ● Observations (represented as vectors) which lie at marginal distance from the hyperplane are called support vectors. ● These are important as shifting them even slightly might change the position of the hyperplane to a great extent.
  15. 15. Example - Support vectors The vectors lying on the green lines in the image are the support vectors.
  16. 16. Soft margin To avoid ‘overfitting’ of data (i.e. low sensitivity of individual observations) by trying to make perfectly linearly separable sets, we may opt to allow some amount of misclassification keeping in mind the greater robustness to individual observations and better classification of most of the observations.
  17. 17. Achieving soft margin Each observation has something known as the ‘slack variable’ that allow individual observations to be on the wrong side of the margin or the hyperplane. Sum of slack variables <= C Where C is a nonnegative tuning parameter. C is our budget for the amount that the margin can violated by all the observations.
  18. 18. Tuning parameter C & Support vectors relation Observations that lie directly on the margin, or on the wrong side of the margin for their class, are known as support vectors. These observations do affect the support vector classifier. When the tuning parameter C is large, then the margin is wide, many observations violate the margin, and so there are many support vectors.
  19. 19. Non linearly separable In this case, an SVM would not able to linearly classify the data. Hence SVM uses what is known as the ‘kernel trick’. The idea is that the enlarged feature space might have a linear boundary which might not quite be linear in the original feature space. In this ‘trick’ the feature space is enlarged. This can be done using various kernel functions.
  20. 20. Enlarged feature space
  21. 21. Multi-Category Classification ● One-Versus-One Classification ● One-Versus-All Classification
  22. 22. Sample Data X = [ [0,0], [1,1], [2,2], [3,3], [4,4] ] Y = [ 0, 0, 0, 1, 1]
  23. 23. SVM in sklearn clfy = svm.SVC() Default: class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)
  24. 24. ‘Fit’ the model,y) Fit the SVM model i.e., compute and build a hyperplane.
  25. 25. Features of sklearn clfy.support_vectors_ Retrieve all the support vectors of the model clfy.predict([3,3]) Predict the class of the given sample
  26. 26. Features of sklearn clfy.score(x,y) Returns the mean accuracy on the given test data and labels. clfy.decision_function([2.5,2.5]) Distance of the samples X to the separating hyperplane.
  27. 27. Conclusion Parameter and kernel selection is crucial in an SVM model.