Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

464 views

Published on

Support vector machines (SVMs) often contain a
large number of support vectors which reduce the run-time
speeds of decision functions. In addition, this might cause an
overfitting effect where the resulting SVM adapts itself to the
noise in the training set rather than the true underlying data
distribution and will probably fail to correctly classify unseen
examples. To obtain more fast and accurate SVMs, many
methods have been proposed to prune SVs in trained SVMs.
In this paper, we propose a multi-objective genetic algorithm
to reduce the complexity of support vector machines as well
as to improve generalization accuracy by the reduction of
overfitting. Experiments on four benchmark datasets show that
the proposed evolutionary approach can effectively reduce the
number of support vectors included in the decision functions
of SVMs without sacrificing their classification accuracy.

Published in: Technology, Education
  • Be the first to comment

A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

  1. 1. Support Vector Machine SVM Pruning Experiments Conclusion Future Work A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines Mohamed Abdel Hady, Wessam Herbawi, Friedhelm Schwenker Institute of Neural Information Processing University of Ulm, Germany {mohamed.abdel-hady}@uni-ulm.de November 4, 2011 1 / 15
  2. 2. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Support Vector Machine + - + + + + + + + + - - - - - - - - - + {x|‹w, ϕ(x)›+b = 0} w y = -1 y = +1 {x|‹w, ϕ(x)›+b = -1} {x|‹w, ϕ(x)›+b = +1} Maximum margin + + - - є1 є4 є2 є3 + - 2 / 15
  3. 3. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Support Vector Machine To obtain the optimal hyperplane, one solves the following convex quadratic optimization problem with respect to weight vector w and bias b: min w,b 1 2 w 2 + C n i=1 i , (1) subject to the constraints: yi ( w, φ(xi ) + b) ≥ 1 − i , i ≥ 0 for i = 1 . . . , n (2) The regularization parameter C controls the trade-off between maximizing the margin 1/ w and minimizing the sum of slack variables of the training examples i = max(0, 1 − yi ( w, φ(xi ) + b))for i = 1, . . . , n. (3) The training example xi is correctly classified if 0 ≤ i < 1 and is misclassified when i ≥ 1. 3 / 15
  4. 4. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Support Vector Machine The problem is converted into its equivalent dual problem, using standard Lagrangian techniques, whose number of variables is the number of training examples. max α n i=1 αi − 1 2 n i,j=1 αi αj yi yj k(xi , xj ) (4) subject to the constraints n i=1 αi yi = 0 and 0 ≤ αi ≤ C for i = 1, . . . n. (5) where the coefficients α∗ i are the optimal solution of the dual problem and k is the kernel function. Hence, the decision function to classify unseen example x can be written as: f(x) = nsv i=1 α∗ i yi k(x, xi ) + b∗ , (6) The training examples xi with α∗ i > 0 are called support vectors and the number of support vectors is denoted by nsv ≤ n. 4 / 15
  5. 5. Support Vector Machine SVM Pruning Experiments Conclusion Future Work SVM Pruning The classification time complexity of the SVM classifier scales with the number of support vectors (O(nsv )). To reduce the complexity of SVM, the number of support vectors should be reduced To reduce the overfitting (over-training) of SVM, the number of support vectors should be reduced Indirect methods: reduce the number of training examples {(xi , yi ) : i = 1, . . . , n} [Pedrajas, IEEE TNN 2009] Direct methods: The multiobjective evolutionary SVM proposed in this paper is the first evolutionary algorithm that reformulates SVM pruning as a combinatorial multi-objective optimization problem. 5 / 15
  6. 6. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Genetic Algorithm for Support Vector Selection Evaluate SVM simplified decision function GA Operators (Selection, Crossover and Mutation) Evaluate the fitness of individuals in population Number of support vectors Training error Genetic Algorithm support vectors indices 6 / 15
  7. 7. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Representation (Encoding) For support vector selection a binary encoding is appropriate. Here, the tth candidate solution in a population is an nsv -dimensional bit vector st ∈ {0, 1}nsv . The jth support vector will be included in the decision function if stj = 1 and excluded when stj = 0. For instance, if we have a problem with 7 support vectors, the tth individual solution of the population can be represented as st = (1, 0, 0, 1, 1, 1, 0) or st = (0, 1, 0, 1, 1, 0, 1). Then for each solution with bit vector st , only the summation of the nsv selected support vectors are performed to define the reduced decision function (freduced ), which is used in Eq. (9) to evaluate the fitness of solution st . freduced (xi , st ) = nsv j=1 stj α∗ j yj Kij + b∗ , (7) 7 / 15
  8. 8. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Selection Criteria (Objectives) determine the quality of each candidate solution in the population. We want to design classifiers with high generalization ability. There is a trade-off between SVM complexity and its training error (the number of misclassified examples on the set n training examples) the following two objective functions are used to measure the fitness of a solution st : f1(st ) = nsv = nsv j=1 stj (8) and f2(st ) = n i=1 1(yi =sgn(freduced (xi ,st ))) (9) where freduced is the reduced decision function defined in Eq. (7) and sgn is the indicator function with values -1 and +1. It is easy to achieve zero training error when all training examples are support vectors, but this solution is not likely to generalize well (prone to overfitting). 8 / 15
  9. 9. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Experimental Setup soft-margin L1-SVMs with Gaussian kernel function k(x, xi ) = exp(−γ x − xi 2 ) (10) with γ = 1/d and the regularization term C =1. four benchmark datasets from UCI Benchmark Repository, ionosphere, diabetes, sick, and german credit where the number of features (d) is 34, 8, 29, and 20, respectively. All features are normalized to have zero mean and unit variance. Each dataset is divided randomly into two subsets, 10% are used as testset Dtest , while the remaining 90% are used as training examples Dtrain. Thus, the size of training sets (n) is 315, 691, 3394 and 900 and the size of test set (m) is 36, 77, 378 and 100, respectively. At the beginning of the experiment, a soft margin L1-norm SVM is constructed using subset Dtrain and SMO algorithm. The training error f2(st ) of each individual solution st (support vector subset) is evaluated on subset Dtrain where CE(train) = f2(st )/n. After each run of MOGA, we evaluate the average test set error CE(test) of each solution in the final set of Pareto-optimal solutions using subset Dtest . 9 / 15
  10. 10. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Experimental Results For the application of the NSGA-II we choose a population size of 100 and the other parameters of the NSGA-II (pc = 0.9, pmut = 1/nsv , ηc = 20, ηmut = 20) where the two objectives given in Eq. (8) and Eq. (9) are optimized. For each dataset, ten optimization runs of MOGA are carried out, each of them lasting for 10000 generations. Pareto-optimal solutions after pruning compared to unpruned SVM dataset ionosphere diabetes sick german credit before [101, 4, 10] [399, 126, 14] [503, 88, 12] [820, 20, 27] after [0, 202, 23] [0, 450, 50] [0, 208, 23] [8, 259, 26] to [15, 3, 5] to [101, 125, 18] to [92, 83, 13] to [283, 57, 22] the solutions are written as triple [nsv , n.CE(train), m.CE(test)] 10 / 15
  11. 11. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Pareto Fronts 0 5 10 15 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 ionosphere 0 50 100 150 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 diabetes after pruning: CE(train) after pruning: CE(test) before pruning: CE(train) before pruning: CE(test) 0 20 40 60 80 100 0.02 0.03 0.04 0.05 0.06 0.07 sick 0 100 200 300 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 german credit 11 / 15
  12. 12. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Experimental Results For many solutions for ionosphere and german credit, we can see the effort of overfitting as the generalization ability of the SVM classifier was improved after pruning while the training error get worse. A typical MOO heuristic is to select a solution (support vector subset) that corresponds to an interesting part of the Pareto front. 12 / 15
  13. 13. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Attainment Surfaces 0 5 10 15 20 25 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 ionosphere 0 50 100 150 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 diabetes attainment surface: 10th attainment surface: 5th attainment surface: 1st before pruning 0 50 100 150 200 0.02 0.03 0.04 0.05 0.06 0.07 sick 0 100 200 300 400 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 german credit 13 / 15
  14. 14. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Experimental Results The attainment curves have a maximum complexity of 22, 132, 171, and 300 for ionosphere, diabetes, sick and german credit, respectively. That is, the evolutionary pruning approach achieved a percentage of complexity reduction equals to 78.2%, 66.9%, 66% and 63.4% for the four datasets, repectively without sacrificing the training error. 14 / 15
  15. 15. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Conclusion Support vector selection is a multi-objective optimization problem. We have described a genetic algorithm to reduce the computational complexity of support vector machines by reducing number of support vectors comprised in their decision functions. The resulting Pareto fronts visualize the trade-off between SVM complexity and its training error for guiding the support vector selection For some data sets, the experimental results show that the test set classification accuracy is improved after pruning without sacrificing the training set accuracy. Thus, the post-pruning of SVMs achieved the same effect of post-pruning decision trees where it reduces overfitting. 15 / 15
  16. 16. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Future Work We plan to extend the application of the proposed approach to regression tasks that suffer from the same problem of large number of support vectors in the decision functions of support vector regression machines. In addition, we will conduct further experiments using other types of kernel functions as we used only Gaussian kernels in the presented experiments. We expect that the percentage of complexity reduction is kernel-dependent. 16 / 15
  17. 17. Support Vector Machine SVM Pruning Experiments Conclusion Future Work Thanks for your attention Questions ?? 17 / 15

×