Dana Simian, Florin Stoica, Evaluation of a hybrid method for constructing multiple SVM kernels, Recent Advances in Computers, Proceedings of the 13th WSEAS International Conference on Computers, Recent Advances in Computer Engineering Series, WSEAS Press, Rodos, Greece, July 23-25, 2009, ISSN: 1790-5109, ISBN: 978-960-474-099-4, pp. 619-623
Evaluation of a hybrid method for constructing multiple SVM kernels
1. Proceedings of the 13th WSEAS International Conference on COMPUTERS
Evaluation of a hybrid method for constructing multiple SVM kernels
DANA SIMIAN
Computer Science Department
Faculty of Sciences
University “Lucian Blaga” Sibiu
Str. Dr. Ion Ratiu 5-7, 550012, Sibiu
ROMANIA
d_simian@yahoo.com
FLORIN STOICA
Computer Science Department
Faculty of Sciences
University “Lucian Blaga” Sibiu
Str. Dr. Ion Ratiu 5-7, 550012, Sibiu
ROMANIA
florin.stoica@ulbsibiu.ro
Abstract: - In this paper we evaluate the performance of many multiple SVM kernels obtained using a hybrid
algorithm. The purpose of our algorithm is to optimize the construction of multiple SVM kernels used in
classification tasks. We compare the results obtained using different types of simple kernels and we
characterize the behavior of the multiple kernel related to the composition operations +,* and exp and simple
kernel types. We use many data sets in order to correlate the performance of our algorithm with the type of
the classified data.
Key-Words: - SVM kernel, Genetic algorithms, Optimization
1 Introduction and motivation
The classification task appears in various fields:
- in medicine – classification of clinical data;
- in biology - classification of species, of cells, of
genes;
- in chemistry – classification of substances;
- in ecology - classification of ecological systems;
- in bibliomining - classification can be used for
finding hidden patterns in data by deciding to
what pre-defined class to assign a record of the
data set;
- in webmining.
The classification process consists in ordering
given items, derived from a specific problem, based
on their characteristics and similarity, in pre-defined
groups or classes. The task of classification is to find
a rule, which based on external observations assigns
an object to one of several classes. Classification
predicts categorical labels. We usually have access
to data that have been classified (named cases) and
we want to build classification models based on
these data. The assignment has to be consistent with
real data that we have about the problem.
Data that are used for learning constitute the
training set. Each instance in the training set
contains one (or many) target value, named class
label and several attributes named features.
The model obtained is evaluated using a different set
of data – the test data set (unseen data).
Two types of approaches for classification can be
define: classical statistical approaches (discriminate
analysis, generalized linear models) and modern
statistical machine learning (neural network,
evolutionary algorithms, support vector machine,
belief networks, classification trees, Gaussian
processes).
In machine learning, classification is defined as
supervised learning.
Large collections of heterogeneous data are now
available in electronic format. We can use them for
building classifiers for different particular problems.
The difficulty is that a very good classifier for a
particular class of problems might work very badly
in other cases. It is necessary to train our classifier
on specific sets of data.
We consider in this article only Support Vector
Machines (SVM), which is one of the most popular
approaches in the machine learning literature.
Support Vector Machines represent a class of neural
networks, introduced by Vapnik ([14]), which can
be used for solving the problem of binary or
multiclass classification.
The goal of SVM is to produce a model which
predicts target value of data instances in the testing
set. SVM models are obtained by convex
optimization and are able to learn and generalize in
high dimensional input spaces. A hyperplane with
the maximum margin is used for discriminating a
class from another. If the data set is separable we
obtain an optimal separating hyperplane with a
maximal margin. In the case of no separable data,
using an appropriate kernel, the data are projected
ISSN: 1790-5109 619 ISBN: 978-960-474-099-4
2. Proceedings of the 13th WSEAS International Conference on COMPUTERS
in a space with higher dimension in which they are
separable by a hyperplane. The kernel method is a
very powerful one. Kernel functions can be
interpreted as representing the inner product of data
objects implicitly mapped into a nonlinear feature
space. The ”kernel trick” is to calculate the inner
product in the feature space without knowing
explicit the mapping function. Usually, simple
kernels are used:
- Polynomial:
K d ,
r x x = x ⋅ x + r d
where r d ∈Ζ+ POL ( , ) ( ) , , 1 2 1 2
- RBF:
⎞
⎟ ⎟⎠
⎛
K (x , x ) exp 1 x x RBF γ
⎜ ⎜⎝
−
−
= 2
1 2 2 2 1 2
γ
- Sigmoidal:
( , ) tanh( 1) 1 2 1 2 K x x = ⋅ x ⋅ x + SIG γ γ
The parameters in these kernels are tuned by hand
and different methods for evaluation the
performance of the kernels, for particular data sets
can be used. By example, cross-validation estimates
the generalization error of a given model, or it can
be used for model selection by choosing one of
several models that has the smallest estimated
generalization error.
Usually the choice of the kernel is made
empirically and the standard SVM classifiers use a
single kernel, but the real problems require more
complex kernels. Recent papers proved that multiple
kernels give better results than the single ones. In
the design process of a multiple kernel, only the
operations (+,*, exp) can be used. These operations
preserve the properties of a kernel function, which
are deriving from the Mercer’s conditions.
It is very important to build multiple kernels adapted
to the input data. The problem of choosing an
optimal kernel function and the optimal values for
the problem parameters is known as the problem of
model choosing ([6]).
Recent development are oriented in finding
complex kernels and studying their behavior for
different problems ([1], [3] - [6], [8], [9], [11], [12]).
One possibility is to use a linear combination of
simple kernels and to optimize the weights ([1]). For
optimization the weights two different kind of
approaches can be found. One of them reduces the
problem to a convex optimization problem. Other
uses evolutionary methods for optimizing the
weights Complex nonlinear multi kernels were
proposed in [4]-[6], [11], [12], where hybrid
approaches using a genetic algorithm and a SVM
algorithm are proposed.
The aim of this paper is to construct and analyze
multiple SVM kernels, based on a genetic algorithm
which uses a new co-mutation operator and to
evaluate the classifiers obtained in this way.
2 Main results
2.1 The model of our multiple kernel
We design a multiple kernel using a genetic
algorithm and a SVM algorithm. We use the idea
proposed in [6]. Every chromosome codes the
expression of a multiple kernel. The quality of a
chromosome is represented by the classification
accuracy (the number of correctly classified items
over the total number of items) using the multiple
kernel coded in this chromosome and it is obtained
running the SVM algorithm. The hybrid techniques
is structured in two levels: a macro level and a micro
level. The macro level is represented by the genetic
algorithm which builds the multiple kernels. The
micro level is represented by the SVM algorithm
which computes the quality of chromosomes. The
accuracy rate is computed by the SVM algorithm on
a validation set of data
In a first level, we will build and evaluate multiple
kernels using the set of operations
op ∈{+,∗, exp}, i = 1,3 i
We use a genetic algorithm based on a modified co-mutation
operator, LR-Mijn, introduced by us in [3].
General representation of multiple kernel
( ) ( ) 1 2 2 1 3 3 4 K op K op K op K
is given in figure 1.
op1
op2 op3
1 K 2 K 3 K 4 K
Fig. 1 Representation of multiple kernel
We will consider in our construction at most 4
simple kernels. If a node contains the operation exp
only his “left” kernel descendants is considered. We
use all the three types of simple kernels, presented in
section 1. A polynomial kernel depends on 2 integer
parameters: the degree di and the coefficient ri. RBF
and sigmoidal kernels depend on one real parameter
γi.
Our chromosome is composed from 78 genes: 2
genes for each operation, 2 genes for the kernel’s
type, 4 genes for the degree parameter di, 12 genes
for rj. If the associated kernel is not polynomial, the
ISSN: 1790-5109 620 ISBN: 978-960-474-099-4
3. Proceedings of the 13th WSEAS International Conference on COMPUTERS
last 16 genes are used to represent the real value of
parameter γi, in place of di and ri. The structure of
the chromosome which codes the multiple kernel is:
1 d 1 r
1 op 2 op 3 op 1 t
1 γ
…
2.2 Operations of the genetic algorithm
Initialization
We initialize randomly population P(t) with P
elements.
Evaluation
The evaluation of the chromosome is made using the
SVM algorithm for a particular set of data. To do
this we divide the data into two subsets: the training
subset, used for problem modeling and test subset
used for evaluation. The training subset is also
random divide into a subset for learning and a subset
for validation. The SVM algorithm uses the data
from the learning subset for training and the subset
from the validation set for computing the
classification accuracy which is used as fitness
function for the genetic algorithm.
Co-Mutations
We select randomly one element among the best T%
from P(t). We mutate it using the co-mutation
operator LR-Mijn.
We introduced in [13] the co-mutation operator,
named LR-Mijn, which makes long jumps, finding
the longest sequence of σp elements, situated in the
left or in the right of the position p. In the same
paper, we constructed an evolutionary algorithm
based only on selection and on LR-Mijn operator. We
also verified the effectiveness of our co-mutation
operator using three benchmarking problems
(classical test functions of Rastrigin, Schwefel and
Griewangk). We concluded that LR-Mijn offers
superior performances than the co-mutation operator
Mijn introduced in [7] by de Falco and our
evolutionary algorithm based on LR-Mijn has a
better convergence than corresponding algorithm
based on Mijn operator.
The co-mutations replace the classical mutations
and cross-over.
Mutations on the operations field
The number of genes which code the operations
used for the construction of multiple kernel is
significant less than the total number of genes of the
chromosome. To allow a often faster changing of
the operations we make mutation only in the region
of the first 6 genes. We select randomly one element
among the best T1 % from P(t). One operation from
the chromosome is random selected and is replaced
by another operation from the set of allowed
operations.
2.3 SVM algorithm
The evaluation of the chromosome is made using the
SVM algorithm for a particular set of data. To do
this we divide the data into two subsets: the training
subset, used for problem modeling and test subset
used for evaluation. The training subset is also
random divided into a subset for learning and a
subset for validation. The SVM algorithm uses the
data from the learning subset for training and the
subset from the validation set for computing the
classification accuracy which is used as fitness
function for the genetic algorithm.
2.4 Implementation/testing/validation
For the implementation/testing/validation of our
method was used different data set from the page
LIBSVM data sets page ([2]).
In order to replace the default polynomial kernel
from libsvm, we extend the svm_parameter class
with the following attributes:
// our kernel is ”hybrid”
public static final int HYBRID = 5;
// parameters for multiple polynomial kernels
public long op[];
public long type[];
public long d[];
public long r[];
public double g[];
The class svm_predict was extended with the
method predict, The Kernel class was modified to
accomplish the kernel substitution. In the k_function
method, the simple kernel computation part was
modified.
In the genetic algorithm, the operations and the
parameters of simple kernels are obtained from a
chromosome, which is then evaluated using the
result of the predict method.
After the end of the genetic algorithm, the best
chromosome gives the multiple kernel which can be
evaluated on the test subset of data. If the accuracy
is acceptable, the model can be used to classify
items for which the class label is unknown. The way
of construction this multiple kernel assures that it is
a veritable kernel, that is, it satisfies Mercer’s
conditions.
3. Experimental results
For evaluating the performance of our hybrid
method, we used many datasets from libsvm library
([2]) and compare the classification accuracy with
this obtained using the standard method from libsvm
ISSN: 1790-5109 621 ISBN: 978-960-474-099-4
4. Proceedings of the 13th WSEAS International Conference on COMPUTERS
(simple kernel).
For each execution, dimension of population was
35 and the number of generations was 30.
For the “leukemia” data set, multiple kernels
obtained using genetic approach are improving the
classification accuracy, from 67,64%, up to
94.12%. In the figure 2 are presented results from
three runs of our genetic algorithm based on a
modified LR-Mijn operator.
Fig. 2 Classification accuracy using
multiple kernels for “leukemia” data set
One “optimal” multiple kernel obtained is
( K γ + K r1 ,d1 ) + ( K r2 ,d 2 * K
r3 ,d 3 )
SIG POL POL POL γ =1.97 , 3 1 d = , 609 1 r = , 2 2 d = , 3970 2 r = ,
1 3 d = , 3615 3 r = .
Another optimal kernel is
( r1 ,d1 ) ( r2 ,d 2 * r3 ,d 3 )
RBF POL POL POL K γ + K + K K
γ = 0.50 , 3 1 d = , 633 1 r = , 2 2 d = , 3970 2 r = ,
1 3 d = , 4095 3 r = .
For the “splice” data set, multiple kernels
obtained using genetic approach are improving the
classification accuracy, from 52.0%, up to 72.68%.
In the figure 3 are presented results from three runs
of our genetic algorithm based on a modified LR-Mijn
operator.
The “optimal” multiple kernel obtained is
( r1 ,d1 )* exp ( )
POL SIG RBF K + K γ K γ
0 1 d = , 1295 1 r = , 0.0147 1 γ = , 1.8598 2 γ = .
Fig. 3 Classification accuracy using
multiple kernels for “splice” data set
Using the standard libsvm package, for the
“vowel” data set we are improving the classification
accuracy from 51.28% up to 62.12%.
In the figure 4 are presented results from three runs
of our genetic algorithm based on a modified LR-Mijn
operator.
Fig. 4 Classification accuracy using
multiple kernels for “vowel”data set
The optimal multiple kernel obtained is
( K γ 1 * K γ 2 ) + ( K γ 3 * K
r3 ,d 3 )
RBF SIG RBF POL 0.6637 1 γ = , 0.1563 2 γ = , 0.0546 3 γ = , 7 1 d = ,
3324 1 r = .
4 Conclusion
In this paper we presented a hybrid approach for
optimization the SVM multiple kernels. The idea of
using hybrid techniques for optimization the
multiple kernels is not new, but is very recent and
the way in which we designed the first level of the
method is original. In the first level, we use a co-mutation
operator, introduced by ourselves in [13].
ISSN: 1790-5109 622 ISBN: 978-960-474-099-4
5. Proceedings of the 13th WSEAS International Conference on COMPUTERS
The results obtained comparing the classification
accuracy using multiple kernels built from our
methods with the ones obtained using the standard
method from libsvm are promising. We used three
data sets. The results underline that the classification
accuracy is dependent on the type of data. Further
numerical experiments are required in order to asses
the power of our evolved kernels. Results
concerning classification accuracy using multiple
kernels, obtained using other techniques, on the
same data sets like in our case should be necessary
for making a relevant comparison. This will be a
further direction of our study. Another further
direction is the modification of the genetic algorithm
from the first level for improving the convergence
and obtaining in a shorter time a multiple kernel
which assures better classification accuracy.
References:
[1] Bach F. R., Lanckriet G. R. G., Jordan M. I.,
Multiple kernel learning, conic duality, and the
SMO algorithm Machine Learning, Proceedings of
ICML 2004, ACM, 2004, 6.
[2]Chang C-C., Lin C-J., LIBSVM : a library for
support vector machines, 2001. Software available
at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[3] Chapelle O., Vapnik V., Bousquet O., Mukherjee
S, Choosing multiple parameters for support vector
machines, Machine Learning, 46(1/3), 2002, 131 -
159.
[4] Diosan L., Oltean M., Rogozan A., Pecuchet J.
P., Improving svm performance using a linear
combination of kernels, Adaptive and Natural
Computing Algorithms, ICANNGA07, volume 4432
of LNCS, 2007, 218 - 227.
[5] Diosan L., Rogozan A., Pecuchet J. P., Une
approche evolutive pour generer des noyaux
multiples (An evolutionary approach for generating
multiple kernels), portal VODEL,
http://vodel.insarouen. fr/publications/rfia, 2008.
[6] Diosan L., Oltean M., Rogozan A., Pecuchet J.
P., Genetically Designed Multiple-Kernels for
Improving the SVM Performance, portal VODEL,
http://vodel.insa-rouen.fr/publications/rfia, 2008.
[7] De Falco,. Iazzetta A,. Della Cioppa A,
Tarantino E., The Effectiveness of Co-mutation in
Evolutionary Algorithms: the Mijn operator,
Research Institute on Parallel Information Systems,
National Research Council of Italy, 200
[8] Nguyen H. N., Ohn S. Y., Choi W. J., Combined
kernel function for support vector machine and
learning method based on evolutionary algorithm,
Neural Information Processing, 11th International
Conference, ICONIP 2004, volume 3316 of LNCS,
Springer, 2004, 1273 - 1278.
[9] Ohn S. Y., Nguyen H. N., Chi S. D.,
Evolutionary parameter estimation algorithm for
combined kernel function in support vector machine,
Content Computing, Advanced Workshop on
Content Computing, AWCC 2004, volume 3309 of
LNCS, Springer, 2004, 481 - 486.
[10] Ohn S. Y., Nguyen H. N., Kim D. S., Park J. S.,
Determining optimal decision model for support
vector machine by genetic algorithm, Computational
and Information Science, First International
Symposium, CIS 2004, volume 3314 of LNCS,
Springer, 2004, 895 - 902.
[11] Simian D., Stoica F., An evolutionary method
for constructing complex SVM kernels, Recent
Advances in Mathematics and Computers in Biology
and Chemistry, Proceedings of the 10th International
Conference on Mathematics and Computers in
Biology and Chemistry, MCBC'09,Prague, Chech
Republic, 2009, pp. 178-184.
[12] Simian D., A Model For a Complex Polynomial
SVM Kernel, Proceedings of the 8-th WSEAS Int.
Conf. on Simulation, Modelling and Optimization.
Santander Spain, 2008, within Mathematics and
Computers in Science and Engineering, pp. 164-
170, ISSN 1790 -2769, ISBN 978- 960-474-007-9
(2008)
[13] Stoica F., Simian D., Simian C., A new co-mutation
genetic operator, Advanced topics on
evolutionary computing, Proceeding of the 9-th
Conference on Evolutionay Computing, Sofia, Mai
2008, pp. 76-82,
[14] Vapnik V., The Nature of Statistical Learning
Theory, Springer Verlag, 1995.
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datase
ts
ISSN: 1790-5109 623 ISBN: 978-960-474-099-4