Dana Simian, Florin Stoica, Corina Simian, Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Behaviour, Lecture Notes in Computer Science, LNCS 5910 (2010), I. Lirkov, S. Margenov, and J. Wasniewski (Eds.), Springer-Verlag Berlin Heidelberg, pp. 361-368
Luciferase in rDNA technology (biotechnology).pptx
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Behaviour
1. Optimization of Complex SVM Kernels Using a
Hybrid Algorithm Based on Wasp Behaviour
Dana Simian, Florin Stoica, and Corina Simian
University “Lucian Blaga” of Sibiu, Faculty of Sciences
5-7 dr. I. Rat¸iu str, 550012 Sibiu, Romˆania
Abstract. The aim of this paper is to present a new method for opti-mization
of SVM multiple kernels. The kernel substitution can be used
to define many other types of learning machines distinct from SVMs.
We introduced a new hybrid method which uses in the first level an evo-lutionary
algorithm based on wasp behaviour and on the co-mutation
operator LR − Mijn and in the second level a SVM algorithm which
computes the quality of chromosomes. The most important details of
our algorithms are presented. The testing and validation proves that
multiple kernels obtained using our genetic approach are improving the
classification accuracy up to 94.12% for the “leukemia” data set.
1 Introduction
Classification task is to assign an object to one or several classes, based on a set
of attributes. A classification task supposes the existence of training and testing
data given in the form of data instances. Each instance in the training set con-tains
one target value, named class label and several attributes named features.
The accuracy of the model for a specific test set is defined as the percentage
of test set items that are correctly classified by the model. If the accuracy is
acceptable, the model can be used to classify items for which the class label
is unknown. Two types of approaches for classification can be defined: classi-cal
statistical approaches (discriminate analysis, generalized linear models) and
modern statistical machine learning (neural network, evolutionary algorithms,
support vector machines — SVM, belief networks, classification trees, Gaus-sian
processes). In the recent years, SVMs have become a very popular tool
for machine learning tasks and have been successfully applied in classification,
regression, and novelty detection. Many applications of SVM have been done
in various fields: particle identification, face identification, text categorization,
bioinformatics, database marketing, classification of clinical data. The goal of
SVM is to produce a model which predicts target value of data instances in the
testing set which are given only the attributes. Training involves optimization
of a convex cost function. If the data set is separable we obtain an optimal sep-arating
hyperplane with a maximal margin (see Vapnik [12]). In the case of non
separable data a successful method is the kernel method. Using an appropriate
kernel, the data are projected in a space with higher dimension in which they
are separable by a hyperplane [2,12].
I. Lirkov, S. Margenov, and J. Wa´sniewski (Eds.): LSSC 2009, LNCS 5910, pp. 361–368, 2010.
c
Springer-Verlag Berlin Heidelberg 2010
2. 362 D. Simian, F. Stoica, and C. Simian
Standard SVM classifiers use a single kernel. Many recent studies proved that
multiples kernels work better than the singles ones. It is very important to build
multiple kernels adapted to the input data. For optimization of the weights,
two different approaches are used. One possibility is to use a linear combination
of simple kernels and to optimize the weights [1,3,7]. Other uses evolutionary
methods [8] for optimizing the weights. Complex nonlinear multi kernels were
proposed in [3,4,5,6,10,11], where hybrid approaches using a genetic algorithm
and a SVM algorithm are proposed.
The aim of this paper is to introduce a hybrid method for kernel optimization
based on a genetic algorithm which uses an improved co-mutation algorithm
using a modified wasp behaviour computational model.
The paper is organized as follows: section 2 briefly presents the kernel sub-stitution
method. In section 3 and 4 we characterize the co-mutation operator
LR −Mijn and the wasp behaviour computational method. Sections 5 contains
our main results: construction and evaluation of our method. Conclusions and
further directions of study are presented in section 6.
2 Kernel Substitution Method
SVM algorithm can solve the problem of binary or multiclass classification. We
focus in this paper only on the binary classification due to the numerous methods
to generalize the binary classifier to a n - class classifier [12].
Let be given the data points xi ∈ Rd, i = 1, . . .mand their labels yi ∈ {−1, 1}.
We are looking for a decision function f, which associates to each input data
x its correct label y = f(x). For the data sets which are not linearly separable
we use the kernel method, which makes a projection of the input data X in a
feature Hilbert’s space F,
φ : X → F; x → φ(x).
The kernel represents the inner product of data objects implicitly mapped into a
higher dimensional Hilbert space of features. Kernel functions can be interpreted
as representing the inner product of the nonlinear feature space,
K(xi, xj) = φ(xi), φ(xj )
The functional form of the mapping φ(xi) does not need to be known.
The basic properties of a kernel function are derived from Mercer’s theo-rem.
Different simple kernels are often used: linear, polynomial, RBF, sigmoidal.
These kernels will be defined in subsection 5.1. Multiple kernels can be designed
using the set of operations (+, ∗, exp), which preserves the Mercer’s conditions.
3 The Co-mutation LR −Mijn Operator
We introduced in [11] a new co-mutation operator, named LR − Mijn, which
makes long jumps, finding the longest sequence of σp elements, situated in the
3. Optimization of Complex SVM Kernels Using a Hybrid Algorithm 363
left or in the right of the position p. Let denote with σ a generic string and
σ = σl−1 . . . σ0 , where σq ∈ A, ∀q ∈ {0, . . . , l − 1}, A = {a1, . . . , as} being a
generic alphabet. The set of all sequences of length l over the alphabet A will be
denoted with Σ = Al. Through σ(q, i) we denote that on position q within the
sequence σ there is the symbol ai of the alphabet A. σz
p,j denotes the presence
of z symbols aj within the sequence σ, starting from the position p and going
left and σright,n
left,m (p, i) specify the presence of symbol ai on position p within the
sequence σ, between right symbols an on the right and left symbols am on the
left. We suppose that σ = σ(l−1) . . .σ(p+left+1,m)σright,i
left,i (p, i)σ(p−right−
1, n) . . .σ(0). The expression of LR −Mijn depends of the position of p. There
are 4 different cases: p= right, p= l − left − 1; p = right, p= l − left − 1;
p= right, p = l − left − 1; and p = right, p = l − left − 1. The expression of
LR −Mijn, in all these cases can be found in [11].
As an example, let us consider the binary case, the string σ = 11110000
and the randomly chosen application point p = 2. In this case, σ2 = 0, so we
have to find the longest sequence of 0 within string σ, starting from position p.
This sequence goes to the right, and because we have reached the end of the
string, and no occurrence of 1 has been met, the new string obtained after the
application of LR −Mijn is 11110111.
The most interesting points in the co-mutation operator presented above is
the fact that allows long jumps, which are not possible in the classical Bit-
Flip Mutation (BFM), thus the search can reach very far points from where the
search currently is. Our co-mutation operator has the same capabilities of local
search as ordinary mutation has and it has also the possibility of performing
jumps to reach far regions in the search space which cannot be reached by BFM.
This leads to a better convergence of an Evolutionary Algorithm based on the
LR − Mijn in comparison with classical mutation operators. This statement
was verified through experimental results in [11], where a simple evolutionary
algorithm based only on selection and LR−Mijn is used for the optimization of
some well-known benchmark functions(functions proposed by Rastrigin, Schwefel
and Griewangk).
4 Wasp Behaviour Computational Models
In a real colony of wasps, individual wasp interacts with its local environment
in the form of a stimulus-response mechanism, which governs distributed task
allocation. An individual wasp has a response threshold for each zone of the nest.
Based on a wasp’s threshold for a given zone and the amount of stimulus from
brood located in this zone, a wasp may or may not become engaged in the task of
foraging for this zone. A hierarchical social order among the wasps of the colony
is formed through interactions among individual wasps of the colony. When two
individuals of the colony encounter each other, they may with some probability
interact in a dominance contest. The wasp with the higher social rank will have
a higher probability of dominating in the interaction. Computational analogies
of these systems have served as inspiration for wasp behaviour computational
4. 364 D. Simian, F. Stoica, and C. Simian
models. An algorithm based on wasp behaviour is essentially a system based on
agents that simulate the natural behaviour of insects. An artificial wasp, will
probabilistically decide if it bids or nor for a task. The probability is dependent
on the level of the threshold and stimulus. A hierarchical order among the ar-tificial
wasps is given using a probability function. Models using Wasp agents
are used for solving large complex problems with a dynamic character and dis-tributed
coordination of resources and manufacturing control tasks in a factory.
The elements which particularize an algorithm based on wasp behaviour are the
ways in which the response thresholds are updated and the way in which the
“conflicts” between two or more wasps are solved.
5 Main Results: The Model for Constructing Complex
SVM Kernels
5.1 Basic Construction
Our goal is to build and analyze a multiple kernel starting from the following
simple kernels:
- Polynomial: Kd,r
pol(x1, x2) = (x1 · x2 + r)d, r,d ∈ Z+
- RBF: Kγ
RBF (x1, x2) = exp
−1
2γ2 |x1 − x2|2
- Sigmoidal: Kγ
sig(x1, x2) = tanh(γ · x1 · x2 + 1)
We use the idea of the model proposed in [5]. The hybrid techniques are struc-tured
in two levels: a macro level and a micro level. The macro level is represented
by a genetic algorithm which builds the multiple kernel. The micro level is rep-resented
by the SVM algorithm which computes the quality of chromosomes.
The accuracy rate is computed by the SVM algorithm on a validation set of
data. In a first level, we will build and evaluate multiple kernels using the set
of operations opi ∈ {+, ∗, exp}, i = 1, . . . , 3 and a genetic algorithm based on a
modified LR−Mijn operator using a wasp-based computational scheme. In the
fig. 1 the multiple kernel (K1op2K2)op1(K3op3K4) is represented:
If a node contains the operation exp only one of its descendants is considered
(the “left” kernel). We will consider in our construction at most 4 simple kernels.
Every chromosome codes the expression of a multiple kernel. The chromosome
which codes the multiple kernel described above has the structure given in fig. 2,
where opi ∈ {+, ∗, exp}, i ∈ {1, 2, 3}, ti ∈ {POL,RBF,SIG} represents the
Fig. 1. General representation of multiple kernel
5. Optimization of Complex SVM Kernels Using a Hybrid Algorithm 365
Fig. 2. Chromosome’s structure
type of the kernel and di, ri, γi are parameters from the definition of the kernel
Ki, i ∈ {1, . . . , 4}. Each operation opi is represented using two genes, the type of
kernel ti is also represented using two genes, for a degree dj 4 genes are allocated
and the parameter ri is represented using 12 genes. If the associated kernel is
not polynomial, these 16 genes are used to represent a real value of parameter
γi in place of di and ri. Thus, our chromosome is composed from 78 genes.
5.2 Optimization of the LR −Mijn Operator Using a Wasp-Based
Computational Scheme
In this subsection we present our approach in optimization of a LR − Mijn
operator using a scheme based on a computational model of wasp behavior.
Our intention was to allow a often faster changing of the operations in the
multiple kernel’s structure. The representation length for one operation inside
the chromosome’s structure is 2 genes, significantly less than the representation
length for the multiple kernel’s parameters (4 and 12 genes). Therefore, usually,
in the optimization process, the probability of changing the multiple kernels’
parameters is bigger the the probability of changing the operations from the
multiple kernel.
Each chromosome C has an associated wasp. Each wasp has a response thresh-old
θC. The set of operations coded within chromosome broadcasts a stimulus
SC which is equal to difference between maximum classification accuracy (100)
and the actual classification accuracy obtained using the multiple kernel coded
in the chromosome:
SC = 100 − CAC (1)
The modified LR −Mijn operator will perform a mutation that will change the
operations coded within chromosome with probability:
P(θC, SC) = S2C
S2C
+ θ2C
(2)
The threshold values θC may vary in the range [θmin, θmax]. When the population
of chromosomes is evaluated, the threshold θC is updated as follows:
θC = θC − δ, δ 0, (3)
if the classification accuracy of the new chromosome C is lower than in the
previous step, and:
θC = θC + δ, δ 0, (4)
if the classification accuracy of the new chromosome C is greater than in the
previous step.
6. 366 D. Simian, F. Stoica, and C. Simian
5.3 SVM Algorithm
The evaluation of the chromosome is made using the SVM algorithm for a par-ticular
set of data. To do this we divide the data into two subsets: the training
subset, used for problem modeling and test subset used for evaluation. The
training subset is also randomly divided into a subset for learning and a subset
for validation. The SVM algorithm uses the data from the learning subset for
training and the subset from the validation set for computing the classification
accuracy which is used as fitness function for the genetic algorithm. For the
implementation/testing/validation of our method was used the “leukemia” data
set from the page LIBSVM data sets page [2].
In order to replace the default polynomial kernel from libsvm, we extend the
svm parameter class with the following attributes:
//our kernel is “hybrid”
public static final int HYBRID = 5;
// parameters for multiple polynomial kernels
public long op[];
public long type[];
public long d[];
public long r[];
public double g[];
The class svm predict was extended with the method predict(long op[], long
type[], long d[], long r[], double g[]) . The Kernel class was modified to accom-plish
the kernel substitution. In the k function method we have the computation
of our simple kernels. Then, the kernels are combined using operation given in
array param.op[]. In the genetic algorithm, the operations, the type of the sim-ple
kernels, and all other parameters are obtained from a chromosome, which
is then evaluated using the result of the predict method presented above. After
the end of the genetic algorithm, the best chromosome gives the multiple kernel
which can be evaluated on the test subset of data. The way of construction this
multiple kernel assures that it is a veritable kernel, that is, it satisfies Mercer’s
conditions.
5.4 Experimental Results
Using the standard libsvm package, for the “leukemia” data set is obtained the
following classification accuracy:
java -classpath libsvm.jar svm predict leu.t leu.model leu.output
Accuracy = 67.64705882352942%(23/34) (classification)
Multiple kernels obtained using genetic approach are improving the classification
accuracy up to 94.12%. In fig. 3 results from three runs of our genetic algorithm
based on a modified LR − Mijn operator are presented. For each execution,
dimension of population was 35 and the number of generations was 30.
7. Optimization of Complex SVM Kernels Using a Hybrid Algorithm 367
Fig. 3. Classification accuracy using multiple kernels
Fig. 4. Optimal kernels
One “optimal” multiple kernel obtained is depicted in fig. 4 (left), where
γ = 1.97, d1 = 3, r1 = 609, d2 = 2, r2 = 3970, d3 = 1, r3 = 3615.
Another “optimal” multiple kernel obtained is depicted in fig. 4 (right), where
γ = 0.5, d1 = 3, r1 = 633, d2 = 2, r2 = 3970, d3 = 1, r3 = 4095.
6 Conclusions and Further Directions of Study
In this paper we presented a hybrid approach for optimization the SVM mul-tiple
kernels. The idea of using hybrid techniques for optimization the multiple
kernels is not new, but is very recent and the way in which we designed the first
level of the method is original. Modifying the co-mutation operator LR −Mijn
using a wasp behaviour model improves the probability to choose new opera-tions
in the structure of multiple kernels. We observed that the parameter r,
from the simple polynomial kernels which appears in the optimal multiple ker-nel
is large. After many executions we also observe that a better accuracy and
convergence is obtained if we impose a superior limit for the parameter γ of the
sigmoidal/RBF simple kernels. Experimental results prove that the utilization
of genetic algorithm based on modified LR −Mijn co-mutation operator, has a
better convergence and improves the accuracy toward the classical genetic algo-rithm
used in [2]. Further numerical experiments are required in order to asses
the power of our evolved kernels.
References
1. Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality,
and the SMO algorithm Machine Learning. In: Proceedings of ICML 2004. ACM
International Conference Proceeding Series 69. ACM, New York (2004)
8. 368 D. Simian, F. Stoica, and C. Simian
2. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001),
http://www.csie.ntu.edu.tw/~cjlin/libsvm
3. Diosan, L., Oltean, M., Rogozan, A., Pecuchet, J.P.: Improving SVM Perfor-mance
Using a Linear Combination of Kernels. In: Beliczynski, B., Dzielinski, A.,
Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4432, pp. 218–227.
Springer, Heidelberg (2007)
4. Diosan, L., Rogozan, A., Pecuchet, J.P.: Une approche evolutive pour generer des
noyaux multiples (An evolutionary approach for generating multiple kernels). Por-tal
VODEL (2008), http://vodel.insarouen.fr/publications/rfia
5. Diosan, L., Oltean, M., Rogozan, A., Pecuchet, J.P.: Genetically Designed Multiple-
Kernels for Improving the SVM Performance. Portal VODEL (2008),
http://vodel.insa-rouen.fr/publications/rfia
6. Nguyen, H.N., Ohn, S.Y., Choi, W.J.: Combined kernel function for support vec-tor
machine and learning method based on evolutionary algorithm. In: Pal, N.R.,
Kasabov, N.,Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316,
pp. 1273–1278. Springer, Heidelberg (2004)
7. Sonnenburg, S., Ratsch, G., Schafer, C., Scholkopf, B.: Large scale multiple kernel
learning. Journal of Machine Learning Research 7, 1531–1565 (2006)
8. Stahlbock, R., Lessmann, S., Crone, S.: Genetically constructed kernels for support
vector machines. In: Proc. of German Operations Research, pp. 257–262. Springer,
Heidelberg (2005)
9. Simian, D.: A Model For a Complex Polynomial SVM Kernel. In: Proceedings of
the 8-th WSEAS Int. Conf. on Simulation, Modelling and Optimization, Santander
Spain. Within Mathematics and Computers in Science and Engineering, pp. 164–
170 (2008)
10. Simian, D., Stoica, F.: An evolutionary method for constructing complex SVM
kernels. In: Proceedings of the 10th International Conference on Mathematics and
Computers in Biology and Chemistry. Recent Advances in Mathematics and Com-puters
in Biology and Chemistry, MCBC 2009, Prague, Chech Republic, pp. 172–
178. WSEAS Press (2009)
11. Stoica, F., Simian, D., Simian, C.: A new co-mutation genetic operator, Advanced
topics on evolutionary computing. In: Proceeding of the 9th Conference on Evolu-tionary
Computing, Sofia, pp. 76–82 (2008)
12. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995),
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets