Multihyperkernel Customization and Analysis on OVA SVMs

Ethan Bowen
Neural Networks
12/4/2011
Multihyperkernel Customization and Analysis on One Versus All Support Vector
Machines
Abstract
Kernels allow for mapping into a higher dimensional space in order to get non-
linearly separable data into a separable form. With the application of Multihyperkernels
it is possible to obtain an increase in correct classification for Support Vector Machines
while preserving their structure. This is done by the combination of kernels to form a
new kernel.
Introduction
Support Vector Machines perform binary classification by creating an N-
dimensional hyperplane that best separates the classes into two distinct categories. All
SVMs use a function called a kernel to do such mapping. All kernel functions are
written in the form K(x,w) = <ɸ(x), ɸ (w) > where ɸ is the features space and x and w are
from the input space. Two common kernels used are the polynomial and radial basis
kernel (RBF). I will be covering many more kernels but primarily these two types.
The dataset I used I received is a multiclass categorization problem from the UCI
Machine Learning Repository that uses 5 features to classify teachers based on their
performance. In the dataset, the class labels are low, medium, or high and the features
are English speaking, course instructor, course, summer or regular semester, and class
size.
Instead of using a multiclass SVM which essentially requires Y binary classifying
SVMs, where Y is the number of classes, I decided to do a One-Versus-All approach

Ethan Bowen
Neural Networks
12/4/2011
which classifies each class against the rest. This approach allowed me to easily show
how the type of kernel used can affect the percentage of correct label classification and
also showing how creating custom kernels can improve correct class classification. My
research and knowledge of the subject at hand was obtained through reading several
published papers (which are cited) and from the course of this Neural Networks class.
Methods
When considering kernel methods to use it is best to know the data that you are
working with in order to have an optimal kernel function. For instance, using a linear
kernel function would be more harmful than good if you knew that your data was non-
linearly separable. The method I approached when choosing my kernel for this dataset
was to just try to find the most optimal. I began to test the kernels mentioned earlier
against some new kernels I created. Since I was given no testing sample, my testing
sample is a subset of the training sample.
For testing the polynomial kernel I created two kernel functions. The first is
BowenPoly, which is a kernel function of the form k (xi,x) = k (xi,x)d
where d is the
number of features in the dataset. The second is BowenN1, which is a kernel function
of the form k (xi,x) = k (xi,x)d+1
where d is the number of features in the dataset.
For testing the RBF kernel I created BowenRBF which is a combination of two
RBF kernels. I denote r=||xi-x||2
and ɛ=1/2*σ where σ is sigma and α is an Nx1 matrix
containing weights and N is the number of kernels used in BowenPoly (for this case
N=2). It can be said that the summation from 1 to N on α=1. For my testing α1=0.5 and
α2=0.5 meaning each kernel used is weighed at 50% of its’ original value.

Ethan Bowen
Neural Networks
12/4/2011
Using the Laplacian, Exponential, Multiquadric, and Gaussian RBFs, I plotted
sigma from 1 to 10 in 0.01 increments to see how kernels classified. I choose to use
the Laplacian and Exponential kernels since they gave the best results compared to the
other kernels (including Gaussian) (4). So with the Laplacian kernel (LAP) in the form
k(xi,x)=e-r/σ
and the Exponential kernel (EXP) in the form k(xi,x)=e-r/2(σ^2)
I created
BowenRBF in the form k(xi,x)=α1LAP + α2EXP. I obtained this process from the use of
a multihyperkernel which is multiple “kernel on kernel” notions that implicitly do kernel
optimization inside a set family of kernels (such as Gaussian kernels with different
sigmas) in the form of k(xi,x) = α𝑁
1 iKi(xi,x). For my case sigma is the same for both
kernels in BowenRBF.
Once I found the classification of each previous kernel and for new kernels for
each class (low, medium, high), I observed my results to see first which kernel gave
better average correctness for classification and second to determine the usefulness of
the new kernels compared to the original kernels.
Results
For the linear I found that the most optimal kernel is k (xi,x) = xi
T
x was only about
70% correctly classified so I quickly switched to non-linear kernels for testing. Based on
(1) it shows that for this dataset BowenPoly and BowenPoly1 did not consistently map
better than the polynomial kernel so I could not accurately state that my kernels would
map better for different testing sets. For the non-linear data I tested against the most
commonly used RBF called Gaussian RBF in the form k(xi,x)= e-(ɛr)^2
and the
classification was 80% correct. (2) shows that for each OVA SVM, BowenRBF shows

Ethan Bowen
Neural Networks
12/4/2011
improvement compared to the Gaussian RBF for sigmas from 0.7 to 10 and based on
(3) you can see that the averages of the OVA SVMs for the class labels (low, medium,
and high) has BowenRBF at a much higher percent correct classification than the
Gaussian RBF for sigmas 0.7 to 10. Therefore, I can’t say for this dataset that there is
evidence to prove that BowenPoly and BowenN1 will regularly correctly classify at a
higher percentage at the homogeneous polynomial kernel, but I can say that for this
dataset that there is evidence (3) that BowenRBF will regularly correctly classify at a
higher percentage over the Gaussian RBF kernel and this justifies that I could use
BowenRBF to obtain a high percent correct classification for further testing samples.
Discussion
It can be said that doing just a few classifications tests on a dataset does not
justify BowenRBF to being a better kernel over Gaussian and much more research into
multihyperkernels is needed before a concrete justification can be given. I find this
research very interesting and found often I would be learning many new processes of
using kernels and applying them to specific applications. In this process α was simply
choosen as 0.5 for each element but I learned that there are algorithms for learning
these weights as well. Overall this was a very fun project and I enjoyed the process of
discovering a custom kernel that worked better than other known kernels.

Ethan Bowen
Neural Networks
12/4/2011
References
[1]Andrew Oliver Hatch. Kernel Optimization for Support Vector Machines : Application
to Speaker Verification.
PhD thesis, EECS Department, University of California, Berkeley, Dec 2006.
[2] C. S. Ong and A. J. Smola. Machine learning using hyperkernels. In Proceedings of
the
International Conference on Machine Learning, pages 568–575, 2003.
[3] Souza, César R. "Kernel Functions for Machine Learning Applications." 17 Mar.
2010. Web. <http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-
learning.html>.

Ethan Bowen
Neural Networks
12/4/2011
(1)

Ethan Bowen
Neural Networks
12/4/2011
(2)

Ethan Bowen
Neural Networks
12/4/2011
(3)
(4) Sigma from 0-10. Graph is scaled.

Multihyperkernel Customization and Analysis on OVA SVMs

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Multihyperkernel Customization and Analysis on OVA SVMs

Similar to Multihyperkernel Customization and Analysis on OVA SVMs (20)

Multihyperkernel Customization and Analysis on OVA SVMs