2. WHAT IS SVM?
SVM stands for support Vector machines.
It is basically used in binary classification
problems.
The SVM approach helps in ascertaining
the decision boundary for a given data
set. This decision boundary is then used to
classify the untagged data.
3. How is a decision boundary
constructed?
In order to construct decision boundary
we make use of some of the vectors from
the given set. These vectors are known as
Support vectors. Then a boundary is
chosen in such a way that the margin
between the boundary and the given
support vectors is the maximum.
4. A simple example
H1 does not
separate the two
classes
H2 separates but
with a small margin
H3 separates with
an ideal margin
5. Concept of Kernel and non-
linearly separable classes
In the prior example we had two classes
and they were linearly separable. But
sometimes the task at hand is not that
simple and they cannot be classified by
using a simple linear decision boundary. In
such complex cases we use the concept
of kernel functions. These functions map
the given data into some higher space so
that they become linearly separable.
6. Example
Here Φ represents
the kernel function
used.
It can be seen that
the function Φ is
used to map the
data such that it
becomes linearly
separable.
7. Binarization of multiclass
problems
Since SVM is used for binary classification
problems it becomes a bit tedious to use
for multi-class problems. But still it is used
since it gives better accuracy.
For multi-class problems the library that we
use already comes equipped with
suitable approaches. One such approach
is one against all.
8. Application to POS Tagging
POS Tagging is done via the SVM
approach. Here we used the java SVM
library called LibSVM.
We have 24 classes i.e. Part of speech
tags.
First step in the whole tagging process is to
convert the given corpus from the
SSF(Shakti Standard Format) to the SVM
format.
9. Format Converters.
Sanchay is equipped with the SSF2SVM
format converter which takes the SSF
corpus and extracts the necessary
features to convert it into the desired SVM
format, which comprises of the feature
vectors.
Each POS tag is assigned a specific
integer value and the unknown words are
assigned ‘0’.
10. SVM annotation main
Uses the LibSVM java libraries.
Uses SVM-Train function to train and generate
a model file.
The parameters used for SVM-Train are as
follows :
SVM Type is multi-class classification (C-SVC).
Kernel Type is Linear. Here linear gives better
results in comparison to other kernel functions.
11. Final Result
The accuracy with which Magahi is
tagged using the SVM approach is
85.45%.
This accuracy is obtained by using the
same linear kernel and c-SVC type multi-
classification SVM.