The network anomaly detection technology based
on support vector machine (SVM) can efficiently detect unknown
attacks or variants of known attacks. However, it cannot be used
for detection of large-scale intrusion scenarios due to the demand
of computational time. The graphics processing unit (GPU) has
the characteristics of multi-threads and powerful parallel
processing capability. Hence Parallel computing framework is
used to accelerate the SVM-based classification.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Detection System
1. GPU Parallel Computing of Support
Vector Machines as applied to Intrusion Detection
System
Nabha Kshirsagar1
, N. Z. Tarapore2
1
Research Scholar, Vishwakarma Institute of Technology, Pune
2
Assistant Professor, Vishwakarma Institute of Technology, Pune
Abstract—The network anomaly detection technology based
on support vector machine (SVM) can efficiently detect unknown
attacks or variants of known attacks. However, it cannot be used
for detection of large-scale intrusion scenarios due to the demand
of computational time. The graphics processing unit (GPU) has
the characteristics of multi-threads and powerful parallel
processing capability. Hence Parallel computing framework is
used to accelerate the SVM-based classification.
Keywords— support vector machine, parallel computing, graphics
processing unit, intrusion detection system.
I. INTRODUCTION
An intrusion detection system (IDS) is a device or software
application that monitors a network or systems for malicious
activity or policy violations. Intrusion Detection Systems (IDS)
have become a standard component in network security
infrastructures. An IDS gathers information from various
areas within a computer or a network. A survey of the
literature available indicates that using supervised machine
learning approach this gathered information can be analysed
to identify possible security breaches, which include both
intrusions (attacks from outside the organization) and misuse
(attacks from within the organization). In machine learning,
IDS can be viewed as classic classification problem, where
given machine learning model predicts if given state is attack
or not.
Supervised learning uses the gathered information as
training data and produces a model which is a function that if
given input to then produces required output. A Classification
algorithm is a procedure for selecting a hypothesis from a set
of alternatives that best fits a set of observations, in this case
the alternatives will be attack or not.
A Support Vector Machine (SVM) is a discriminative
classifier formally defined by a separating hyper plane. In
other words, given labelled training data (supervised learning),
the algorithm outputs an optimal hyper plane which
categorizes new examples. SVM is very powerful classifier
for handling large datasets in high dimensional space with a
strong mathematical property that is quadratic optimization
problem. However, it has high computational cost. Thus, this
results into more training time for large datasets.
To reduce this training time one of the approaches can be
parallel computing with graphics processing unit (GPU).
Nowadays GPU are popular for machine learning. A new area
has emerged due to highly parallel nature of GPU, known as
general purpose computing with GPU (GPGPU). With GPU,
parallel computing can be achieved with low cost and low
power consumption.
SVM problem is known as quadratic optimization problem.
Sequential minimal optimization (SMO) is an iterative
algorithm for solving this optimization problem. This paper
describes a parallel sequential minimal optimization (SMO)
algorithm which solves the SVM problem using GPU for IDS.
II. RELATED WORK
The network anomaly detection technology based on SVM
can efficiently detect unknown attacks or variants of known
attacks, however, it cannot be used for detection of large-scale
intrusion scenarios due to the demand of computational time.
The graphics processing unit (GPU) has the characteristics of
multi-threads and powerful parallel processing capability.
Based on the system structure and parallel computation
framework of GPU, a parallel algorithm of SVM, named
GSVM, is proposed in this paper. Extensive experiments were
carried out on KDD99 and other large-scale datasets, the
results showed that GSVM significantly improves the
efficiency of intrusion detection, while retaining detection
performance [2].
SVM is considered as one of the most powerful classifiers
for hyperspectral remote sensing images. However, it has high
computational cost. In this paper, a novel two-level parallel
computing framework to accelerate the SVM-based
classification by utilizing CUDA and OpenMP, is proposed.
For a binary SVM classifier, the kernel function is optimized
on GPU, and then a second-order working set selection (WSS)
procedure is employed and optimized especially for GPU to
reduce the cost of communication between GPU and host. In
addition to the parallel binary SVM classifier on GPU as data
processing level parallelization, a multiclass SVM is
addressed by a “one-against-one” approach in OpenMP, and
several binary SVM classifiers are run simultaneously to
conduct task-level parallelization [1].
SMO is one popular algorithm for training SVM, but it still
requires a large amount of computation time for solving large
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 6, June 2018
63 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
2. size problems. This paper proposes one parallel
implementation of SMO for training SVM. The parallel SMO
is developed using message passing interface (MPI).
Specifically, the parallel SMO first partitions the entire
training data set into smaller subsets and then simultaneously
runs multiple CPU processors to deal with each of the
partitioned data sets. Experiments based on this approach
show that there is great speed up on different datasets when
many processors are used. [3]
Training a support vector machine requires the solution of a
very large quadratic programming (QP) optimization problem.
SMO breaks this large QP problem into a series of smallest
possible QP problems. These small QP problems are solved
analytically, which avoids using a time-consuming numerical
QP optimization as an inner loop. The amount of memory
required for SMO is linear in the training set size, which
allows SMO to handle very large training sets. Since matrix
computation is avoided, SMO scales somewhere between
linear and quadratic in the training set size for various test
problems, while the standard chunking SVM algorithm scales
somewhere between linear and cubic in the training set size.
SMO’s computation time is dominated by SVM evaluation,
hence SMO is fastest for linear SVMs and sparse data sets. On
real world sparse data sets, SMO can be more than 1000 times
faster than the chunking algorithm [8].
The scaling of serial algorithms cannot rely on the
improvement of CPUs anymore. The performance of classical
Support Vector Machine (SVM) implementation–ns has
reached its limit and the arrival of the multi core era requires
these algorithms to adapt to a new parallel scenario. GPUs
have arisen as high-performance platforms to implement data
parallel algorithms. In this paper, it is described how a native
implementation of a multiclass classifier based on SVMs can
map its inherent degrees of parallelism to the GPU
programming model and efficiently use its computational
throughput. Empirical results show that the training and
classification time of the algorithm can be reduced an order of
magnitude compared to a classical multiclass solver, LIBSVM,
while guaranteeing the same accuracy [5].
III. METHODOLOGY
The binary SVM training step is to solve a typical convex
optimization problem, and is usually addressed by the SMO
algorithm. The SMO algorithm is mainly composed of a WSS
procedure and an optimality function updating procedure. [1]
In this procedure [1], the second-order WSS procedure is
used to obtain a two-element working set B = {ilow, ihigh}. B is
selected from the following subset of training points:
I (α) = { i|αi < C, yi = 1 or αi > 0, yi = −1 }
I = { i|αi < C, yi = −1 or αi > 0, yi = 1 }
where C is a constant in the process. To minimize the
optimality function f, the working set B is selected to satisfy
the following conditions:
= arg { -yifi |i I (α) }
= arg
,
| I ,
where and , are defined as
0
, , , 2 ,
After the working set is selected, the optimality function f
is updated. This procedure iterates until the algorithm
converges, and the terminating condition is defined as:
Where is constant, and are defined as:
max , ∈
min , ∈
Finding the minimum and maximum for the convergence
condition can be done using parallel reduction shown in figure
Fig 1
Fig 1 Finding maximum using parallel reduction
As parallel reduction works best for finding the value and not
for finding the index for the value. Hence and are
calculated in CPU itself. But some kernel computations
required for can be parallelized. The overall flow for the
algorithm is shown in Fig 2.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 6, June 2018
64 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
3. Fig 2. Flow of Algorithm
Parallel SMO Algorithm is as follows
1) αi ≤ 0, fi ← −yi, ∀i ∈ {1, 2, . . . , n}, and
m(α) ← 0,M(α) ← 0
2) copy α and f to device memory
3) copy x to device memory, and transpose this matrix
4) While m(α) −M(α) > ε
do
Working set selection:
In host calculate ilow, ihigh
obtain m(α) by reduction in the device
calculate (xilow, xi) for each sample in the device
obtain M(α) by reduction in the device
copy m(α) and M(α) to host
Update: check if m(α) −M(α) > ε in host:
execute: Updating old values of α and f
End while
IV.RESULTS AND ANALYSIS
A. Dataset and Platform
Dataset used is KDD Cup 1999 Data 10% subset. This
dataset is collected over a period of nine weeks for a LAN
simulating a typical U.S. Air Force LAN. It contains 24 types
of attacks and 41 features. As this is binary classification, data
is pre-processed and converted into numeric and normalized
form with all attacks labelled as -1 and normal records as +1.
The experiment was carried out on workstation with Intel
Xeon E5 v2 CPU at 2.5GHz and 8GB of RAM, running
Ubuntu and the GPU used is NVIDIA® Tesla® K80
B. Experimental Results
Sequential SVM is implemented in Python whereas parallel
SVM is implemented in PyCUDA. In parallel SVM block size
in CUDA is an important factor related to performance of the
algorithm. TABLE I describes training time by parallel SVM
for given number of samples for block size values 128, 256,
512, 1024. For this experiment the standard parameter for
SVM classifier max_iter is 50. Fig 3 represents graph for
TABLE I. TABLE II and TABLE III represents training time
required for both sequential and parallel algorithm. For
observations in TABLE II the standard parameters for SVM
classifier are default values that is C=1.0, epsilon=0.001,
max_iter=1000 with linear kernel. This particular
configuration is called as SET I. For observations in TABLE
III , all the parameters are same as TABLE II except epsilon
is kept as 0.01. This particular configuration is called as SET
II. For both the SET I and SET II the training datasets used are
same just the parameters are different as mentioned above.
The graph is plotted for SET I that is for all observations in
TABLE II. Fig 4 represents this graph for the observations in
TABLE II. It shows the particular behaviour for sequential
and parallel algorithm.
TABLE I
TRAINING TIME FOR DIFFERENT BLOCK SIZE
Number
of
samples
Training Time (s)
blocksize=128
blocksize=
256
blocksize
=512
blocksize=
1024
7000 8.32 8.24 8.11 8.67
10000 10.18 10.20 9.88 10.28
20000 21.14 21.33 20.14 22.67
40000 39.89 39.48 38.67 40.92
60000 57.23 57.78 55.21 58.49
80000 80.19 81.85 77.68 85.90
100000 95.72 96.93 93.79 97.38
200000 192.90 207.90 186.85 193.60
Fig 3. Training Time for different block size
TABLE III
TRAINING TIME FOR ALGORITHMS SET I
Number of
samples
Training Time (s) Speed up
Sequential
SVM
Parallel
SVM
10000 77.28 72.04 1.07
20000 101.32 80.37 1.26
40000 220.23 156.14 1.41
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 6, June 2018
65 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
4. 60000 478.75 304.68 1.57
80000 629.88 390.15 1.61
100000 998.71 550.66 1.81
200000 1490.47 730.04 2.04
Fig 4. Training Time for algorithms SET I
TABLE IIIII
TRAINING TIME FOR ALGORITHMS SET II
Number of
samples
Training Time (s) Speed up
Sequential
SVM
Parallel
SVM
10000 73.48 69.2 1.06
20000 98.22 77.29 1.27
40000 202.33 140.32 1.44
60000 430.12 280.97 1.53
80000 610.35 374.33 1.63
100000 980.18 530.74 1.84
200000 1444.58 690.91 2.09
C. Analysis of Results
With TABLE I and Fig 3, we can see that training time for
block size 512 is minimum for all training sample size. It is
concluded that for the given configuration of system 512
block size is more suitable. Even with block size 1024 training
time is more than other block size values for parallel
algorithm. For CUDA the maximum block size used is 1024
that is why at the boundary value system does not show its
best performance, as too much of resources are used. With
less values of block size the, resources are underused. For the
choice of block size GPU family that is also the architecture
plays an important role.
It can be seen with graph in Fig 4 that training time for
sequential SVM increases rapidly with increase in number of
samples. Whereas for parallel algorithm the training time
increases gradually with increase in number of samples. We
can see that up to training sample size 10,000 the time
required by both the algorithm is somewhat similar with
sequential algorithm taking less time than parallel algorithm.
After size 10,000 is crossed speedup increases rapidly as the
number of training samples increases with parallel algorithm
taking less time than sequential algorithm. We can see low
speed up for small values of number of training samples. This
is a typical behaviour for parallel algorithm. For less number
of records the cost of parallelism that is transferring data from
CPU to GPU and then again back to CPU is more than the
benefit due to parallel processing.
By analysing observations in tables TABLE I and TABLE
II, we can see that when the threshold value, known as epsilon,
is increased then both the sequential and parallel algorithm
converges sooner. In this case also after size 10,000 is crossed
speedup increases rapidly as the number of training samples
increases with parallel algorithm taking less time than
sequential algorithm.
With dataset divided into 70% training and 30% testing
datasets, both sequential SVM and parallel SVM classifier,
show 98.99% accuracy. The maximum training size used for
the given experiment is 2 lakhs. Only 10 precent of KDD CUP
dataset contains more than 4 lakh records. As we can see more
benefit in terms of speedup with increase in number of
samples , this approach is more beneficial for whole dataset as
training dataset.
V. CONCLUSION
SVM is a powerful classifier with high computational cost.
The computational cost increases with increase in sample size.
The emergence of GPUs as massively parallel processors has
opened a wide range of opportunities for acceleration of large-
scale classification problems. The data-parallel characteristic
of many learning algorithms like SVM conveniently fits the
set of problems that modern GPUs are meant to solve.
In this a paper parallel SVM using GPU for intrusion
detection is proposed. Using this approach, we can conclude
that with given GPU having compute capability of 2.91
teraflops, achieved maximum speedup is 2.09 with accuracy
98.99%. This speedup can be further increased for given
number of training samples by using GPUs having more
compute capability.
This approach can be further extended for multi
classification by exploiting parallelism related to both CPUs
and GPUs. Also, this approach can be more beneficial with
more complex datasets.
REFERENCES
[1] K. Tan, J. Zhang, Q. Du, X. Wang , “GPU Parallel Implementation of
Support Vector Machines for Hyperspectral Image Classification”,
IEEE, 2015.
[2] X. Zhang, Y. Zhang, C. Gu , “GPU Implementation of Parallel
Support Vector Machine Algorithm with Applications to Intruder
Detection”, Journal Of Computers, vol. 9, no. 5, May 2014.
[3] L. J. Cao, S. S. Keerthi, C. Ong, J. Q. Zhang, U. Periyathamby, Xiu
Ju Fu, and H. P. Lee, “Parallel Sequential Minimal Optimization for
the Training of Support Vector Machines”, IEEE Transactions On
Neural Networks, vol. 17, no. 4, July 2006.
[4] J. Silva, A. Aguiar, F. Silva, “A Parallel Computing Hybrid
Approach for Feature Selection”, IEEE 2015.
[5] S. H. Lopez, J. R. Williams, Abel Sanchez, “Parallel Multiclass
Classification using SVMs on GPUs”, ACM 2010.
[6] R.E. Fan, P.H. Chen, C.-J. Lin, “Working set selection using second
order information for training support vector machines”, J. Mach.
Learning Res., vol. 6, 2005, pp. 1889–1918.
[7] J. C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for
Training Support Vector Machines”, Microsoft Research, Technical
Report, MSR-TR-98-14, April 21, 1998.
[8] B. C. Catanzaro, N. Sundaram, K. Keutzer, “Fast Support Vector
Machine Training and Classification on Graphics Processors”.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 6, June 2018
66 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
5. [9] K. Zhu, H. Wang, H. Bai, J. Li, Z. Qiu ,H. Cui, E. Y. Chang,
“Parallelizing Support Vector Machines on Distributed Computers”,
Google research,China.
[10] H. P. Graf, E. Cosatto, L. Bottou, I. Durdanovic, V. Vapnik,
“Parallel Support Vector Machines: The Cascade SVM”, Advances in
Neural Information processing, vol 17, MIT Press,2005.
[11] M. S. Abadeh, J. Habibi, Z. Barzegar, M. Sergi, “A parallel genetic
local search algorithm for intrusion detection in computer networks”,
Engineering Applications of Artificial Intelligence, 2007.
[12] C. Kyrkou, T. Theocharides, “A Parallel Hardware Architecture for
Real-Time Object Detection with Support Vector Machines”, IEEE
Transactions On Computers, vol. 61, No. 6, June 2012.
[13] M. Kruczkowski, E. Niewiadomska-Szynkiewicz, “Support Vector
Machine for malware analysis and classification”, IEEE/WIC/ACM
International Joint Conferences on Web Intelligence (WI) and
Intelligent Agent Technologies (IAT), 2014.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 6, June 2018
67 https://sites.google.com/site/ijcsis/
ISSN 1947-5500