A Stacked Generalization Ensemble Approach for Improved Intrusion Detection

A Stacked Generalization Ensemble Approach for
Improved Intrusion Detection
Oluwafemi Oriola
Department of Computer Science
Adekunle Ajasin University
Akungba Akoko, Nigeria
oluwafemi.oriola@aaua.edu.ng
Abstract—Classical machine learning techniques have been
employed severally in intrusion detection. But due to the rising
cases and sophistication of attacks, more advanced machine
learning techniques including ensemble-based methods, neural
networks and deep learning techniques have been applied.
However, there is still need for improved machine learning
approach to detect attacks more effectively and efficiently.
Stacked generalization approach has been shown to be capable of
learning from features and meta-features but has been limited by
the deficiencies of base classifiers and lack of optimization in the
choice of meta-feature combination. This paper therefore
proposes a stacked generalization ensemble approach based on
two-tier meta-learner, in which the outputs of classical stacked
ensemble are passed to multi-feature-based stacked ensemble,
which is optimized. A Grid-search approach is used for the
optimization. Nine data features and four meta-features derived
from Logistic Regression, Support Vector Machine, Naïve Bayes,
and Multilayer Perceptron neural network are used for the
machine learning classification task. By applying neural
networks as the meta-learner for the classification of NSL-KDD
data, improved performances in terms of accuracy, precision,
recall and F-measure of 0.97, 0.98, 0.98 and 0.98, respectively are
achieved.
Keywords-Intrusion detection system; machine learning;
ensemble method; stacked generalization; two-tier meta-learner
I. INTRODUCTION
The field of Artificial Intelligence, most especially machine
learning has been very beneficial to numerous sectors such as
health care, education, transport and logistics,
pharmaceuticals, finance, energy, manufacturing and public
service[1]. Machine learning refers to an artificial intelligence
technology that allows systems to learn directly from
examples, data, and experience without having to be explicitly
programmed. The machine learning techniques include
supervised, unsupervised or reinforcement learning
techniques[2]. Supervised learning technique is the
commonest technique, which learns from a set of labelled data
and predict the classes of unlabelled data. In manufacturing
sector, supervised learning techniques have been applied to
improve the effectiveness of Intrusion Detection Systems[3].
Intrusion Detection Systems (IDS) are used to monitor
network activities and detect incidents of attacks[3]. Various
types of IDS exist such as signature-based IDS and anomaly-
based IDS. The signature-based IDS rely on repository of
previous attacks to detect new attacks, while anomaly-based
IDS rely on normal behaviour of systems and networks to
detect incidents of attacks. There are also Hybrid IDS, which
combine the characteristics of signature and anomaly-based
IDS. These different types of IDS can operate either at the host
or the network level in a network. The commercial and open
source IDS such as SNORT, Bro, Prelude, Ethereal and
OSSEC are still largely signature-based IDS. In the research
community however, anomaly and hybrid have been designed
using machine learning.
Presently, the capabilities of the traditional IDS could not
match the capacity of network attacks because of the
sophistication of network threats and availability of high-end
computers and network systems. Thus, several novelties have
been devised among which is ensemble machine learning to
improve the capacity of IDS. Ensemble method is a way of
combining same or different approaches to solve a particular
problem; weaker learners are combined to form stronger
learner[4][5]. It usually involves combination of the outputs of
classifiers. The major advantage of ensemble algorithms over
hybrid is its modularity, which allows lesser performing
algorithm to be replaced with better ones. The bagging,
boosting and stacking methods[6][7] have been popularly used
for the mix-of-experts functions. Also, voting selection
methods including majority voting, weighted voting, rule-based
voting, probability voting, and average voting methods have
also been used. Except for stacking methods, the ensemble
algorithms are focused on homogeneous sets of features, which
might not be effective for class imbalance contexts such as
intrusion detection.
However, the existing performances of deep learning
algorithms, with high running costs, have far surpassed the
performances of classical ensemble approaches including
stacking. Therefore, this paper focuses on intrusion detection
using improved stacking ensemble machine learning approach,
which results are better than both existing ensemble and
artificial neural network algorithms.
II. ENSEMBLE AND DEEP NEURAL NETWORKS-BASED
INTRUSION DETECTION
Several works have been carried out on intrusion detection
system. The subsections below present the ensemble and deep
neural network-based intrusion detection, respectively.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 18, No. 5, May 2020
62 https://sites.google.com/site/ijcsis/
ISSN 1947-5500

A. Ensemble-based Intrusion Detection
The most related work, which applied meta-classification
enabled by stacked generalization was proposed by [8]. The
authors combined Random Forest, K-Nearest Neighbour and
Logistic Regression to predict multiclass intrusion classes in
UNSW-NB15 dataset. Multi-feature-based stacking, with
single-tier meta-learner was applied. Support Vector Machine
was used as the meta-learner. It achieved a cross-validated
accuracy of 0.94 and F-Measure of 0.95. The work however
was limited by the deficiency of the base classifier and lack of
optimization in the choice of meta-feature combination.
Gaikwad and Thool [9] improved on the Accuracy of IDS by
using Genetic Algorithm for selecting fifteen most relevant
features from NSL-KDD. They applied bagging ensemble
approach on each of Naive Bayes, PART and C4.5 algorithm.
The result showed that the bagging led to increase in the
accuracy. A hybrid intelligent intrusion detection system
composed of pre-processing phase, feature reduction phase,
classification phase and combining phase was proposed in
[10]. The work investigated the performance of hybrid Radial
Basis Function (RBF) and Support Vector Machine (SVM) for
IDS. The 41 attributes of NSL-KDD dataset were used, while
approximately 20% of the original dataset were used for
classification. The best training sets were selected based on
Adaptive Resampling and Combining (ARCING). The results
showed a classification Accuracy of 0.8519 which was higher
than the Accuracy of RBF and SVM base classifiers. Gao et
al. [11] designed an ensemble adaptive voting algorithm for
classification of NSL-KDD intrusion dataset. The adaptive
voting algorithm recorded the highest accuracy of 0.852.
Multi-Tree algorithm outperformed decision tree, random
forest, k-Nearest Neighbour and Deep Neural Networks.
B. Neural Networks-based Intrusion Detection
The authors in [12] proposed artificial neural network
architecture based on Multilayer Perceptron (MLP) for
detection of intrusion in a typical benign network traffic data.
They obtained an accuracy of 0.98, relative operating
characteristics of 0.98, and false positive rate of less than 2%
using 10-fold cross-validation. The neural network ensemble
method comprised of autoencoder, deep belief neural network,
deep neural network, and an extreme learning machine was
proposed in [13], which are computationally expensive. Using
NSL-KDD dataset, the testing accuracy was 0.92, while the F-
score was 0.93. Yang et al. [14]worked on improvement of the
performance of IDS using NSL-KDD and UNSW-NB15
datasets; they used modified density peak clustering algorithm
(MDPCA) to solve class imbalance problems by dividing the
training set into several subsets with similar set of attributes.
Deep belief networks (DBNs) was used to reduce the high
dimensions and perform classification. The outputs of the
DBNs were aggregated using Fuzzy Membership Weights.
The results showed that the accuracy and F-score were 0.82
and 0.81, respectively for NSL-KDD, while they were 0.90
and 0.91 for UNSW-NB15.
This paper focuses on development of ensemble Machine
Learning approach based on stack generalization, with feature
selection, optimal meta-feature combination and artificial
neural networks for the purpose of efficient and effective
intrusion detection.
III. STACKING
Stacking (or stacked generalization), is an ensemble technique
of combining multiple classifiers [8]. Unlike bagging and
boosting, stacking is usually used to combine different
classifiers. Stacking consists of two levels which are base
learner and stacking model learner. Base learner uses many
different models to learn from a dataset. The outputs of each
of the models are collected to create a new dataset. In the new
dataset, each instance is related to the real value that is
supposed to be predicted. Then the dataset is used by stacking
model learner to provide the final output. For example, the
predicted classifications from the base classifiers such as naïve
bayes, decision tree and support vector machine can be used as
input variables into a k-nearest neighbour classifier as a
stacking model learner, which will attempt to learn from the
data how to combine the predictions from the different models
to achieve the best classification accuracy.
The popular stacked generalization techniques include the
classical stacked method[15] and multi-feature-based stacked
generalization method[16], which involve single meta-learner.
The classical stacking involves a single set of features and a
meta-learner, while a multi-feature-based stacking involves
multiple set of features and a meta-learner. The general
algorithm for the meta-classification is presented as follows:
Meta-learner ( )
Input: Labels L predicted by Base Classifiers, S
Output: Labels Lp predicted by E
//Initialize predictions for each feature in horizontal axis
Do for D= 1 to d
//initialize predictions for each feature in vertical axis
Do for M = 1 to m
If D =! M Then
Construct a new dataset T(d,m)
LT ←Train(T(d,m))
LP ←Test(LT, Ltest)
Endif
End
IV. PROPOSED APPROACH
The objective of the proposed approach is to obtain
improved predictions by using an ensemble technique called
stacking. Therefore, a two-tier stacking approach was
developed.
The algorithm and other existing algorithms were applied
to analyze a popular and reliable intrusion detection evaluation
data known as NSL-KDD [17], which has been used in
previous works. NSL-KDD is an upgraded version of KDD’99
developed by Canadian Institute of Cybersecurity, University
of New Brunswick. It was designed as a solution to the
There is no sponsor for the work
Vol. 18, No. 5, May 2020
ISSN 1947-5500

limitations of KDD’99 data. The dataset consists of 41
network features presented in Table I.
TABLE I: NSL_KDD NETWORK FEATURES WITH TYPES
S/N Feature Feature type
1 duration continuous
2 protocol_type symbolic
3 service symbolic
4 flag symbolic
5 src_bytes continuous
6 dst_bytes continuous
7 land symbolic
8 wrong_fragment continuous
9 urgent continuous
10 hot continuous
11 num_failed_logins continuous
12 logged_in symbolic
13 num_compromised continuous
14 root_shell continuous
15 su_attempted continuous
16 num_root continuous
17 num_file_creations continuous
18 num_shells continuous
19 num_access_files continuous
20 num_outbound_cmds continuous
21 is_host_login symbolic
22 is_guest_login symbolic
23 Count continuous
24 srv_count continuous
25 serror_rate continuous
26 srv_serror_rate continuous
27 rerror_rate continuous
28 srv_rerror_rate continuous
29 same_srv_rate continuous
30 diff_srv_rate continuous
31 srv_diff_host_rate continuous
32 dst_host_count continuous
33 dst_host_srv_count continuous
34 dst_host_same_srv_rate continuous
35 dst_host_diff_srv_rate continuous
36 dst_host_same_src_port_rate continuous
37 dst_host_srv_diff_host_rate continuous
38 dst_host_serror_rate continuous
39 dst_host_srv_serror_rate continuous
40 dst_host_rerror_rate continuous
41 dst_host_srv_rerror_rate continuous
The attack classes with the corresponding attack types,
which have been categorized as probe, user-to-root (U2R),
remote-to-local (R2L) or denial of service (DOS) is presented
in Table II.
Table II. ATTACK CLASSES AND TYPES
S/N Attack
Class
Attack types
1 DoS Back, land, neptune, pod, smurf,
teardrop, mailbomb, apache2, processtable,
udpstorm
2 Probe Ipsweep, nmap, portsweep, satan,
mscan, saint
3 R2L Ftp write, guess passwd, imap, multihop,
phf, spy, warezclient, warezmaster,
sendmail, named, snmpgetattack,
snmpguess, xlock, xsnoop, worm
4 U2R Buffer overflow, loadmodule, perl,
rootkit, httptunnel, ps, sqlattack, xterm
The data is composed of two datasets, which include
training and testing datasets. The training dataset contains
125,973 instances, while the testing dataset consists 22,453
instances. The distribution of the classes in both datasets is
imbalance. Table III presents the distribution of the NSL-
KDD datasets.
TABLE III. THE DISTRIBUTION OF NSL-KDD ATTACKS
Category Normal DoS Probe U2R R2L Total
Training 67,343 45,927 11,656 52 995 125,973
Testing 9,711 7,458 2,421 200 2,754 22,453
Three preprocessing processes were carried out on the
training and testing datasets. The steps include:
• Removal of redundant features
• Transformation of categorical features to numerical
features
• Normalization of the features using min-max
normalization presented in (5).
(5)
In Figure 1, the framework for the proposed stacking
approach is presented. Based on the objective of detecting
intrusion in efficient manner, three sets of best features
including 21, 11 and 9 were extracted using Sklearn Chi-
Square [18] from which 9 features consistently showed better
performance in terms of efficiency and effectiveness during
training. Therefore, nine features were used.
Thereafter, four base classifiers including Support Vector
Machine (SVM), Logistic Regression (LogReg), Naïve Bayes
(NB) and Multilayer Perceptron Artificial Neural Networks
(MLP-ANN) were trained. The prediction outputs of each
classifier form the meta-features, which were combined and
classified by the classical meta-leaner (stage 1). Furthermore,
multifeatured-based stacked ensemble was applied on the
outputs of the classical meta-learner as a second-tier meta-
learner (stage 2). Artificial neural network[12] was employed
as meta-learner in both cases.
The multi-feature-based stacking involved multilevel
combination of features, with levels ranging from 2 to 4 and
and meta-features ranging from 2 to 4 in each of the level. The
best combination is selected using Pipeline Grid-search
optimization based on parameter settings and was
implemented with Python Scikit-Learn [18].
The performance was compared with the existing stacking
ensembles as well as state-of-the-arts based on accuracy,
precision, recall and F-score. The performance metrics are
estimated as presented in equations (1) to (4). The equations
rely on the true positive (TP), which is the number of instances
of a class that are correctly predicted; true negative (TN),
which is the number of instances of other classes that are
Vol. 18, No. 5, May 2020
ISSN 1947-5500

incorrectly predicted; false positive (FP), which is the number
of instances of other classes that have been incorrectly
predicted as belonging to a class; false negative (FN), which is
the number of instances of a class that have been incorrectly
predicted as belonging to another class. The equations are
presented in (1) and (2).
Accuracy (Acc) = (1)
Precision (P) = (2)
Recall (R) = (3)
F-Score (F1) = (4)
Figure 1. Framework for the proposed Stacking Approach
Given data instances, I = (1, . . ., h), classifiers, S= (1, ... , m)
with parameters, R = (1, .. ., r) and values, V = (v11, ... , vmp), the
prediction scores (Lp) and final predictions (F) were evaluated
using the algorithmic methods presented below:
Base Classifier ( )
Input: Values V = (v11, . . ., vmp)
Output: Score P for each instance I
Do for S=1 to m
Do for R=1 to r
Do for I = 1 to h
Model ←S (train, R, V)//for train and test set
Score ←S (test, Model)
End
Meta-Learner 1 ( )
Input: Labels Lp predicted by Base Classifiers, S
Output: Labels Lp predicted by E
//Initialize predictions for each feature in horizontal axis
Do for D= 1 to d
//initialize predictions for each feature in vertical axis
Do for M = 1 to m
If D =! M Then
Construct a new dataset T(d,m)
LT ←Train(T(d,m))
LP ←Test (LT, Ltest)
Endif
End
Meta-learner 2 ( )
Input: The predictions P corresponding to L by the meta-
learner 1.
Output: The labels F predicted by the meta-learner G.
Do for R= 2 to 4 // meta-levels
Do for U = 2 to 4//meta-features
B[X] ← Instant (R, U)
Z← Meta-learner (B[X], P)
If Zru>Zr+1, u+1
F ← Zru (Label corresponding to best prediction for G)
Endif
End
V. RESULTS
The proposed stacked ensemble and previous stacked
ensembles were implemented in Python Scikit-Learn [18] with
default settings.
The accuracy, precision, recall and F-score of the base
classifiers, the proposed stacked ensemble, the multi-feature-
based stacked ensemble, and the classical stacked ensemble are
presented in Table IV. The comparison of the proposed stacked
ensemble and existing ensemble methods are presented in
Table V. Figure 2 shows the bar chart for the comparative
analysis of the proposed stacked ensemble and the state-of-the-
arts based on accuracy.
TABLE IV. PERFORMANCE OF THE PROPOSED STACKED
ENSEMBLE, BASE CLASSIFIER AND PREVIOUS STACKED
Classifier Accuracy Precision Recall F-
score
LogReg O.67 0.86 0.68 0.74
SVM 0.80 0.91 0.81 0.84
MLP-ANN 0.95 0.97 0.96 0.96
NB 0.45 0.80 0.45 0.50
Classical Stacked
Ensemble
0.80 0.91 0.81 0.85
Multi-feature
Stacked Ensemble
0.87 0.93 0.88 0.90
Proposed Stacked
Ensemble
0.97 0.98 0.98 0.98
Vol. 18, No. 5, May 2020
ISSN 1947-5500

The results in Table IV shows that the proposed stacked
ensemble method outperformed LogReg, SVM, NB and MLP-
ANN base classifiers by recording accuracy of 0.97, precision
of 0.98, recall of 0.98 and F-score of 0.98. It also performed
better than both the classical stacking method, with accuracy of
0.80, precision of 0.91, recall of 0.81 and F-score of 0.85 and
multi-feature based stacking method with accuracy of 0.87,
precision of 0.93, recall of 0.88 and F-score of 0.98. The
performance of multifeatured-based stacking method was
however better than the performance of the classical stacking
method. Except for the proposed stacked method, MLP-ANN
outperformed all the base classifiers and the stacking methods,
justifying the results of neural networks[12] [13].
TABLE V. PERFORMANCE OF THE PROPOSED STACKED ENSEMBLE
AND OTHER ENSEMBLE METHODS
Accuracy Precision Recall F-score
Majority
Voting
Ensemble
0.92 0.97 0.92 0.92
Weighted
Voting
Ensemble
0.95 0.97 0.96 0.96
Classical
Stacked
Ensemble
0.80 0.91 0.81 0.85
Multi-
feature
Stacked
Ensemble
0.87 0.93 0.88 0.90
Proposed
Stacked
Ensemble
0.97 0.98 0.98 0.98
The results in Table V shows that the proposed stacked
ensemble method outperformed majority voting ensemble
method, with accuracy of 0.92, precision of 0.97, recall of 0.92,
F-score of 0.92 and weighted voting ensemble method, with
accuracy of 0.95, precision of 0.97, recall of 0.96 and F-score
0.96. However, their performances were better than the
performance of both the classical stacked and multi-featured-
based stacked methods. The performance of weighted voting
ensemble method was better than the performance of majority
voting method.
The bar chart in Fig. 2 shows that the performance of the
proposed stacked ensemble method was better than the
performances of the state-of-the-arts in the evaluation of NSL-
KDD data in terms of accuracy and F-score. The chart shows
that the performance of Ludwig [13] which relied on deep
neural networks was better than other state-of-the-arts, but
clearly performed lesser than the proposed approach.
Fig. 2. Comparison of the Proposed Stacked Ensemble Method and the
State-of-the-arts
VI. CONCLUSION
This paper has proposed a stacked generalization ensemble
approach, with two meta-learners. The first meta-learner was
based on the classical stacked ensemble, while the second
meta-learner was based on the multi-feature-based stacked
ensemble. The second meta-learner was optimized to obtain
the best combination of meta-features.
By applying the algorithm to NSL-KDD intrusion
evaluation data, an accuracy of 0.97, precision of 0.98, recall
of 0.98 and F-score of 0.98 were achieved. The comparison of
the method with base classifiers, ensemble methods and state-
of-the-arts showed that the proposed stacked generalization
approach is better. Therefore, the stacked ensemble approach
provides a more effective way of detecting intrusion detection
in efficient manner compared to computationally expensive
deep learning methods [13].
In future, more optimization algorithms and datasets will
be evaluated.
REFERENCES
[1] RS, Machine learning: the power and promise of
computers that learn by example. The Royal Society,
2017.
[2] S. Das, A. Dey, A. Pal, and N. Roy, “Applications of
Artificial Intelligence in Machine Learning: Review
and Prospect,” Int. J. Comput. Appl., 2015.
[3] C. F. Tsai, Y. F. Hsu, C. Y. Lin, and W. Y. Lin,
“Intrusion detection by machine learning: A review,”
Expert Systems with Applications. 2009.
[4] L. K. Hansen and P. Salamon, “Neural Network
Ensembles,” IEEE Trans. Pattern Anal. Mach. Intell.,
Vol. 18, No. 5, May 2020
ISSN 1947-5500

1990.
[5] R. E. Schapire, “The Strength of Weak Learnability,”
Mach. Learn., 1990.
[6] I. Syarif, E. Zaluska, A. Prugel-bennett, and G. Wills,
“Application of Bagging , Boosting and Stacking,” pp.
593–602, 2012.
[7] S. S. Roy and V. Krishna, “Analyzing Intrusion
Detection System : An Ensemble based Stacking
Approach,” pp. 307–309, 2014.
[8] S. Rajagopal, P. P. Kundapur, and K. S. Hareesha, “A
Stacking Ensemble for Network Intrusion Detection
Using Heterogeneous Datasets,” vol. 2020, 2020.
[9] D. Gaikwad and R. Thool, “Intrusion Detection
System using Bagging with Partial Decision Treebase
Classifier,” Procedia Comput. Sci., vol. 49, pp. 92–98,
2015.
[10] M. Govindarajan and R. Chandrasekaran, “Intrusion
Detection using an Ensemble of Classification
Methods,” Proc. World Congr. Eng. Comput. Sci., vol.
I, no. October, 2012.
[11] X. Gao, C. Shan, C. Hu, Z. Niu, and Z. Liu, “An
Adaptive Ensemble Machine Learning Model for
Intrusion Detection,” IEEE Access, vol. 7, pp. 82512–
82521, 2019.
[12] A. Shenfield, D. Day, and A. Ayesh, “Intelligent
intrusion detection systems using artificial neural
networks,” ICT Express, vol. 4, no. 2, pp. 95–99,
2018.
[13] S. A. Ludwig, “Applying A Neural Network Ensemble
To Intrusion Detection,” vol. 9, no. 3, pp. 177–188,
2019.
[14] Y. Yang, K. Zheng, C. Wu, X. Niu, and Y. Yang,
“applied sciences Building an Effective Intrusion
Detection System Using the Modified Density Peak
Clustering Algorithm and Deep Belief Networks,”
2019.
[15] S. Raschka, Mlxtend 0.9.0. 2017.
[16] S. Malmasi and M. Zampieri, “Challenges in
discriminating profanity from hate speech,” J. Exp.
Theor. Artif. Intell., vol. 3079, pp. 1–16, 2018.
[17] CICS, “NSL-KDD,” Canadian Institute of
Cybersecurity, University of New Brunswick, 2019.
[Online]. Available:
https://www.unb.ca/cic/datasets/nsl.html. [Accessed:
05-Apr-2019].
[18] G. O. and D. E. Pedregosa F., Varoquaux G., Gramfort
A., Michel V., Thirion B., “Scikit-learn: Machine
Learning in Python,” J. Mach. Learn. Res., vol. 12, pp.
2825–2830, 2011.
Vol. 18, No. 5, May 2020
ISSN 1947-5500

A Stacked Generalization Ensemble Approach for Improved Intrusion Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to A Stacked Generalization Ensemble Approach for Improved Intrusion Detection

Similar to A Stacked Generalization Ensemble Approach for Improved Intrusion Detection (20)

Recently uploaded

Recently uploaded (20)

A Stacked Generalization Ensemble Approach for Improved Intrusion Detection