2. 2 Amrita & P Ahmed
This survey paper categorizes the feature selection algorithms that have been developed for IDS
building, critically evaluates their usefulness, and recommends ways of enhancing the quality of feature
selection algorithms.
The paper is organized into the following sections. Intrusion Detection Systems is reviewed in
Section 2. Section 3 gives the details of the Datasets and Performance Evaluation used in this survey. In
Section 4, different methodologies of feature selection in IDSs are discussed. Related research in the
literature for feature selection methods together with their performance is addressed in Section 5. Section
6 summaries the different results reported in the literature in tabular form. Section 7 concludes and
discusses future research.
IINTRUSION DETECTION SYSTEM
An intrusion is defined as an attempt to compromise the confidentiality, integrity, availability,
unauthorized use of resources, or to bypass the security mechanisms of a computer system or network
and James P. Anderson introduced Intrusion Detection (ID) early in 1980s [2]. Dorothy Denning
proposed several models for IDS in 1987 [3]. Ideally, Intrusions Detection (ID) should be an intelligent
monitoring process of events occurring in system and analyzing them for security violations policies. An
IDS is required to have a high attack Detection Rate (DR) with a low False Alarm Rate (FAR). Refer [4]
for the organization of a generalized IDS.
Approaches of IDS based on detection are anomaly based and misuse based intrusion detection
approach. In anomaly based intrusion detection approach [5], the system first learns the normal behavior
or activity of the system or network to detect the intrusion. In misuse or signature based intrusion
detection approach [6], the system first define the attack and the characteristics of the attack that
distinguish this attack from normal data or traffic to detect the intrusion. Approaches of IDS based on
location of monitoring are Network based intrusion detection system (NIDS) [7] and Host-based
intrusion detection system (HIDS)[8]. NIDS detects intrusion by monitoring network traffic in terms of
IP packet. HIDS are installed locally on host machines and detects intrusions by examining system calls,
application logs, file system modification and other host activities made by each user on a particular
machine.
DATASETS AND PERFORMANCE EVALUATION
This section summarizes the popular benchmark datasets and performance evaluation measures
in the intrusion detection domain to evaluate different feature selection methods in intrusion detection
system
3. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 3
DATASETS
The KDD CUP 1999 [9] benchmark datasets are used to evaluate different feature selection
method for IDS. It consists 4,940,000 connection records for training data set and 311,029 connection
records for test data set. The training set contains 24 attacks and the test set contains 38 attacks. Since the
training and test set are prohibitively large, another 10% of the KDD Cup’99 dataset is frequently used
[9]. Each connection had a label of either normal or the attack type, with exactly one specific attack type
falls into one of the four attacks categories [10] as: Denial of Service Attack (DoS), User to Root Attack
(U2R), Remote to Local Attack (R2L) and Probing Attack. Each connection record consisted of 41
features and are labeled in order as 1,2,3,4,5,6,7,8,9,.....,41 and falls into the four categories are shown in
Table 1:
Category 1 (1-9) : Basic features of individual TCP connections
Category 2 (10-22) : Content features within a connection suggested by domain knowledge
Category 3 (23-31) : Traffic features computed using a two-second time window
Category 4 (32-41) : Traffic features computed using a two-second time window from destination to
host
Table 1: Lists of features in the KDD cup 99
Feature # Feature Name Feature # Feature Name Feature # Feature Name
1 Duration 15 Su-attempted 29 Same-srv-rate
2 Protocol-type 16 Num-root 30 Diff-srv-rate
3 Service 17 Num-file-creations 31 Srv-diff-host-rate
4 Flag 18 Num-shells 32 Dst-host-count
5 Src-bytes 19 Num-access-files 33 Dst-host-srv-count
6 Dst-bytes 20 Num-outbound-cmds 34 Dst-host-same-srv-
rate
7 Land 21 Is-hot-login 35 Dst-host-diff-srv-
rate
8 Wrong-fragment 22 Is-guest-login 36 Dst-host-same-src-
port-rate
9 Urgent 23 Count 37 Dst-host-srv-diff-
host-rate
10 Hot 24 Srv-count 38 Dst-host-serror-rate
11 Num-failed-logins 25 Serror-rate 39 Dst-host-srv-serror-
rate
12 Logged-in 26 Srv-serror-rate 40 Dst-host-rerror-rate
13 Num-compromised 27 Rerror-rate 41 Dst-host-srv-rerror-
rate
14 Root-shell 28 Srv-rerror-rate
Performance Evaluation
The effectiveness of an IDS is evaluated by its ability to make correct predictions. According to
the real nature of a given event compared to the prediction from the IDS, four possible outcomes are
shown in Table 2, known as the confusion matrix [4]. True Positive Rate(TPR) or Detection Rate(DR),
True Negative Rate(TNR), False Positive Rate (FPR) or False Alarm Rate (FAR) and False Negative
4. 4 Amrita & P Ahmed
Rate(FNR) are measures that can be applied to quantify the performance of IDSs [4] based on the above
confusion matrix.
Table 2. Confusion Matrix
Predicted Negative Class
(Normal)
Positive Class (Attack)
Actual
Negative Class (Normal) True Negative (TN) False Positive (FP)
Positive Class (Attack) False Negative (FN) True positive (TP)
FEATURE SELECTION
Real time intrusion detection is merely impossible due to the huge amount of data flowing on
the Internet. Feature selection can reduce the computation and model complexity. Research on feature
selection started in early 60s [11]. Feature selection is a technique of selecting a subset of relevant
features by removing most irrelevant and redundant features [12] from the data for building robust
learning models [13].
Process of Feature Selection
Feature selection processes involve four basic steps in a typical feature selection method [13]
shown in Figure 2. They are generation procedure to generate the next candidate subset; an evaluation
function to evaluate the subset under examination; a stopping criterion to decide when to stop; and a
validation procedure to check whether the subset is valid. Figure 2 demonstrates the feature selection
process to determine and validate a best feature subset.
Figure 1 : Feature selection process with validation [13].
5. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 5
METHODS FOR FEATURE SELECTION
Blum and Langley [14] divide the feature selection methods into three categories named filter,
wrapper and hybrid (embedded) method. These methods are currently used in intrusion detection. The
filter method [15][16] selects features subsets based on the general characteristics of the data. Filter
method is independent of classification algorithms. Filter algorithm [18] uses external learning algorithm
to evaluate the performance of selected features. The wrapper method [19] “Wrap around” the learning
algorithm. It uses one predetermined classifier to evaluate features or feature subsets. Wrapper algorithm
[18] uses a search algorithm to search through the space of possible features and evaluate each subset by
running a model on the subset. Many feature subsets are evaluated based on classification performance
and best one is selected This method is more computationally expensive than the filter method [17][19].
The hybrid method [17][20] combines wrapper and filter approach to achieve best possible performance
with a particular learning algorithm. More efficient search strategies and evaluation criteria are needed
for feature selection with large dimensionality in hybrid algorithm [18] to achieve similar time
complexity of filter algorithms. These methods are discussed in detail in Section 5 and summarized in
section 6.
RELATED WORKS
In this section, we thoroughly discusses the different feature selection methods used in intrusion
detection based on filter, wrapper and hybrid method, number of feature selected, feature number
(according to Table 1), its performance on KDD Cup’99 dataset, strength, limitation and future work
reported in the literature.
Filter Method
A feature selection algorithm, FSMDB based on DB index criterion is proposed in [21] (Zhang
et al., 2004). Criterion function is constructed according to the characters of DB index criterion. 24
features {features no. : 6, 5, 1, 34, 33, 36, 32, 8, 27, 29, 28, 30, 26, 38, 39, 35, 13, 24, 23, 11, 3, 10, 12
and 4} are selected and tested using two classifiers BP network and SVM. Classification accuracy of
FSMDB algorithm by classifiers BP network and SVM are 0.1017 and 0.056 respectively. This method
can be used for supervised or unsupervised classification problems but has computational complexity in
unsupervised learning mode. Future Work: To find a better approach to reduce high computational
complexity in unsupervised learning mode.
Two neural network methods: (1) neural network principal component analysis (NNPCA) and
(2) nonlinear component analysis (NLCA) are presented in [22] (Kuchimanchi et al., 2004). The number
of significant features extracted from methods PCA, NNPCA and NLCA are 19, 19 and 12. The first 19
selected features based on the results of Scree test and critical eignvalues test are {feature no. : 5, 6, 1,
22, 21, 31, 30, 3, 4, 2, 16, 10, 13, 34, 32, 27, 24, 37, 23 and 36}. The performance of the Non-linear
classifier (NC) and the CART decision tree classifier (DC) are tested on four datasets (Table 3). DC has
6. 6 Amrita & P Ahmed
relatively high detection accuracies and low false positive rates. Future Work: This work can be extended
on quantitative measures to find optimal combinations of classifiers and feature extractors for IDS.
Table 3 : False Positive Rates (FPR) And Detection Accuracies (DA) for NC and DC on the four Datasets
DATASET #Features FPR DA
NC DC NC DC
ORIGDATA 41 8.2821 0.2268 99.0198 99.9428
PCADATA 19 29.4105 0.2609 99.1161 99.9167
NNPCADATA 19 50.5463 0.4922 98.8206 99.7516
NLDATA 12 51.2756 0.8227 97.2306 99.6359
RICGA (ReliefF Immune Clonal Genetic Algorithm), a combined feature subset selection
method based on the ReliefF algorithm, Immune Clonal selection algorithm and GA is proposed in [23]
(Zhu et al., 2005). BP networks is used as classifier.. RICGA has higher classification accuracy (86.47%)
for small size feature subsets (8) than ReliefF-GA. Features are not mentioned in the paper.
This paper [24] (Zainal et al., 2006) investigated the effectiveness of Rough Set (RS) theory in
identifying important features and used as a classifier. The 6 significant features obtained are {feature
no.: 41, 32, 24, 4, 5 and 3}. Classification results obtained by Rough Set are compared with Multivariate
Adaptive Regression Splines (MARS), Support Vector Decision Function (SVDF) and Linear Genetic
Programming (LGP). Classification accuracy of RS is ranked second for normal category and performed
almost same to MARS and SVDF for attack category. Future Work: This work can be extended in terms
of accuracy by focusing on fusion of classifiers after a set of optimum feature subset is obtained.
Wong and Lai (2006) [25] combined Discriminant Analysis (DA) and Support Vector Machine
(SVM) to detect intrusion for anomaly-based network IDS. Nine features (feature no. : 12, 23, 32, 2, 24,
36, 31, 29 and 39) are extracted by Discriminant Analysis and evaluated by SVM. The TN (%), FP(%),
FN(%) and TP(%) of the proposed method are 99.58%, 0.42%, 9.93% and 90.07% respectively. Future
Work: Multiple Discriminant Analysis (MDA) can be applied to find the optimal feature set for each
type of attack.
Li et al. (2006) [26] proposed a lightweight intrusion detection model. Information Gain and
Chi-Square approach are used to extract important features and Classic Maximum Entropy (ME) model
is used to learn and detect intrusions. The top 12 important features selected by both methods are
{feature no.: 3, 5, 6, 10, 13, 23, 24, 27, 28, 37, 40 and 41}. Experimental results are shown in Table 4.
Future Work: This model can be applied in realistic environment to verify its real-time performance and
effectiveness.
7. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 7
Table 4. Detection Results
All 41 features Selected features
Class Testing Time(s) Acc.(%) Testing Time(s) Acc.(%)
Normal 1.28 99.75 0.78 99.73
Probe 2.09 99.8 1.25 99.76
DoS 1.93 100 1.03 100
U2R 1.05 99.89 0.7 99.87
R2L 1.02 99.78 0.68 99.75
Tamilarasan et al. (2006) [27] performed different feature selection and ranking methods on the
KDD Cup’99 dataset. Chi-Square analysis, logistic regression, normal distribution and beta distribution
experiments are performed for feature selection. The 25 most significant features ranked by Chi-square
test are {feature no.: 35, 27, 41, 28, 40, 30, 34, 3, 33, 12, 37, 24, 29, 2, 13, 8, 36, 10, 26, 39, 22, 25, 5, 1,
38}. Experiments are performed for normal, probe, DoS, U2R, and R2L using resilient back propagation
neural network. The overall accuracy of the classification is 97.04% with a FPR of 2.76% and FNR of
0.20%.
Fadaeieslam et al. (2007) [28] proposed a feature selection method based on Decision
Dependent Correlation (DDC). Mutual information of each feature and decision is calculated and top 20
important features {feature no.: 3, 5, 40, 24, 2, 10, 41, 36, 8, 13, 27, 28, 22, 11, 14, 17, 18, 7, 9 and 15}
are selected and evaluated by SVM classifier. The classified result is 93.46% and it outperforms
Principal Component Analysis PCA.
Shina Sheen and R Rajesh (2008) [29] considered different methods: Chi square, Information
Gain and ReliefF for feature selection. Top 20 features {feature no.: 2, 3, 4, 5, 12, 22, 23, 24, 27, 28, 30,
31, 32, 33, 34, 35, 37, 38, 40 and 41} are selected and evaluated using decision tree (C4.5). The
Classification accuracy of Chi Square, Info Gain and ReliefF are 95.8506%, 95.8506% and 95.6432%
respectively.
In [30] (Kiziloren and German, 2009), Principal Component Analysis (PCA) is used for feature
selection to increase quality of extracted feature vectors and Self Organizing Network (SOM) as a
classifier to detect network anomalies. The highest success rate 98.83% of the system is obtained when
number of feature vector size equals to 10. Features are not mentioned in the paper. The average success
rate of the system without using PCA is 97.76%. PCA provides faster classification operation which is
important for a real-time system.
Suebsing and Hiransakolwong (2009) [31] proposed a combination of Euclidean Distance and
Cosine Similarity to select robust features subsets with smaller size. Euclidean Distance is used to select
the features to detect the known attacks and Cosine Similarity is used to select the features to detect the
unknown attacks to build a model. The known detection method extracts 30 important features {feature
no. : 1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40, 41, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19
8. 8 Amrita & P Ahmed
and 22}. The unknown detection method extracts 24 important features {feature no. : 1, 2, 12, 25, 26, 27,
28, 30, 31, 35, 37, 38, 39, 40, 41, 3, 4, 23, 24, 29, 32, 33, 34 and 36}. 15 features {feature no.: 1, 2, 12,
25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40 and 41} are selected by both methods. The C5.0 method is used
as a classifier. The experimental results are shown in Table 5.
Table 5: Results for known and unknown attack
Parameter Known attack Unknown attack
Full Set (41) Known detection
method(30)
Full Set (41) Unknown
detection
method(24)
Overall TP % 97.95 98.12 53.31 68.28
Overall FP % 2.04 1.87 46.69 31.72
Time to Build Model(s) 75 51 75 45
A new approach named Quantitative Intrusion Intensity Assessment (QIIA) is proposed in the
paper [32] (Lee et al., 2009). QIIA evaluates the proximity of each instance of audit data using proximity
metrics based on Random Forests (RF). QIIA uses Random Forests (RF) to select important features by
using the numerical feature importance of RF. Two approaches QIIQ1 and QIIA2 are proposed to
determine the threshold parameters value. The top 5 important features selected are {feature no.: 23, 32,
10, 6 and 3}. Only DoS attacks are used since other attack types have very small number of instances.
The experimental results show that the detection rates (DR) of QIIA1 and QIIA2 are 97.94 and 99.37
respectively.
An entropy-based traffic profiling scheme for detecting security attacks is presented in [33] (Lee
and He, 2009). Only denial-of-service (DoS) attack is focused in this paper. The top six features ranked
by the accuracy are {feature no.: 5, 6, 31, 32, 36 and 37}. The true positive rate (TPR) of this scheme is
91%.
[34] (Xiao et al., 2009) presented a two-step feature selection algorithm. It eliminates two kinds of
features: irrelevant features in first step and redundant features in second step. 21 features {feature no.: 1,
3, 4, 5, 6, 8, 11, 12, 13, 23, 25, 26, 27, 28, 29, 30, 32, 33, 34, 36 and 39} are selected and evaluated using
C4.5 algorithm and Support Vector Machine (SVM). The Detection Rate (%), False Alarm Rate (%) and
Processing Time of selected features (All features) are 86.3 (87.0), 1.89 (1.85) and 15.163 sec (21.891
sec) respectively.
A novel approach for selecting features and comparing the performance of various BN
classifiers is proposed in [35] (Khor et al., 2009). Two feature selection algorithms Correlation-based
Feature Selection Subset Evaluator (CFSE) and Consistency Subset Evaluator (CSE) and domain experts
are utilised to form the proposed feature set.. This feature set contains 7 features as {feature no.: 3, 6, 12,
23, 32*, 14* and 40*}. Bayesian Network (BN) is employed as a classifier. The classification accuracy
(%) of the BN for Normal, DoS, Probe, R2L and U2R types are (99.8, 99.9, 89.4, 91.5 and 69.2%). *:
Features that were selected based on domain knowledge.
9. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 9
Bahrololum et al. (2009) [36] used three machine learning methods : Decision Tree(DT),
Flexible Neural Tree (FNT) and Particle Swarm Optimization (PSO) for feature selection. The five
important features {feature no.: 10, 17, 14, 13 and 11} are selected depending on the contribution of the
variables for the construction of the decision tree. The experimental results are shown in (Table 6).
Table 6 : Detection Performance using DT, FNT and PSO Methods
Attack Class DT FNT PSO
Normal 9.96% 99.19% 95.69%
DoS 100% 98.75% 90.41%
R2L 99.02% 99.09% 98.10%
U2R 88.33% 99.70% 100%
Probe 99.66% 98.39% 95.53%
An automatic feature selection method based on filter method is proposed by Nguyen et al.
(2010) [37]. The globally optimal subset of relevant features is found by the Correlation Feature
Selection (CFS) and evaluated by C4.5 and BayesNet. The selected features for Normal&DoS are 3 {5, 6
and 12}; for Normal&Probe are 6 {5, 6, 12, 29, 37 and 41}; for Normal&U2R is 1 {14}; for
Normal&R2L are 2 {10 and 22}. Average classification accuracies of C4.5 and BayesNet are 99.41%
and 98.82% respectively.
Chen et al. [38] (2010) proposed a novel inconsistency-based feature selection method. Data
consistency is applied to find the optimal features and evaluated by decision tree method (C4.5). The
proposed method is compared with CFS (Table 7).
Table 7 : Performance Comparision (CC: Classification Correctness)
Attac
k
Type
All features Proposed Method CFS Method
CC(%
)
Time(s
)
Features CC(%
)
Time(s
)
Features CC(%
)
Ti
me
(s)
Probe 99.85 0.66 4(3,5,35,36) 99.77 0.16 4(5,6,25,37) 94.35 0.2
7
DoS 99.94 1.08 4(3,4,10,23) 99.81 0.22 4(2,5,16,22) 99.32 0.3
3
U2R 100 0.11 2(3,41) 100 0.09 9(3,10,24,29,31,32,33,34,40) 100 0.0
8
R2U 98.99 0.22 5(3,5,12,32,35) 99.13 9.13 5(3,5,10,24,33) 98.05 0.1
1
All 99.5 3.72 8(1,3,5,25,32,34,36,40
)
99.45 0.48 11(2,3,4,5,6,10,23,24,25,36,3
7)
99.67 6.2
8
A novel unsupervised statistical varGDLF, a variational framework for the GD mixture model
with localized feature selection (GDLF) approach is proposed in [39] (Fan et al., 2011) for detecting
network based attacks. Eleven features {feature no.: 1, 5, 12, 15, 18, 21, 22, 29, 33, 38 and 41} are
selected. The performance of varGDLF approach is compared with other four variational mixture models
10. 10 Amrita & P Ahmed
and it outperforms with the highest accuracy rate (85.2%), the lowest FP rate (7.3%) and the most
accurately detected number of components (4.95). Accuracy rate for Normal, DOS, R2L, U2R and
Probing is 99.5, 96.5, 75.4, 69.6 and 85.1%, respectively. FP rate is 11.5, 0.8, 1.4, 11.5 and 11.3%,
respectively.
An improved information gain (IIG) algorithm is proposed in [40] (Xian et al., 2011) based on
feature redundancy.. Twenty two features are selected after applying Information Gain (IG) algorithm
and then 12 {feature no.:2, 3, 5, 6, 8, 10, 12, 23, 25, 36, 37 and 38} features are selected after applying
IIG. Naive Bayes (NB) is used to carry out the experiment on the three feature set as the original feature
set (41 features), feature subset 1 (22 features) and feature subset 2(12 features). The Processing times (s)
of the three feature subsets are 8.34, 4.16 and 2.08; the Detection Rates (DR) (%) are 96.187, 96.407 and
96.801; the False Positive Rates (FPR) (%) are 5.22, 2.58 and 1.02 respectively.
WRAPPER METHOD
In paper [41] (Middlemiss and Dick, 2003), a simple Genetic Algorithm (GA) is used to evolve
weights for the features and k-nearest neighbour (KNN)classifier is used as fitness function of the GA
and also as classifier. Top five ranked features for each class are selected {DoS-23,29,1,11,24; R2U-
24,3,12,23,36; U2R-24,6,31,41,17; Probe-2,37,30,3,6}. The result shown indicates an increase in
intrusion detection accuracy.
Mukkamala and Sung (2003) [42] presented two methods to rank the important features:
(1)Performance-Based Ranking Method (PBRM) and (2) Support Vector Decision Function Ranking
Method (SVDFRM). Thirty one features are selected by union of important features for each of the 5
classes ranked by PBRM. In SVDFRM, the union of important features for each of the 5 classes are 23.
The 8 important features identified by both ranking methods are {feature no.: 1, 3, 5, 6, 23, 24, 32 and
33}. Experiments are performed by both methods with classifier SVM (Table 8). Future Work: Ongoing
experiments include making 23-class (22 attack classes plus normal) feature identification using SVMs.
Table 8 : Performance of SVMs
Ranked by PBRM (31) Ranked by SVDFM (23)
Class Training
Time (s)
Testing Time(s) Acc.(%) Training
Time (s)
Testing
Time(s)
Acc.(%)
Normal 7.67 1.02 99.51 4.85 0.82 99.55
Probe 44.38 2.07 99.67 36.23 1.4 99.71
DOS 18.64 1.41 99.22 7.77 1.32 99.2
U2R 3.23 0.98 99.87 1.72 0.75 99.87
R2L 9.81 1.01 99.78 5.91 0.88 99.78
11. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 11
The Ant Colony Optimization (ACO) based intrusion feature selection algorithm is proposed in
[43] (Gao et al., 2005). The fisher discrimination rate is adopted as the heuristic information for ants’
traversal. The Least Square based SVM classifier is adopted as the base classifier to evaluate the
generated feature subset. The number of features selected by applying ACO-SVM methods is 11 for
Probe, 9 for DoS, and 14 for U2R & R2L. Features name is not mentioned in this paper. Table 9 shows
the experimental results.
Table 9: Performance of ACO-SVM
Type #Feature Correct
Classification
Rates
False Positive
Rates
Average
Detection Time
Probe 11 99.40% 0.35% 0.074
DoS 9 95.20% 3.24% 0.031
U2R&R2L 14 98.70% 1.60% 0.078
This paper [44] (Banković et al., 2007) investigated the possibility to increase the detection rate
(DR) of U2R attacks in misuse detection. Extracted features obtained by using Principal Component
Analysis(PCA) and Multi Expression Programming(MEP) are {U2R-14, 33; DoS- 1, 5, 39; Normal- 3,
10, 12}. Genetic algorithm is employed to implement rules for detecting various types of attacks.
Additional two more rule sets are deployed to re-check the decision of the rule set for detecting U2R
attacks. The experiments show (Table 10) that this system outperforms the best-performed model
reported in literature.
Table 10. Performance of the System
#Rules DR FPR
Total
System
U2R Rule
System
Total
System
U2R Rule
System
50 50 46.3 0.0055 0.007
75 77.8 77.8 7.2 10.2
100 100 100 16.54 27.4
Chen et al. (2007) [45] presented a wrapper based feature selection method. A random search
method named modified random mutation hill climbing (MRMHC) is introduced as search strategy to
select features subsets and Support Vector Machines (SVMs) as classifier. The experiments are shown in
Table 11. Future Work: This method can be improved on search strategy and evaluation criterion.
12. 12 Amrita & P Ahmed
Table 11: Selected feature subsets, time for selecting process for different feature
selection algorithm, average time of building and testing process for ALL
Attacks, DOS, PROBE, R2L and U2R
Attack Type ALL DOS PROBE R2L U2R
#Features 5 4 5 3 5
Selected features 3,5,23,33,34 5,12,23,34 1,3,5,23,37 1, 5,6 1,3,6,14,33
Time of
Selecting
Process(h)
GA-
SVMs
1.3 0.5 4 1.5 1.5
MRMHC-
SVMs
0.4 0.2 2.2 0.8 0.6
Avg. Time to
Build Process(s)
All 78 136 245 317 193
Selected 30 31 96 24 78
Avg. Time to
Test Process(s)
All 18 22 49 55 50
selected 6 5 17 7 15
A multi-objective genetic fuzzy intrusion detection system (MOGFIDS) is proposed by Tsang et
al. (2007) [46]. The MOGFIDS is used as a genetic wrapper to search for a near-optimal feature subset.
The 27 features selected by MOGFIDS are {feature no.: 2 (tcp, udp, icmp), 5, 6, 7, 8, 9, 11, 12, 13, 14,
17, 18, 22, 23, 25, 30, 32, 33, 34, 35, 36, 37, 38, 39 and 40}. The MOGFIDS has second highest ACC
(99.24%) and lowest FPR (1.1%) among the wrappers in the paper. Future Work: This can be applied to
other complex problem domains such as face recognition and DNA computing.
This paper [47] (Wang and Gombault, 2008) proposed a system that extracts important features
from raw network traffic only for DDoS attacks in real computer networks. The first 9 important features
{feature no.: 23, 32, 37, 33, 5, 24, 31, 39 and 3} based on rank are selected by Information Gain and Chi-
square method and evaluated by Bayesian Networks and decision trees (C4.5) shown in Table 12. Future
Work: A practical real-time system for fast detection of DDoS attacks can be developed.
Table 12: Detection rate, False Positive Rate and Construction Time Results
Evaluatio
n Criteria
Dr FPR Features
Construction Time
Training Time (s) Testing time (s)
Methods C4.5 BN C4.5 BN - C4.5 BN C4.5 BN
#Feature
s
9 41 9 41 9 41 9 41 9 41 9 41 9 41 9 41
99.
8
99.
8
99.
6
99.
0
0.3 0.3 1.6 1.5 237(s) 2043(s) 1.
7
15.
3
0.
7
4.4 0.
2
0.9 0.2 0.9
Li et al. (2009) [48] proposed a wrapper-based feature selection method to build lightweight
intrusion detection system. Modified Random Mutation Hill Climbing (RMHC) method are applied as
search strategy to find a candidate feature subset and modified linear Support Vector Machines (SVMs)
to evaluate the candidate feature subset. A classification algorithm based on a decision tree whose nodes
consist of linear SVMs is used to build the IDS from selected features subsets. The experiments show
13. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 13
that the systems have higher ROC (Receiver Operating Characteristic) scores than all 41 features in
terms of detecting known attacks, new attacks and computational cost (Table 13).
Table 13 – Selected feature subsets, Average time of building and
testing processes with all and selected features for ALL attacks,
DOS, PROBE, R2L and U2R
Attack
Type
Features Building time(s) Testing time(s)
All
features
Selected
features
All
features
Selected features
ALL 4(3,5,23,32) 78 36 18 8
DOS 4(2,5,23,34) 136 41 22 9
PROBE 6(1,3,5,6,23,35) 245 123 49 29
R2L 3(1,3,5) 317 35 55 8
U2R 5(1,3,5,14,32) 193 85 50 18
This paper [49] (Ali et al., 2010) improve the accuracy of Signature Detection Classification
(SDC) Model by applying the features extraction based customized features. Features are extracted by
using GA (Genetic Algorithm), two-second-time and Hidden Markov from customized features. Eleven
features {feature no.: 5, 6, 13, 23, 24, 25, 26, 33, 36, 37 and 38} are extracted and the best signature
detection classification model is developed using JRip, Ridor, PART and Decision tree. The extracted
features have increased the detection rates between 0.4% to 9% and reduced false alarm rates between
0.17% to 0.5%.
Gong et al. (2011) [50] proposed a novel approach for feature selection based on Genetic
Quantum Particale Swarm Optimization (GQPSO) for network intrusion detection. Support Vector
Machine (SVM) is used for classification algorithm. Selected features and experimental results are
shown in Table 14.
Table 14 : Selected Feature and performance of SVM with GQPSO Algorithm
Attack Type Features Training Detecting DR Error
Report
Time(ms) Time(ms) Rate(%)
DoS 10 (2, 6, 3, 12,
21, 22,31, 26, 28,
30)
0.0627 0.0581 99.98 0
Probe 5 (5, 12, 26, 32,
34)
0.0431 0.0478 91.77 0.001
R2L 7 (10, 23, 25, 29,
26, 33, 35)
0.053 0.014 98.26 0
U2R 5 (2, 3, 17, 32,
36)
0.0006 0.0016 100 0.0003
14. 14 Amrita & P Ahmed
Li et al. (2012) [51] proposed an effective wrapper-based feature reduction method, called
gradually feature removal (GFR) method. The GFR method extracted 19 critical features {feature no.: 2,
4, 8, 10, 14, 15, 19, 25, 27, 29, 31, 32, 33, 34, 35, 36, 37, 38 and 40}. The accuracy of SVM classifier is
achieved 98.6249% and MCC (Matthews correlation coefficient) is 0.861161. The training and testing
time of SVM classifier is greatly reduced.
An advanced intelligent systems using ensemble soft computing techniques is proposed by
Sindhu et al. (2012) [52] for a lightweight IDS to detect anomalies in networks. GA (Genetic Algorithm)
is used to extract the feature subset and a neurotree paradigm is proposed as a classifier. Features
extracted by this method are 16 {feature no.: 2, 3, 4, 5, 6, 8, 10, 12, 24, 25, 29, 35, 36, 37, 38 and 40}.
The detection rate is 98.4% which is superior to other methods.
HYBRID METHOD
In this paper [53] (NG et al., 2003), a feature importance ranking methodology based on the
stochastic radial basis function neural network output sensitivity measure (RBFNN-SM) is presented.
RBFNN-SM is used to evaluate the features for only the normal and six classes of denial of service
(DOS) attack. The experiments show that 8 {feature no.: 2, 24, 23, 29, 32, 34, 33 and 36} most
significant sensitive features are enough to classify normal and DOS attacks. The computation
complexity reduced to 9 seconds from 23 seconds. The classification accuracy for normal and DOS
attacks are 99.77% and 99.06%; the FAR for 8 (41) features are 0.18% (0.01%) and 0.27% (0.03%); the
FPR are 0.93% (0.70%); and training and testing are 0.94% and (0.71%) respectively.
Shazzad and Park (2005) [54] proposed a fast hybrid feature selection method to determine an
optimal feature set. This method is a fusion of Correlation-based Feature Selection (CFS), Support
Vector Machine (SVM) and Genetic Algorithm (GA). Subsets of features are generated by Genetic
Algorithm and evaluated by CFS and SVM. The 12 selected features are {feature no.: 1, 6, 12, 14, 23,
24, 25, 31, 32, 37, 40 and 41}. Optimal subset set has 99.56% as DR and 37.5% as FPR in average.
Chebrolu, Abraham and Thomas(2005) [7] investigated the performance of two feature
selection techniques, Bayesian Networks (BN) and Classification and Regression Trees (CART) and
developed the ensemble classifier of both techniques for building an IDS and best in classifying R2L and
DoS. Seventeen important features are {feature no.: 1, 2, 3, 5, 7, 8, 11, 12, 14, 17, 22, 23, 24, 25, 26, 30
and 32} are selected by Markov blanket model and a classifier is constructed using BN and tested.
Twelve features {feature no.: 3, 5, 6, 12, 23, 24, 25, 28, 31, 32, 33 and 35} are selected by decision tree
and a classifier using CART is constructed and tested. Normal class is classified 100% correctly and the
accuracies of classes U2R and R2L have increased by using the 12-variable reduced data set. It is
observed that CART classifies accurately on smaller data sets. In ensemble approach, the BN classifier
and the CART models are constructed first individually. Then the ensemble approach is used for the 12,
17 and 41-variable data sets. By using the ensemble model, Normal, Probe and DOS could be detected
with 100% accuracy and U2R and R2L with 84% and 99.47% accuracies, respectively.
15. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 15
In this paper [55] (Chen et al., 2007), a new hybrid approach named as C4.5-PCA-C4.5 is
proposed. It uses PCA (Principal Component Analysis) and decision tree classifier C4.5 as feature
selection method and C4.5 as classifiers. The important features extracted are {feature no.: 33, 34, 4, 1,
3, 10 and 22}. The performance of C4.5-PCA-C4.5 is compared with other four systems C4.5-ALL,
C4.5-PCA, SVM-CFS and SVM-CFS-SVM. The experiment results show that C4.5-PCA-C4.5 has
lower testing time, fast training and testing process, highest TPR, lowest FPR. Average building process
time for C4.5-PCA-C4.5 is 6 sec.
Lee et al. (2007) [56] uses two machine learning algorithms Random Forests (RF) for feature
selection and Minimax Probability Machine (MPM) for intrusion detection. The top 5 {feature no.: 23, 6,
29, 3 and 5} important features are selected. Only Denial of Service (DoS) attacks are used. The
detection rate is 99.84% and average simulation time is 0.1039 sec.
Wei Wang et al. (2008) [57] used filter and wrapper scheme for feature selection. Information
gain (IG) based filter model and Bayesian networks (BN) and decision trees (C4.5) based wrapper model
are employed to select features for network intrusion detection and Bayesian networks (BN) and decision
trees (C4.5) as classifier. Experiments results and selected 10 features for each class are shown in Table
15.
Table 15. Results comparison using 41 features and 10 features
Attacks Features
Selected
Methods Using 41 Features Using 10 Features
DR FPR Training
Time(s)
Test
Time(s)
DR FPR Training
Time(s)
Test
Time(s)
DoS 3, 4, 5,
6, 8, 10,
13, 23,
24, 37
BN 98.73 0.08 4.7 2.1 100 0 0.8 0.6
C4.5 99.96 0.15 16.3 1.2 100 0.14 4.6 0.5
DDoS 3, 4, 5,
6, 8, 10,
13, 23,
24, 37
BN 99.03 1.53 - - 99 1.92 - -
C4.5 99.8 0.26 - - 100 0.34 - -
Probe 3, 4, 5,
6, 29,
30, 32,
35, 39,
40
BN 92.89 6.08 3.1 2.8 83 3.06 0.5 0.4
C4.5 82.59 0.04 14.5 1.1 83 0.05 1.2 0.3
R2L 1, 3, 5,
6, 12,
22, 23,
31, 32,
33
BN 92.22 0.33 2.6 1.8 89 0.32 0.5 0.4
C4.5 80.29 0.02 10.5 0.8 87 0.01 0.5 0.2
U2R 1, 2, 3,
5, 10,
13, 14,
32, 33,
36
BN 75.86 0.29 2.6 1.8 66 0.12 0.4 0.4
C4.5 24.14 0 9.9 0.7 24 0 0.6 0.2
Hong and Haibo (2009) [58] proposed a new hybrid selection algorithm to build lightweight
network IDS. Chi-Square and enhanced C4.5 algorithm are used for feature selection in the
preprocessing phase. The top fifteen most important features extracted from Chi-Square algorithms are
16. 16 Amrita & P Ahmed
{feature no.: 5, 3, 23, 35, 4, 8, 30, 34, 36, 6, 33, 38, 24, 25 and 2}. The top five features extracted by
C4.5 and C4.5-Chi2 methods are {feature no.:25, 4, 2, 5 and 29} and {feature no.: 5, 3, 4, 8 and 25}
respectively. The experimental results are shown in Table 16.
Table 16: Detection & False Positive Rate Results based on C4.5- CHI2
Attack
Type
Evaluation Criteria
DR FPR Training Time Testing Time
Normal 99.9 1.6
0.02 Sec 0.03 Sec.
DOS 99.3 1.48
Probe 93.87 1.82
U2R 50.01 28.32
R2L 61.55 12.17
In this paper [59] (Xiang et al., 2009), a hybrid method named Robust Artificial Intelligence
Selection Algorithm (RAIS) is presented. Mutual information and artificial intelligence method are used
for feature subsets selection and SVMs as classifier. Selected features are not mentioned in this paper.
The experimental results show that the RAIS algorithm has the lowest false alarm rate, 3.49%, the
highest rate of accuracy, 99.01%, and detection rate, 99.27%.
Zaman and Karray (2009) [60] proposed a novel and simple method named Enhanced Support
Vector Decision Function (ESVDF) for features selection. This method utilizes the Support Vector
Machines (SVMs) approach based on Forward Selection Ranking (FSR) and Backward Elimination
Ranking (BER) algorithms. The ESVDF (SVDF/FSR or SVDF/BER) method applies SVDF in the FSR
and BER approaches to select the most effective features set. Two classifiers: Neural Networks (NNs)
and SVMs are used to evaluate features. The experimental results are shown in Table 17. Feature’s name
is not mentioned.
Table 17 : Comparison of ESVDF/FSR, ESVDF/BER, and All 41 Features using NNs and
SVMs classifiers.
Classifier Algorithm #Features Accuarcy FPR Training
Time
Testing Time
NN ESVDF/FSR 6 99.55% 0.0032 217.57 0.047
ESVDF/BER 9 99.57% 0.003 255.047 0.053
Non 41 99.65% 0.0036 911.68 0.075
SVM ESVDF/FSR 6 99.46% 0.0033 2.039 0.052
ESVDF/BER 9 99.58% 0.0031 2.1 0.046
Non 41 99.71% 0.0032 5.182 0.17
Ming-Yang Su (2011) [61] proposed a method for feature selection to detect DoS/DDoS attacks
in real time for designing an anomaly-based NIDS. Genetic algorithm (GA) combined with KNN (k-
17. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 17
nearest-neighbor) are used for feature selection and weighting. The result of KNN classification is used
as the fitness function in a genetic algorithm to evolve the weight vectors of features. Initial 35 features
in the training phase are weighted. The top 19 features are considered for known attacks and the top 28
features for unknown attacks. Extracted features are not mentioned in the paper. An overall accuracy rate
of 97.42% is obtained for known attacks and 78% for unknown attacks.
A SYSTEMATIC REVIEW OF RELATED WORK
The afore-mentioned work of feature selection is summarized in a systematic way according to
approach as filter in Table 18, wrapper in Table 19 and hybrid in Table 20. These tables consist of
literature reference, proposed method name, number of features selected by paper, feature number
according to Table 1, classifier used to evaluate the proposed method, evaluation criteria and results of
proposed method.
Table 18: Summary of Filter Method
Lit.
Ref.
Method Name No of
Feature
Feature No Classifier
Used
Evaluation
Criteria
Result
FILTERMETHOD
[21]
2004
FSMDB 24 6,5,1,34,33,36,32,8,27,29,2
8,30,26,38,39,35,13,24,23,1
1,3,10,12,4
BP
Network,
SVM
Classification
Accuracy
BP-0.1017
SVM-
0.056
[22]
2004
NNPCA &
NLCA
19
12
5, 6, 1, 22, 21, 31, 30, 3, 4,
2, 16, 10, 13, 34, 32, 27, 24,
37, 23
NC & DC FPR
Detection
Accuracies
Table 3
[23]
2005
RICGA 12 Not Mentioned BP
Network
Classification
Accuracy
88.15%.
[24]
2006
Rough Set 6 41, 32, 24, 4, 5, 3 Rough Set Classification
Accuracy
99.743
[25]
2006
Combined DA and
SVM
9 12, 23, 32, 2, 24, 36, 31, 29,
39
SVM TN (%)
FP ( %)
FN (%)
TP (%)
99.58%
00.42%
09.93%
90.07%
[26]
2006
Information Gain
and Chi-Square
approach
12 3,5,6,10,13,23,24,27,28,37,
40,41
ME Accuracy
Testing Time
Table 4
[27]
2006
Artificial Neural
Networks and
Statistical Methods
25 35,27,41,28,40,30,34,3,33,1
2,37, 24,29, 2, 13,8,36,10,
26,39,22, 25,5,1,38
RBP
Neural
Network
Accuracy
FPR
FNR
97.04%
2.76%
0.20%
[28]
2007
Decision Dependent
Correlation(DDC)
20 3,5,40,24,2,10,41,36,8,13,2
7,28,22,11,14,17,18,7,9,15
SVM Classification
Accuracy
93.46%
[29]
2008
Chi Square,
Info Gain and
ReliefF
20 2,3,4,5,12,22,23,24,27,28,
30,31,32,33,34, 35,37,38,
40,41
Decision
Tree(C4.5)
Classification
Accuracy
95.8506%
95.8506%
95.6432%
[30]
2009
PCA-SOM 10 Not mentioned SOM Avg. Success
Rate
98.83%
[31]
2009
Euclidean Distance
& Cosine Similarity
15 1, 2, 12, 25, 26, 27, 28, 30,
31, 35, 37, 38, 39, 40 41
C5.0 Table 5 Table 5
[32]
2009
(1) QIIA1(Max value)
(2)QIIA2(Center Data)
5 23, 32, 10, 6 , 3 (1)
(2)
DR (1) 97.94
(2) 99.37
[33]
2009
Entropy-Based Scheme
with Chi-Square
5 5, 6, 31, 32, 36, 37 Chi-Square
Test
TPR 91%
[34]
2009
Mutual Information
based Algorithm
21 1, 3, 4, 5, 6, 8, 11, 12, 13,
23, 25, 26, 27, 28, 29, 30,
32, 33, 34, 36, 39
C4.5 &
SVM
DR
FAR
Process. Time
86.3
1.89
15.163s
[35]
2009
Proposed feature set
using CFSE and
CSE
7 3, 6, 12, 23, 32*, 14*, 40* BN Classification
Accuracy (%)
Normal-
99.8
DoS-99.9
Probe-89.4
R2L-91.5
U2R-69.2
[36]
2009
Based on DT, FNT
and PSO
5 10,17,14,13, 11 DT, FNT
and PSO
Detection
Accuracy
Table 6
max
^
xP
TxP
^
18. 18 Amrita & P Ahmed
[37]
2010
M01LPfrom CFS 3
6
1
2
Normal&Dos-5,6,12;
Normal&Probe-
5,6,12,29,37,41;
Normal&U2R-14;
Normal&R2L-10,22;
C4.5
BayesNet
Classification
Accuracy
99.41%
98.82%
[38]
2010
Inconsistency-based
feature selection
method
Table 7 Table 7 C4.5 Classification
Correctness
Time(s)
Table 7
[39]
2011
varGDLF 11 1, 5, 12, 15, 18, 21, 22, 29,
33, 38, 41
varGDLF Accuracy Rate
FPR
No of Comp.
85.2%
7.3%
4.95
[40]
2011
IIG(Improved
Information Gain)
12 2, 3, 5, 6, 8, 10, 12, 23, 25,
36, 37, 38
NB DR
FPR
Processing Time
96.801
1.02
2.08 s
*: Features that were selected based on domain knowledge.
Table 19: Summary of Wrapper Method
Lit.
Ref.
Method Name No of
Feature
Feature No Classifie
r Used
Evaluation
Criteria
Result
WRAPPERMETHOD
[41]
200
3
GA combination
with a k-nearest
neighbour
classifier
5 for
each
class
DoS-23,29,1,11,24;
R2U-24,3,12,23,36;
U2R-24,6,31,41,17;
Probe-2,37,30,3,6
KNN Detection
Accuracy
Increase
in ID
Accurac
y
[42]
200
3
PBRM and
SVDFRM
8 1,3,5,6,23,24,32,33 SVM Table 8 Table
8
[43]
200
5
ACO-SVM Table 9 Not Mentioned SVM Table 9 Table
9
[44]
200
7
PCA & MEP 8 14, 33,1, 5, 39, 3, 10, 12 GA DR
FPR
Table
10
[45]
200
7
MRMHC-SVMs Table
11
Table 11 SVM Table 11 Table
11
[46]
200
7
MOGFIDS 27 2(tcp,udp,icmp),5,6,7,8,9
,11,
12,13,14,17,18,22,23,25,
30,32,
33,34,35,36,37,38,39, 40
MOGFID
S
Accuracy
FPR
99.24
%
1.1%
[47]
200
8
Information Gain
and Chi-square
9 23, 32, 37, 33, 5, 24, 31,
39, 3
C4.5 &
BN
Table 12 Table
12
[48]
200
9
Modified RMHC
and modified
linear SVM
Table
13
Table 13 Decision
Tree
Table 13 Table 13
[49]
201
0
Features Selection
based on
Customized
Features
11 5, 6, 13, 23, 24, 25, 26,
33, 36, 37, 38
JRip, Ridor,
PART &
Decision
tree
DR
FAR
Increase
d
Decrease
d
[50]
201
1
GQPSO Table
14
Table 14 SVM Table 14 Table
14
[51]
201
2
GFR (Gradually
Feature Removal)
19 2,4,8,10,14,
15,19,25,27,
29,31,32,33,
34,35,36,37, 38,40
SVM Training time
(s)
Testing time
(s)
Accuracy (%)
MCCavg
0.118356
4.63227
98.6249
0.861161
[52]
201
2
A combined GA
and neurotree
method
16 2,3,4,5,6,8, 10,12,24,
25,29,35,36,37,38,40
Neurotre
e
DR 98.38
19. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 19
Table 20: Summary of Hybrid Method
Lit.
Ref.
Method Name No of
Feature
Feature No Classifie
r Used
Evaluation
Criteria
Result
HYBRIDMETHOD [53]
200
3
RBFNN-SM 8 2, 24, 23, 29, 32, 34, 33,
36
RBFNN Class. Acc.
FAR
FPR
99.415%
0.065%
0.935%
[54]
200
5
A fusion of CFS,
SVM & GA
12 1, 6, 12, 14, 23, 24, 25,
31, 32, 37, 40, 41
SVM DR
FPR
99.56%
37.5%
[7]
200
5
Markov blanket
model and
Decision Tree for
feature selection
17-BN
12-CART
{1,2,3,5,7,8,
11,12,14,17, 22,23,24,
25, 26,30,32};
{3,5,6,12,23,
24,25,28,31, 32,33,35}
Ensemble
of BN
and
CART
Accuracy
(%)
100% -
Normal,
DoS,Probe
84% -
U2R
99.47-R2L
[55]
200
7
C4.5-PCA-C4.5 5 33, 34, 4, 1, 3, 10, 22 C4.5 Testing
Time, TPR,
FPR
6 sec
-, -
[56]
200
7
RF 5 23, 6, 29, 3, 5 MPM DR
Avg Sim. Time
99.84%
0.1039
s
[57]
200
8
Information gain &
BN and C4.5
10 Table 15 BN &
C4.5
DR
FPR
Table
15
[58]
200
9
C4.5-Chi2 5 5, 3, 4, 8, 25 Enhanced
C4.5
Table 16 Table
16
[59]
200
9
RAIS - Not mentioned SVM DR
FAR
Accuracy
99.17%
3.49%
98.60%
[60]
200
9
ESVDF/FSR
ESVDF/BER
6
9
Not mentioned NN
SVM
Table 17 Table
17
[61]
201
1
GA/KNN Hybrid 19
28
Not Mentioned GA/KNN Accuracy
Rate
97.42%
78.00%
CONCLUSIONS & FUTURE RESEARCH DIRECTIONS
Intrusion Detection Systems (IDS) have become vital and a necessary component of almost
every computer and network security. As network speed becomes faster, there is an emerge need for IDS
to be lightweight, efficient and accurate with high detection rates (DR) and low false positive rates
(FAR). Other difficulties faced by intrusion detection systems are curse of feature dimensionality and
emerging data complexities. Therefore, feature selection has become very important part in intrusion
detection systems due to curse of feature dimensionality and emerging data complexities. Feature
selection selects a subset of relevant features, removes irrelevant and redundant features from the dataset
to build robust, efficient, accurate and lightweight intrusion detection system to ensure timeliness for real
time.
A plenty of feature selection methods have been proposed by researchers in intrusion detection
system to deal with these problems. This paper has presented to survey this fast developing field and
addresses the main contribution of feature selection research proposed for intrusion detection. We
showed that why feature selection method is vital in IDS. We surveyed the existing feature selection
methods for IDS categorised as filter, wrapper and hybrid. We also presented the performance of these
methods based on different metric on KDD Cup’99 dataset, mentioned extracted feature set and classifier
20. 20 Amrita & P Ahmed
to evaluate these extracted feature set, strength, limitation and future work of these proposed method in
section 5 and 6. The following are useful future research issues:
FUTURE RESEARCH
Single classifier for evaluation of the extracted feature set may be no longer good solution for
building the robust IDS. Therefore, designing more sophisticated classifiers by combining multiple
classifiers or combining ensemble [7] and hybrid classifiers may enhance the robustness and
performance of IDS.
After comparing the existing feature selection methods in intrusion detection, we discovered that
finding an optimal and best feature set still needs to be researched.
Feature selection algorithms always need improvement on search strategy and evaluation criterion for
building efficient and lightweight intrusion detection system.
Robustness of the extracted feature can be enhanced by using ensemble of feature selection methods,
combined with appropriate evaluation criteria.
After surveying these many feature selection methods, we cannot say that which method perform the
best under which classifier for intrusion detection (to the best of our knowledge).
Most of the proposed method works on two-class classification (normal and attack type) (to the best
of our knowledge). Very little work has been done on multiple class classification (five-class four
classes of attack and one class of normal) [62][63]. Therefore, the research in many papers can be further
extended in the future on multiple class classification.
Classes in KDD Cup’99 are unbalanced in both training and test sets as it can be seen in Table 1.
Normal and DoS classes have enough instances, whereas Probe and R2L have small instances,
particularly U2R. These classes (Probe, R2L, U2R) have not good classification rate due to small number
of instances in training set [56][31][39]. So, this is future research to develop the method combined with
appropriate evaluation criteria to alleviate the small instance of dataset.
We can conclude that there are features that really significant in classifying the normal and attacks
type as reported in literature. Also, there is no specific generic classifier that can best classify all the
attack types as seen in this survey. Different researchers use different classifier to evaluate the feature set.
This paper systematically summarized the contributions of each researcher and also projected the number
of significant research problem in this field. We hope that this survey will provide useful insights, broad
overview and new research directions about this field to the readers.
REFERENCES
[1] Mitra, P. et al. (2002). Unsupervised Feature Selection Using Feature Similarity. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 24, 301–312
21. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 21
[2] Anderson, J. P. (1980). Computer security threat monitoring and surveillance. Technical Report
98-17, James P. Anderson Co., Fort Washington, Pennsylvania, USA
[3] Denning, D. E. (1987). An intrusion detection model. IEEE Transaction on Software
Engineering, Software Engineering 13(2), 222-232
[4] Wu, S.X. & Banzhaf, W. (2010). The use of computational intelligence in intrusion detection
systems: A review. Applied Soft Computing Journal, 10, 1–35
[5] Lazarevic, A., Ertoz, L., Kumar V., Ozgur A. & Srivastava J. (2003). A comparative study of
anomaly detection schemes in network intrusion detection. In Proc. of the SIAM Conference on Data
Mining
[6] Kumar, S. & Spafford, E. H. (1994). A pattern matching model for misuse intrusion detection. In
Proceedings of the 17th National Computer Security Conference, 11-21
[7] Chebrolu, S. et al. (2005). Feature deduction and ensemble design of intrusion detection systems.
Computer Security, 24( 4), 295–307
[8] Yeung, D.Y. & Ding, Y. (2003). Host-based intrusion detection using dynamic and static
behavioral models. Pattern Recognition, 36, 229-243
[9] sKDD Cup 1999 Intrusion detection dataset:
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[10] Mukkamala, S. et al. (2005). Intrusion detection using an ensemble of intelligent paradigms.
Journal of Network and Computer Applications, 28(2), 167–82
[11] Lewis, P. M. (1962). The characteristic selection problem in recognition system. IRE
Transaction on Information Theory, 8, 171-178
[12] John, G.H. et al. (1994). Irrelevant Features and the Subset Selection Problem. Proc. of the 11th
Int. Conf. on Machine Learning, Morgan Kaufmann Publishers, 121-129
[13] Dash, M. & Liu, H. (1997). Feature Selection for Classification. Intelligent Data Analysis, 1(3),
131–56
[14] Blum, Avrim L. & Pat Langley (1997). Selection of relevant features and examples in machine
learning. Artificial Intelligence, 97(1-2), 245–271
[15] Dash, M. et al. (2002). Feature Selection for Clustering-a Filter Solution. Proc. 2nd Int’l Conf.
Data Mining, 115-122
[16] Włodzisław, W. Tomasz et al. (2003). Feature Selection and Ranking Filters.
[17] Das, S. (2001). Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection. Proc. 18th
Int’l Conf. Machine Learning, 74-81
22. 22 Amrita & P Ahmed
[18] Liu, H. & Yu, L. (2005). Towards integrating feature selection algorithms for classification and
clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502
[19] R. Kohavi and G.H. John (1997). Wrappers for Feature Subset Selection. Artificial Intelligence.
97 (1-2), 273-324
[20] Xing, E. et al. (2001). Feature Selection for High-Dimensional Genomic Microarray Data. Proc.
15th Int’l Conf.Machine Learning, 601-608
[21] Zhang, L. et al. (2004). Feature Selection for Pattern Classification Problems. Proceedings of
the Fourth International Conference on Computer and Information Technology (CIT’04)
[22] Kuchimanchi, Gopi K. et al. (2004). Dimension Reduction Using Feature Extraction Methods
for Real-time Misuse Detection Systems. Proceedings of the 2004 IEEE Workshop on Information
Assurance and Security United States Military Academy, West Point, NY, 195-202
[23] Zhu, Y. et al. (2005). Modified Genetic Algorithm based Feature Subset Selection in Intrusion
Detection System. Proceedings of ISCIT 2005, 9-12
[24] Zainal, A. et al. (2006). Feature selection using rough set in intrusion detection. In Proc. IEEE
TENCON, 1-4
[25] Wong, Wai-Tak & Lai, Cheng-Yang (2006). Identifying Important Features for Intrusion
Detection Using Discriminant Analysis and Support Vector Machine. Proceedings of the Fifth
International Conference on Machine Learning and Cybernetics, Dalian, 3563-3567
[26] Yang, L. et al. (2006). A Lightweight Intrusion Detection Model Based on Feature Selection and
Maximum Entropy Model. International Conference on Communication Technology (ICCT '06), 1-4
[27] Tamilarasan, A. et al. (2006). Feature Ranking and Selection for Intrusion Detection Using
Artificial Neural Networks and Statistical Methods. Int’l Joint Conf. on Neural Networks (IJCNN’06),
4754-4761
[28] Fadaeieslam, M. J.et al. (2007). Comparison of two feature selection methods in Intrusion
Detection Systems. Seventh International Conference on Computer and Information Technology, 83-86
[29] Sheen, Shina & Rajesh, R. (2008). Network Intrusion Detection using Feature Selection and
Decision tree classifier. IEEE Region 10 Conference, TENCON 2008, 1-4.
[30] Kiziloren, T. & Germen, E. (2009).Anomaly Detection with Self-Organizing Maps and Effects
of Principal Component Analysis on Feature Vectors. Fifth Int’l Conf. on Natural Computation, 509-513
[31] Suebsing, A. & Hiransakolwong, N. (2009). Feature Selection Using Euclidean Distance and
Cosine Similarity for Intrusion Detection Model. Asian Conf. on Intelligent Info. and Database Systems,
86-91
23. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 23
[32] Lee, S. M.et al. (2009). Quantitative Intrusion Intensity Assessment using Important Feature
Selection and Proximity Metrics. 15th IEEE Pacific Rim Int’l Symposium on Dependable Computing,
127-134
[33] Lee, Tsern-Huei & He, Jyun-De (2009). Entropy-Based Profiling of Network Traffic for
Detection of Security Attack. TENCON, 1-5
[34] Xiao, L. et al. (2009). A Two-step Feature Selection Algorithm Adapting to Intrusion Detection.
International Joint Conference on Artificial Intelligence, 618-622
[35] Kok-Chin Khor et al. (2009). From Feature Selection to Building of Bayesian Classifiers: A
Network Intrusion Detection Perspective. American Journal of Applied Sciences, 6 (11), 1948-1959
[36] Bahrololum, M. et al. (2009). Machine Learning Techniques for Feature Reduction in Intrusion
Detection Systems: A Comparison. Fourth International Conference on Computer Sciences and
Convergence Information Technology (ICCIT), 2009, Pp. 1091-1095.
[37] Nguyen, H. et al. (2010). Improving Effectiveness of Intrusion Detection by Correlation Feature
Selection. 2010 International Conference on Availability, Reliability and Security, 17-24
[38] Chen, T. et al. (2010). A Naive Feature Selection Method and Its Application in Network
Intrusion Detection. 2010 International Conference on Computational Intelligence and Security (CIS),
416-420.
[39] Fan, W. et al. (2011). Unsupervised Anomaly Intrusion Detection via Localized Bayesian
Feature Selection. 2011 11th IEEE International Conference on Data Mining, 1032-1937
[40] Xian, J. et al. (2011). An Algorithm Application in Intrusion Forensics Based on Improved
Information Gain. Web Society (SWS), 3rd Symposium on Date of Conference, 100-104
[41] Middlemiss, Melanie J. & Dick, G. (2003). Weighted Feature Extraction using a Genetic
Algorithm for Intrusion Detection, IEEE, 1669- 1675
[42] Mukkamala, S. & Sung, A. H. (2003). Feature Selection for Intrusion Detection Using Neural
Networks and Support Vector Machines. Journal of the Transportation Research Board of the National
Academics, Transportation Research Record No 1822, 33-39
[43] Gao, Hai-Hua et al. (2005). Ant Colony Optimization based network intrusion feature selection
and detection. Proc. of the Fourth Int’l Conf. on Machine Learning and Cybernetics, Guangzhou, 3871-
75
[44] Banković, Z. et al. (2007). Increasing Detection Rate of User-to-Root Attacks Using Genetic
Algorithms. Int’l Conf. on Emerging Security Information, Systems and Technologies, 48-53
[45] Chen,Y. Et al. (2007). Toward Building Lightweight Intrusion Detection System Through
Modified RMHC and SVM. ICON, 83-88
24. 24 Amrita & P Ahmed
[46] CHi-Ho Tsang et al. (2007). Genetic-fuzzy rule mining approach and evaluation of feature
selection techniques for anomaly intrusion detection. Pattern Recognition, 40, 2373-2391.
[47] Wang, W. & Gombault, S. (2008). Efficient Detection of DDoS Attacks with Important
Attributes. Third International Conference on Risks and Security of Internet and Systems: CRiSIS’2008,
61-67
[48] Li, Y. et al. (2009). Building lightweight intrusion detection system using wrapper-based feature
selection mechanisms. Computers and security, 28(6), 466–75
[49] Zulaiha, A.O. et al.(2010).Improving Signature Detection Classification Model Using Features
Selection based on Customized Features.10th Int’l Conf. on Intelligent Systems Design and
Applications,1026-31
[50] Gong, S. (2011). Feature Selection Method for Network Intrusion Based on GQPSO Attribute
Reduction. International Conference on Multimedia Technology (ICMT), 6365 - 6368
[51] Li, Y. et al. (2012). An efficient intrusion detection system based on support vector machines
and gradually feature removal method. Expert Systems with Applications, 39, 424–430
[52] Sindhu, Siva S. et al. (2012). Decision tree based light weight intrusion detection using a
wrapper approach. Expert Systems with Applications, 39, 129–141
[53] Wing, W.Y. NG et al.(2003).Dimensionality Reduction for Denial of Service Detection
Problems using RBFNN Output Sensitivity.Proc.of 2nd Int’l Conf. on Machine Learning and
Cybernetics, Wan, 1293-98
[54] Shazzad, K. M. & Park, J. S. (2005). Optimization of Intrusion Detection through Fast Hybrid
Feature Selection. Proc.of 6th Int’l Conf. on Parallel and Distributed Computing, Applications and
Technologies
[55] Chen, Y. et al. (2007). Building Lightweight Intrusion Detection System Based on Principal
Component Analysis and C4.5 Algorithm. ICACT2007, 2109-2112
[56] Lee, S. M. et al. (2007). A Hybrid Approach for Real-Time Network Intrusion Detection
Systems. International Conference on Computational Intelligence and Security, 712-715
[57] Wang, W.et al. (2008). Towards fast detecting intrusions: using key attributes of network traffic.
The Third International Conference on Internet Monitoring and Protection, 86-91
[58] Hong, D. & Haibo, L. (2009). A Lightweight Network Intrusion Detection Model Based on
Feature Selection. 15th IEEE Pacific Rim International Symposium on Dependable Computing, 165-168
[59] Xiang,C. et al. (2009). Robust Observation Selection for Intrusion detection. Sixth
International Conference on Fuzzy Systems and Knowledge Discovery, 269-272
25. A Study of Feature Selection Methods in Intrusion Detection System: A Survey 25
[60] Zaman, S. & Karray, F. (2009). Features Selection for Intrusion Detection Systems Based on
Support Vector Machines. 6th IEEE Consumer Communications and Networking Conference (CCNC),
1- 8
[61] Ming-Yang Su (2011). Real-time anomaly detection systems for Denial-of-Service attacks by
weighted k-nearest-neighbor classifiers. Expert Systems with Applications, 38, 3492–3498
[62] Bruzzone, L. & Serpico, S. B. (2000). A technique for feature selection in multiclass problems.
International Journal of Remote Sensing, 21(3), 549–563
[63] Chiblovskii, B., & Lecerf, L. (2008). Scalable feature selection for multiclass problems. In Proc. of
the European conf. on machine learning and knowledge discovery in databases (ECML PKDD’08), 227