This document discusses feature selection techniques for intrusion detection systems. It begins with background on intrusion detection and challenges related to large datasets. It then describes three common feature selection algorithms: Correlation-based Feature Selection (CFS), Information Gain (IG), and Gain Ratio (GR). The document proposes a fusion model that applies these three algorithms to select features, then uses genetic algorithm and naive Bayes classification to evaluate performance. It conducted experiments on the KDD Cup 99 intrusion detection dataset to compare the proposed fusion method to the individual feature selection algorithms.
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
Survey of network anomaly detection using markov chainijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMcscpconf
The purpose of this paper is to describe two objective fuzzy genetics-based learning algorithms
and discusses its usage to detect intrusion in a computer network. Experiments were performed
with KDD-cup data set, which have information on computer networks, during normal behavior
and intrusive behavior. The performance of final fuzzy classification system has been
investigated using intrusion detection problem as a high dimensional classification problem.
This task is formulated as optimization problem with two objectives: To minimize the number of
fuzzy rules and to maximize the classification rate. We show a two-objective genetic algorithm
for finding non-dominated solutions of the fuzzy rule selection problem
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
Visualize network anomaly detection by using k means clustering algorithmIJCNCJournal
With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing,
and because of the base-rate fallacy the amount of false alarms will also increase. Another problem with
detection of attacks is that they usually isn’t detected until after the attack has taken place, this makes
defending against attacks hard and can easily lead to disclosure of sensitive information.
In this paper we choose K-means algorithm with the Kdd Cup 1999 network data set to evaluate the
performance of an unsupervised learning method for anomaly detection. The results of the evaluation
showed that a high detection rate can be achieve while maintaining a low false alarm rate .This paper
presents the result of using k-means clustering by applying Cluster 3.0 tool and visualized this result by
using TreeView visualization tool .
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...IJMER
Intrusion Detection System (IDS) plays a major role in the provision of effective security to various types of networks. Moreover, Intrusion Detection System for networks need appropriate rule set for classifying network bench mark data into normal or attack patterns. Generally, each dataset is characterized by a large set of features. However, all these features will not be relevant or fully contribute in identifying an attack. Since different attacks need various subsets to provide better detection accuracy. In this paper an improved feature selection algorithm is proposed to identify the most appropriate subset of features for detecting a certain attacks. This proposed method is based on Minkowski distance feature ranking and an improved exhaustive search that selects a better combination of features. This system has been evaluated using the KDD CUP 1999 dataset and also with EMSVM [1] classifier. The experimental results show that the proposed system provides high classification accuracy and low false alarm rate when applied on the reduced feature subsets
Survey of network anomaly detection using markov chainijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
COMPUTER INTRUSION DETECTION BY TWOOBJECTIVE FUZZY GENETIC ALGORITHMcscpconf
The purpose of this paper is to describe two objective fuzzy genetics-based learning algorithms
and discusses its usage to detect intrusion in a computer network. Experiments were performed
with KDD-cup data set, which have information on computer networks, during normal behavior
and intrusive behavior. The performance of final fuzzy classification system has been
investigated using intrusion detection problem as a high dimensional classification problem.
This task is formulated as optimization problem with two objectives: To minimize the number of
fuzzy rules and to maximize the classification rate. We show a two-objective genetic algorithm
for finding non-dominated solutions of the fuzzy rule selection problem
Intrusion Detection System Based on K-Star Classifier and Feature Set ReductionIOSR Journals
Abstract: Network security and Intrusion Detection Systems (IDS’s) is an important security related research
area. This paper applies K-star algorithm with filtering analysis in order to build a network intrusion detection
system. For our experimental analysis and as a case study, we have used the new NSL-KDD dataset, which is a
modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 66.0% for the
training set and the remainder for the testing set a 2 class classifications has been implemented. WEKA which is
a java based open source software consists of a collection of machine learning algorithms for Data mining tasks
has been used in the testing process. The experimental results show that the proposed approach is very accurate
with low false positive rate and high true positive rate and it takes less learning time in comparison with other
existing approaches used for efficient network intrusion detection.
Keywords: Information Gain, Intrusion Detection System, Instance-based classifier, K-Star, Weka.
Visualize network anomaly detection by using k means clustering algorithmIJCNCJournal
With the ever increasing amount of new attacks in today’s world the amount of data will keep increasing,
and because of the base-rate fallacy the amount of false alarms will also increase. Another problem with
detection of attacks is that they usually isn’t detected until after the attack has taken place, this makes
defending against attacks hard and can easily lead to disclosure of sensitive information.
In this paper we choose K-means algorithm with the Kdd Cup 1999 network data set to evaluate the
performance of an unsupervised learning method for anomaly detection. The results of the evaluation
showed that a high detection rate can be achieve while maintaining a low false alarm rate .This paper
presents the result of using k-means clustering by applying Cluster 3.0 tool and visualized this result by
using TreeView visualization tool .
Outstanding to the promotion of the Internet and local networks, interruption occasions to computer
systems are emerging. Intrusion detection systems are becoming progressively vital in retaining
appropriate network safety. IDS is a software or hardware device that deals with attacks by gathering
information from a numerous system and network sources, then evaluating signs of security complexities.
Enterprise networked systems are unsurprisingly unprotected to the growing threats posed by hackers as
well as malicious users inside to a network. IDS technology is one of the significant tools used now-a-days,
to counter such threat. In this research we have proposed framework by using advance feature selection
and dimensionality reduction technique we can reduce IDS data then applying Fuzzy ARTMAP classifier
we can find intrusions so that we get accurate results within less time. Feature selection, as an active
research area in decreasing dimensionality, eliminating unrelated data, developing learning correctness,
and improving result unambiguousness.
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
Day today life internet threat has been increased significantly. There is a need to develop model in order to
maintain security of system. The most effective techniques are Intrusion Detection System (IDS).The
purpose of intrusion system through the security devices detect and deal with it. In this paper, a
mathematical approach is used effectively to predict and detect intrusion in the network. Here we discuss
about two algorithms ‘K-Means + Apriori’, a method which classify normal and abnormal activities in
computer network. In K-Means process, it partitions the training set into K-clusters using Euclidean
distance and introduce an outlier factor, then it build Apriori Algorithm to prune the data by removing
infrequent data in the database. Based on defined state the degree of incoming data is evaluated through
the experiment using sample DARPA2000 dataset, and achieves high detection performance in level of
attack in stages.
Classification Rule Discovery Using Ant-Miner Algorithm: An Application Of N...IJMER
Enormous studies on intrusion detection have widely applied data mining techniques to
finding out the useful knowledge automatically from large amount of databases, while few studies have
proposed classification data mining approaches. In an actual risk assessment process, the discovery of
intrusion detection prediction knowledge from experts is still regarded as an important task because
experts’ predictions depend on their subjectivity. Traditional statistical techniques and artificial
intelligence techniques are commonly used to solve this classification decision making. This paper
proposes an ant-miner based data mining method for discovering network intrusion detection rules from
large dataset. The obtained result of this experiment shows that clearly the ant-miner is superior than
ID3, J48, ADtree, BFtree, Simple cart. Although different classification models have been developed for
network intrusion detection, each of them has its strength and weakness, including the most commonly
applied Support Vector Machine(SVM)method and the clustering based on Self Organized Ant Colony
Network (CSOACN).Our algorithm is implemented and evaluated using a standard bench mark KDD99
dataset. Experiments show that ant-miner algorithm out performs than other methods in terms of both
classification rate and accuracy
A NOVEL INTRUSION DETECTION MODEL FOR MOBILE AD-HOC NETWORKS USING CP-KNNIJCNCJournal
Mobile ad-hoc network security problems are the subject of in depth analysis. A group of mobile nodes area unit connected to a set wired backbone. In MANET, the node themselves implement the network management in a very cooperative fashion. All the nodes area unit accountable to create a constellation that is dynamically, modification it and conjointly the absence of any clear network boundaries. We tend to project a completely unique intrusion detection model for mobile ad-hoc network victimization. CP-KNN (Conformal Prediction K-Nearest Neighbor) algorithmic rule is to classify the audit knowledge for anomaly detection. The non-conformity score worth is employed to cut back the classification period of time for multi level iteration. It is effectively notice anomalies with high true positive rate, low false positive rate and high confidence that the progressive of assorted anomaly detection ways. Additionally it is interfered
by “noisy” knowledge (unclean data), the projected technique is strong, effective and conjointly it retains
its smart detection performance and to avoid the abnormal activity.
Intrusion detection with Parameterized Methods for Wireless Sensor Networksrahulmonikasharma
Current network intrusion detection systems lack adaptability to the frequently changing network environments. Furthermore, intrusion detection in the new distributed architectures is now a major requirement. In this paper, we propose two Adaboost based intrusion detection algorithms. In the first algorithm, a traditional online Adaboost process is used where decision stumps are used as weak classifiers. In the second algorithm, an improved online Adaboost process is proposed, and online Gaussian mixture models (GMMs) are used as weak classifiers. We further propose a distributed intrusion detection framework, in which a local parameterized detection model is constructed in each node using the online Adaboost algorithm. A global detection model is constructed in each node by combining the local parametric models using a small number of samples in the node. This combination is achieved using an algorithm based on particle swarm optimization (PSO) and support vector machines. The global model in each node is used to detect intrusions. Experimental results show that the improved online Adaboost process with GMMs obtains a higher detection rate and a lower false alarm rate than the traditional online Adaboost process that uses decision stumps. Both the algorithms outperform existing intrusion detection algorithms. It is also shown that our PSO, and SVM-based algorithm effectively combines the local detection models into the global model in each node; the global model in a node can handle the intrusion types that are found in other nodes, without sharing the samples of these intrusion types.
AN EFFICIENT INTRUSION DETECTION SYSTEM WITH CUSTOM FEATURES USING FPA-GRADIE...IJCNCJournal
An efficient Intrusion Detection System has to be given high priority while connecting systems with a network to prevent the system before an attack happens. It is a big challenge to the network security group to prevent the system from a variable types of new attacks as technology is growing in parallel. In this paper, an efficient model to detect Intrusion is proposed to predict attacks with high accuracy and less false-negative rate by deriving custom features UNSW-CF by using the benchmark intrusion dataset UNSW-NB15. To reduce the learning complexity, Custom Features are derived and then Significant Features are constructed by applying meta-heuristic FPA (Flower Pollination algorithm) and MRMR (Minimal Redundancy and Maximum Redundancy) which reduces learning time and also increases prediction accuracy. ENC (ElasicNet Classifier), KRRC (Kernel Ridge Regression Classifier), IGBC (Improved Gradient Boosting Classifier) is employed to classify the attacks in the datasets UNSW-CF, UNSW and recorded that UNSW-CF with derived custom features using IGBC integrated with FPA provided high accuracy of 97.38% and a low error rate of 2.16%. Also, the sensitivity and specificity rate for IGB attains a high rate of 97.32% and 97.50% respectively.
an error in that computer program. In order to improve the software quality, prediction of faulty modules is
necessary. Various Metric suites and techniques are available to predict the modules which are critical and
likely to be fault prone. Genetic Algorithm is a problem solving algorithm. It uses genetics as its model of
problem solving. It’s a search technique to find approximate solutions to optimization and search
problems.Genetic algorithm is applied for solving the problem of faulty module prediction and as well as
for finding the most important attribute for fault occurrence. In order to perform the analysis, performance
validation of the Genetic Algorithm using open source software jEdit is done. The results are measured in
terms Accuracy and Error in predicting by calculating probability of detection and probability of false
Alarms
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...gerogepatton
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So,these result in model not able to efficiently learn the characteristics of rare categories and this will result in
poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
Outstanding to the promotion of the Internet and local networks, interruption occasions to computer
systems are emerging. Intrusion detection systems are becoming progressively vital in retaining
appropriate network safety. IDS is a software or hardware device that deals with attacks by gathering
information from a numerous system and network sources, then evaluating signs of security complexities.
Enterprise networked systems are unsurprisingly unprotected to the growing threats posed by hackers as
well as malicious users inside to a network. IDS technology is one of the significant tools used now-a-days,
to counter such threat. In this research we have proposed framework by using advance feature selection
and dimensionality reduction technique we can reduce IDS data then applying Fuzzy ARTMAP classifier
we can find intrusions so that we get accurate results within less time. Feature selection, as an active
research area in decreasing dimensionality, eliminating unrelated data, developing learning correctness,
and improving result unambiguousness.
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
Day today life internet threat has been increased significantly. There is a need to develop model in order to
maintain security of system. The most effective techniques are Intrusion Detection System (IDS).The
purpose of intrusion system through the security devices detect and deal with it. In this paper, a
mathematical approach is used effectively to predict and detect intrusion in the network. Here we discuss
about two algorithms ‘K-Means + Apriori’, a method which classify normal and abnormal activities in
computer network. In K-Means process, it partitions the training set into K-clusters using Euclidean
distance and introduce an outlier factor, then it build Apriori Algorithm to prune the data by removing
infrequent data in the database. Based on defined state the degree of incoming data is evaluated through
the experiment using sample DARPA2000 dataset, and achieves high detection performance in level of
attack in stages.
Classification Rule Discovery Using Ant-Miner Algorithm: An Application Of N...IJMER
Enormous studies on intrusion detection have widely applied data mining techniques to
finding out the useful knowledge automatically from large amount of databases, while few studies have
proposed classification data mining approaches. In an actual risk assessment process, the discovery of
intrusion detection prediction knowledge from experts is still regarded as an important task because
experts’ predictions depend on their subjectivity. Traditional statistical techniques and artificial
intelligence techniques are commonly used to solve this classification decision making. This paper
proposes an ant-miner based data mining method for discovering network intrusion detection rules from
large dataset. The obtained result of this experiment shows that clearly the ant-miner is superior than
ID3, J48, ADtree, BFtree, Simple cart. Although different classification models have been developed for
network intrusion detection, each of them has its strength and weakness, including the most commonly
applied Support Vector Machine(SVM)method and the clustering based on Self Organized Ant Colony
Network (CSOACN).Our algorithm is implemented and evaluated using a standard bench mark KDD99
dataset. Experiments show that ant-miner algorithm out performs than other methods in terms of both
classification rate and accuracy
A NOVEL INTRUSION DETECTION MODEL FOR MOBILE AD-HOC NETWORKS USING CP-KNNIJCNCJournal
Mobile ad-hoc network security problems are the subject of in depth analysis. A group of mobile nodes area unit connected to a set wired backbone. In MANET, the node themselves implement the network management in a very cooperative fashion. All the nodes area unit accountable to create a constellation that is dynamically, modification it and conjointly the absence of any clear network boundaries. We tend to project a completely unique intrusion detection model for mobile ad-hoc network victimization. CP-KNN (Conformal Prediction K-Nearest Neighbor) algorithmic rule is to classify the audit knowledge for anomaly detection. The non-conformity score worth is employed to cut back the classification period of time for multi level iteration. It is effectively notice anomalies with high true positive rate, low false positive rate and high confidence that the progressive of assorted anomaly detection ways. Additionally it is interfered
by “noisy” knowledge (unclean data), the projected technique is strong, effective and conjointly it retains
its smart detection performance and to avoid the abnormal activity.
Intrusion detection with Parameterized Methods for Wireless Sensor Networksrahulmonikasharma
Current network intrusion detection systems lack adaptability to the frequently changing network environments. Furthermore, intrusion detection in the new distributed architectures is now a major requirement. In this paper, we propose two Adaboost based intrusion detection algorithms. In the first algorithm, a traditional online Adaboost process is used where decision stumps are used as weak classifiers. In the second algorithm, an improved online Adaboost process is proposed, and online Gaussian mixture models (GMMs) are used as weak classifiers. We further propose a distributed intrusion detection framework, in which a local parameterized detection model is constructed in each node using the online Adaboost algorithm. A global detection model is constructed in each node by combining the local parametric models using a small number of samples in the node. This combination is achieved using an algorithm based on particle swarm optimization (PSO) and support vector machines. The global model in each node is used to detect intrusions. Experimental results show that the improved online Adaboost process with GMMs obtains a higher detection rate and a lower false alarm rate than the traditional online Adaboost process that uses decision stumps. Both the algorithms outperform existing intrusion detection algorithms. It is also shown that our PSO, and SVM-based algorithm effectively combines the local detection models into the global model in each node; the global model in a node can handle the intrusion types that are found in other nodes, without sharing the samples of these intrusion types.
AN EFFICIENT INTRUSION DETECTION SYSTEM WITH CUSTOM FEATURES USING FPA-GRADIE...IJCNCJournal
An efficient Intrusion Detection System has to be given high priority while connecting systems with a network to prevent the system before an attack happens. It is a big challenge to the network security group to prevent the system from a variable types of new attacks as technology is growing in parallel. In this paper, an efficient model to detect Intrusion is proposed to predict attacks with high accuracy and less false-negative rate by deriving custom features UNSW-CF by using the benchmark intrusion dataset UNSW-NB15. To reduce the learning complexity, Custom Features are derived and then Significant Features are constructed by applying meta-heuristic FPA (Flower Pollination algorithm) and MRMR (Minimal Redundancy and Maximum Redundancy) which reduces learning time and also increases prediction accuracy. ENC (ElasicNet Classifier), KRRC (Kernel Ridge Regression Classifier), IGBC (Improved Gradient Boosting Classifier) is employed to classify the attacks in the datasets UNSW-CF, UNSW and recorded that UNSW-CF with derived custom features using IGBC integrated with FPA provided high accuracy of 97.38% and a low error rate of 2.16%. Also, the sensitivity and specificity rate for IGB attains a high rate of 97.32% and 97.50% respectively.
an error in that computer program. In order to improve the software quality, prediction of faulty modules is
necessary. Various Metric suites and techniques are available to predict the modules which are critical and
likely to be fault prone. Genetic Algorithm is a problem solving algorithm. It uses genetics as its model of
problem solving. It’s a search technique to find approximate solutions to optimization and search
problems.Genetic algorithm is applied for solving the problem of faulty module prediction and as well as
for finding the most important attribute for fault occurrence. In order to perform the analysis, performance
validation of the Genetic Algorithm using open source software jEdit is done. The results are measured in
terms Accuracy and Error in predicting by calculating probability of detection and probability of false
Alarms
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...gerogepatton
Network intrusion detection often finds a difficulty in creating classifiers that could handle unequal distributed attack categories. Generally, attacks such as Remote to Local (R2L) and User to Root (U2R) attacks are very rare attacks and even in KDD dataset, these attacks are only 2% of overall datasets. So,these result in model not able to efficiently learn the characteristics of rare categories and this will result in
poor detection rates of rare attack categories like R2L and U2R attacks. We even compared the accuracy of KDD and NSL-KDD datasets using different classifiers in WEKA.
Data Mining Techniques for Providing Network Security through Intrusion Detec...IJAAS Team
Intrusion Detection Systems are playing major role in network security in this internet world. Many researchers have been introduced number of intrusion detection systems in the past. Even though, no system was detected all kind of attacks and achieved better detection accuracy. Most of the intrusion detection systems are used data mining techniques such as clustering, outlier detection, classification, classification through learning techniques. Most of the researchers have been applied soft computing techniques for making effective decision over the network dataset for enhancing the detection accuracy in Intrusion Detection System. Few researchers also applied artificial intelligence techniques along with data mining algorithms for making dynamic decision. This paper discusses about the number of intrusion detection systems that are proposed for providing network security. Finally, comparative analysis made between the existing systems and suggested some new ideas for enhancing the performance of the existing systems.
COPYRIGHTThis thesis is copyright materials protected under the .docxvoversbyobersby
COPYRIGHT
This thesis is copyright materials protected under the Berne Convection, the copyright Act 1999 and other international and national enactments in that behalf, on intellectual property. It may not be reproduced by any means in full or in part except for short extracts in fair dealing so for research or private study, critical scholarly review or discourse with acknowledgment, with written permission of the Dean School of Graduate Studies on behalf of both the author and XXX XXX University.ABSTRACT
With Fast growing internet world the risk of intrusion has also increased, as a result Intrusion Detection System (IDS) is the admired key research field. IDS are used to identify any suspicious activity or patterns in the network or machine, which endeavors the security features or compromise the machine. IDS majorly use all the features of the data. It is a keen observation that all the features are not of equal relevance for the detection of attacks. Moreover every feature does not contribute in enhancing the system performance significantly. The main aim of the work done is to develop an efficient denial of service network intrusion classification model. The specific objectives included: to analyse existing literature in intrusion detection systems; what are the techniques used to model IDS, types of network attacks, performance of various machine learning tools, how are network intrusion detection systems assessed; to find out top network traffic attributes that can be used to model denial of service intrusion detection; to develop a machine learning model for detection of denial of service network intrusion.Methods: The research design was experimental and data was collected by simulation using NSL-KDD dataset. By implementing Correlation Feature Selection (CFS) mechanism using three search algorithms, a smallest set of features is selected with all the features that are selected very frequently. Findings: The smallest subset of features chosen is the most nominal among all the feature subset found. Further, the performances using Artificial neural networks(ANN), decision trees, Support Vector Machines (SVM) and K-Nearest Neighbour (KNN) classifiers is compared for 7 subsets found by filter model and 41 attributes. Results: The outcome indicates a remarkable improvement in the performance metrics used for comparison of the two classifiers. The results show that using 17/18 selected features improves DOS types classification accuracies as compared to using the 41 features in the NSL-KDD dataset. It was further observed that using an ensemble of three classifiers with decision fusion performs better as compared to using a single classifier for DOS type’s classification. Among machine learning tools experimented, ANN achieved best classification accuracies followed by SVM and DT. KNN registered the lowest classification accuracies. Application: The proposed work with such an improved detection rate and lesser classification time and lar.
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAINijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
International Journal of Computer Science, Engineering and Information Techno...ijcseit
Recently an internet threat has been increased. Our motive is detect the intrusion in the network in concise.
The real time issue such as DoS attack in banking, companies, industries and organization have been
increased significantly IDS has been used in both server and host side. The major challenge is to effectively
predict the periods of threats and protect the server from the unauthorized user. In this study, a novel
probabilistic approach is proposed effectively to detect the network intrusions. It uses a Markov chain for
probabilistic modelling of abnormal events in network systems. The degree of abnormality of the incoming
data is performed on the basis of the network states.
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion. Our observations confirm the conjecture
that both the feature selection and stochastic based genetic operators improves the accuracy and the
effectiveness. The training time is shown to be reduced tremendously by 98.59% and accuracy improved to
98.75%.
Attack Detection Availing Feature Discretion using Random Forest ClassifierCSEIJJournal
The widespread use of the Internet has an adverse effect of being vulnerable to cyber attacks. Defensive
mechanisms like firewalls and IDSs have evolved with a lot of research contributions happening in these
areas. Machine learning techniques have been successfully used in these defense mechanisms especially
IDSs. Although they are effective to some extent in identifying new patterns and variants of existing
malicious patterns, many attacks are still left as undetected. The objective is to develop an algorithm for
detecting malicious domains based on passive traffic measurements. In this paper, an anomaly-based
intrusion detection system based on an ensemble based machine learning classifier called Random Forest
with gradient boosting is deployed. NSL-KDD cup dataset is used for analysis and out of 41 features, 32
features were identified as significant using feature discretion.
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...IJNSA Journal
Intrusion Detection Systems (IDS) form a key part of system defence, where it identifies abnormal
activities happening in a computer system. In recent years different soft computing based techniques have
been proposed for the development of IDS. On the other hand, intrusion detection is not yet a perfect
technology. This has provided an opportunity for data mining to make quite a lot of important
contributions in the field of intrusion detection. In this paper we have proposed a new hybrid technique
by utilizing data mining techniques such as fuzzy C means clustering, Fuzzy neural network / Neurofuzzy and radial basis function(RBF) SVM for fortification of the intrusion detection system. The
proposed technique has five major steps in which, first step is to perform the relevance analysis, and then
input data is clustered using Fuzzy C-means clustering. After that, neuro-fuzzy is trained, such that each
of the data point is trained with the corresponding neuro-fuzzy classifier associated with the cluster.
Subsequently, a vector for SVM classification is formed and in the last step, classification using RBF-
SVM is performed to detect intrusion has happened or not. Data set used is the KDD cup 1999 dataset
and we have used precision, recall, F-measure and accuracy as the evaluation metrics parameters. Our
technique could achieve better accuracy for all types of intrusions. The results of proposed technique are
compared with the other existing techniques. These comparisons proved the effectiveness of our
technique.
Improving the performance of Intrusion detection systemsyasmen essam
Intrusion detection systems (IDS) are widely studied by
researchers nowadays due to the dramatic growth in
network-based technologies. Policy violations and
unauthorized access is in turn increasing which makes
intrusion detection systems of great importance. Existing
approaches to improve intrusion detection systems focus on feature selection or reduction since some features are
irrelevant or redundant which when removed improve the
accuracy as well as the learning time.
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...IDES Editor
Intrusion detection in the internet is an active
area of research. Intruders can be classified into two
types, namely; external intruders who are unauthorized
users of the computers they attack, and internal
intruders, who have permission to access the system but
with some restrictions. The aim of this paper is to present
a methodology to recognize attacks during the normal
activities in a system. A novel classification via sequential
information bottleneck (sIB) clustering algorithm has
been proposed to build an efficient anomaly based
network intrusion detection model. We have compared
our proposed method with other clustering algorithms
like X-Means, Farthest First, Filtered clusters, DBSCAN,
K-Means, and EM (Expectation-Maximization)
clustering in order to find the suitability of our proposed
algorithm. A subset of KDDCup 1999 intrusion detection
benchmark dataset has been used for the experiment.
Results show that the proposed method is efficient in
terms of detection accuracy, low false positive rate in
comparison to the other existing methods.
A novel ensemble modeling for intrusion detection system IJECEIAES
Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs, and hence our system is robust and efficient.
Feature Selection using the Concept of Peafowl Mating in IDSIJCNCJournal
Cloud computing has high applicability as an Internet based service that relies on sharing computing resources. Cloud computing provides services that are Infrastructure based, Platform based and Software based. The popularity of this technology is due to its superb performance, high level of computing ability, low cost of services, scalability, availability and flexibility. The obtainability and openness of data in cloud environment make it vulnerable to the world of cyber-attacks. To detect the attacks Intrusion Detection System is used, that can identify the attacks and ensure information security. Such a coherent and proficient Intrusion Detection System is proposed in this paper to achieve higher certainty levels regarding safety in cloud environment. In this paper, the mating behavior of peafowl is incorporated into an optimization algorithm which in turn is used as a feature selection algorithm. The algorithm is used to reduce the huge size of cloud data so that the IDS can work efficiently on the cloud to detect intrusions. The proposed model has been experimented with NSL-KDD dataset as well as Kyoto dataset and have proved to be a better as well as an efficient IDS.
Feature Selection using the Concept of Peafowl Mating in IDSIJCNCJournal
Cloud computing has high applicability as an Internet based service that relies on sharing computing resources. Cloud computing provides services that are Infrastructure based, Platform based and Software based. The popularity of this technology is due to its superb performance, high level of computing ability, low cost of services, scalability, availability and flexibility. The obtainability and openness of data in cloud environment make it vulnerable to the world of cyber-attacks. To detect the attacks Intrusion Detection System is used, that can identify the attacks and ensure information security. Such a coherent and proficient Intrusion Detection System is proposed in this paper to achieve higher certainty levels regarding safety in cloud environment. In this paper, the mating behavior of peafowl is incorporated into an optimization algorithm which in turn is used as a feature selection algorithm. The algorithm is used to reduce the huge size of cloud data so that the IDS can work efficiently on the cloud to detect intrusions. The proposed model has been experimented with NSL-KDD dataset as well as Kyoto dataset and have proved to be a better as well as an efficient IDS.
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...IJNSA Journal
Over the past few years, intrusion protection systems have drawn a mature research area in the field of computer networks. The problem of excessive features has a significant impact on
intrusion detection performance. The use of machine learning algorithms in many previous researches has been used to identify network traffic, harmful or normal. Therefore, to obtain the accuracy, we must reduce the dimensionality of the data used. A new model design based on a combination of feature selection and machine learning algorithms is proposed in this paper. This model depends on selected genes from every feature to increase the accuracy of intrusion detection systems. We selected from features content only ones which impact in attack detection. The performance has been evaluated based on a comparison of several known algorithms. The NSL-KDD dataset is used for examining classification. The proposed model outperformed the other learning approaches with accuracy 98.8 %.
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...IJCNCJournal
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...IJCNCJournal
The paper proposes a two-phase classification method for detecting anomalies in network traffic, aiming to tackle the challenges of imbalance and feature selection. The study uses Information Gain to select relevant features and evaluates its performance on the CICIDS-2018 dataset with various classifiers. Results indicate that the ensemble classifier achieved the highest accuracy, precision, and recall. The proposed method addresses challenges in intrusion detection and highlights the effectiveness of ensemble classifiers in improving anomaly detection accuracy. Also, the quantity of pertinent characteristics chosen by Information Gain has a considerable impact on the F1-score and detection accuracy. Specifically, the Ensemble Learning achieved the highest accuracy of 98.36% and F1-score of 97.98% using the relevant selected features.
Electrically small antennas: The art of miniaturizationEditor IJARCET
We are living in the technological era, were we preferred to have the portable devices rather than unmovable devices. We are isolating our self rom the wires and we are becoming the habitual of wireless world what makes the device portable? I guess physical dimensions (mechanical) of that particular device, but along with this the electrical dimension is of the device is also of great importance. Reducing the physical dimension of the antenna would result in the small antenna but not electrically small antenna. We have different definition for the electrically small antenna but the one which is most appropriate is, where k is the wave number and is equal to and a is the radius of the imaginary sphere circumscribing the maximum dimension of the antenna. As the present day electronic devices progress to diminish in size, technocrats have become increasingly concentrated on electrically small antenna (ESA) designs to reduce the size of the antenna in the overall electronics system. Researchers in many fields, including RF and Microwave, biomedical technology and national intelligence, can benefit from electrically small antennas as long as the performance of the designed ESA meets the system requirement.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
1. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1725
www.ijarcet.org
Department of computer science and engineering, Sharda University, Greater Noida
Abstract
The security of information and data is a
critical issue in a computer networked environment.
In our society computer networks are used to store
proprietary information and to provide services for
organizations and society. So in order to secure this
valuable information from unknown attacks
(intrusions) need of intrusion detection system arises.
There are many intrusion detection approaches
focused on the issues of feature reduction as some of
the features are irrelevant or redundant which results
in lengthy detection process and degrading the
performance of an IDS. So in order to design
lightweight IDS we investigate the performance of
three feature selection approaches CFS, Information
Gain and Gain Ratio. In this paper we propose a
fusion model by making use of the three standard
algorithms and finally applying genetic algorithm
that identify important reduced input features. We
apply Naive Bayes classifier on the dataset for
evaluating the performance of the proposed method
over the standard ones. The reduced attributes shows
that proposed algorithm give better performance that
is efficient and effective for detecting intrusions.
Keywords- CFS, InfoGain, GainRatio, Genetic
Algorithm, KDDCup99 dataset, NaiveBayes,
Intrusion Detection.
1. Introduction
The rapid development of computer
networks and mostly of the Internet has created many
challenging issues in network and information
security such as intrusions on computer and network
systems. An intrusion is an attempt to compromise
the integrity, confidentiality, availability of a
resource, or to bypass the security mechanisms of a
computer system or network. James Anderson
introduced the concept of intrusion detection in 1980
[1].These security attacks can cause severe disruption
to data and networks. Therefore, Intrusion Detection
system becomes an important part of every computer
or network system. It monitors computer or network
traffic and identify malicious activities that alerts the
system or network administrator against malicious
attacks.Dorothy Denning proposed several models
for IDS in 1987 [2].
Approaches of IDS based on detection are
categorized either as misuse detection or anomaly
detection:
Misuse detection- Misuse intrusion detection uses
well-defined patterns of the attack that exploit
weaknesses in the system to identify the intrusions
[3].
Anomaly detection – Anomaly detection refers to
techniques that define and characterize normal
behaviors of the system, any deviation from this
expected normal behaviors are considered as
intrusions [3].
Approaches of IDS based on location of monitoring
are categorized either as Network based intrusion
detection system (NIDS) [4] and host based intrusion
detection system (HIDS) [5]:
Fusion of Statistic, Data Mining and Genetic
Algorithm for feature selection in Intrusion
Detection
MeghaAggarwal& Amrita
2. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1726
www.ijarcet.org
Network based Intrusion detection- NIDS detects
intrusion by monitoring network traffic in terms of IP
packet.
Host based Intrusion detection- HIDS are installed
locally on host machines and detects intrusions by
examining system calls, application logs, other host
activities made by each user on a particular machine.
Due to large volumes of data as well as the
complex and dynamic properties of intrusion
behaviors, identifying intrusion traffic from normal
becomes difficult. Due to this, IDS has to meet the
challenges of low detection rate and large
computation. Therefore, Feature selection is a very
important issue and plays a key role in intrusion
detection in order to achieve maximal performance.
Feature selection is the selection of that minimal
cardinality feature subset of original feature set that
retains the high detection accuracy as the original
feature set [6]. Blum and Langley [7] divide the
feature selection methods into three categories named
filter, wrapper [8] and hybrid (embedded) method.
Network based Intrusion detection- NIDS detects
intrusion by monitoring network traffic in terms of IP
packet.
Host based Intrusion detection- HIDS are installed
locally on host machines and detects intrusions by
examining system calls, application logs, other host
activities made by each user on a particular machine.
Due to large volumes of data as well as the
complex and dynamic properties of intrusion
behaviors, identifying intrusion traffic from normal
becomes difficult. Due to this, IDS has to meet the
challenges of low detection rate and large
computation. Therefore, Feature selection is a very
important issue and plays a key role in intrusion
detection in order to achieve maximal performance.
Feature selection is the selection of that minimal
cardinality feature subset of original feature set that
retains the high detection accuracy as the original
feature set [6]. Blum and Langley [7] divide the
feature selection methods into three categories named
filter, wrapper [8] and hybrid (embedded) method.
Filter method:Filter method [9] uses
external learning algorithm to evaluate the
performance of selected features.
Wrapper method: The wrapper method
[10] “Wrap around” the learning Algorithm. It uses
one predetermined classifier to evaluate subsets of
features. This method is computationally more
expensive than the filter method [11] [10].
Hybrid method: The hybrid method [11]
[12] combines wrapper and filter approach to achieve
best possible performance with a particular learning
algorithm.
The rest of this paper is organized as
follows. In Section 2, we review a background of
feature selection. In section 3, a review of standard
feature selection methods are given. Next, the
proposed feature selection algorithm is presented. In
Section 5, the experimental results are reported, and
an implication and future direction of this study are
discussed in the final section.
2. Background
In [13], the author has proposed a new
hybrid feature selection method – a fusion of
Correlation-based Feature Selection, Support Vector
Machine and Genetic Algorithm – to determine an
optimal feature set. Correlation-based Feature
Selection (CFS) is a filter method. It evaluates merit
of the feature subset. A flow chart is given in this
paper that describes the working of the proposed
hybrid algorithm. The hybrid feature selection
method reduced the computational resource while
maintaining the detection and false positive rate
within tolerable range. The proposed algorithm also
reduces the training time and testing time. Faster
training and testing helps to build lightweight
intrusion detection system.
In paper [14], a feature relevance analysis is
performed on KDD 99 training set, which is widely
used by machine learning researchers. Feature
relevance is expressed in terms of information gain,
which gets higher as the feature gets more
discriminative. In order to get feature relevance
measure for all classes in training set, information
gain is calculated on binary classification, for each
feature resulting in a separate information gain per
class.
In [15], the author has proposed an
automatic feature selection based on the filter method
used in machine learning. In particular, we focus on
Correlation Feature Selection (CFS). By transforming
the CFS optimization problem into a polynomial
mixed 0−1 fractional programming problem and by
introducing additional variables in the problem
transformed in such a way, they obtain a new mixed
0 –1 linear programming problem with a number of
constraints and variables that is linear in the number
of full set features. The mixed 0−1 linear
3. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1727
www.ijarcet.org
programming problem can then be solved by means
of branch-and-bound algorithm. Their feature
selection algorithm was compared experimentally
with the best-first-CFS and the genetic-algorithm-
CFS methods regarding the feature selection
capabilities. The classification accuracy obtained
after the feature selection by means of the C4.5 and
the Bayes Net machines over the KDD CUP’99 IDS
benchmarking data set was also tested.
In paper [16] the author has incorporated
information gain (IG) method for selecting
discriminative features and triangle area based SVM
by combining k- means clustering algorithm and
SVM as a classifier for detecting attacks.
3. Feature Selection Methods
Basically there are two types of feature
selection methods [20]-
Feature Ranking:
(a) Rank features according to some
criterion and selects the top K
features.
(b) A threshold is needed in advance to
select the top K features.
Feature Subset Evaluator:
(a) Selects the minimum subset of
features that does not deteriorate
learning performance.
(b) No threshold necessary.
3.1 Correlation-based Feature Selection
(CFS):
CFS is basically a feature subset evaluator
method of feature selection. It evaluates
merit of the feature subset on the basis of
hypothesis –“Good feature subsets contains
features highly correlated with the class yet
uncorrelated to each other [17]”.With CFS
as attribute evaluator and search strategy
such as best first is used to search the
feature subset in reasonable time. Equation
1 for calculating CFS is
( 1)
cf
s
ff
kr
M
k k k r
Where Msgives the merit of a feature subset
S, k is the number of features present in the
feature subset rcf is average feature-class
correlation and rff is average feature-feature
correlation [17].
3.2 Info Gain (IG):
Info Gain is basically a feature ranking
method of feature selection. This method
evaluates attributes by measuring their
information gain with respect to the class.
Let C be a set of training set samples with
theircorresponding labels. Suppose there are
m classes and thetraining set contains
Cisamples of class I and C is the
totalnumber of samples in the training set
[14]. Expectedinformation needed to
classify a given sample is calculated by:
1 2 m 2
1
(S ,S ...............S ) log (1)
m
i
i
C Ci
I
C C
A feature F with values { f1, f2, …, f v} can
divide the training set into v subsets { C1,
C2, …, Cv } where Ciis the subset which has
the value fjfor feature F. Furthermore let
Cjcontain Cijsamples of class i. Entropy of
the feature F is
ij mj
1
............
(F) * (C ........ C ) (2)
v
ij mj
j
C C
E I
C
Information gain for F can be calculated as:
1(F) (C ............. ) E(F) (3)mGain I C
3.3 Gain Ratio (GR):
Gain Ratio is also a method of feature
ranking for feature selection. The gain ratio
is an extension of info gain, attempts to
overcome the bias. Gain ratio applies
normalization to info gain using a value
defined as
2
1
(C) ( / )log ( / )
v
f i i
i
SplitInfo C C C C
The value represents the potential
information generated by splitting the training
dataset, C, into v partitions, corresponding to the v
outcomes of a test on attribute A [18].
f(F) (F) / SplitInfo (S)GainRatio Gain
4. Genetic Algorithms:
Genetic algorithms are basically
computerized search and optimization
methods that work very parallel to the
principles of natural evolution. Based on
4. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1728
www.ijarcet.org
Darwin's survival-of-the-fittest principles,
GA's intelligent search procedure finds the
best and fittest design solutions [22].
Potential solutions to the problem to be
solved are encoded as sequences of bits,
characters or numbers. The unit of encoding
is called a gene, and the encoded sequence is
called a chromosome. Each chromosome
represents one possible solution to the
problem. GA is able to select subsets of
various sizes in order to determine the
optimum combination and number of inputs
to network. A chromosome contains the
information about the solution to a problem,
which it represents. Typically, it can be
encoded using a binary string as follows
[23]:
Chromosome 1 1101100100110110
Chromosome 2 1101111000011110
In which a bit value of 1 in the chromosome
representation means that the corresponding
feature is included in the specified subset,
and a value of 0 indicates that the
corresponding feature is not included in the
subset.
The set of chromosomes during a stage of
evolution are called population. An
evaluation function is used to evaluate the
fitness of each chromosome. During the
process of evaluation crossover and
mutation operator are used to simulate the
natural reproduction and mutation of genes.
Genetic algorithm starts with a randomly
generated population, evolves through
selection, crossover, and mutation. Finally,
the best chromosome is picked up as the
final result. This allows reducing the
computational expense on the training
system with near optimal results still
reachable. Research [21] has shown that GA
is one of the most efficient of all feature
selection methods.
5. Proposed Method:
In this approach detection of intrusions will be
accomplished by using a fusion of feature selection
approaches. There are several existing feature
selection approaches but we will use a fusion of
feature selection approaches by incorporating CFS,
Info Gain, Gain Ratio and finally applying genetic
algorithm (GA) for intrusion detection. The proposed
method is discussed below.
Step1: Select features using CFS (defined in
3.1)
1 2 3 4, , , , 41CFS cfs cfs cfs cfs cfsnS f f f f f n
Step2: (i) Select features using Information
Gain (IG). These features are arranged on
the basis of their rank (defined in 3.2)
_ 1 2 3...................., , , 41.IG T IG IG IG IGnS f f f f n
(ii) From set SIG_T ,select top 30
ranked features.
1 2 3 3( 0) 03 ......................., ....., ...IG IG IG IG IGS f f f f
Step3: (i) Select features using Gain Ratio
(GR). These features are arranged on the
basis of their rank (defined in 3.3)
GR_ 1 2 3.........., , . , 41T GR GR GR GRnS f f f f n
(ii) From set SGR_T ,select top 30
ranked features.
(30) 1 2 3 30...................., ........,GR GR GR GR GRS f f f f
Step4: Apply union operation on the sets obtained
from steps (1), (2) and (3).
(30) (30) )(T CFS IG GRS S S S
Step5: Finally applying Genetic algorithm (GA) on
the set ST.
Step6:Evaluate the performance of the set ST using
Naïve Bayes classifier.
6. Experimental Setup:
We used WEKA 3.7.8 a machine learning
tool [19], to compute the feature selection subsets for
5. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1729
www.ijarcet.org
CFS, IG, GR and the proposed algorithm and also to
measure the classification performance on each of
these feature sets. We have used
“kddcup.data_10_percent” dataset for evaluating the
performance of the proposed method. Each
connection had a label of either normal or attack type
and the attack type can be further classified into four
categories namely DOS, probe, U2R and R2L.
1. Denial of Service Attack (DOS): Attacks
of this type deprive the host or legitimate
user from using the service or resources.
2. Probe Attack: These attacks automatically
scan a network of computers or a DNS
server to find valid IP addresses.
3. Remote to Local (R2L) Attack: In this
type of attack an attacker who does not have
an account on a victim machine gains local
access to the machine and modifies the data.
4. User to Root (U2R) Attack: In this type of
attack a local user on a machine is able to
obtain privileges normally reserved for the
super (root) users.
We have used naive bayes classifier for
evaluating the performance of our proposed
method.
7. Result
Basically we used three standard methods and one
proposed method for feature reduction. The feature
reduction is performed on 41 features and obtained
11, 30, 30 and 17 features.
Table 1: List of features selected by different
feature selection methods
S.
No
Feature
Selection
Method
Num
ber
of
selec
ted
featu
res
Selected Features
1. CFS+BestF
irst
11 2,3,4,5,6,7,8,14,23,30,3
6
2. InfoGain+R
anker
30 1,2,3,4,5,6,8,10,12,13,2
2,23,24,25,26,
27,28,29,30,31,32,33,34
,35,36,37,38,
39,40,41
3. GainRatio+
Ranker
30 2,3,4,5,6,7,8,10,11,12,1
3,14,22,23,
24,25,26,27,29,30,31,32
,33,34,35,
36,37,38,39,40
4. Proposed
Method
17 2,3,4,5,6,7,8,12,14,23,2
4,25,30,31,33
36,37
Table 2: Performance of feature reduction
methods
Feature
Reduction
Methods
No. of
attribut
es
Time
take
n to
build
mod
el
Time
take
n to
test
mod
el
Accurac
y
CFS+BestFirst 11 1.31s 42.5
1s
91.5749
%
InfoGain
+Ranker
30 0.35s 12.8
8s
99.6249
%
GainRatio+Ran
ker
30 0.3s 12.8
5s
99.6421
%
All Features 41 0.34s 17.2
1s
99.6466
%
Proposed
Method
17 0.21s 8.88s 99.6563
%
The reduced feature set obtained in proposed
algorithm is smallest among the standard feature
selection algorithms and it performs better than other
methods in terms of detection rate and computational
time. The figure below shows comparative graph for
classifier accuracy on the reduced features obtained
by (i) CFS+bestfirst (ii) IG+Ranker (iii)
GR+Ranker(iv) Proposed method.
6. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1730
www.ijarcet.org
Figure 1: Detection rate
Figure 2: Time taken to build model
Figure 3: Time taken to test model
8. Conclusion and Future Work
We have proposed a new method for
attribute selection by making use of the standard
algorithms i.e. CFS, Information Gain, Gain Ratio
and Genetic Algorithm. By using the proposed
algorithm the result improves in terms of reduction in
feature set, reduction in testing and training time and
also gain increase in detection rate. Future work will
include considering the 4 classes of attack .
References:
[1] Anderson, James P., “Computer Security Threat
Monitoring and Surveillance”, James P.
Anderson Co., Fort Washington, Pa., 1980.
[2] Denning, D. E. (1987), “An intrusion detection
model. IEEE Transaction on
SoftwareEngineering”, Software Engineering
13(2), 222-232.
[3] Bezroukov, Nikolai, "Intrusion Detection (general
issues)." Softpanorama: Open Source Software
Educational Society. Nikolai Bezroukov, URL:
http://www.
softpanorama.org/Security/intrusion
detection.shtml , 2003.
[4] Caruana,R. and Frietag,D. “Greedy Attribute
Selection,” Proc. 11th Int’l Conf. Machine
Learning, pp. 28-36, 1994.
86
88
90
92
94
96
98
100
102
CFS+BestFirst
IG+Ranker
GR+Ranker
AllFeatures
ProposedMethod
DETECTIONRATE
FEATURE SELECTION METHODS
Detection
Rate
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Timetakentobuildmodel
FEATURE SELECTION METHODS
Time
taken to
build
model
0
5
10
15
20
25
30
35
40
45
Timetakentotestmodel
Feature Selection Methods
Time taken
to test
model
7. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1731
www.ijarcet.org
[5] Yeung, D.Y. & Ding, Y. (2003),”Host-based
intrusion detection using dynamic and static
behavioral models”, Pattern Recognition, 36,
229-243.
[6] Mitra, P. et al. (2002),”Unsupervised Feature
Selection Using Feature Similarity”, IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 24, 301–312.
[7] Blum, Avrim, L. and Langley, P.. “Selection of
relevant features and examples in machine
learning”, Artificial Intelligence, 97(1-2):245–
271, 1997.
[8] Kohavi, R. and John, G. (1997),”Wrappers for
Feature Subset Selection. Artificial
Intelligence”, 97 (1-2), 273-324.
[9] Liu, H., Motoda , H. ,” Feature Selection for
Knowledge Discovery and Data Mining”,
Boston: Kluwer Academic, 1998.
[10] Kim, Y.,Street,W. and Menczer,F.(2000)
“Feature Selection for Unsupervised Learning
via Evolutionary Search,” Proc. Sixth ACM
SIGKDD Int’l Conf. Knowledge Discovery and
Data Mining, pp. 365-369.
[11] Das, S. (2001),” Filters, Wrappers and a
Boosting-Based Hybrid for Feature Selection”,
Proc. 18th Int’l Conf. Machine Learning, 74-81.
[12] Xing, E. et al. (2001)”Feature Selection for
High-Dimensional Genomic Microarray Data”,
Proc.15th Int’l Conf.Machine Learning, 601-
608.
[13] Sridevi,R. and Chattemvelli ,R.(2012) “Genetic
algorithm and Artificial immune systems: A
combinational approach for network intrusion
detection ”,International conference on
advances in engineering, science and
management (ICAESM-2012),494-498.
[14] H. GüneşKayacık, A. NurZincir-Heywood
“Selecting Features for Intrusion Detection:
A Feature Relevance Analysis on KDD 99 Intrusion
Detection Datasets”.
[15] Shazzad, K. and Park, J. (2005)”Optimization of
intrusion detection through fast hybrid feature
selection” , IEEE.
[16] T.Pingjie, J.Rong-an (2010) “Feature selection
and design of intrusion detection system based
on k-means and triangle area support vector
machine”, IEEE.
[17] M.A. Hall, “Correlation-Based Feature Selection
for Discrete and Numeric Class Machine Learning”,
Proc. 17th
Int’l Conf Machine Learning, 2000, pp.
359-366.
[18] j.Han ,M Kamber, Data mining : Concepts and
Techniques. San Francisco, Morgan Kauffmann
Publishers(2001).
[19] http://www.cs.waikato.ac.nz/~ml/weka/
[20]Lei Yu and Huan Liu, "Efficient Feature
Selection via Analysis of Relevance and
Redundancy", Journal of Machine Learing Research
5(2004), pp1205-1224.
[21] M. Kudo, J. Sklansky, "Comparison of
algorithms that select features for pattern classifiers",
Pattern Recognition 33 (2000) 25-41.
[22] H. Pohlheim, "Genetic and Evolutionary
Algorithms: Principles, Methods and Algorithms ",
http://www.geatbx.com/doculindex.html.
[23] L.Y. Zhai, L.P. Khoo, and S.C. Fok, "Feature
extraction using rough set theory and genetic
algorithms and application for the simplification of
product quality evaluation", Computers & Industrial
Engineering, 2002, pp. 661-676.
MeghaAggarwalreceived herB.Tech degree with honors in
Computer science and engineeringfrom UPTU university. She is
pursuing M.Tech in computer science and engineering from
Shardauniversity. Her areas of interest are computer networks and
security.
Ms. Amrita is an Assistant Professor in Department of Computer
Science and Engineering at Sharda University, Greater Noida. She
received her M.Tech. in Computer Science from
BanasthaliVidyapith, Rajasthan. She is currently pursuing her
Ph.D. in Computer Science and Engineering from Sharda
University, Greater Noida (U.P.). She has more than 12 years of
experience in Academics, Software Development Industry and
Government Organization.