Seminar Report | Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection

Network Intrusion Detection using
Supervised Machine Learning Technique
with Feature Selection
SEMINAR REPORT
Submitted by
JOWIN JOHN CHEMBAN
in partial fulfillment for the award of the degree
of
Bachelor of Technology
in
COMPUTER SCIENCE AND ENGINEERING
of
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
HOLY GRACE ACADEMY OF ENGINEERING
MALA 680 735
NOVEMBER 2019

CERTIFICATE
This is to Certify that the seminar report entitled “Network Intrusion
Detection using Supervised Machine Learning Technique with Feature
Selection” is a bonafide record of the work done by Mr. JOWIN JOHN
CHEMBAN, Register No. HGW16CS022 under our supervision, in partial
fulfillment of the requirements for the award of Degree of Bachelor of
Technology in Computer Science & Engineering from APJ Abdul Kalam
Technological University, Trivandrum for the years 2016-2020
Ms. SUJITHA B CHERKOTTU Ms. VIDHU VALSAN A
Asst Professor, Dept. of CSE Asst Professor, Dept. of CSE
Seminar Coordinator Seminar Guide
Ms. SANAM ANTO
Head of Department, Dept. of CSE
Date :

ACKNOWLEDGEMENT
An endeavor over a long period may be successful only with the advice and
guidance of many well-wishers. I take this opportunity to express my gratitude to all who
encouraged me to complete this seminar. I would like to express my deep sense of
gratitude to my respected Principal Dr. THRESIAMMA PHILIP for her inspiration
and for creating an atmosphere in the college to do the seminar.
I would like to thank Ms. SANAM ANTO, Head of Department of Computer
Science and Engineering for providing permission and facilities to conduct the seminar
in a systematic way, and for guiding me and giving timely advices, suggestions and
whole-hearted moral support in the successful completion of this seminar.
My sincere thanks to the seminar coordinator Ms. SUJITHA B CHERKOTTU,
Assistant Professor in Department of Computer Science and Engineering for their
wholehearted moral support in completion of this seminar.
My sincere thanks to my seminar guide Ms. VIDHU VALSAN A, Assistant
Professor in Department of Computer Science and Engineering for their wholehearted
moral support in completion of this seminar.
Last but not the least, I would like to thank all the Lectures and non-teaching staff
and my friends who have helped me in every possible way in the completion of my
seminar.
Date : JOWIN JOHN CHEMBAN

NETWORK INTRUSION DETECTION USING SUPERVISED MACHINE LEARNING TECHNIQUE WITH FEATURE SELECTION
HGAE DEPARTMENT OF CSE
ABSTRACT
A novel supervised machine learning system is developed to classify network traffic
whether it is malicious or benign. To find the best model considering detection success
rate, combination of supervised learning algorithm and feature selection method have
been used. Through this study, it is found that Artificial Neural Network (ANN) based
machine learning with wrapper feature selection outperform support vector machine
(SVM) technique while classifying network traffic. To evaluate the performance, NSL-
KDD dataset is used to classify network traffic using SVM and ANN supervised
machine learning techniques. Comparative study shows that the proposed model is
efficient than other existing models with respect to intrusion detection success rate.

TABLE OF CONTENTS
CHAPTER
NO.
TITLE PAGE
NO.
1 INTRODUCTION 1
2 LITERATURE SURVEY 3
2.1 IMPORTANCE OF INTRUSION DETECTION SYSTEM
(IDS)
3
2.2 MACHINE LEARNING TECHNIQUES FOR
INTRUSION DETECTION
8
2.3 ANOMALY-BASED NETWORK INTRUSION
DETECTION: TECHNIQUES, SYSTEMS AND
CHALLENGES
16
2.4 INCREMENTAL ANOMALY-BASED INTRUSION
DETECTION SYSTEM USING LIMITED LABELED
DATA
26
2.5 A DEEP LEARNING APPROACH FOR NETWORK
INTRUSION DETECTION SYSTEM
33
3 PROPOSED SYSTEM 43
4 APPLICATIONS 49
5 CONCLUSION 50
REFRENCES 51

LIST OF ABBREVATIONS
NIDS Network Intrusion Detection System
IDS Intrusion Detection System
UTM Unified Threat Modeling
IPS Intrusion Prevention System
SVM Support Vector Machine
ANN Artificial Neural Network
DIDS Distributed Intrusion Detection System
CMDS Computer Misuse Detection System
ASIM Automated Security Measurement System
AFCERT Air Force’s Computer Emergency Response Team
TCP Transfer Control Protocol
IP Internet Protocol
HIDS Host based Intrusion Detection System
AI Artificial Intelligence
CI Computational Intelligence
ML Machine Learning
kNN k-Nearest Neighbor
MLP Multi-Layer Perceptron
UDP User Datagram Protocol
GA Genetic Algorithms
KDD Knowledge Discovery in Databases
RBF Radial Basis Function
DoS Denial of Service
R2L Root to Local
U2R User to Root
PRB Probing
AIS Artificial Immune System

NSA Negative Selection Algorithm
CIDF Common Intrusion Detection Framework
IDWG Intrusion Detection Working Group
IDXP Intrusion Detection eXchange Protocol
IDMEF Intrusion Detection MEssage Format
OS Operating System
A-NIDS Anomaly based Network Intrusion Detection System
TP True Positive
FP False Positive
TN True Negative
FN False Negative
LAN Local Area Network
SC Service Classifier
ITI Incremental Tree Inductive
NADAL Network Anomaly Detection using Active Learning
STL Self-Taught Learning
SNIDS Signature (misuse) based Network Intrusion Detection System
ADNIDS Anomaly Detection based Network Intrusion Detection System
ANN Artificial Neural Network
NB Naïve Bayesian
RF Random Forests
SOM Self-Organized Maps
DMNB Discriminative Multinomial Naïve Bayes
END Ensembles of Balanced Nested Dichotomies
OPF Optimum Path Forest
DBN Deep Belief Network
UFL Unsupervised Feature Learning
RBM Restricted Boltzmann Machine
SMR Soft Max Regression
CPU Central Processing Unit

LIST OF FIGURES
NO. TITLE PAGE
NO.
2.1.1 Number of incidents reported 5
2.1.2 Vulnerabilities reported 6
2.1.3 Layered Security approach for reducing risk 7
2.2.1 Average of detection rates for methods evaluated in Pavel
Laskov, Patrick Dssel, Christin Schfer, and Konrad Rieck.
Learning intrusion detection: Supervised or unsupervised?
11
2.3.1 General CIDF architecture for IDS systems 17
2.3.2 Generic A-NIDS functional architecture 18
2.3.3 Classification of the anomaly detection techniques
according to the nature of the processing involved in the
‘‘behavioural’’ model considered.
19
2.4.1 The proposed model called NADAL 31
2.5.1 The two-stage process of self-taught learning:
a) Unsupervised Feature Learning on unlabeled data.
b) Classification on labeled data.
37
2.5.2 Various steps involved in our NIDS implementation 39
2.5.3 Classification accuracy using self-taught learning and
soft-max regression for 2Class, 5-Class, and 23-Class
when applied to training data
2.5.4 Precision, Recall, and F-Measure values using self-taught
learning and soft-max regression for 2-Class when applied
to training data
41
2.5.5 Classification accuracy using self-taught learning and
soft-max regression for 2-class and 5-class when applied
to test data
41
learning and soft-max regression for 2-class when applied
to test data
41
learning and soft-max regression for 5-class when applied
to test data
42
3.1 Proposed supervised machine learning classifier system 44
3.2 SVM classifier in two-dimensional problem spaces 45
3.3 Artificial neural network showing the input, output and
hidden layers
46

LIST OF TABLES
NO. TITLE PAGE
NO.
2.3.1 Fundamentals of the A-NIDS techniques 20
2.4.1 ACCURACY AND KAPPA FOR TEN
RANDOMIZATIONS: NADAL VS. INCREMENTAL
NAIVE BAYESIAN CLASSIFIER
32
2.5.1 Traffic records distribution in the training 38
3.1 RESULT OF FEATURE SELECTION 47
3.2 RESULT OF CLASSIFICATION 47
3.3 PERFORMANCE COMPARISON WITH EXISTING
MODELS
48

1
CHAPTER 1
INTRODUCTION
1.1 NIDS using Supervised Machine Learning with Feature Selection
With the wide spreading usages of internet and increases in access to online
contents, cybercrime is also happening at an increasing rate. Intrusion is some time also
called as hacker or cracker attempting to break into or misuse your system/network.
Intrusion detection is the first step to prevent security attack. Hence the security
solutions such as Firewall, Intrusion Detection System (IDS), Unified Threat Modeling
(UTM) and Intrusion Prevention System (IPS) are getting much attention in studies.
IDS detect attacks from a variety of systems and network sources by collecting
information and then analyzes the information for possible security breaches.
An IDS installed on a network/system provides much the same purpose as a burglar
alarm system installed in a house. Through various methods, both detect when an
intruder/attacker/burglar is present, and both subsequently issue some type of warning
or alert. The network-based IDS analyze the data packets that travel over a network and
this analysis are carried out in two ways. Till today anomaly-based detection is far
behind than the detection that works based on signature and hence anomaly-based
detection still remains a major area for research. The challenges with anomaly-based
intrusion detection are that it needs to deal with novel attack for which there is no prior
knowledge to identify the anomaly. Hence the system somehow needs to have the
intelligence to segregate which traffic is harmless and which one is malicious or
anomalous and for that machine learning techniques are being explored by the
researchers over the last few years. IDS however is not an answer to all security related
problems. For example, IDS cannot compensate weak identification and authentication
mechanisms or if there is a weakness in the network protocols.
Studying the field of intrusion detection first started in 1980 and the first such
model was published in 1987. For the last few decades, though huge commercial
investments and substantial research were done, intrusion detection technology is still
immature and hence not effective. While network IDS that works based on signature
have seen commercial success and widespread adoption by the technology-based
organization throughout the globe, anomaly-based network IDS have not gained
success in the same scale. Due to that reason in the field of IDS, currently anomaly-
based detection is a major focus area of research and development. And before going to
any wide scale deployment of anomaly-based intrusion detection system, key issues
remain to be solved. But the literature today is limited when it comes to compare on
how intrusion detection performs when using supervised machine learning techniques.
To protect target systems and networks against malicious activities anomaly-based
network IDS is a valuable technology. Despite the variety of anomaly-based network

2
intrusion detection techniques described in the literature in recent years, anomaly
detection functionalities enabled security tools are just beginning to appear, and some
important problems remain to be solved. Several anomaly-based techniques have been
proposed including Linear Regression, Support Vector Machines (SVM), Genetic
Algorithm, Gaussian mixture model, k-nearest neighbor algorithm, Naive Bayes
classifier, Decision Tree. Among them the most widely used learning algorithm is SVM
as it has already established itself on different types of problem. One major issue on
anomaly-based detection is though all these proposed techniques can detect novel
attacks but they all suffer a high false alarm rate in general. The cause behind is the
complexity of generating profiles of practical normal behaviour by learning from the
training data sets. Today Artificial Neural Network (ANN) are often trained by the
back-propagation algorithm, which had been around since 1970 as the reverse mode of
automatic differentiation.
The major challenges in evaluating performance of network IDS is the
unavailability of a comprehensive network-based data set. Most of the proposed
anomaly-based techniques found in the literature were evaluated using KDD CUP 99
dataset. In this paper we used SVM and ANN –two machine learning techniques, on
NSLKDD which is a popular benchmark dataset for network intrusion.
The promise and the contribution machine learning did till today are fascinating.
There are many real-life applications we are using today offered by machine learning. It
seems that machine learning will rule the world in coming days. Hence, we came out
into a hypothesis that the challenge of identifying new attacks or zero-day attacks
facing by the technology enabled organizations today can be overcome using machine
learning techniques. Here we developed a supervised machine learning model that can
classify unseen network traffic based on what is learnt from the seen traffic. We used
both SVM and ANN learning algorithm to find the best classifier with higher accuracy
and success rate.

3
CHAPTER 2
LITERATURE SURVEY
2.1 IMPORTANCE OF INTRUSION DETECTION SYSTEM (IDS)
Intruders computers, who are spread across the Internet have become a major
threat in our world, the researchers proposed a number of techniques such as (firewall,
encryption) to prevent such penetration and protect the infrastructure of computers, but
with this, the intruders managed to penetrate the computers. IDS has taken much of the
attention of researchers, IDS monitor the resources computer and sends reports on the
activities of any anomaly or strange patterns. The aim of this paper is to explain the
stages of the evolution of the idea of IDS and its importance to researchers and research
centres, security, military and to examine the importance of intrusion detection systems
and categories, classifications, and where can put IDS to reduce the risk to the network
Security is an important issue for all the networks of companies and institutions
at the present time and all the intrusions are trying in ways that successful access to the
data network of these companies and Web services and despite the development of
multiple ways to ensure that the infiltration of intrusion to the infrastructure of the
network via the Internet, through the use of firewalls, encryption, etc.
But IDS is a relatively new technology of the techniques for intrusion detection
methods that have emerged in recent years. Intrusion detection system’s main role in a
network is to help computer systems to prepare and deal with the network attacks.
Intrusion detection functions include:
• Monitoring and analyzing both user and system activities
• Analyzing system configurations and vulnerabilities
• Assessing system and file integrity
• Ability to recognize patterns typical of attacks
• Analysis of abnormal activity patterns
• Tracking user policy violations
The purpose of IDS is to help computer systems on how to deal with attacks,
and that IDS is collecting information from several different sources within the
computer systems and networks and compares this information with preexisting
patterns of discrimination as to whether there are attacks or weaknesses.

4
2.1.1 INTRUSION DETECTION SYSTEMS: ABRIEF HISTORY
The goal of intrusion detection is to monitor network assets to detect anomalous
behaviour and misuse in network. This concept has been around for nearly twenty years
but only recently has it seen a dramatic rise in popularity and incorporation into the
overall information security infrastructure. Beginning in 1980, with James Anderson's
paper, Computer Security Threat Monitoring and Surveillance, the intrusion detection
was born. Since then, several polar events in IDS technology have advanced intrusion
detection to its current state.
James Anderson's seminal paper, was written for a government organization,
introduced the notion that audit trails contained vital information that could be valuable
in tracking misuse and understanding of user behaviour. With the release of this paper,
the concept of "detecting" misuse and specific user events emerged. His insight into
audit data and its importance led to tremendous improvements in the auditing
subsystems of virtually every operating system. Anderson's hypothesize also provided
the foundation for future intrusion detection system design and development. His work
was the start of host-based intrusion detection and IDS in general.
In 1983, SRI International, and Dr. Dorothy Denning, began working on a
government project that launched a new effort into intrusion detection system
development. Their goal was to analyze audit trails from government mainframe
computers and create profiles of users based upon their activities. One year later, Dr.
Denning helped to develop the first model for intrusion detection, the Intrusion
Detection Expert System (IDES), which provided the foundation for the IDS
technology development that was soon to follow.
In 1984, SRI also developed a means of tracking and analyzing audit data
containing authentication information of users on ARPANET, the original Internet.
Soon after, SRI completed a Navy SPAWAR contract with the realization of the first
functional intrusion detection system, IDES. Using her research and development work
at SRI, Dr. Denning published the decisive work, An Intrusion Detection Model, which
revealed the necessary information for commercial intrusion detection system
development. The subsequent iteration of this tool was called the Distributed Intrusion
Detection System (DIDS). DIDS augmented the existing solution by tracking client
machines as well as the servers it originally monitored. Finally, in 1989, the developers
from the Haystack project formed the commercial company, Haystack Labs, and
released the last generation of the technology, Stalker. Crosby Marks says that "Stalker
was a host-based, pattern matching system that included robust search capabilities to
manually and automatically query the audit data." The Haystack advances, coupled
with the work of SRI and Denning, greatly advanced the development of host-based
intrusion detection technologies.

5
Commercial development of intrusion detection technologies began in the early
1990s. Haystack Labs was the first commercial vendor of IDS tools, with its Stalker
line of host-based products. SAIC was also developing a form of host-based intrusion
detection, called Computer Misuse Detection System (CMDS). Simultaneously, the Air
Force's Crypto Logic Support Canter developed the Automated Security Measurement
System (ASIM) to monitor network traffic on the US Air Force's network. ASIM made
considerable progress in overcoming scalability and portability issues that previously
plagued NID products. Additionally, ASIM was the first solution to incorporate both a
hardware and software solution to network intrusion detection. ASIM is still currently
in use and managed by the Air Force's Computer Emergency Response Team
(AFCERT) at locations all over the world. As often happened, the development group
on the ASIM project formed a commercial company in 1994, the Wheel Group. Their
product, Net Ranger, was the first commercially viable network intrusion detection
device.
The intrusion detection market began to gain in popularity and truly generate
revenues around 1997. In that year, the security market leader, ISS, developed a
network intrusion detection system called Real Secure. A year later, Cisco recognized
the importance of network intrusion detection and purchased the Wheel Group,
attaining a security solution they could provide to their customers. Similarly, the first
visible host-based intrusion detection company, Centrex Corporation, emerged as a
result of a merger of the development staff from Haystack Labs and the departure of the
CMDS team from SAIC. From there, the commercial IDS world expanded its market-
base and a roller coaster ride of start-up companies, mergers, and acquisitions ensued.
Network intrusion detection actually deals with information passing on the wire
between hosts. Typically referred to as "packet-sniffers," network intrusion detection
devices intercept packets travelling in and out in network along various communication
mediums and protocols, usually TCP/IP. Once captured, the packets are analyzed in a
number of different ways. Some IDS devices will simply compare the packet to a
signature database consisting of known attacks and malicious packet "fingerprints",
Figure 2.1.1 : Number of incidents reported

6
while others will look for anomalous packet activity that might indicate malicious
behaviour.
The IDS basically monitor network traffic for activity that falls within the
banned activity in the network. The IDS main job is gives alert to network admins for
allow them to take corrective action, blocking access to vulnerable ports, denying
access to specific IP address or shutting down services used to allow attacks. This is
nothing but front-line weapon in the network admins war against hackers. This
information is then compared with predefined blueprints of known attacks and
vulnerabilities.
2.1.2 CATEGORIES OF INTRUSION DETECTION SYSTEM
Intrusion detection system is classified into three categories: signature-based
detection systems, anomaly-based detection systems and specification-based detection
systems.
1) Signature based Detection System
Signature based detection system (also called misuse based), This type of
detection is very effective against known attacks, and it depends on the
receiving of regular updates of patterns and will be unable to detect unknown
previous threats or new releases.
2) Anomaly based Detection System
This type of detection depends on the classification of the network to the normal
and anomalous, as this classification is based on rules or heuristics rather than
patterns or signatures and the implementation of this system we first need to
know the normal behaviour of the network.
Anomaly based detection system unlike the misuse-based detection system
because it can detect previous unknown threats, But the false positive to rise
more probably.
3) Specification based Detection System
Figure 2.1.2 : Vulnerabilities reported

7
This type of detection systems is responsible for monitoring the processes and
matching the actual data with the program and in case of any Abnormal
behaviour will be issued an alert and must be maintained and updated whenever
a change was made on the surveillance programs in order to be able to detect
the previous attacks the unknown and the number of false positives what can be
less than the anomaly detection system approach.
2.1.3 CLASSIFICATION OF INTRUSION DETECTION SYSTEM
Intrusion detection system are classified into three types
1) Host based IDS (HIDS)
This type is placed on one device such as server or workstation, where the
data is analyzed locally to the machine and are collecting this data from
different sources. HIDS can use both anomaly and misuse detection system.
2) Network based IDS (NIDS)
NIDS are deployed on strategic point in network infrastructure. The NIDS
can capture and analyze data to detect known attacks by comparing patterns
or signatures of the database or detection of illegal activities by scanning
traffic for anomalous activity. NIDS are also referred as “packet-sniffers”,
Because it captures the packets passing through the of communication
mediums.
3) Hybrid based IDS
The management and alerting from both network and host-based intrusion
detection devices, and provide the logical complement to NID and HID -
central intrusion detection management.
Figure 2.1.3 : Layered Security approach for reducing risk

8
2.1.4 CONCLUSION
An intrusion detection system is a part of the defensive operations that
complements the defenses such as firewalls, UTM etc. The intrusion detection system
basically detects attack signs and then alerts. According to the detection methodology,
intrusion detection systems are typically categorized as misuse detection and anomaly
detection systems. The deployment perspective, they are be classified in network based
or host-based IDS. In current intrusion detection systems where information is collected
from both network and host resources. In terms of performance, an intrusion detection
system becomes more accurate as it detects more attacks and raises fewer false positive
alarms.
2.2 MACHINE LEARNING TECHNIQUES FOR INTRUSION
DETECTION
An Intrusion Detection System (IDS) is a software that monitors a single or a
network of computers for malicious activities (attacks) that are aimed at stealing or
censoring information or corrupting network protocols. Most techniques used in
today’s IDS are not able to deal with the dynamic and complex nature of cyber-attacks
on computer networks. Hence, efficient adaptive methods like various techniques of
machine learning can result in higher detection rates, lower false alarm rates and
reasonable computation and communication costs. In this paper, we study several such
schemes and compare their performance. We divide the schemes into methods based on
classical artificial intelligence (AI) and methods based on computational intelligence
(CI). We explain how various characteristics of CI techniques can be used to build
efficient IDS.
Today, political and commercial entities are increasingly engaging in
sophisticated cyber-warfare to damage, disrupt, or censor information content in
computer networks. In designing network protocols, there is a need to ensure reliability
against intrusions of powerful attackers that can even control a fraction of parties in the
network. The controlled parties can launch both passive (e.g., eavesdropping,
nonparticipation) and active attacks (e.g., jamming, message dropping, corruption, and
forging).
Intrusion detection is the process of dynamically monitoring events occurring in
a computer system or network, analyzing them for signs of possible incidents and often
interdicting the unauthorized access. This is typically accomplished by automatically
collecting information from a variety of systems and network sources, and then
analyzing the information for possible security problems.

9
Motivation
Traditional intrusion detection and prevention techniques, like firewalls, access control
mechanisms, and encryptions, have several limitations in fully protecting networks and
systems from increasingly sophisticated attacks like denial of service. Moreover, most
systems built based on such techniques suffer from high false positive and false
negative detection rates and the lack of continuously adapting to changing malicious
behaviours. In the past decade, however, several Machine Learning (ML) techniques
have been applied to the problem of intrusion detection with the hope of improving
detection rates and adaptability. These techniques are often used to keep the attack
knowledge bases up-to-date and comprehensive.
Study Approach
In this paper, we study several papers that use ML methods for detecting
malicious behaviour in distributed computer systems. There is a huge body of work in
this area thus, we decided to carefully select a few papers based on two factors:
diversity and citations count. By diversity we mean most ML techniques for IDS are
covered but only one paper is picked from the set of papers that use the same technique.
Also, the papers are chosen based on their citations count as this factor greatly shows
how much the corresponding work has influenced the community. All non-survey
papers studied here are cited at least 100 times.
2.2.1 CHALLENGES AND APPROACHES
An IDS generally has to deal with problems such as large network traffic
volumes, highly uneven data distribution, the difficulty to realize decision boundaries
between normal and abnormal behaviour, and a requirement for continuous adaptation
to a constantly changing environment. In general, the challenge is to efficiently capture
and classify various behaviours in a computer network. Strategies for classification of
network behaviours are typically divided into two categories: misuse detection and
anomaly detection.
Misuse detection techniques examine both network and system activity for
known instances of misuse using signature matching algorithms. This technique is
effective at detecting attacks that are already known. However, novel attacks are often
missed giving rise to false negatives. Alerts may be generated by the IDS, but reaction
to every alert wastes time and resources leading to instability of the system. To
overcome this problem, IDS should not start elimination procedure as soon as the first
symptom has been detected but rather it should be patient enough to collect alerts and
decide based on the correlation of them.
Anomaly detection systems rely on constructing a model of user behaviour that
is considered normal. This is achieved by using a combination of statistical or machine
learning methods to examine network traffic or system calls and processes. The
detection of novel attacks is more successful using the anomaly detection approach as

10
any deviant behaviour is classified as an intrusion. However, normal behaviour in a
large and dynamic system is not well defined and it changes over the time. This often
results in a substantial number of false alarms known as false positives. A network-
based IDS looks at the incoming network traffic for patterns that can signify whether a
person is probing the network for vulnerable computers. Since responding to each alert
consumes relatively large amounts of time and resources, IDS should not respond to
every alert it generates. Disregarding this fact may result in a self-inflicted denial-of-
service. To overcome this problem, alerts should be aggregated and correlated in order
to produce fewer but more expressive and remarkable alerts.
2.2.1.1 MACHINE LEARNING APPROACHES
We divide the ML-based approaches to intrusion detection into two categories:
approaches based on Artificial Intelligence (AI) techniques and approaches based on
Computational Intelligence (CI) methods. AI techniques refer to the methods from the
domain of classical AI like statistical modeling and while CI techniques refer to nature-
inspired methods that are used to deal with complex problems that classical methods
are unable to solve. Important CI methodologies are evolutionary computation, fuzzy
logic, artificial neural networks, and artificial immune systems. CI is different from the
well-known field of AI. AI handles symbolic knowledge representation, while CI
handles numeric representation of information. Although the boundary between these
two categories is not always clear and many hybrid methods have been proposed in the
literature, most previous work are mainly designed based on either of the categories.
Moreover, it would be quite useful to understand how well nature-based techniques
perform in contrast to classical methods.
1) AI-BASED TECHNIQUES
Laskov et al. develop an experimental framework for comparative analysis of
supervised (classification) and unsupervised learning (clustering) techniques for
detecting malicious activities. The supervised methods evaluated in this work
include decision trees, k-Nearest Neighbor (kNN), Multi-Layer Perceptron (MLP),
and Support Vector Machines (SVM). The unsupervised algorithms include γ-
algorithm, k-means clustering, and single linkage clustering. They define two
scenarios for evaluating the aforementioned learning algorithms from both
categories. In the first scenario, they assume that training and test data come from
the same unknown distribution. In the second scenario, they consider the case
where the test data comes from new (i.e., unseen) attack patterns. This scenario
helps us understand how much an IDS can generalize its knowledge to new
malicious patterns, which is often very essential for an IDS system.

11
Since today’s sophisticated adversaries tend to use several intrusion patterns to
escape from modern IDS.
The results show that the supervised algorithms in general show better
classification accuracy on the data with known attacks (the first scenario). Among
these algorithms, the decision tree algorithm has achieved the best results (95% true
positive rate, 1% false-positive rate). The next two best algorithms are the MLP and
the SVM, followed by the k-nearest neighbor algorithm. However, if there are
unseen attacks in the test data, then the detection rate of supervised methods
decreases significantly. This is where the unsupervised techniques perform better as
they do not show significant difference in accuracy for seen and unseen attacks.
Figure 2.2.1 shows the average true/false positive rates of all methods evaluated. As
the plots show, the supervised techniques generally perform better although
unsupervised methods give more robust results in both scenarios.
Zanero and Savaresi introduce a two-tier anomaly-based architecture for
IDS in TCP/IP networks based on unsupervised learning: the first tier is an
unsupervised clustering algorithm, which build small-size patterns from the
network packets payload. In other words, TCP or UDP packet are assigned to two
clusters representing normal and abnormal traffic. The second tier is an optimized
traditional anomaly detection algorithm improved by the availability of data on the
packet payload content. The motivation behind the work is that unsupervised
learning methods are usually more powerful in generalization of attack patterns
than supervised methods thus, there is a hope that such an architecture can resist
polymorphic attacks more efficiently.
Lee and Solfo build a classifier to detect anomalies in networks using data
mining techniques. They implement two general data mining algorithms that are
essential in describing normal behaviour of a program or user. They propose an
agent-based architecture for intrusion detection systems, where the learning agents
Figure 2.2.1 : Average of detection rates for methods evaluated in Pavel Laskov, Patrick
Dssel, Christin Schfer, and Konrad Rieck. Learning intrusion detection: Supervised or
unsupervised? In Image Analysis and Processing ICIAP 2005, volume 3617 of Lecture
Notes in Computer Science, pages 50–57. Springer Berlin Heidelberg, 2005. in two
scenarios: test data contains only known attacks (left) and test data contains unknown
attacks (right).

12
continuously compute and provide the updated detection models to the agents. They
conduct experiments on Sendmail system call data and network tcpdump data to
demonstrate the effectiveness of their classification models in detecting anomalies.
They finally argue that the most important challenge of using data mining
approaches in intrusion detection is that they require a large amount of audit data in
order to compute the profile rule sets.
Sommer and Paxson study the imbalance between the extensive amount of
research on ML-based intrusion detection versus the lack of operational
deployments of such systems. They identify challenges particular to network
intrusion detection and provide a set of guidelines for fortifying future research on
ML-based intrusion detection. More specifically, they argue that an anomaly-based
IDS requires outlier detection while the classic application of ML is a classification
problem that deals with finding similarities between activities. It is true that in some
cases, an outlier detection problem can be modeled as a classification problem in
which there are two classes: normal and abnormal. In machine learning, one needs
to train a system with training patterns of all classes while in anomaly detection one
can only train on normal patterns. This means that anomaly detection is better for
finding variations of known attacks, rather than previously unknown malicious
activity. This is why ML methods have been applied to spam detection more
effectively than to intrusion detection.
2) CI-BASED TECHNIQUES
In this section, we review several algorithms based on the four core
techniques of computational intelligence.
• Genetic Algorithms (GA)
Genetic algorithms are aimed at finding optimal solutions to problems. Each
potential solution to a problem is represented as a sequence of bits (genes)
called a genome or chromosome. A genetic algorithm begins with a set of
genomes (population) and an evaluation function called fitness function that
measures the quality (goodness) of each genome. The algorithm uses two
reproduction operators called crossover and mutation to create new
descendants (solutions), which are then evaluated. Crossover determines
how various properties of the parents in a population are inherited by the
descendants. Mutation is the spontaneous alteration of a single gene.
Sinclair et al. use genetic algorithms and decision trees to create rules for an
intrusion detection expert system, which supports the analyst’s job in
differentiating anomalous network activity from normal network traffic. In
this work, GA is used to evolve simple rules for network traffic. Each rule is
represented by a genome and the initial population of genomes is a set of

13
random rules. Each genome is comprised of 29 genes: 8 for source IP, 8 for
destination IP, 6 for source port, 6 for destination port, and 1 for protocol.
The fitness function is based on the actual performance of each rule on a
preclassified data set. An analyst marks a data set comprised of connections
as either normal or abnormal. The system uses analyst-created training sets
for rule development and analyst decision support. If a rule completely
matches an abnormal connection, then it is rewarded a bonus and if it
matches a normal connection it is penalized. Hence, the generations are
biased toward rules that match intrusive connections only. Once the genetic
algorithm reaches a certain number of generations, it stops and the best
genomes (i.e., rules) are selected. The generated rule set can be used as
knowledge inside the IDS for judging whether the network connection and
related behaviours are potential intrusions.
The traditional GA tends to converge to a single best solution called global
maximum. Since, the algorithm requires a group of best unique rules, a
nature inspired technique called niching that attempts to create
subpopulations which converge on local maxima.
Li describes a few disadvantages of the algorithm proposed and defines a
new technique for defining IDS rules. They argue that in order to detect
intrusive behaviours for a local network, network connections should be
used to define normal and abnormal behaviours. An attack can sometimes
be as simple as scanning for available ports in a server or a password-
guessing scheme. But typically, they are complex and are generated by
automated tools. So, one needs to use temporal and spatial information of
network connections to define IDS rules that can classify complex
anomalous activities using an efficient genetic algorithm.
• Artificial Neural Networks (ANN)
A neural network consists of a collection of processing units called neurons
that are highly interconnected according to a given topology. ANN have the
ability to learning by example and generalize from limited, noisy, and
incomplete data. They have been successfully employed in a broad spectrum
of data-intensive applications.
Mukkamala et al. describe approaches to intrusion detection using neural
networks and Support Vector Machines (SVM). Their goal is to discover
patterns or features that describe user behaviour to build classifiers for
recognizing anomalies. SVM are supervised learning machines that
represent the training vector in high-dimensional feature space and label
each vector by its class. SVM define an upper bound on the margin
(separation) between different classes to minimize the generalization error,
which is the amount of error in classification of unknown vectors. SVM

14
classify data by determining a set of training data called support vectors that
approximate a hyperplane in feature space.
Mukkamala et al. use an SVM for non-linear classification of feature vectors
in an IDS. The SVM is trained with 7312 data points and test with 6980 test
points from KDD. Each point is located on a 41-dimensional space and the
training is done using the radial basis function (RBF). The RBF is used to
approximate the non-linear hyperplane that separates the normal and
abnormal classes. Using this SVM, they reach an accuracy of 99.5% in
classification of test points. They also use three multilayer feed-forward
ANN to classify the same test points. The ANN are trained using the same
7312-point training set. The best result from experimenting the different
ANN architectures is a detection rate of 99.25%. The authors conclude that
although their SVM IDS shows higher detection rates than their ANN, SVM
can only be used for binary classification, which is a big limitation for IDS
that require multiple classes.
• Fuzzy Logic
Fuzzy logic is a method to computing based on degrees of truth rather than
the usual true or false Boolean logic on which the modern computers are
based. With fuzzy spaces, fuzzy logic allows an object to belong to different
classes at the same time. This makes fuzzy logic a great choice for intrusion
detection because the security itself includes fuzziness and the boundary
between the normal and anomaly is not well defined. Moreover, the
intrusion detection problem involves many numeric attributes in collected
data, and various derived statistical measures. Building models directly on
numeric data usually causes high detection errors. A behaviour that deviates
only slightly from a model may not be detected or a small change in normal
behaviour may cause a false positive. With fuzzy logic, it is possible to
model these small deviations to keep the false positive/negative rates small.
Every fuzzy rule has the following general form,
IF condition THEN conclusion [weight],
where condition is a fuzzy expression defined using fuzzy logic operators
like fuzzy AND & fuzzy OR, conclusion is an atomic expression, and
weight is a real number in [0,1] that shows the confidence of the rule.
Gomez and Dasgupta show that with fuzzy logic, the false alarm rate in
determining intrusive activities can be reduced. They define a set of fuzzy
rules to define the normal and abnormal behaviour in a computer network,
and a fuzzy inference engine to determine intrusions. They use a genetic
algorithm to generate fuzzy classifiers, which is a set of fuzzy rules in the
form defined above. Each fuzzy rule is represented by a genome and the GA
is used to find the best genomes (fuzzy rules) to be added to the fuzzy

15
classifier. The authors conducted experiments using the KDD evaluation
data to classify 22 different types of attacks into 4 intrusion classes: denial
of service (DoS), unauthorized access from a remote machine (R2L),
unauthorized access to local superuser (root) privileges (U2R), and probing
(PRB). The results show that their algorithm achieves an overall true
positive rate of 98.95% and a false positive rate of 7%.
• Artificial Immune Systems (AIS)
Natural immune systems consist of molecules, cells, and tissues that
establish body’s resistance to infections caused by pathogens like bacteria,
viruses, and parasites. They distinguish pathogens from self-cells and
eliminate the pathogens. This provides a great source of inspiration for
computer security systems, especially IDS. An artificial immune system is a
computationally intelligent system based on behaviour of the natural
immune systems.
The first immune-inspired model applicable to various computer security
problems was proposed by Hofmeyr and Forrest. Their model is specialized
to detect intrusions in local area networks based on TCP/IP. They build a
database containing normal sequences of system calls that act as the self-
definition of the normal behaviour of a program, and as the basis to detect
anomalies. Each TCP connection is modeled by a triple, which encodes
address of sender, address of receiver and port number of the receiver.
Detectors are generated randomly through negative selection algorithm
(NSA). In addition to NSA that results in a signal to stimulate or tolerate the
immune response, they used a second signal (called co-stimulation) to
confirm the anomaly that was detected through NS procedure. In this
system, a human is required to generate this signal manually in order to
reduce false alarms (autoimmunity) of the system.
Kim et al. provide an introduction and analysis of the key developments
within the field of immune-inspired computer security as well as
suggestions for future research. They summarize six immune features that
are desirable for an effective IDS: distributed, multi-layered, self-organized,
lightweight, diverse and disposable. They explain that the human immune
system is distributed through immune networks and it generates unique
antibody sets to provide the first four requirements. It is self-organized
through gene library evolution, negative selection, and clonal. Finally, it is
lightweight through approximate binding, memory cells, and gene
expression to increase efficiency.
Zamani et al. describe an artificial immune algorithm for intrusion detection
in distributed systems based on danger theory, an immunological model
based on the idea that the immune system does not recognize between self

16
and non-self, but rather between events that cause damage. The authors
propose a multi-agent environment that computationally emulates the
behaviour of natural immune systems is effective in reducing false positive
rates. They show the effectiveness of their model in practice by performing
a case study on the problem of detecting distributed denial-of-service attacks
in wireless sensor networks.
Dasgupta proposes a multi-agent IDS based on AIS. He defines three types
of agents: monitoring agents that roam around the network and monitor
various parameters simultaneously at multiple levels (user to packet level),
communicator agents that are used to play the role of signals between
immune cells called lymphokines and decision/action agents to make
decisions based on collected local warning signals. Roles of each type of
agents is unique, though they may work in collaboration. This work
unfortunately does not provide any experimental results making it difficult
for the reader to compare the performance of the proposed system with other
ML-based IDS.
2.2.2 CONCLUSION
We reviewed several influential algorithms for intrusion detection based on
various machine learning techniques. Characteristics of ML techniques makes it
possible to design IDS that have high detection rates and low false positive rates while
the system quickly adapts itself to changing malicious behaviours. We divided these
algorithms into two types of ML-based schemes: Artificial Intelligence (AI) and
Computational Intelligence (CI). Although these two categories of algorithms share
many similarities, several features of CI-based techniques, such as adaptation, fault
tolerance, high computational speed and error resilience in the face of noisy
information, conform the requirement of building efficient intrusion detection systems.
2.3 ANOMALY-BASED NETWORK INTRUSION DETECTION:
TECHNIQUES, SYSTEMS AND CHALLENGES
The Internet and computer networks are exposed to an increasing number of
security threats. With new types of attacks appearing continually, developing flexible
and adaptive security-oriented approaches is a severe challenge. In this context,
anomaly-based network intrusion detection techniques are a valuable technology to
protect target systems and networks against malicious activities. However, despite the
variety of such methods described in the literature in recent years, security tools
incorporating anomaly detection functionalities are just starting to appear, and several
important problems remain to be solved.
Noteworthy work has been carried out by CIDF (‘‘Common Intrusion Detection
Framework’’), a working group created by DARPA in 1998 mainly oriented towards

17
coordinating and defining a common framework in the IDS field. Integrated within
IETF in 2000, and having adopted the new acronym IDWG (‘‘Intrusion Detection
Working Group’’), the group defined a general IDS architecture based on the
consideration of four types of functional modules (Figure 2.3.1):
• E blocks (‘‘Event-boxes’’): This kind of block is composed of sensor
elements that monitor the target system, thus acquiring information events
to be analyzed by other blocks.
• D blocks (‘‘Database-boxes’’): These are elements intended to store
information from E blocks for subsequent processing by A and R boxes.
• A blocks (‘‘Analysis-boxes’’): Processing modules for analyzing events
and detecting potential hostile behaviour, so that some kind of alarm will be
generated if necessary.
• R blocks (‘‘Response-boxes’’): The main function of this type of block is
the execution, if any intrusion occurs, of a response to thwart the detected
menace.
Other key contributions in the IDS field concern the definition of protocols for
data exchange between components (e.g. IDXP, ‘‘Intrusion Detection eXchange
Protocol’’, RFC 4767), and the format considered for this (e.g. IDMEF, ‘‘Intrusion
Detection MEssage Format’’, RFC 4765).
Depending on the information source considered (E boxes in Figure 2.3.1), an
IDS may be either host or network-based. A host-based IDS analyzes events such as
process identifiers and system calls, mainly related to OS information. On the other
hand, a network-based IDS analyzes network related events: traffic volume, IP
addresses, service ports, protocol usage, etc. This paper focuses on the latter type of
IDS.
Depending on the type of analysis carried out (A blocks in Figure 2.3.1),
intrusion detection systems are classified as either signature-based or anomaly-based.
Signature-based schemes (also denoted as misuse-based) seek defined patterns, or
signatures, within the analyzed data. For this purpose, a signature database
corresponding to known attacks is specified a priori. On the other hand, anomaly-based
Figure 2.3.1 : General CIDF architecture for IDS systems

18
detectors attempt to estimate the ‘‘normal’’ behaviour of the system to be protected,
and generate an anomaly alarm whenever the deviation between a given observation at
an instant and the normal behaviour exceeds a predefined threshold. Another possibility
is to model the ‘‘abnormal’’ behaviour of the system and to raise an alarm when the
difference between the observed behaviour and the expected one falls below a given
limit
Signature and anomaly-based systems are similar in terms of conceptual
operation and composition. The main differences between these methodologies are
inherent in the concepts of ‘‘attack’’ and ‘‘anomaly’’. An attack can be defined as ‘‘a
sequence of operations that puts the security of a system at risk’’. An anomaly is just
‘‘an event that is suspicious from the perspective of security’’. Based on this
distinction, the main advantages and disadvantages of each IDS type can be pointed
out.
2.3.1 A-NIDS Techniques
Although different A-NIDS approaches exist (Este´vezTapiador et al., 2004), in
general terms all of them consist of the following basic modules or stages (Figure 2.3.2)
• Parameterization: In this stage, the observed instances of the target system are
represented in a pre-established form.
• Training stage: The normal (or abnormal) behaviour of the system is
characterized and a corresponding model is built. This can be done in very
different ways, automatically or manually, depending on the type of A-NIDS
considered.
• Detection stage: Once the model for the system is available, it is compared with
the (parameterized) observed traffic. If the deviation found exceeds (or is
below, in the case of abnormality models) a given threshold an alarm will be
triggered (Este´vez-Tapiador et al., 2004).
According to the type of processing related to the “behavioural” model of the
target system, anomaly detection techniques can be classified into three main categories
(Lazarevic et al., 2005) (see Figure 2.3.3): statistical based, knowledge-based, and
Figure 2.3.2 : Generic A-NIDS functional architecture.

19
machine learning-based. In the statistical-based case, the behaviour of the system is
represented from a random viewpoint. On the other hand, knowledge-based A-NIDS
techniques try to capture the claimed behaviour from available system data (protocol
specifications, network traffic instances, etc.). Finally, machine learning A-NIDS
schemes are based on the establishment of an explicit or implicit model that allows the
patterns analyzed to be categorized.
Two key aspects concern the evaluation, and thus the comparison, of the
performance of alternative intrusion detection approaches: these are the efficiency of
the detection process, and the cost involved in the operation. Without underestimating
the importance of the cost, at this point the efficiency aspect must be emphasized. Four
situations exist in this context, corresponding to the relation between the result of the
detection for an analyzed event (“normal” vs. “intrusion”) and its actual nature
(‘‘innocuous’’ vs. ‘‘malicious’’). These situations are: false positive (FP), if the
analyzed event is innocuous (or ‘‘clean’’) from the perspective of security, but it is
classified as malicious; true positive (TP), if the analyzed event is correctly classified as
intrusion/malicious; false negative (FN), if the analyzed event is malicious but it is
classified as normal/innocuous; and true negative (TN), if the analyzed event is
Figure 2.3.3 : Classification of the anomaly detection techniques
according to the nature of the processing involved in the
‘‘behavioural’’ model considered.

20
correctly classified as normal/innocuous. It is clear that low FP and FN rates, together
with high TP and TN rates, will result in good efficiency values.
The fundamentals for statistical, knowledge and machine learning-based A-
NIDS, as well as the principal subtypes of each, are described below. The main features
of all are summarized in Table 2.3.1. Above and beyond other possibilities, the
question of efficiency should be a prime consideration in selecting and implementing
A-NIDS methodologies.
1) Statistical-based A-NIDS techniques
In statistical-based techniques, the network traffic activity is captured and a
profile representing its stochastic behaviour is created. This profile is based on
metrics such as the traffic rate, the number of packets for each protocol, the rate
of connections, the number of different IP addresses, etc. Two datasets of
network traffic are considered during the anomaly detection process: one
corresponds to the currently observed profile over time, and the other is for the
previously trained statistical profile.
Apart from their inherent features for use as anomaly-based techniques,
statistical A-NIDS approaches have a number of virtues. Firstly, they do not
require prior knowledge about the normal activity of the target system; instead,
they have the ability to learn the expected behaviour of the system from
observations. Secondly, statistical methods can provide accurate notification of
malicious activities occurring over long periods of time.
However, some drawbacks should also be pointed out. First, this kind of A-
NIDS is susceptible to be trained by an attacker in such a way that the network
traffic generated during the attack is considered as normal. Second, setting the
values of the different parameters/metrics is a difficult task, especially because
the balance between false positives and false negatives is affected.
Table 2.3.1 : Fundamentals of the A-NIDS techniques

21
2) Knowledge-based techniques
The so-called expert system approach is one of the most widely used
knowledge-based IDS schemes. However, like other A-NIDS methodologies,
expert systems can also be classified into other, different categories. Expert
systems are intended to classify the audit data according to a set of rules,
involving three steps. First, different attributes and classes are identified from
the training data. Second, a set of classification rules, parameters or procedures
are deduced. Third, the audit data are classified accordingly.
More restrictive/particular in some senses are specification-based anomaly
methods, for which the desired model is manually constructed by a human
expert, in terms of a set of rules (the specifications) that seek to determine
legitimate system behaviour. If the specifications are complete enough, the
model will be able to detect illegitimate behavioural patterns. Moreover, the
number of false positives is reduced, mainly because this kind of system avoids
the problem of harmless activities, not previously observed, being reported as
intrusions. Specifications could also be developed by using some kind of formal
tool.
3) Machine learning-based A-NIDS schemes
Machine learning techniques are based on establishing an explicit or implicit
model that enables the patterns analyzed to be categorized. A singular
characteristic of these schemes is the need for labelled data to train the
behavioural model, a procedure that places severe demands on resources.
In many cases, the applicability of machine learning principles coincides with
that for the statistical techniques, although the former is focused on building a
model that improves its performance on the basis of previous results. Hence, a
machine learning A-NIDS has the ability to change its execution strategy as it
acquires new information. Although this feature could make it desirable to use
such schemes for all situations, the major drawback is their resource expensive
nature.
Several machine learning-based schemes have been applied to A-NIDS. Some
of the most important are cited below, and their main advantages and drawbacks
are identified.
• Bayesian networks
A Bayesian network is a model that encodes probabilistic relationships
among variables of interest. This technique is generally used for intrusion
detection in combination with statistical schemes, a procedure that yields
several advantages, including the capability of encoding interdependencies
between variables and of predicting events, as well as the ability to
incorporate both prior knowledge and data.

22
However, a serious disadvantage of using Bayesian networks is that their
results are similar to those derived from threshold-based systems, while
considerably higher computational effort is required.
Although the use of Bayesian networks has proved to be effective in certain
situations, the results obtained are highly dependent on the assumptions
about the behaviour of the target system, and so a deviation in these
hypotheses leads to detection errors, attributable to the model considered.
• Markov models
A Markov chain is a set of states that are interconnected through certain
transition probabilities, which determine the topology and the capabilities of
the model. During a first training phase, the probabilities associated to the
transitions are estimated from the normal behaviour of the target system.
The detection of anomalies is then carried out by comparing the anomaly
score obtained for the observed sequences with a fixed threshold.
Markov-based techniques have been extensively used in the context of host
IDS, normally applied to system calls. In all cases, the model derived for the
target system has provided a good approach for the claimed profile, while,
as in Bayesian networks, the results are highly dependent on the
assumptions about the behaviour accepted for the system.
• Neural networks
With the aim of simulating the operation of the human brain, neural
networks have been adopted in the field of anomaly intrusion detection,
mainly because of their flexibility and adaptability to environmental
changes. However, a common characteristic in the proposed variants, from
recurrent neural networks to self-organizing maps (Ramadas et al., 2003), is
that they do not provide a descriptive model that explains why a particular
detection decision has been taken.
• Fuzzy logic techniques
Fuzzy logic is derived from fuzzy set theory under which reasoning is
approximate rather than precisely deduced from classical predicate logic.
Fuzzy techniques are thus used in the field of anomaly detection mainly
because the features to be considered can be seen as fuzzy variables. This
kind of processing scheme considers an observation as normal if it lies
within a given interval.
Although fuzzy logic has proved to be effective, especially against port
scans and probes, its main disadvantage is the high resource consumption
involved. On the other hand, it should also be noticed that fuzzy logic is
controversial in some circles, and it has been rejected by some engineers
and by most statisticians, who hold that probability is the only rigorous
mathematical description of uncertainty.
• Genetic algorithms
Genetic algorithms are categorized as global search heuristics, and are a
particular class of evolutionary algorithms that use techniques inspired by

23
evolutionary biology such as inheritance, mutation, selection and
recombination. Thus, genetic algorithms constitute another type of machine
learning-based technique, capable of deriving classification rules and/or
selecting appropriate features or optimal parameters for the detection
process. The main advantage of this subtype of machine learning A-NIDS is
the use of a flexible and robust global search method that converges to a
solution from multiple directions, whilst no prior knowledge about the
system behaviour is assumed. Its main disadvantage is the high resource
consumption involved.
• Clustering and outlier detection
Clustering techniques work by grouping the observed data into clusters,
according to a given similarity or distance measure. The procedure most
commonly used for this consists in selecting a representative point for each
cluster. Then, each new data point is classified as belonging to a given
cluster according to the proximity to the corresponding representative point.
Some points may not belong to any cluster; these are named outliers and
represent the anomalies in the detection process.
Clustering techniques determine the occurrence of intrusion events only
from the raw audit data, and so the effort required to tune the IDS is
reduced.
4) Additional considerations on A-NIDS processing.
KDD and data mining
In addition to the above described A-NIDS techniques, there are others that may
help in the task of dealing with the amount of information contained within a
dataset. Two of these techniques are principal component analysis (PCA) and
association rule discovery.
PCA is a technique that is used to reduce the complexity of a dataset. It is not a
detection scheme itself but an auxiliary one. A given data collection (or dataset),
obtained by means of the different sensors in the target environment, becomes
more and more extensive and complex as the number of different services and
speed of the networks grow. To simplify the dataset, PCA makes a translation
on a basis by which n correlated variables are represented in order to reduce the
number of variables to d < n, which will be both uncorrelated and linear
combinations of the original ones. This makes it possible to express the data in a
reduced form, thus facilitating the detection process.
To conclude the present section, let us present an important discussion of A-
NIDS techniques. During recent decades several scientific communities have
contributed to analyzing information from high volume databases. However, in
the 1990s, KDD (‘‘Knowledge Discovery in Databases’’) burst onto the scene,

24
to ‘‘identify new, valid, potentially useful and comprehensible patterns for
data’’.
2.3.2 AVAILABLE A-NIDS SYSTEMS
This section describes several reported endeavours in the development and
deployment of A-NIDS platforms in real network environments. The analysis is split
into two categories: available platforms, commercial or freeware, and research systems.
Commercial systems tend to use well proven techniques, and so they do not usually
consider the A-NIDS techniques most recently proposed in the specialized literature. In
fact, most of them include a signature-based detection module as the core of the
detection platform.
2.3.2.1 A-NIDS platforms
In recent years, a number of important actions have focused on implementing
A-NIDS techniques in real security platforms. Currently available IDS software tools in
this line include Snort (www.snort.org), Prelude (www.prelude-ids.org), and N@G
(www.ncb.ernet.in/nag).
Although anomaly-based detection techniques are not yet mature, they are
beginning to appear in commercial and open source products. Furthermore, in recent
years, some pioneering systems and businesses in the A-NIDS field have been acquired
by bigger companies, and their products incorporated into more general and integral
network security platforms.
More recent systems make use of a distributed architecture for intrusion
detection by incorporating agents (or sensors), and a central console to supervise the
overall detection process. This is the case of the SecurityFocus DeepSight Threat
Management System – now part of DeepNines BBX Intrusion Prevention which uses a
statistical approach to detect potential Internet threats. Data are collected by distributed
sensors, which include intrusion detection capabilities. The sensors report current
network scans and attacks to the controller, providing a global detection capability.
Most of the platforms perform further analysis on the monitored data, related to
audit, tracing and forensic capabilities. Additionally, they may trigger some kind of
response to detected attacks, namely an interaction with firewalls, the reset of TCP
connections, the use of honey systems, etc.
More advanced platforms include the Protocol Anomaly Detection (PAD)
technique, which is based on the detection of anomalies in the use of protocols. This
kind of analysis is adopted in BarbedWire IDS, DeepNines BBX, N@G, and Strata
Guard. PAD combines specification-based and statistical characterization A-NIDS
techniques to model the behaviour of a given protocol. This can be complemented by
using additional A-NIDS techniques.

25
2.3.2.2 A-NIDS research-related environments
Although some of the above-mentioned A-NIDS platforms are also usable for
research purposes, others have been specifically developed for this. Unlike
‘‘commercial’’ A-NIDS systems, research-oriented environments include more
innovative anomaly detection techniques. Conceived as research platforms, these
systems enable the integration of contributed modules performing additional detection
techniques. This is also the case of Snort and Prelude, two of the most widely deployed
NIDS tools today.
Another observed tendency is the consideration of intrusion prevention
procedures or IPS (Intrusion Prevention System), that is, inline IDS schemes that filter
and analyze all the network traffic accessing the target environment. This has two main
consequences. On one hand, most projects have a structured architecture in which
various detectors can work jointly, typically in a distributed way (e.g. EMERALD,
AAFID, GIDRE). On the other hand, as the detectors are now ‘‘pluggable’’ modules, a
specialization of their functions and capabilities can be observed. Thus, individual
detectors are designed to monitor only a specific protocol or behaviour (e.g. Anagram
targets HTTP payloads), and the global detection capabilities of the platform result
from combining and correlating the information from different detectors.
2.3.3 OPEN ISSUES AND CHALLENGES
Intrusion detection techniques are continuously evolving, with the goal of
improving the security and protection of networks and computer infrastructures.
Despite the promising nature of anomaly-based IDS, as well as its relatively long
existence, there still exist several open issues regarding these systems. Some of the
most significant challenges in the area are:
• Low detection efficiency, especially due to the high false positive rate
usually obtained (Axelsson, 2000). This aspect is generally explained as
arising from the lack of good studies on the nature of the intrusion events.
The problem calls for the exploration and development of new, accurate
processing schemes, as well as better structured approaches to modelling
network systems.
• Low throughput and high cost, mainly due to the high data rates (Gbps)
that characterize current wideband transmission technologies (Kruegel et al.,
2002). Some proposals intended to optimize intrusion detection are
concerned with grid techniques and distributed detection paradigms.
• The absence of appropriate metrics and assessment methodologies, as
well as a general framework for evaluating and comparing alternative IDS
techniques (Stolfo and Fan, 2000; Gaffney and Ulvila, 2001). Due to the
importance of this issue, it is analyzed in greater depth below.
• The analysis of ciphered data, although this is also a general problem
faced by all intrusion detection platforms. Moreover, this problem could be

26
dealt with by simply locating the detection agents at those functional points
in the system where data are available in ‘‘plaintext’’ format and, for which
the corresponding detection analysis can be carried out without special
restrictions.
A-NIDS assessment
One of the main challenges that researchers must face, when trying to
implement and validate a new intrusion detection method, is to assess it and
compare its performance with that of other available approaches. It is noticeable
that this task is not restricted to A-NIDS, but is also applicable to NIDS in
general. The need for test-beds that provide robust and reliable metrics to
quantify NIDS has been suggested. Although some authors defend a testing
methodology in real environments, most of them, advocate an evaluation
procedure in experimental environments.
An advantage of assessment in real environments is that the traffic is
sufficiently realistic; however, this approach is subject to:
(a) The risk of potential attacks
(b) The possible interruption of the system operation due to simulated attacks
On the other hand, the evaluation of NIDS methodologies in experimental
environments involves the generation of synthetic traffic as well as background
traffic representing legal users, which is far from being a trivial undertaking.
2.3.4 SUMMARY
The present paper discusses the foundations of the main A-NIDS technologies,
together with their general operational architecture, and provides a classification for
them according to the type of processing related to the “behavioural” model for the
target system. Another valuable aspect of this study is that it describes, in a concise
way, the main features of several currently available IDS systems/platforms. Finally,
the most significant open issues regarding A-NIDS are identified, among which that of
assessment is given particular emphasis.
2.4 INCREMENTAL ANOMALY-BASED INTRUSION
DETECTION SYSTEM USING LIMITED LABELED DATA
With the proliferation of the internet and increased global access to online
media, cybercrime is also occurring at an increasing rate. Currently, both personal users
and companies are vulnerable to cybercrime. A number of tools including firewalls and
Intrusion Detection Systems (IDS) can be used as defense mechanisms. A firewall acts
as a checkpoint which allows packets to pass through according to predetermined
conditions. In extreme cases, it may even disconnect all network traffic. An IDS, on the

27
other hand, automates the monitoring process in computer networks. The streaming
nature of data in computer networks poses a significant challenge in building IDS. In
this paper, a method is proposed to overcome this problem by performing online
classification on datasets. In doing so, an incremental naive Bayesian classifier is
employed. Furthermore, active learning enables solving the problem using a small set
of labeled data points which are often very expensive to acquire. The proposed method
includes two groups of actions i.e. offline and online. The former involves data
preprocessing while the latter introduces the NADAL online method. The proposed
method is compared to the incremental naive Bayesian classifier using the NSL-KDD
standard dataset.
There are three advantages with the proposed method:
(1) overcoming the streaming data challenge;
(2) reducing the high cost associated with instance labeling; and
(3) improved accuracy and Kappa compared to the incremental naive Bayesian
approach.
Thus, the method is well-suited to IDS applications.
An attack refers to a set of actions that compromise the confidentiality,
integrity, and accessibility of resources. A system is known to be secure if it can
guarantee these three criteria. Attacks must be identified before doing any harm to the
organization. Even Local Area Networks (LAN) need to be able to withstand such
attacks since network performance is important in terms of bandwidth and other
resources. The most common means of defense against potential attacks involves a
two-layered system. The first layer comprises a firewall which controls access to the
network while the second layer is configured to detect threats that somehow manage to
pass through the firewall and take appropriate action to defend the network. This
second layer is known as an Intrusion Detection System (IDS) which is able to identify
intrusion attempts by monitoring and analyzing network packets and logs. In case an
intrusion is detected, the system alerts the network administer.
With respect to information source, IDS are divided into two categories: host-
based and network-based. Host-based methods tend to monitor and analyze internal
computer operations, for instance by determining the resources that are allowed for
each host as well as illegal access attempts. Network-based systems, in contrast, deal
with intrusion at the network level. Anomalies at this level are often caused by external
attackers whose aim is to gain unauthorized network access, steal information, and
disrupt the network. Anomalies at this level are often caused by external attackers
whose aim is to gain unauthorized network access, steal information, and disrupt the
network. There are certain challenges for anomaly detection systems. Unlike traditional
data packets which are inherently static, data streams are continuous flows of data
which cannot be stored; they must be analyzed as one unit.

28
2.4.1 RELATED WORK
Anomaly-based IDS have been extensively studied; however, few studies
present an incremental approach. Incremental methods may be supervised, semi-
supervised, and unsupervised. In this paper, supervised methods are considered which
model the normality of the data. Here, the problem of anomaly detection is converted
into one of classification.
• W.-Y. Yu and H.-M. Lee propose an incremental learning method by cascading
a Service Classifier (SC) using Incremental Tree Inductive (ITI) learning. The
cascading approach includes three steps:
(1) training;
(2) test;
(3) incremental learning.
• In another study, a novel anomaly detection system is proposed by Ren et al. to
which dynamically update normal usage profiles. Upon encountering new
behavior, density-based incremental clustering is used to insert the new
behavior into old profiles. The authors report less sensitivity to data disruptions
compared to Anomaly Detection with Fast Incremental Clustering (ADWICE)
profiles. The approach also improves cluster quality and reduces false alarms;
nevertheless, the method displays poor performance in working with large
datasets.
• Other authors propose Reserved Set-Incremental Support Vector Machine (RS-
IVM) which is an improved incremental SVM for intrusion detection. In order
to reduce the noise cause by large differences between feature values, the
authors propose a modified kernel function known as U-RBF which embeds
feature means and root square mean differences in the RBF kernel. The authors
claim that RS-ISVM facilitates the fluctuation phenomenon in the learning
process while providing better and more reliable performance. However, it
suffers from low U2R and R2L and requires a large number of parameters.
Many modern intrusion detection methods focus on feature selection or reduction. This
is because many features may be irrelevant or redundant and may inhibit system
performance. Efficient naive Bayesian classifiers are applied to the reduced dataset to
detect possible intrusions. Experimental results show that the selected features are more
appropriate for designing IDS and result in more effective intrusion detection.
In this paper, the naive Bayesian algorithm is evaluated using the KDD-NSL
dataset to detect four types of attacks: Probe, DoS, U2R, and R2L. Feature reduction
may use three standard feature selection methods: correlation, information gain, or gain
ratio. The proposed method in this study employs feature vitality-based reduction. The
results indicate that the proposed model provides better performance.

29
2.4.2 NAIVE BAYESIAN CLASSIFICATION
Naive Bayesian classification is a popular method for stream mining. The
popularity of the method is due to the fact that the model can be updated with new data
streams very easily. The method is inherently incremental since new data points are
updated as they arrive. Given this incremental nature, the algorithm is very suitable to
stream mining.
Assuming m classes, namely C1, C2, … , Cm for tuple X, the classifier seeks to find the
class with the highest posterior probability on the condition X. In fact, the classifier
predicts whether tuple X belongs to the class. Therefore, X belongs to Ci if and only if:
(1)
Since P(X) remains constant for all classes, one must determine the class that
maximizes the expression. If prior probabilities are unknown, they are commonly
regarded as being equal i.e. p(C1) = (C2) = … = p(Cm); Hence, only p(X|Ci) must be
maximized. Moreover, the probabilities may be estimated using
, where |Ci,D| is the number training tuples with the label Ci .
Datasets with large numbers of features impose high calculation cost for p(X|Ci). To
reduce the calculations, the classes are assumed to be independent. Thus, the following
is true:
(2)
Using the training tuples, individual probabilities p(X1|Ci), p(X2|Ci), and p(Xn|Ci) may
be estimated.
2.4.3 ACTIVE LEARNING
Instead of inquiring about the correct labels for all instances, active learning
determines how input instances are selectively labeled. Quite often, this approach
requires considerably fewer instances to learn a concept, compared to typical
supervised methods. In active learning, once an instance is scanned, depending on the
selected strategy, the algorithm searches for the correct label and the predictive model
is trained with the new instance.

30
In the following, we briefly explain four active learning strategies
• Random Strategy: Input samples are given random labels.
• Fixed Uncertainty Strategy: The instances for which the current classifier has
minimum confidence are labeled. A constant threshold is considered. Only
those instances are labeled for which the maximum posterior probability as
estimated by the classifier does not exceed the threshold.
• Variable Uncertainty Strategy: Instances below the threshold are labeled with
a time interval; the threshold is introduced as varying with time; and the budget
is spent in a uniform fashion over time.
• Uncertainty Strategy with Randomization: A random threshold is selected
and the labels for instances near the threshold are inquired.
2.4.4 PROPOSED METHOD
The proposed model, called Network Anomaly Detection using Active Learning
(NADAL) involves an offline and an online step. The selected dataset is preprocessed
in an offline fashion. The NSL-KDD dataset contains instances labeled with the attack
type. During the preprocessing step, the attacks are divided into four categories: DoS,
Probe, R2L, and U2R. Furthermore, there are four classifiers at the respective layers of
attacks. Thus, the preprocessing carried out using Weka selects the appropriate features
for each classifier. The selected features are then given to the feature filtering module
in NADAL.
Figure 2.4.1 illustrates the NADAL framework. In the proposed online method,
at each time, each instance is processed at most once to improve the model. The
instance is then discarded. Initially, instance Xt having label yt passes through the
feature filtering module and the appropriate features for each classifier are considered.
At each layer, the naive Bayesian module incrementally predicts the probability that the
instance belongs to the class. Thereafter, the selected active learning strategy (i.e.
uncertainty with randomization) is called. The output of the strategy determines
whether the label for the instance must be inquired. A logical OR gate is used to
aggregate the results from different active learning modules. The classifiers are updated
using the instance if the gate outputs 1. Otherwise, the aggregate output module
predicts the label according to the maximum certainty calculated by the classifiers. In
this case, ŷt represents the actual label for instance Xt.

31
2.4.4 EVALUATION
The proposed framework in this paper was implemented using Java in NetBeans
8.0.2. Feature selection was performed using Weka and the Wrapper method. The
active learning modules as well as the incremental naive Bayesian module were
implemented by modifying the code from Massive Online Analysis (MOA1) 2016.04
written in Java. The standard NSLKDD2 dataset is used for evaluation purposes. The
dataset was randomized via the Randomize functionality in Weka. The accuracy and
Kappa values were then calculated for the framework at four layers: DoS, Probe, U2R,
and R2L. The results were compared to those of the incremental naïve Bayesian
approach in MOA.
Figure 2.4.1 : The proposed model called NADAL

32
A. Dataset
As mentioned earlier, in this paper, the standard NSL-KDD dataset is used for
evaluation purposes. The dataset is a revision of the KDD-99 without repetitive
and redundant instances. Each record includes 42nd
features. The KDDtrain+.txt
file was used wherein the 42nd
feature identifies a normal vs. attack label. There
are four types of attacks: DoS, Probe, R2L, and U2R
B. Evaluation Criteria
The results are evaluated according to accuracy and Kappa. Accuracy represents
the percentage of tuples in the dataset that are correctly labeled. The measure is
calculated as below:
(3)
The Kappa coefficient measures the agreement among individuals who classify
or measure items. The value is obtained as follows:
(4)
Where p0 and pc denote observed and chance agreement, respectively.
C. Implementation Results
The results exhibit a clear improvement in both accuracy and Kappa compared
to the incremental naive Bayesian approach. The results are shown for the NSL-
KDD dataset with randomizations.
Table 2.4.1 : ACCURACY AND KAPPA FOR TEN RANDOMIZATIONS: NADAL VS. INCREMENTAL
NAIVE BAYESIAN CLASSIFIER

33
2.4.5 CONCLUSION AND RECOMMENDATIONS
Traditional data packets are inherently static. In contrast, streaming data are
continuously created; they cannot be stored; and must by analyzed as a single unit. A
novel network anomaly detection framework was proposed to improve efficiency in
classifying data in an online fashion. Furthermore, active learning was used to reduce
labeling costs. The proposed system was evaluated using the standard NSL-KDD
dataset. Implementation results revealed that the proposed method outperforms the
naive Bayesian approach in terms of both accuracy and Kappa.
2.5 A DEEP LEARNING APPROACH FOR NETWORK
INTRUSION DETECTION SYSTEM
A Network Intrusion Detection System (NIDS) helps system administrators to
detect network security breaches in their organizations. However, many challenges
arise while developing a flexible and efficient NIDS for unforeseen and unpredictable
attacks. We propose a deep learning-based approach for developing such an efficient
and flexible NIDS. We use Self-taught Learning (STL), a deep learning-based
technique, on NSL-KDD - a benchmark dataset for network intrusion. We present the
performance of our approach and compare it with a few previous works. Compared
metrics include accuracy, precision, recall, and f-measure values.
A NIDS monitors and analyzes the network traffic entering into or exiting from
the network devices of an organization and raises alarms if an intrusion is observed.
Based on the methods of intrusion detection, NIDSs are categorized into two classes:
1) Signature (misuse) based NIDS (SNIDS)
2) Anomaly Detection based NIDS (ADNIDS)
In SNIDS, e.g. Snort, attack signatures are pre-installed in the NIDS. A pattern
matching is performed for the traffic against the installed signatures to detect an
intrusion in the network.
In contrast, an ADNIDS classifies network traffic as an intrusion when it
observes a deviation from the normal traffic pattern.
SNIDS is effective in the detection of known attacks and shows high detection
accuracy with less false-alarm rates. However, its performance suffers during
detection of unknown or new attacks due to the limitation of attack signatures
that can be installed beforehand in an IDS.
ADNIDS, on the other hand, is well-suited for the detection of unknown and
new attacks. Although ADNIDS produces high false-positive rates, its
theoretical potential in the identification of novel attacks has caused its wide
acceptance among the research community.

34
There are primarily two challenges that arise while developing an efficient and
flexible NIDS for unknown future attacks. First, proper feature selections from the
network traffic dataset for anomaly detection is difficult. The features selected for one
class of attack may not work well for other categories of attacks due to continuously
changing and evolving attack scenarios. Second, unavailability of labeled traffic dataset
from real networks for developing a NIDS. Immense efforts are required to produce
such a labeled dataset from the raw network traffic traces collected over a period or in
real-time. Additionally, to preserve the confidentiality of the internal organizational
network structure as well as the privacy of various users, network administrators are
reluctant towards reporting any intrusion that might have occurred in their networks
Various machine learning techniques have been used to develop ADNIDSs,
such as Artificial Neural Networks (ANN), Support Vector Machines (SVM), Naive-
Bayesian (NB), Random Forests (RF), and Self-Organized Maps (SOM). The NIDSs
are developed as classifiers to differentiate the normal traffic from the anomalous
traffic. Many NIDSs perform a feature selection task to extract a subset of relevant
features from the traffic dataset to enhance classification results. Feature selection helps
in the elimination of the possibility of incorrect training through the removal of
redundant features and noises. Recently, deep learning-based methods have been
successfully applied in audio, image, and speech processing applications. These
methods aim to learn a good feature representation from a large amount of unlabeled
data and subsequently apply these learned features on a limited amount of labeled data
in a supervised classification. The labeled and unlabeled data may come from different
distributions. However, they must be relevant to each other.
It is envisioned that the deep learning-based approaches can help to overcome
the challenges of developing an efficient NIDS. We can collect unlabeled network
traffic data from different network sources and a good feature representation from these
datasets using deep learning techniques can be obtained. These features can, then, be
applied for supervised classification to a small, but labeled traffic dataset consisting of
normal as well as anomalous traffic records. The traffic data for labeled dataset can be
collected in a confined, isolated and private network environment. With this
motivation, we use self-taught learning, a deep learning technique based on sparse
autoencoder and soft-max regression, to develop a NIDS. We verify the usability of the
self-taught learning-based NIDS by applying on NSL-KDD intrusion dataset, an
improved version of the benchmark dataset for various NIDS evaluations - KDD Cup
99.
2.5.1 RELATED WORK
This section presents various recent accomplishments in this area. we only
discuss the work that have used the NSL-KDD dataset for their performance
benchmarking. Therefore, any dataset referred from this point forward should be
considered as NSL-KDD. This approach allows a more accurate comparison of work

35
with other found in the literature. Finally, we discuss a few deep-learning based
approaches that have been tried so far for similar kind of work.
One of the earliest works found in literature used ANN with enhanced resilient
back-propagation for the design of such an IDS. This work used only the training
dataset for training (70%), validation (15%) and testing (15%). As expected, use of
unlabeled data for testing resulted in a reduction of performance.
A more recent work used J48 decision tree classifier with 10-fold cross-
validation for testing on the training dataset. This work used a reduced feature set of 22
features instead of the full set of 41 features.
A similar work evaluated various popular supervised tree-based classifiers and
found that Random Tree model performed best with the highest degree of accuracy
along with a reduced false alarm rate.
Many 2-level classification approaches have also been proposed. One such
work used Discriminative Multinomial Naive Bayes (DMNB) as a base classifier and
Nominal-to Binary supervised filtering at the second level along with 10-fold cross
validation for testing. This work was further extended to use Ensembles of Balanced
Nested Dichotomies (END) at the first level and Random Forest at the second level. As
expected, this enhancement resulted in an improved detection rate and a lower false
positive rate.
Another 2-level implementation used principal component analysis (PCA) for
the feature set reduction and then SVM (using Radial Basis Function) for final
classification, resulted in a high detection accuracy with only the training dataset and
full 41 features set. A reduction in features set to 23 resulted in even better detection
accuracy in some of the attack classes, but the overall performance was reduced. The
authors improved their work by using information gain to rank the features and then a
behavior-based feature selection to reduce the feature set to 20. This resulted in an
improvement in reported accuracy using the training dataset.
The second category to look at, used both the training and test dataset. An initial
attempt in this category used fuzzy classification with genetic algorithm and resulted in
a detection accuracy of 80%+ with a low false positive rate. Another important work
used unsupervised clustering algorithms and found that the performance using only the
training data was reduced drastically when test data was also used.
A similar implementation using the k-point algorithm resulted in a slightly
better detection accuracy and lower false positive rate, using both training and test
datasets.
Another less popular technique, OPF (optimum path forest) which uses graph
partitioning for feature classification, was found to demonstrate a high detection
accuracy within one-third of the time compared to SVMRBF method.

36
A deep learning approach with Deep Belief Network (DBN) as a feature
selector and SVM as a classifier resulted in an accuracy of 92.84% when applied on
training data.
2.5.2 SELF-TAUGHT LEARNING & NSL-KDD DATASET
OVERVIEW
1) Self-Taught Learning
Self-taught Learning (STL) is a deep learning approach that consists of two
stages for the classification. First, a good feature representation is learnt from a
large collection of unlabeled data, xu, termed as Unsupervised Feature Learning
(UFL). In the second stage, this learnt representation is applied to labeled data,
xl, and used for the classification task. Figure 2.5.1 shows the architecture
diagram of STL. There are different approaches used for UFL, such as Sparse
Autoencoder, Restricted Boltzmann Machine (RBM), K-Means Clustering, and
Gaussian Mixtures.
A sparse autoencoder is a neural network consists of an input, a hidden, and an
output layer. The input and output layers contain N nodes, and the hidden layer
contains K nodes. The target values at the output layer are set equal to the input
values, i.e., x̂ i = xi as shown in Figure 2.5.1(a). The sparse autoencoder network
finds the optimal values for weight matrices, W ∈ K×N and V ∈ N×K, and bias
vectors, b1 ∈ K×1 and b2 ∈ N×1, using back-propagation algorithm while
trying to learn the approximation of the identity function, i.e., output x̂ similar to
x. Sigmoid function, 𝑔(𝑧) =
1
1+ⅇ−𝑧
, is used for the activation, hW,b of the nodes
in the hidden and output layers:
hW,b(x) = g(Wx + b) (1)
(2)
The cost function to be minimized in sparse autoencoder using back-
propagation is represented by Eq. (2). The first term is the average of sum-of-
square error terms for all m input data. The second term is a weight decay term,
with λ as weight decay parameter, to avoid the over-fitting in training. The last
term in the equation is sparsity penalty term that puts a constraint into the
hidden layer to maintain a low average activation values, and expressed as
KullbackLeibler (KL) divergence shown in Eq. (3):
(3)

37
where ρ is a sparsity constraint parameter ranges from 0 to 1 and β controls the
sparsity penalty term. The KL(ρ||p̂ j) attains a minimum value when ρ = p̂ j,
where p̂ j denotes the average activation value of hidden unit j over all training
inputs x. Once we learn optimal values for W and b1 by applying the sparse
autoencoder on unlabeled data, xu, we evaluate the feature representation a =
hW,b1(xl) for the labeled data, (xl,y). We use this new feature representation, a,
with the labels vector, y, for the classification task in the second stage. We use
soft-max regression for the classification task as shown in the Figure 2.5.1(b)
2) NSL-KDD Dataset
NSL-KDD dataset is an improved and reduced version of the KDD Cup 99
dataset. The KDD Cup dataset was prepared using the network traffic captured
by 1998 DARPA IDS evaluation program. The network traffic includes normal
and different kinds of attack traffic, such as DoS, Probing, user-to-root (U2R),
and root-to-local (R2L). The network traffic for training was collected for seven
weeks followed by two weeks of traffic collection for testing in raw tcpdump
format. The test data contains many attacks that were not injected during the
training data collection phase to make the intrusion detection task realistic. It is
believed that most of the novel attacks can be derived from the known attacks.
Finally, the training and test data were processed into the datasets of five
million and two million TCP/IP connection records, respectively. The KDD
Cup dataset has been widely used as a benchmark dataset for many years in the
evaluation of NIDS. One of the major drawbacks with the dataset is that it
contains an enormous amount of redundant records both in the training and test
data. It was observed that almost 78% and 75% records are redundant in the
Figure 2.5.1 : The two-stage process of self-taught learning: a) Unsupervised
Feature Learning (UFL) on unlabeled data. b) Classification on labeled data.

Seminar Report | Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection

Seminar Report | Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Seminar Report | Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection

Similar to Seminar Report | Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection (20)

Recently uploaded

Recently uploaded (20)

Seminar Report | Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection