NETWORJS3.pdf

Improving Network Security Using Machine Learning Techniques
Shaik Akbar1
, Dr. J.A. Chandulal2
, Dr. K. Nageswara Rao3
, G. Sudheer Kumar4
1
Department of Computer Science and Engineering, SVIET, Nandamuru, Andhra Pradesh, India
2
Department of Computer Science and Engineering, GITAM University, Visakhapatnam, Andhra Pradesh, India
3
Department of Computer Science and Engineering, P.V.P. S. I .Technology, Vijayawada, Andhra Pradesh, India
4
Department of Computer Science and Engineering, SVIET, Nandamuru, Andhra Pradesh, India
(akbar999@gmail.com, chandulal@gitam.edu, hodcse@pvpsiddhartha.ac.in, sudheerkumargedela@gmail.com)
Abstract - Discovery of malicious correlations in
computer networks has been an emergent problem
motivating extensive research in computer science to develop
improved intrusion detecting systems (IDS). In this
manuscript, we present a machine learning approach known
as Decision Tree (C4.5) Algorithm and Genetic Algorithm,
to classify such risky/attack type of connections. The
algorithm obtains into consideration dissimilar features in
network connections and to create a classification rule set.
Every rule in rule set recognizes a particular attack type.
For this research, we implement a GA, C.45 and educated it
on the KDD Cup 99 data set to create a rule set that can be
functional to the IDS to recognize and categorize dissimilar
varieties of assault links. During our study, we have
developed a rule set contain of six rules to classify six
dissimilar attack type of connections that fall into 4 modules
namely DoS, U2R, root to local and probing attacks. The
rule produces works with 93.70% correctness for detecting
the denial of service type of attack connections, and with
significant accuracy for detecting the root to local, user to
root and probe connections. Results from our experiment
have given hopeful results towards applying enhanced
genetic algorithm for NIDS.
Keywords - Intrusion Detection, Genetic Algorithm, C4.5
Algorithm, KDDCup’p99, Computer networks, Data mining
I. INTRODUCTION
In the previous few existences, the information
uprising has lastly come of time. Extra than ever before,
we observe that the Internet has altered our lives. The
potentialities and chances are boundless; unluckily, so too
are the threats and probability of hateful interruptions.
Intruders can be divided into two categories: unknown
and insiders. Outsiders are interlopers who advance your
structure from exterior of your network and who may
possibly assault your outside incidence (i.e. spoil net
servers, advance spam during electronic mail servers, etc.)
They can also effort to go around the firewall and assault
machines on the interior network. Insiders, in difference,
re official customers of your inner network who abuse
advantages, imitate higher advantaged users, or use
rightness information to increase contact from outside
causes Axelsson 2000 [1].
Intrusion detection systems have emerged to identify
exploits which imperil the honesty, discretion or
accessibility of a reserve as an effort to give an answer to
obtainable safety problems. The proceeds of identifying
exploit those efforts to cooperation the discretion, honesty
or accessibility of a source Kruegel C et.al 2002 [2].
Intrusion is the act of breaking the security policy or legal
protections that affect to an information system. An
intruder is somebody efforting to split into or abuse your
system Kayacik et.al 2003 [3]. Stateless firewalls suffer
from numerous significant drawbacks that make them
inadequate to protect networks by themselves.
There are two common groups of assaults which
interruption detection expertise effort to recognize –
irregularity detection and abuse detection, Anomaly
discovery recognizes actions that differ from recognized
models for customers, or factions of customers. Anomaly
discovery usually occupies the creation of information
supports that include the outlines of the experiential
actions. The next common advance to interruption
detection is abuse detection. This advance absorbs the
difference of a user’s actions with the identified
performances of assailant’s difficulty to enter a system
Axelsson 2000[1]. Although irregularity discovery
classically develops entrance examining to identify when
a confident recognized metric has been reached, abuse
discovery advance frequently use a rule-based advance.
When applied to abuse discovery, the systems become
situations for network assaults. The interruption detection
machine recognizes a probable assault if a user’s actions
are established to be dependable with the traditional rules
Bass T. 2000 [4].
KDDCup99 Data set is used for Intrusion Detection
and the development replica is checked on the data set.
The process of Artificial Intelligence for detection of
intrusions is the method to build precise or correct IDS.
To recognize misuse, anomaly detection and detecting
key models are identified by using the rule based, Genetic
Algorithm and C4.5 algorithm techniques.
978-1-4673-1344-5/12/$31.00 ©2012 IEEE

II. ATTRIBUTE SELECTION
The information gets determine used in C4.5
algorithm is utilized to choose the check feature at every
node in the hierarchy. Such a compute is referred to as a
feature choice determine or calculate of the integrity of
split. The feature with the greatest information gain
preferred as the examination attribute for the near node.
This attribute decreases the information essential to
categorize the models in the ensuing dividers. Such an
information-theoretic advance decreases the probable
quantity of checks required to categorize an object and
assurances that an easy tree is produce.
III. EXISTING ALGORITHM: INFORMATION GAIN
Let S be a set of training set models with their
identical tags. Suppose there are m modules and the
training set includes Si models of class ‘I‘ and ‘s’ is the
entire amount of models in the training set. Expected
information essential to categorize a certain model is
intended by:
i=1
I(S1,S2,……Sm) = - ∑ Si / S log2Si (1)
m
A feature F with values {f1,f2, ………fv} can split the
training set into v subsets
In addition let Sj have Sij trials of class i. Entropy of the
feature F is
V
E(F)= ∑ S1j + …….+Smj / S * I(S1j,S2j,…..Smj) (2)
j=1
Information gain for F can be intended as:
Gain(F) = I( S1,S2, …… ,Sm) - E(F) (3)
In this study, information gain is considered for class
tags by using a binary intolerance for all class. That is, for
each class, a dataset instance is measured in-class, if it has
the equivalent label; out-class, if it has a dissimilar label.
Consequently as different to manipulating one
information gain as a general assess on the significance of
the attribute for every class, so guess an information gain
for all class. Thus, this indicates how well the attribute
can categorize the particular class from extra classes.
IV. PROPOSED ENHANCEMENT: GAIN RATIO
CRITERION
The thought of information gain recognized previous
leans to maintain features that have an enormous amount
of ideals. For instance, if we have a feature D that has a
separate value for every record, then Info (D, T) is 0, thus
Gain (D, T) is maximal. To reimburse for this, it was
recommended in [6] to use the subsequent ratio in its
place of gain. Split info is the information due to the split
of T on the foundation of the value of the type feature D,
which is defined by
n
Split Info(x) = -∑ |Ti| / |T|.log2 |Ti| / |T| (4)
i=1
And the gain ratio is then calculated by
GainRatio(D,T) = Gain(D,T)/SplitInfo(D,T) (5)
The gain ratio, positions the sum of helpful
information formed by split, i.e., that shows cooperative
for categorization. If the split is close to slight, split
information will be tiny and this ratio will be unstable. To
avoid this, the gain ratio typical chooses a test to exploit
the ratio above, topic to manage that the information gain
must be great, at least as large as the average gain over all
tests checked.
V. CLASSIFYING AND DETECTING ANOMALIES
Misuse detection is done through applying rules to the
test data. Test data is collected from the KDDCUP Data
set. The test data is stored in the database. The rules are
applied as SQL query to the database. This classified data
under different attack categories as follows:
1) DOS
2) Probe
3) U2R
4) R2L
The C4.5 algorithm generates a decision tree, from the
origin node, by selecting one outstanding feature with the
maximum information gain as the examination for the
present node. In this work, Enhanced C4.5, by selecting
one enduring feature with the highest information gain
ratio as the check for present node is considered a
afterward edition of the C4.5 algorithm, will be used to
build the decision trees for categorization. From the table
3 it is clear that Enhanced C4.5 outperforms the classical
C4.5 algorithm Split info is the information owing to the
split of T on the foundation of the rate of the definite
feature D, which is clear by
n
Split Info(x) = -∑ |Ti| / |T|.log2 |Ti| / |T| (4)
i=1
2012 IEEE International Conference on Computational Intelligence and Computing Research

And the gain ratio is then calculated by
GainRatio (D,T) = Gain(D,T) / SplitInfo(D,T) (5)
In Enhanced C4.5 the gain ratio, states the amount of
helpful data created by split, i.e., that shows obliging for
categorization. If the split is near-trifling, split
information will be little and this ratio will be unstable.
To avoid this, the gain ratio form chooses a check to
exploit the ratio above, topic to the restriction that the
information gain must be great, at slightest as great as the
standard gain over all assessments studied.
VI. CONCLUSIONS OVERALL PERFORMANCE FOR
C4.5 ALGORITHM VS ENHANCED C4.5
ALGORITHM
This table1 illustrates the overall detection rate and
false positive rate for C4.5 and Enhanced C4.5 algorithm.
Enhanced C4.5 gives improved accuracy for DoS, Probe,
R2L and U2R categories compared to C4.5 algorithm.
TABLE 1
OVERALL DETECTION RATE AND FALSE POSITIVE RATE FOR C4.5
AND ENHANCED C4.5 ALGORITHM
Sl.
No
Attack
Category
Detection
Rate (%)
(C4.5)
Detection
Rate (%)
(Enhanced
C4.5)
False
Positive
(%)
(Enhanced
C4.5)
1 DoS 90.6 92.92 0.085
2 Probe 84.0 88.29 0.152
3 U2R 83.6 84.00 0.220
4 R2L 53.7 66.91 0.398
Average Success
Rate
77.975 83.03 0.213
VII. MODEL RESULT SCREEN SHOTS
Fig. 1. Showing KDDCUP Decision Tree Data Set
Fig. 2. Entropy and Gain Ratio Values of All Attributes

VIII. FUTURE DETECTION GENETIC ALGORITHM
OVERVIEW
List illustrates the chief steps of the prepared detection
algorithm as well as the training process. It primary
produces the first population and loads the network audit
data. Then the original population is developed for an
amount of creations. In every formation, the qualities of
the rules are initially considered, and then amounts of
best-fit rules are chosen. The training technique creates by
arbitrarily producing an original population of rules (Step
1). Step 2 estimates the whole amount of records in the
audit data. Step 3 estimates the fitness of every rule and
choose the best-fit rules into novel population. Step 4
guesses the rank selection of individuals. Step 5-7 apply
the crossover and mutation operators to each rule in the
novel population. Step 8 prefers the top greatest
chromosomes into novel population. Finally, Step9
confirms and chooses whether to stop the training
procedure or to go into the next creation to maintain the
advance process.
A. Solution Steps of the Detection Algorithm
Algorithm: Rule set creation with Genetic Algorithm
Input: Amount of productions, Set Binary String,
Population range, Crossover possibility, Mutation
possibility.
Output: A position of chosen attributes.
Step 1) Initialize the Population randomly
Step 2) Total number of Records in the Training Set
Step 3) Estimate Fitness = f(x)/ f (sum)
Where f (x) is the fitness of individual x and f is
the entire fitness of all individuals
Step 4) Rank Selection Ps(i) = r(i) / rsum
Where Ps(i) is probability of selection individual
r(i) is rank of individuals
rsum is sum of all fitness values.
Step 5) For every Chromosome in the novel Population
Step 6) Apply regular Crossover operator to the
Chromosome
Step 7) Apply Mutation operator to the Chromosome
Step 8) Choose the top greatest 60% of Chromosomes
into new population
Step 9) if the amount of creations is not attained, go to
Step 3.
IX. EXPERIMENTAL RESULTS
From the above implementation we have successfully
generate some rules that classify the stated attack
connections and for applying Genetic Algorithm on
selected feature set and find the fitness value for each
generation. This section reports four different attack
categories that can recognize.
TABLE 2
ENHANCED RULE BASED GA - DETECTION RATE FOR DOS,
R2L, U2R, PROBE ATTACKS
Sl.
No
Attack
Category
Detection Rate
(%)
False Positive
(%)
1 DoS 93.70 0.063
2 R2L 88.85 0.112
3 U2R 92.50 0.075
4 Probe 95.33 0.055
Average Success Rate 92.595 0.076
TABLE 3
OVERALL PERFORMANCE COMPARISONS OF G.A VS ENHANCED G.A.
The graph in figure 3 shows the performance of G.A
and Enhanced G.A in terms of accuracy for the DoS,
R2L, U2R, Probe.
0
20
40
60
80
100
DoS Probe U2R R2L
Attack Categories
Detection
Rate
Detection Rate (%)
(Hoffman)
Detection Rate (%)
(Selvakani)
Detection Rate (%)
(Enhanced G.A)
Fig. 3. Shows the performance of G.A and Enhanced G.A
Sl.
No
Attac
k
Categ
ory
Detecti
on
Rate
(%)
(Hoff
man)
Detecti
on Rate
(%)
(Selvak
ani)
Detecti
on
Rate
(%)
(Enhan
ced
G.A)
False
Positive
(%)
(Enhanc
ed G.A)
1 DoS 82.9 86.7 93.70 0.063
2 Probe 75.3 79.1 95.33 0.112
3 U2R 73.1 71.2 92.50 0.075
4 R2L 85.3 83.3 88.85 0.055
Average
Success Rate
79.15 80.075 92.595 0.076

TABLE 4
PERFORMANCE COMPARISON OF ENHANCED G.A VS ENHANCED
C4.5
The graph in figure 4 shows the performance of improved
G.A and enhanced C4.5 in terms of accuracy for the DoS,
R2L, U2R, Probe categories.
Fig. 4. Shows the Performance of Enhanced G.A an Enhanced C4.5
Algorithm
X. CONCLUSION AND FUTURE WORK
The improved Genetic Algorithm is a well proper
method for Intrusion Detection compared to enhanced
C4.5 algorithm. Obtain different classification rules for
Intrusion Detection through Genetic Algorithm. The
future Genetic Algorithm presents the Intrusion Detection
System for detecting DoS, R2L, U2R, Probe from
KDDCUP99 Dataset. The outputs of the experiments are
satisfactory with an average success rate of 92.595% and
the overall results of the technique implemented are good.
In Future we have to implement with other features and
different classification methods.
REFERENCES
[1] Axelsson S. 2000, Intrusion Detection Systems: A Survey and
Taxonomy, Technical Report, Dept.of Computer Engineering,
Chalmers University.
[2] Kruegel C and F Valeru. 2002, Stateful Intrusion Detection for
High-Speed Networks, Proceddings of the IEEE Symposium on
Research on Security and Privacy, 285-293.
[3] Kayacik G., Zincir-Heywood N., Heywood M.2003, On the
Capability of an SOM-based Intrusion Detection System,
proceedings of International Joint Conference on Neural Networks.
[4] Bass T.2000, Intrusion detection systems and multisensor data
fusion, Communications of the ACM, Vol.43, 99-105.
[5] S.Selvakani k, Rengan S Rajesh, “Integrated Intrusion Detection
System Using Soft Computing”, IJNS, Vol.10, No.2, pp.87-92,
March 2010.
[6] Bridges S.M and Vaughn R.B, “Fuzzy Data Mining and Genetic
Algorithms Applied to Intrusion Detection”, Proceedings of 12th
Annual Candian Information Technology Security Symposium,
pp.109-122, 2000.
[7] Crosbie Mark and Gene Spafford 1995, “Applying Genetic
Programming to Intrusion Detection”. In Proceeding of 1995
AAAIFall Symposium on Genetic Programming, pp.1-8
Cambridge, Massachusetts.
[8] Chitur. A, “Model Generation for an Intrusion Detection System
using Genetic Algorithms”, High School Hornors Thesis,
http”//www/cs columibi.edu/ids/publications/gaidsthesis 01.pdf
accessed in 2006.
[9] C. Xiang and S.M. Lim, “Design of multiple-level hybrid classifier
for intrusion detection system,” in IEEE Transaction on System
Man, Cybernetics Part A, Cybernetics, Vol.2, No.28, Mystic, CT,
pp. 117-122, May, 2005.
[10] J. Shavlik and M. Shavlik, “Selection, combination, and evaluation
of effective software sensors for detecting abnormal computer
usage, “Proceedings of the First International Conference on
Network security, Seattle, Washington, USA,pp. 56-67, May 2003.
Sl.
N
o
Attac
k
Categ
ory
Detection
Rate (%)
(Enhance
d G.A)
False
Positive
(%)
(Enhanc
ed G.A)
Detec
tion
Rate
(%)
(Enh
anced
C4.5)
False
Positiv
e (%)
(Enha
nced
C.4.5)
1 DoS 93.70 0.063 92.92 0.085
2 Probe 95.33 0.112 88.29 0.152
3 U2R 92.50 0.075 84.00 0.220
4 R2L 88.85 0.055 66.91 0.398
Average
Success
Rate
92.595 0.076 83.03 0.213
0
20
40
60
80
100
DoS Probe U2R R2L
Attack Categories
Detection
Rate
Detection Rate (%) (Enhanced
G.A)
Detection Rate (%) (Enhanced
C4.5)

NETWORJS3.pdf

Recommended

Recommended

More Related Content

Similar to NETWORJS3.pdf

Similar to NETWORJS3.pdf (20)

Recently uploaded

Recently uploaded (20)

NETWORJS3.pdf