A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets
1. A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets 353
A Study on Genetic-Fuzzy Based Automatic
Intrusion Detection on Network Datasets
Jabez1
and Anadha Mala2
1
Sathyabama University, India
2
St. Joseph’s College of Engineering, India
E-mail: 1
jabezme@gmail.com
ABSTRACT: The intrusion detection aims at distinguishing the attack data and the
normal data from the network pattern database. It is an indispensable part of the
information security system. Due to the variety of network data behaviours and the
rapid development of attack fashions, it is necessary to develop a fast machine-
learning-based intrusion detection algorithm with high detection rates and low false-
alarm rates. In this correspondence, we propose a novel fuzzy method with genetic for
detecting intrusion data from the network database. Genetic algorithm is an
evolutionary optimization technique, which uses Directed graph structures instead of
strings in genetic algorithm or trees in genetic programming, which leads to enhancing
the representation ability with a compact programs derived from the reusability of nodes
in a graph structure. By combining fuzzy set theory with Genetic proposes a new
method that can deal with a mixed of database that contains both discrete and
continuous attributes and also extract many important association rules to contribute
and to enhance the Intrusion data detections ability. Therefore, the proposed method
is flexible and can be applied for both misuse and anomaly detection in data-
intrusion-detection problems. Also the incomplete database will include some of the
missing data in some tuples and however, the proposed methods by applying some rules
to extract these tuples. The Genetic-Fuzzy presents a data Intrusion Detection Systems
for recovering data. It also include following steps in Genetic-Fuzzy rules:
• Process data model as a mathematical representation for Normal data.
• Improving the process data model which improves the Model of normal data and
it should represent the underlying truth of normal Data.
• Uses cluster centers or centroids and use distances away from the centroids and
convert the Data to Training Data.
Keywords: Intrusion, Centroids, Tuples.
1. INTRODUCTION
Many kinds of systems over the Internet such as online shopping, Internet banking,
trading stocks and foreign exchange, and online auction have been developed. However,
due to the open society of the Internet, the security of our computer systems and data is
2. 354 ICSEMA–2012
always at risk. The extensive growth of the Internet has prompted data intrusion detection
to become a critical component of infrastructure protection mechanisms. The data
intrusion detection can be defined by identifying a set of malicious actions that threaten
the integrity, confidentiality and availability of data resources.
Normal Intrusion detection is traditionally divided into two categories, i.e., misuse detection
and anomaly detection. Misuse detection mainly searches for specific patterns of data or
sequences of data and user behaviour data which matches and are well-known as intrusion
scenarios. While, the anomaly detection models are developed for normal data and intrusion
data. The intrusion data that are been detected and evaluated significantly from the normal
data by applying various data mining approaches. The advantage of using anomaly based data
intrusion detection is that it mainly detects novel intrusions that have not been observed.
A significant challenge in providing an effective defense mechanism to a network perimeter
is having the ability to detect intrusions and implement counter measures. Components of
the network perimeter defense capable of detecting intrusions are referred to as Intrusion
Detection Systems (IDS).
IDS is further classified as signature–based (also known as misuse system) or anomaly–
based. Signature–based systems attempt to match observed activities against well defined
patterns, also called signatures. Anomaly–based systems look for any evidence of activities
that deviate from what is considered normal system use. These systems are capable of
detecting attacks for which a well–defined pattern does not exist (such as a new attack or a
variation of an existing attack). A hybrid IDS is capable of using signatures and detecting
anomalies.
While accuracy in data is the essential requirement of an Intrusion Detection System (IDS),
its extensibility and adaptability are also critical in today’s data computing environment.
Currently, building of effective IDS is an enormous knowledge engineering task.
Accepted rely on their intuition and experience to select the statistical measures for anomaly
detection. Experts first analyze and categorize attack scenarios and data vulnerabilities, and
hand-code the corresponding rules and patterns for misuse detection. Due to the manual
and ad-hoc nature of the development process, such IDS have limited extensibility and
adaptability.
A basic premise for intrusion detection is that when audit mechanisms are enabled to record
system data events, distinct evidence of egitimate activities and intrusions will be manifested
in the audit data because of the large amount of audit records and the variety of system
features, efficient and intelligent data analysis tools are required to discover the behaviour
of system activities.
KDD99 Cup dataset and the Defence Advanced Research Projects Agency (DARPA)
datasets rovided by MIT Lincoln Laboratory are widely used as training and testing
3. A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets 355
datasets for the evaluation of IDSs. An evolutionary neural network is introduced for
each specific system-call-level to audit data.
Parikh and Chen discussed a classification system using several sets of neural networks
for specific classes and also proposed a technique for cost minimization in the intrusion-
detection problems. Data mining generally refers to the process of extracting useful rules
from large stores of data.
The recent rapid development in data mining contributes to developing wide variety of
algorithms suitable for network-intrusion-detection problems. Intrusion detection can be
thought of as a classification problem: we wish to classify each audit record into one of
discrete sets of possible categories, normal or a particular kind of intrusion. As one of the
most popular data mining methods for wide range of applications, rule mining is used to
discover new rules or correlations among a set of attributes in a dataset. The relationship
between datasets can be represented as rules. An rule is expressed by X _Ë Y, Where X and Y
contain a set of attributes. This means that if a tuple satisfies X, it is also likely to satisfy Y.
The most popular model for mining rules from databases is the a priori algorithm this
algorithm measures the importance of rules with two factors: support and confidence.
However, this algorithm may suffer form large computational complexity for rule extraction
from a dense database.
In order to discover interesting rules from a dense database, genetic algorithm (GA) and
genetic programming (GP) have been applied to rule mining. In the GA, the method
evolves the rules during generations and individuals or population themselves represent
the association relationships. However, it is not easy for GA to extract enough number of
interesting rules, because a rule is represented as an individual of GA.
GP improves the interpretability of GA by replacing the gene structures with the tree
structures, which enables higher representation ability of association rules. As an extended
evolutionary algorithm of GA and GP, genetic network programming that represents its
solutions using directed graph structures has been proposed. Originally, Genetic-Fuzzy is
applied to dynamic problems based on inherent features of the graph structure such as
reusability of nodes like Automatically Defined Functions (ADFs) in GP, a compact structure
without bloat and applicability to partially observable Markov decision process. However, to
extend the applicable fields of Genetic-Fuzzy and rule mining technique using Genetic-
Fuzzy has been developed.
The advantage of rule mining methods is to extract sufficient number of important rules
for user’s purpose rather than to extract all the rules meeting the criteria. Like most of the
existing rule mining algorithms, conventional rule mining based on Genetic-Fuzzy is able
to extract rules with attributes of binary values. However, in real-world applications,
atabases are more likely to be composed of both binary and continuous values.
4. 356 ICSEMA–2012
This paper describes a novel fuzzy rule mining method based on Genetic-Fuzzy and its
application to intrusion data detection. By combining fuzzy set theory with Genetic-Fuzzy,
the proposed method can deal with the mixed database that contains both discrete and
continuous attributes. Such mixed database is normal in real world applications and Genetic-
Fuzzy can extract rules that include both discrete and continuous attributes consistently.
The initiative of combining association rule mining with fuzzy set theory has been
applied more frequently in recent years. The original idea comes from dealing with
quantitative attributes in a database, where discretization of the quantitative attributes into
intervals would lead to under- or overestimate the values that are near the borders. This is
called the sharp boundary problem. Fuzzy sets can help us to overcome this problem by
allowing different degrees of memberships. Compared with traditional association rules
with crisp sets, fuzzy rules provide good linguistic explanation.
2. OVERVIEW OF THE PROPOSED APPROACH
A novel fuzzy rule mining method based on Genetic-Fuzzy and its application to
intrusion d data detection. By combining fuzzy set theory with Genetic, the proposed
method can deal with the mixed database that contains both discrete and continuous
attributes. Such mixed database is normal in real world applications and Genetic-Fuzzy
can extract rules that include both discrete and continuous attributes consistently.
The initiative of combining association rule mining with fuzzy set theory has been applied
more frequently in recent years. The original idea comes from dealing with quantitative
attributes in a database, where discretization of the quantitative attributes into intervals
would lead to under-or overestimate the values that are near the borders. This is called the
sharp boundary problem. Fuzzy sets can help us to overcome this problem by allowing
different degrees of memberships.
Compared with traditional association rules with crisp sets, fuzzy rules provide good
linguistic explanation. Here, the concept of Genetic-Fuzzy rule mining is introduced in detail.
The fuzzy membership values are used for fuzzy rule extraction, and sub attribute-utilization
mechanism is proposed to avoid the information loss. Meanwhile, a new Genetic-Fuzzy
structure for rule mining is built up so as to conduct the rule extraction step.
In addition, a new fitness function that provides the flexibility of mining more new rules
and mining rules with higher accuracy is given in order to adapt to different kinds of
detection. After the extraction of class-association rules, these rules are used for classification.
In this paper, two kinds of classifiers are built up for misuse detection and anomaly detection,
respectively, in order to classify new data correctly.
For misuse detection, the normal-pattern rules and intrusion-pattern rules are extracted from
the training dataset. Classifiers are built up according to these extracted rules. While, for
anomaly detection, we focus on extracting as many normal-pattern rules as possible.
Extracted normal-pattern rules are used to detect novel or unknown intrusions by evaluating
5. A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets 357
the deviation from the normal behaviour. The decision rules are provided for both categorical
and continuous features.
The relations between categorical and continuous features are handled naturally, without any
forced conversions between these two types of features. A simple over fitting handling is used
to improve the learning results. In the specific case of network intrusion detection, we use
adaptable initial weights to make the trade off between the detection and false-alarm rates.
The experiment results show that our algorithm has a very low falsealarm rate with a high
detection rate, and the run speed of our algorithm is faster in the learning stage compared
with the published run speeds of the existing algorithms.
Features of the proposed method are summarized as follows:
1. Genetic-Fuzzy rule mining can deal with both discrete and Continuous attributes in
the database, which is practically useful for real network-related databases.
2. Sub attribute utilization considers all discrete and continuous attribute values as
information, which contributes to avoid data loss and effective rule mining in Genetic-
Fuzzy.
3. The proposed fitness function contributes to mining more new rules with higher
accuracy.
Table 1: Rules Applied in Genetic-Fuzzy
6. 358 ICSEMA–2012
4. The proposed framework for intrusion detection can be flexibly applied to both
misuse and anomaly detection with specific Designed classifiers.
5. Experienced knowledge on intrusion patterns is not required before the training.
6. High Detection Rates (DRs) are obtained in both misuse detection and anomaly
detection.
Thus the Judgment node Transfer the packet to processing node and receive the ACK and
then Apply Genetic-Fuzzy Rules with attributes Matching Probability produce Misuse
Detection. Normal rule pool Probability produce Anomaly Detection and finally it
Calculates the Detection Rate.
3. PROPOSED ALGORITHM
The decision rules are provided for both categorical and continuous features. The relations
between categorical and continuous features are handled naturally, without any forced
conversions between these two types of features. A simple over fitting handling is used to
improve the learning results. In the specific case of network intrusion detection, we use
adaptable initial weights to make the trade-off between the detection and false-alarm
rates. The experiment results show that our algorithm has a very low false-alarm rate with
a high detection rate, and the run speed of our algorithm is faster in the learning stage
compared with the published run speeds of the existing algorithms.
Fig. 1: Structure of the Proposed Approach
7. A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets 359
Step 1: Record the System Calls
• Special programs such as strace
• Collects process ids and system call numbers
• System call numbers are found by their order in call file.
Step 2: Convert the Data to the Training Data
• List of process Ids and system calls are converted to n Length strings
• n is 6, 10, or 14
• Take a sliding window across the data.
Step 3: Build the Process Data Model
• The process data model is a mathematical representation of normal behaviour
• Improving the process data model improves the model of normal behaviour
• It should represent the underlying truth of normalcy of the Data.
Fig. 2: Overall Architecture Flow
Step 4: Compare New Process Data with the Process Data Model
• New process data is converted to a form that can be compared against the process
data model
• Our form is also a set of strings
• This new data is compared and later classified in step 5 as Normal or abnormal
behaviour.
Step 5: Determine an Intrusion
• Hard limits are given to the intrusion signal to determine if new process data is either
a normal or abnormal behaviour
8. 360 ICSEMA–2012
• One and a half times the maximum self test signal is considered a true negative.
Anything less is a false negative
• Genetic network programming apply
• Probability of loss finding
• Intrusion detection.
4. RESULTS AND DISCUSSION
The effectiveness and efficiency of the proposed method are studied using KDD99 Cup
and DARPA98 database. The features of the proposed method are summarized as follows
and they are Genetic-Fuzzy rule mining can deal with both discrete and continuous attributes
in the database, which is practically useful for real network-related databases. Sub attribute
utilization considers all discrete and continuous attribute values as information, which
contributes to avoid data loss and effective rule mining in Genetic-Fuzzy. The proposed
fitness function contributes to mining more new rules with higher accuracy. Also the
proposed framework for intrusion detection can be flexibly applied to both misuse and
anomaly detection with specific designed classifiers. The experienced knowledge on intrusion
patterns discovery are not required before the training. Also it has a high Detection Rates
(DRs) are obtained in both misuse detection and anomaly detection.
Fig. 23
(a) Trained input vs. Attacks
(b) Detection Rate vs. Category
9. A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network Datasets 361
Test were conducted which are shown in Figure 3a was taken on the training of different
types of trained input versus Attacks were highlighted. The experiment was conducted again
and again which provided a 100% result. Also the experiment was proved in Figure 3b has
a high detection rate.
5. CONCLUSION
A Genetic-Fuzzy rule mining with sub attribute utilization and the classifiers based on the
extracted rules have been proposed, which can consistently use and combine discrete and
continuous attributes in a rule and efficiently extract many good rules for classification.
As an application, intrusion-detection classifiers for both misuse detection and anomaly
detection have been developed and their effectiveness is confirmed using KDD99 Cup
and DARPA98 data. In the misuse detection show that the proposed method shows high
DR and low PFR, which are two important criteria for security systems. In the anomaly
detection, the results show high DR and reasonable PFR even without pre experienced
knowledge, which is an important advantage of the proposed method. In order to analyze
the proposed method in the intrusion-detection problem in detail, Genetic-Fuzzy data
mining is compared with that with crisp data mining, and the result clarifies the necessity
to introduce fuzzy membership functions into Genetic-Fuzzy based data mining. The
important function of the proposed method is to efficiently extract many rules that are
statistically significant and they can be used for several purposes. The matching of a new
connection with the normal rules and the intrusion rules are calculated, respectively, and
the connection is classified into the normal class or intrusion class. When we use the rules
for anomaly detection, only the rules of the normal connections are used to calculate the
deviation of a new connection from the normal data. Therefore, many rules extracted by
Genetic-Fuzzy cover the spaces of the classes widely.
REFERENCES
[1] Shingo Mabu, Ci Chen, Nannan Lu, Kaoru Shimada, and Kotaro Hirasawa “An Intrusion-
Detection Model Based on Fuzzy Class-Association-Rule Mining Using Genetic Network
Programming “, January 2011.
[2] J. G.-P. A. El Semaray, J. Edmonds, and M. Papa, “Applying data mining of fuzzy
association rules to network intrusion detection,” presented at the IEEE Workshop Inf.,
United States Military Academy, West Point, NY, 2006.
[3] W. Hu, W. Hu, and S. Maybank, “Adaboost-based algorithm for network intrusion detection,”
IEEE Trans. Syst.,Man, Cybern. B, Cybern., vol. 38, no. 2, pp. 577–583, Apr. 2008.
[4] Z. Bankovi´c, D. Stepanovi´c, S. Bojani´c, and O. Nieto-Taladriz, “Improving network
security using genetic algorithm approach,” Comput. Elect. Eng., vol. 33, pp. 438–451, 2007.
[5] S.-J. Han and S.-B. Cho, “Evolutionary neural networks for anomaly detection based on the
behaviour of a program,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 36, no. 3, pp. 559–
570, Jun. 2006.
[6] J. Zhang, M. Zulkernine, and A. Haque, “Random-forestsbased network intrusion detection
systems,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 38, no. 5, pp. 649–659, Sep. 2008.
10. 362 ICSEMA–2012
[7] K. Shimada, K. Hirasawa, and J. Hu, “Genetic network programming with class association
rule acquisition mechanisms from incomplete databases,” in Proc. SICE Annu, Conf.,
Kagawa, Japan, 2007, pp. 2708–2714.
[8] K. Shimada, K. Hirasawa, and J. Hu, “Class association rule mining from incomplete
database using genetic network programming,” (in Japanese),IEEJ Trans. EIS, vol. 128, no. 5,
pp. 795– 803, 2008.
[9] S. Mabu, K. Hirasawa, and J. Hu, “A graph-based evolutionary algorithm: Genetic Network
Programming (GNP) and its extension using reinforcement learning,” Evol. Comput, vol. 15,
no. 3, pp. 369– 398, 2007
[10] T. Eguchi, K. Hirasawa, J. Hu, and N. Ota, “A study of evolutionary multiagent models
based on symbiosis,” IEEE Trans. Syst.,Man, Cybern. B, Cybern., vol. 36, no. 1, pp. 179–193,
Feb. 2006.
[11] K. Hirasawa, T. Eguchi, J. Zhou, L. Yu, and S. Markon, “A doubledeck elevator group
supervisory control system using genetic network programming,” IEEE Trans. Syst., Man,
Cybern. C, Appl. Rev., vol. 38, no. 4, pp. 535–550, Jul. 2008.
[12] K. Hirasawa, M. Okubo, H. Katagiri, J. Hu, and J. Murata, “Comparison between Genetic
Network Programming (GNP) and Genetic Programming (GP),” in Proc. Congr. Evol.
Comput., 2001, pp. 1276–1282.
[13] K. Shimada, K. Hirasawa, and J. Hu, “Genetic network programming with acquisition
mechanisms of association rules,” J. Adv. Comput. Intell. Intell. Inf., vol. 10, no. 1, pp. 102–111,
2006.
[14] C. C. Aggarwal and P. Yu, “Outliers detection for high dimensional data,” in Proc. ACM
SIGMOD Conf., 2001, pp. 37–46.
[15] K. Shimada, K. Hirasawa, and J. Hu, “Class association rule mining with chi-squared test
using Genetic Network Programming,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., 2006,
pp. 5338–5344.
[16] W. Lee and S. J. Stolfo, “A framework for constructing features and models for intrusion
detection systems,” ACM Trans. Inf. Syst. Secur., vol. 3, no. 4, pp. 227–261, 2000.
[17] Tcptrace Software Tool. [Online]. Available: www.tcptrace.org.
[18] Z. Bankovi´c, D. Stepanovi´c, S. Bojani´c, and O. Nieto-Taladriz, “Improving network
security using genetic algorithm approach,” Comput. Elect. Eng., vol. 33, pp. 438–451, 2007.
[19] K. Shimada, K. Hirasawa, and J. Hu, “Genetic network programming with class association
rule acquisition mechanisms from incomplete databases,” in Proc. SICE Annu. Conf.,
Kagawa, Japan, 2007, pp. 2708– 2714.
[20] K. Shimada, K. Hirasawa, and J. Hu, “Class association rule mining from incomplete
database using genetic network programming,” (in Japanese), IEEJ Trans. EIS, Vol. 128, No. 5,
pp. 795–803, 2008.