46 102-112


Published on

Information Systems and Networks are subjected to electronic attacks. When
network attacks hit, organizations are thrown into crisis mode. From the IT department to
call centers, to the board room and beyond, all are fraught with danger until the situation is
under control. Traditional methods which are used to overcome these threats (e.g. firewall,
antivirus software, password protection etc.) do not provide complete security to the system.
This encourages the researchers to develop an Intrusion Detection System which is capable
of detecting and responding to such events. This review paper presents a comprehensive
study of Genetic Algorithm (GA) based Intrusion Detection System (IDS). It provides a
brief overview of rule-based IDS, elaborates the implementation issues of Genetic Algorithm
and also presents a comparative analysis of existing studies.

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

46 102-112

  1. 1. Applying Genetic Algorithm in Intrusion Detection System: A Comprehensive Review Shaveta1 , Er. Abhinav Bhandari2 and Dr. Krishan Kumar Saluja3 1 Research Scholar, Department of Computer Engineering, UCOE, Punjabi University, Patiala, India er.shaveta89@gmail.com 2 Assistant Professor, Department of Computer Engineering, UCOE, Punjabi University, Patiala, India bhandarinitj@gmail.com 3 Associate Professor, Department of Computer Science and Engineering, S.B.S.C.E.T, Ferozepur, India k.saluja@rediffmail.com Abstract— Information Systems and Networks are subjected to electronic attacks. When network attacks hit, organizations are thrown into crisis mode. From the IT department to call centers, to the board room and beyond, all are fraught with danger until the situation is under control. Traditional methods which are used to overcome these threats (e.g. firewall, antivirus software, password protection etc.) do not provide complete security to the system. This encourages the researchers to develop an Intrusion Detection System which is capable of detecting and responding to such events. This review paper presents a comprehensive study of Genetic Algorithm (GA) based Intrusion Detection System (IDS). It provides a brief overview of rule-based IDS, elaborates the implementation issues of Genetic Algorithm and also presents a comparative analysis of existing studies. Index Terms— False Positive, Fitness Function, Genetic Algorithm (GA), Intrusion, Intrusion Detection System (IDS) I. INTRODUCTION Internet was originally designed by keeping functionality but not security in mind. The TCP/IP protocol suite, the most widely used protocol suite for data communication, works on the assumption that all the hosts participating in the communication have no malicious intention. Such design flaws open up the internet to many opportunities for intrusion. Intrusion is a set of actions aimed at compromising the security goals (confidentiality, integrity, availability) of a computing/networking resource [1]. Intrusion techniques may include exploiting software bugs and system misconfigurations, password cracking, sniffing unsecured traffic, or exploiting the design flaw of specific protocols [5]. An intruder is any user or group of users who initiate such intrusive actions. Intruders can be divided into two groups, external and internal. The former refers to those who do not have authorized access to the system and who attack by using various penetration techniques. The latter refers to those with access permission who wish to perform unauthorized activities [6]. The attacks are growing exponentially and are getting more sophisticated. Attempts to breach information security are rising every day, along with the availability of the Vulnerability Assessment tools that are widely available on the Internet, for free, as well as for a commercial use. Tools such as SubSeven, BackOrifce, Nmap, L0ftCrack, can all be used to scan, identify, probe, and penetrate into your systems. With the help of such tools even the best security measures can be breached. The key targets of the attackers include banks, DOI: 02.ITC.2014.5.46 © Association of Computer Electronics and Electrical Engineers, 2014 Proc. of Int. Conf. on Recent Trends in Information, Telecommunication and Computing, ITC
  2. 2. 103 law firms and corporates. According to the report published by Symantec Corporation [14] for the month of November, 2013 the number of targeted attacks has increased, 438 new vulnerabilities have been discovered bringing the total for the year up to 5965, two zero-day vulnerabilities have been discovered and 42 million identities have been exposed. A successful targeted attack on a large company can cost it $2.4 million in direct financial losses and additional costs. For a medium-sized or small company, a targeted attack can mean about $92,000 in damages – almost twice as much as an average attack [10]. Therefore, the attention drifts to Intrusion Detection Systems which monitor network traffic so as to identify resources misuse, unauthorized use as well as its abuse and perform actions as defined by security policies. Intrusion detection systems perform following functions:  Monitoring and analysis of user and system activity  Auditing of system configurations and vulnerabilities  Assessing the integrity of critical system and data files  Statistical analysis of activity patterns based on the matching to known attacks  Abnormal activity analysis and Operating system audit The majorities of currently existing IDS face a number of challenges such as low detection rates and high false alarm rates and therefore obstruct legitimate users from accessing the network resources. These problems are due to the sophistication of the attacks and their intended similarities to normal behavior. To overcome these problems in currently existing IDS, Genetic Algorithm based Intrusion detection system is employed to enhance the performance of intrusion detection for rare and complicated attacks. The rest of the paper is organized as follows: Section 2 provides a brief introduction to Intrusion Detection System. Section 3 describes the implementation issues of Genetic Algorithm. Section 4 describes the technique of applying Genetic Algorithm to Intrusion Detection System. Section 5 presents the related work and a comparative analysis of existing studies. Finally, the discussion is concluded. II. INTRUSION DETECTION SYSTEM Intrusion detection is the process of identifying and responding to such events which violate the computer security policies, acceptable use policies or standard security practices. An Intrusion Detection System (IDS) is a security system which implements the process of intrusion detection and reports the intrusion accurately to the appropriate authority. The IDS monitors packets from various network connections in order to detect an intrusive activity [1]. If an intrusion is detected, the IDS simply logs in a message into system audit file to be later analyzed by network security experts or stops such connections to end an intruder's attack or performs some other action as defined by the organization’s rules and practices to provide security, handle intrusion and recover from the damage caused by security breaches [1]. These systems do not react equally at all the times, false alarms could occur sometimes. A. Components of IDS The basic architecture of intrusion detection system is explained below [2] [16] and presented in figure 1:  Data Source: Data sources can be categorized into four categories namely Host-based monitors, Network- based monitors, Application-based monitors and Target-based monitors.  Data gathering device (sensor): It is responsible for collecting data from the monitored system.  Analysis Engine (detector): This component takes information from the sensors and examines the data in order to detect attacks. The analysis engine can use various analysis approaches e.g. misuse/signature based detection or anomaly/statistical detection.  Knowledge base: It is database which contains information collected by the sensors, but in preprocessed format (e.g. knowledge base of attacks and their signatures, filtered data, data profiles, etc.). This information is usually provided by network and security experts.  Configuration device: It provides information about the current state of the intrusion detection system (IDS).  Response Manager: The response manager only acts when an intrusion is detected and performs the necessary action as defined by the security policies of the organization. These actions can be either automated (active) or involve human interaction (inactive).
  3. 3. 104 Figure1. Basic Architecture of Intrusion Detection System B. Characteristics of IDS IDS must have following characteristics [2]:  Prediction performance: Typical measures for evaluating predictive performance of IDS include detection rate and false alarm rate. Detection rate is defined as the ratio of the number of correctly detected attacks to the total number of attacks. The false alarm (or false positive) rate is the ratio of the number of normal connections that are incorrectly classified as attacks to the total number of normal connections. Therefore, good IDS must have high detection rate and low false positive rate.  Time performance: The total time taken by IDS for generating alarm should be as short as possible. The processing time depends upon the processing speed of the IDS, which is the rate at which the IDS processes audit events. If this rate is not sufficiently high, then the real time processing of security events may not be feasible. The propagation time is the time needed for processed information to propagate to the security analyst. Both times need to be as short as possible in order to allow the security analyst sufficient time to react to an attack before much damage has been done, as well as to stop an attacker from modifying audit information or altering the IDS itself.  Fault tolerance: An IDS should be robust, dependable and resistant to attacks and should be able to recover quickly. This characteristic is very important for the proper functioning of IDSs, since most commercial IDSs run on operating systems and networks that are vulnerable to different types of attacks. In addition, IDS should also be resistant to scenarios when an adversary can cause the IDS to generate a large number of false or misleading alarms. Such alarms may easily have a negative impact on the availability of the system, and the IDS should be able to quickly overcome these obstacles.  Dynamic reconfiguration: it must be dynamically reconfigurable so that time spent on reconfiguration of the system is as short as possible. C. Taxonomy of IDS’s The IDSs are generally classified [9] as shown in the figure 2: Figure2: Taxonomy of IDS’s By location (or by scope of protection): Data Source (Monitored System) Data gathering (sensors) Analysis Engine Knowledge base Configuration Response Component Raw data Events System state System state Actions Actions IDS Classification By location By detection model Host-based IDS Network- based IDS Misuse Detection Anomaly Detection
  4. 4. 105 Intrusion Detection Systems can be divided into following two types depending on the location where they look for intrusive actions:  Host-based IDS (HIDS): Host-based IDS loads a piece of software on the system to be monitored. This software evaluates the information associated with the system including the contents of operating system, system and application files. If any critical file is deleted or modified then an alert message is send to the administrator for further investigation.  Network-based IDS (NIDS): identifies the intrusive activities by analyzing the stream of packets which travel across the network. By detection model: Intrusion Detection Systems can also be classified into following categories on the basis of the detection approaches:  Misuse detection (or signature based detection): these systems work by matching user activity with stored signatures of known attacks. Such detection systems use a predefined knowledgebase to check whether the new network connection is in that knowledge database. If yes, the IDS consider this connection as a possible attack and then block it.  Anomaly detection (or Behavior detection): In this case, the system learns the characteristics of normal user activities and then uses such characteristics to judge whether new user's activity is normal or not. III. GENETIC ALGORITHM The Genetic Algorithm is a probabilistic search algorithm that iteratively transforms a set (called population) of mathematical objects (typically fixed-length binary character strings called chromosomes), each with an associated fitness value, into a new population of offspring objects using operations that are patterned after naturally occurring genetic operations, such as crossover and mutation [8]. Genetic Algorithm is inspired from the natural search and selection processes leading to the survival of the fittest [13]. In last few years, genetic algorithms have emerged as practical, robust optimization and search methods. Genetic Algorithms represent an intelligent exploitation of a random search used to solve optimization problems. GAs, although randomized, exploit historical information to direct the search into the region of better performance within the search space. A. Working Principle of GA: The working principle of GA is explained as follows [17]. Genetic Algorithm begins with a set of suitable solutions for the problem. Each solution is represented by a chromosome-like data structure. Solutions from one population are selected and used to generate a new population. This is motivated by the possibility that the new population will be better than the old one. Solutions are selected according to their fitness to generate new population; more suitable they are, more chances they have to reproduce. This is repeated until some condition (e.g. fixed number of generations reached or improvement of the best solution etc.) is satisfied. The pseudo-code for GA is as shown below. Pseudo-code: BEGIN INITIALISE population with random candidate solutions. EVALUATE each candidate; REPEAT UNTIL (terminate condition) is satisfied DO 1. SELECT parents; 2. RECOMBINE pairs of parents; 3. MUTATE the resulting offspring; 4. SELECT individuals or the next generation; END B. Encoding of solutions as chromosomes: Before using genetic algorithm to solve any problem it is necessary to encode the potential solutions to that problem in a form which can be processed by a computer [17]. One common approach is to encode the solutions as binary strings: sequences of 1’s and 0’s, where each digit represents the value of some aspect of the solution. Each solution is represented in the form of a chromosome. Different positions in a chromosome are referred to as genes and are changed randomly within a range during the process of evolution. Example:
  5. 5. 106 A Gene may look like: 1101 A chromosome may look like: Gene1 Gene 2 Gene3 Gene4 1101 1001 1111 1011 Binary string representation of above chromosome: 1101100111111011 Other methods of encoding include encoding values as integers or real numbers or any element (E11 E3 E7…E1 E15) or list of rules (R1 R2 R3…R22 R23) or any data structure. The selection of the encoding method depends upon the attributes of the problem to be solved. C. Steps involved in basic Genetic Algorithm: The various steps involved in GA are explained below [17] and the overall flow chart is presented in figure 3: Step 1: [Start] Generate random population of ‘n’ chromosomes each representing a different solution to the problem. Step 2: [Fitness] Evaluate fitness f(x) of each chromosome ‘x’ in the population. Step 3: [New population] Generate new population by repeating following steps until the new population is complete a. [Selection] Select two parent chromosomes from a population according to their fitness (higher the fitness, greater the chance of selection). b. [Crossover] With a crossover probability, cross over the parents to generate new offspring. Crossover could be one-point or multi-point. If no crossover is performed then offspring is the exact copy of parents. c. [Mutation] With a mutation probability, mutate new offspring (i.e. randomly flip some bits). d. [Accepting] Place new offspring in the new population. Step 4: [Replace] Use new population for further run of the algorithm. Step 5: [Test] If the end condition is satisfied, stop and return the best solution in current population. Step 6: [Loop] Go to step2. Figure3: Overall flow of GA Yes No Mutation Start Generate random population Apply Fitness Function Optimization criteria met? Result Selection Crossover
  6. 6. 107 A genetic algorithm is quite straightforward in general, but it could be complex in most cases. The values of various parameters (for example, mutation rate, crossover rate, population size, chromosome size, number of evolutions or generations, and selection process) need to be selected by considering the attributes of the problem being solved. Genetic Algorithm is used to solve a problem if alternate solutions are too slow (or much complicated) or an exploratory tool is required to examine new approaches or benefits of GA meet key problem requirements etc. The advantages of using Genetic algorithm are [8]:  Always gives answer  Answer gets better with time  Inherently parallel  Easily re-trainable  Multiple ways to speed up and improve a GA-based application as knowledge about problem domain is gained  Easy to exploit previous or alternate solutions  Different operators used in genetic algorithm avoid getting stuck in local maxima etc. D. Limitations of Genetic Algorithm Genetic algorithms are efficient, but in practice they have certain limitations:  It is not always easy to find a fitness function.  Representing a problem space in genetic algorithms is very complex.  It is a tough task to choose the optimal parameters for a genetic algorithm.  Genetic algorithms need a large number of fitness function evaluations.  It is not easy to configure a genetic algorithm based system. IV. GENETIC ALGORITHM BASED SYSTEM MODEL Genetic Algorithm can be used in different ways in intrusion detection systems. If Intrusion Detection System is illustrated as a rule-based system then GA can be considered as a tool to generate rules for the rule- based IDS. The goal of the system is not to evolve a single best rule (global optimal), but to create a set of rules which is good enough to detect attacks. The system works by analysing the network connections. The figure 4 describes the overall flow of GA based IDS. The system works in two phases: training phase and testing phase. A. Training phase In this phase, a set of classification rules is generated from network audit data using Genetic Algorithm in an offline environment. The training data set contains analysed logs of connections which clearly distinguish between normal connections and attacks. The examples of various data sets include KDD Cup99 and DARPA. The records from the training data set are represented in the form of chromosomes. Each chromosome is a rule within which certain features of a connection are encoded in the form of fixed length vector. A fitness function is then applied to each chromosome in order to evaluate its goodness. If a chromosome helps to identify an attack correctly, it is considered good (or fit) else it is considered bad. Crossover and mutation operations are applied to the good chromosomes in order to produce new generation. This entire process is repeated by using the newly generated population. This process of evolution continues until a solution is reached (i.e. a set of rules, capable of detecting attacks is generated). The generated rules are stored in a rule base in the following form: if { condition } then { act } For example, a rule can be defined as [1]: if {the connection has following information: source IP address; destination IP address:; destination port number: 21; connection time: 10.1 seconds} then {stop the connection} Explanation: if there exists a network connection request with source IP address, destination IP address, destination port number 21, and connection time 10.1 seconds, then stop the connection establishment – since IP address is recognized by the IDS as a blacklisted IP address. Thus, service request initiated from it, is rejected. The various steps involved in training phase are [1]: 1. Encoding of connections – Consider the following case [13] where six features of a network connection are being used to identify an attack. The dataset used in this case is DARPA dataset which contains 7 features of a connection including the attack name. The normal connections contain no attack-names. Each
  7. 7. 108 chromosome is a rule within which the 7-features are encoded via fixed length vector, and each feature is encoded as one or more genes of different types as shown in table below. TABLE I. CHROMOSOME REPRESENTATION OF A RULE Sr. no Feature Feature Explanation Format Number of Genes 1. Duration Time period of the connection H:M:S 3 2. Protocol Protocol used for making connection Numeric 1 3. Source Port Application that the attacker system is running Numeric 1 4. Destination Port Application that the target system is running Numeric 1 5. Source IP Attacker system’s IP address a.b.c.d 4 6. Destination IP Target system’s IP address a.b.c.d 4 7. Attack name and type Name of the attack string 1 Each rule uses an if-then clause with a “condition” and “outcome” part. The first 6-features are connected via logical AND to form “condition” part; while attack name is the “outcome” to show network record classification (during training) or connection (during intrusion detection) if a rule is matched. For example consider the following rule [13]: if (duration=“0:0:1” and protocol=“finger” and source_port=18982 and destination_port=79 and source_ip=“” and destination_ip=“”) then (attack_name=“neptune”) The above rule expresses that if a network packet is originated from IP address and port 18982, and sent to IP address and port 79 using the protocol finger, and the connection duration is 1 second, then most likely it is a network attack of type neptune that may eventually cause the destination host out of service. The above rule can be represented as follows: {0, 0, 1, 2, 18982, 79, 9, 9, 9, 9, 172, 16, 112, 50, 1} 2. Evaluating each chromosome using fitness function – During the training phase, evaluation of chromosomes is carried out in order to determine their goodness. If a chromosome correctly classifies an attack, it is considered good; else, it is bad and is not selected for crossover to produce offspring. Thus, a chromosome which detects more attacks has higher fitness value and has higher chances for selection. The different fitness models proposed by various researchers are: support and confidence model, reward-penalty model, weighted sum model etc. 3. Selection – In order to choose the chromosomes different selection methods are used e.g. Fitness- proportion selection, Roulette-wheel selection, Rank selection, Local selection, Tournament selection, Steady state selection [6]. 4. Crossover –With a crossover probability, cross over the parents to generate new offspring. Crossover can be one-point or multi-point. If no crossover is performed then offspring is the exact copy of parents. 5. Mutation: Each gene in a chromosome may or may not change depending on the probability of mutation rate. Mutation improves population diversity needed in this work. B. Testing phase In this phase, the rules stored in the rule base are used to detect whether a real-time network connection is a normal connection or an intrusive attack. If the characteristics of new connection match with the ‘condition’ section of some pre-defined rule in the rule-base then the connection is considered as an attack else it is considered as a normal connection. If an attack is detected then IDS performs the necessary actions defined by the security policies of the organization. The algorithm for GA-based IDS is presented below. Algorithm: Intrusion Detection [1] Input: Inflowing network connection Output: Decision if connection is intrusive or not 1: Loop Forever {fetch incoming packet} 2: for each rule in rule-base 3: Match rule with network connection (analysis console) 4: if rules match then 5: Mark current connection as an intrusion (and generate an alarm as per security policies) 6: end if 7: end for each 8: end loop forever.
  8. 8. 109 Figure4: Overall flow of GA based IDS V. RELATED WORK The Intrusion Detection System has undergone rapid changes and is using new evolved techniques to generate better results. Genetic Algorithm can be used in different ways in Intrusion Detection Systems. Genetic Algorithm based intrusion detection approach discussed in this review paper is focused on a rule based Intrusion Detection System which uses only Genetic Algorithm to generate knowledge. For this purpose network connections are analysed to describe the normal and abnormal behaviour in the network. This section briefly summarizes some of the GA based IDSs and presents a comparative analysis of various existing studies in table 2. The early effort of using GAs for intrusion detection can be dated back to 1995, when Crosbie and Spafford [12] applied the multiple agent technology and GP (Genetic Programming) to detect network anomalies. Each agent monitors one parameter of the network audit data and GP is used to find the set of agents that collectively determine anomalous network behaviors. This method has the advantage of using many small autonomous agents, but the communication among them is still a problem. Also the training process can be time consuming if the agents are not appropriately initialized. Wei Li [11] proposes a GA-based method to detect anomalous network behaviors. This implementation of genetic algorithm is unique as it considers both temporal and spatial information of network connections in encoding the network connection information into rules in IDS. This may lead to increased detection rates. However, no experimental results are available yet. Ren Hui Gong, Mohammad Zulkernine and Purang Abolmaesumi [13] present a method of applying Genetic Algorithm for intrusion detection. Seven network features including both categorical and quantitative data fields are used when encoding and deriving the rules. A simple but efficient and flexible fitness function, i.e. the support-confidence framework, is used to judge the quality of each rule. Depending on the selection of fitness function weight values, the generated rules can be used to either generally detect network intrusions or precisely classify the types of intrusions.The method has been implemented using Java and third party package ECJ. The implementation has been tested using subsets of 1998 DARPA dataset. Experimental results show that the proposed method worked efficiently and has flexibility to be used in different ways. Start Evolution of rules using Genetic Algorithm Analysis of new connections using rules from rule-base Attack Detected? Alert Yes No Training Dataset Testing Dataset
  9. 9. 110 However, some limitations of the method are also observed. First, the generated rules are biased to the training dataset. This issue may be resolved by carefully selecting either the number of generations in the training phase or the number of top best-fit rules in the intrusion detection phase. Second, while the support- confidence framework is simple to implement and provides improved accuracy to final rules, it requires the whole training data to be loaded into memory before any computation. For large training datasets, it is neither efficient nor feasible. The use of some sorts of cache technologies may solve the problem. Anup Goyal and Chetan Kumar [3] describe a GA based IDS to classify different types of network attacks with very low false positive rate (at 0.2%) and almost 100% detection rate. The algorithm takes into consideration different features of network connections such as type of protocol, network service on the destination and status of the connection to generate a classification rule set. Each rule in rule set identifies a particular attack .The design of the fitness function is such to make it biased towards individuals that correctly classify only the attack connections. The experiments are performed on the KDDCup99 data set. The generated rule set consists of six rules that can be applied to the IDS to identify and classify six different types of attack connections that fall into two classes namely Denial of Service (DoS) and Probing attacks. GALIB C++ library, especially suited to develop GA is used to implement the proposed system. Bader and Nasereddin [5] discuss a technique of using Genetic Algorithm for Intrusion Detection System. This implementation considers both temporal and spatial information of network connections in encoding the network connection information into rules in IDS. The network traffic used for implementing GA is a pre- classified data set that differentiates normal network connections from anomalous ones. This data set is gathered using network sniffers (a program used to record network traffic without doing something harmful) such as Tcpdump or Snort. The data set is manually classified based on the knowledge of experts. The rules generated are good enough for filtering new network traffic. The various attributes of network connections which are used for generating rules are: source IP address, destination IP address, source port number, destination port number, duration, state, protocol, number of bytes sent by originator, number of bytes send by responder. B. Uppalaiah, K. Anand, B. Narsimha, S. Swaraj and T. Bharat [4] suggest an intrusion detection system using genetic algorithm to generate rule set for eight types of attacks belonging to four categories. The proposed architecture deployed KDDCUP99 dataset. The dataset contains 41 features out of which only 3 features have been used to specify each entry of the dataset. The architecture of the system and the software implementation for the proposed technique are also discussed. The system created specified set of rules and achieved high DoS (Denial of Service), R2L (Remote to Local), U2R (User to Root), Probe attack detection rate. The average success rate achieved during experiments is 83.65%. The proposed system is flexible for usage in different application areas. The proposed system is implemented using C# in .net suite. Firas Alabsi and Reyadh Naoum [7] recommend a new fitness function using Reward-Penalty technique to evaluate the chromosomes efficiently. The data of 5% of KDDCUP’99 has been used for the proposed system. The proposed fitness function works on the principle that reward and penalty are proportionate to the strength and weakness of chromosomes. In order to prove the validity of the new fitness function, the results of reward-penalty model based fitness function are compared with the results of the support-confidence model based fitness function. The results closely match with each other. The system has been built by using Vb.Net 2010 and SQL server 2008. A.A. Ojugo, A.O. Eboka, O.E. Okonta, R.E Yoro (Mrs) and F.O. Aghware [1] present a genetic algorithm based approach which uses rules derived from network audit data for network intrusion detection. The fitness function utilized is based on the support-confidence framework. The fitness function is simple, efficient and flexible. The training and testing data set used is the DARPA 1998 MIT Lincoln laboratory. The study implemented GA based IDS using C (programming language) in Linux operating system platform. However, some limitations of the method are also witnessed. First, the generated rules are biased to the training dataset. This issue may be resolved by carefully selecting either the number of generations in the training phase or the number of top best-fit rules in the intrusion detection phase. Second, while the support-confidence framework is simple to implement and provides improved accuracy to final rules, it requires the whole training data to be loaded into memory before any computation. For large training datasets, it is neither efficient nor feasible. The use of some sorts of cache technologies may solve the problem. V. Moraveji Hashmei, Z. Muda and W. Yassin [15] present a genetic algorithm based intrusion detection system. Software implementation of the proposed system is presented. The system is flexible enough to be used in different application environments, if proper attack taxonomy and proper training dataset exist. High detection rate and low false positive rates are the highlights of the proposed system. The proposed system can
  10. 10. 111 be applied for intrusion detection without using any complementary technique that is commonly used with other soft-computing techniques. KDDCUP’99 dataset is used for training phase. Bharat S. Dhak and Shrikant Lade [6] present a genetic algorithm based intrusion detection technique to detect malicious packets on the network and ultimately help to block the respective IP addresses. The Genetic Algorithm process is discussed in detail. The training is done on the predefined data rules. The testing is done on the entries generated by the firewall system of machine in pfirewall.log file. The proposed system can be integrated with any of the IDS system to improve the efficiency and the performance of the same. M. Sadiq Ali Khan [18] designed a rule-based Intrusion Detection System to detect DoS (Denial of Service) or Probing attacks by formulating the contributing parameters in terms of rules. Genetic algorithm is used to devise these rules. In this study, KDD-99 data set is used with reduced set of attributes. Principal Component Analysis is used to reduce the data set. By running GA for more than 2000 times the proposed system managed to achieve 91% accuracy in detecting network attacks. TABLE II. COMPARISON OF EXISTING STUDIES ON GA BASED IDS Reference Detection Approach Fitness Function (F) Explanation of Fitness function used Remarks A.A. Ojugo, A.O. Eboka, O.E. Okonta, R.E Yoro (Mrs), F.O. Aghware [1] Misuse analysis Support and confidence model F=W1*support+ W2*confidence If we have the rule: If A then B, support = |A and B| / N confidence = |A and B| / |A| N = Number of connections in training data |A| = Number of connections matching condition A. |A and B| = Connections matching rule if A and B w1, w2 = Weights to balance/control the two terms. Uses 7-network features; so in order to detect millions of connections high processing speed and sufficient cache are the required features; 97% of the attacks detected correctly by this system. B. Uppalaiah K. Anand, B.Narsimha, S.Swaraj, T.Bharat [4] Misuse analysis Fitness = f(x) / f (sum) Where f(x) is the fitness of entity x and f is the total fitness of all entities Uses only 3 network features; 83.65% of avg. success rate; process is faster , can be applied for high speed networks Bharat S. Dhak, Shrikant Lade [6] Misuse analysis F= weight*packet_size Where the packet_size is the actual packet data size prescribed by the incoming packet data stream and weight is the Vector which is applied to each chromosome. Scope of experiment is focused to generate a list of vulnerable IP addresses; gained 96% of accuracy. Firas Alabsi, Reyadh Naoum [7] Misuse analysis Reward Penalty model based F=2+(AB-A/AB+A)+(AB/X)- (A/Y) Consider a rule: If A then B, ((AB-A)/(AB+A))= strength of a record; AB/X= ratio of the strength of record to the strength of the strongest record; A/Y=ratio of the weakness of a record to the weakness of the weakest record; Uses 5-network features; Fitness function gives reward to good chromosomes and applies penalty on the bad chromosomes; comparison between the newly proposed and other existing fitness functions is presented. Wei Li [11] Anomaly Detection Weighted sum model based F=1-penalty Fitness function is determined by calculating the general outcome, absolute difference and penalty values. Considers both temporal and spatial features of a network connection to detect an attack; no experimental results V. Moraveji Hashmei, Z. Muda, W. Yassin [15] Misuse analysis F= (a/A)-(b/B) Where a=number of correctly detected attacks; A = total number of attacks in the training dataset; b = number of normal connections that are falsely detected as attacks; B = total number of normal connections. Uses only 3-network features; fast processing and can be applied for high speed networks; high detection rate; low false positives; gained 95.62% as detection rate and 4.37% as false alarm; can be used without using any complementary technique.
  11. 11. 112 VI. CONCLUSION The three factors which have impact on the effectiveness of the genetic algorithm are selection of fitness function, representation of individuals and values of the GA parameters. The determination of these factors often depends on applications. Designing accurate fitness function is the major challenge for solving a particular problem. Different models for designing fitness function have been discussed in the paper. Using GA for intrusion detection has proven to be a cost-effective approach. One of the major advantages of this technique is due to the fact that in the real world, the types of intrusions change and become complicated very rapidly. The GA based detection system can upload and update new rules to the systems as the new intrusions become known. Therefore, it is cost effective and adaptive. REFERENCES [1] A.A. Ojugo, A.O. Eboka, O.E. Okonta, R.E Yoro (Mrs), F.O. Aghware, “Genetic Algorithm Rule-Based Intrusion Detection System (GAIDS)”, Journal of Emerging Trends in Computing and Information Sciences, Vol.3, pp. 1182- 1194, Aug 2012 [2] Aleksandar Lazarevic, Vipin Kumar, Jaideep Srivastava, “Intrusion Detection a survey”, unpublished. [3] Anup Goyal, Chetan Kumar, “GA-NIDS: A Genetic Algorithm based Network Intrusion Detection System”, 2008. [4] B. Uppalaiah, K. Anand, B. Narsimha, S. Swaraj, T. Bharat, “Genetic Algorithm Approach to Intrusion Detection System”, IJCST Vol. 3, Issue 1, Jan-March 2012 [5] Bader and Nasereddin, “Using Genetic Algorithm in Network Security”, IJRRAS, vol. 5, pp. 148-154, Nov. 2010 [6] Bharat S. Dhak, Shrikant Lade, “ An Evolutionary Approach to Intrusion Detection System using Genetic Algorithm” .ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, Dec. 2012 [7] Firas Alabsi and Reyadh Naoum(2012, April), “Fitness Function for Genetic Algorithm used in Intrusion Detection System”, International Journal of Applied Science and Technology, Vol. 2, pp. 632-637. [8] GA tutorial, Available at: http://www.vit.ac.in/academicresearch/res701/RES701DUMP/Evolutionary%20Algorithms/GATutorial.pdf [9] Kamal Kishore Prasad, Samarjeet Borah, “Use of Genetic Algorithms in Intrusion Detection Systems: An Analysis”, International Journal of Applied Research and Studies (iJARS) ISSN: 2278-9480 Volume 2, Issue 8, Aug 2013 [10]Kaspersky lab Global Corporate IT Security Risks: 2013, May 2013 [11]Li, Wei, “Using Genetic Algorithm for Network Intrusion Detection”, (2004) [12]M. Crosbie, E. Spafford, “Applying Genetic Programming to Intrusion Detection”, Proceedings of the AAAI Fall Symposium, 1995. [13]Ren Hui Gong, Mohammad Zulkernine, Purang Abolmaesumi, “A Software Implementation of a Genetic Algorithm Based Approach to Network Intrusion Detection”, Proceedings of the Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks (SNPD/SAWN’05) 2005 IEEE . [14]Ben Nahorney, Symantec Intelligence report: November 2013 [15]V. Moraveji Hashmei, Z. Muda and W. Yassin, “Improving Intrusion Detection using Genetic Algorithm”, International Technology journal 12(11) pp. 2167-2173, 2013 [16]Mohammad Sazzadul Hoque, Md. Abdul Mukit and Md. Abu Naser Bikas, “An Implementation of Intrusion Detection System using Genetic Algorithm”, International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 [17]RC Chakraborty, Fundamentals of Genetic Algorithm: AI Course, June 2010, available at http://www.myreaders.info/09-Genetic_Algorithms.pdf [18]M. Sadiq Ali Khan, “Rule based Network Intrusion Detection using Genetic Algorithm”, International Journal of Computer Applications (0975 – 8887) Volume 18– No.8, March 2011