Network security based on immune principles


Published on

Seminar Report was presented by Renjith. P. Ravindran student of SOE, CUSAT in CSE department, batch 2008-2012 under the guidance of Sudheep Elayidom, HOD of CSE Department

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Network security based on immune principles

  3. 3. Division of Computer Engineering School of Engineering Cochin University of Science & Technology Kochi-682022 CERTIFICATE Certified that this is a bonafide record of the seminar work titled NETWORK SECURITY BASED ON IMMUNE PRINCIPLES presented by Renjith P. Ravindran of VII semester Computer Science & Engineering in the year 2010 in partial fulfillment of the requirements for the award of Degree of Bachelor of Technology in Computer Science & Engineering of Cochin University of Science & Technology Dr. David Peter S Head of the Division Mr. Sudheep Elayidom Seminar Guide
  4. 4. ACKNOWLEDGEMENT Many people have contributed to the success of this. Although a single sentence hardly suffices, I would like to thank Almighty God for blessing us with His grace. I extend my sincere and heartfelt thanks to Dr. David Peter, Head of Division, Computer Science and Engineering, for providing us the right ambience for carrying out this work. I am profoundly indebted to my seminar guide, Mr. Sudheep Elayidom for innumerable acts of timely advice, encouragement and I sincerely express my gratitude to him. I express my immense pleasure and thankfulness to all the teachers and staff of the Department of Computer Science and Engineering, CUSAT for their cooperation and support. Last but not the least, I thank all others, and especially my classmates who in one way or another helped me in the successful completion of this work. RENJITH P. RAVINDRAN i
  5. 5. ABSTRACT Intrusion detection for computer systems can be seen as a problem of pattern classification, but the system must deal with some intrinsic characteristics that make it very difficult to detect intrusions directly using classical pattern recognition methods. For example, normal and anomalous states are distinguished using features that are multi-dimensional, and there is extreme asymmetry in the amount of data available for these two sets of states. Furthermore, the patterns involved cannot be recognized by linear methods. The natural human immune system faces the same difficulties, but successfully protects the body against a vast variety of foreign pathogens. It is a selfadaptive and self-learning classifier that can recognize and classify threats by learning, memory, and association. An Artificial Immune System enlightened by the biological immune system is emerging as a new field of computation intelligence research. At present, it has showed us many properties after the research on Immune System, including distributed computation, self-organization and lightweight, which were discovered to meet the requirements. So the Immunology applied to the intrusion detection will be a brand-new area. ii
  7. 7. LIST OF FIGURES AND TABLES 1 Two-dimensional representation of a universe of strings page 18 2 Lifecycle of a detector page 26 3 Different represent of string in different nodes page 28 4 Flowchart for detection of IP packets page 34 5 Comparison of two methods page 36 6 Results of applying the usefulness-criteria page 42 iv
  8. 8. Network Security Based on Immune Principles, 2010 CHAPTER-1 INTRODUCTION Modern computer systems are plagued by security vulnerabilities. Whether it is the UNIX buffer overflow or bug in Microsoft Internet Explorer, our applications and operating systems are full of security flaws on many levels. From the viewpoint of traditional computer security, it should be possible to eliminate such problems through more extensive use of formal methods and better software engineering. We believe that such an approach is unlikely to succeed. Three key assumptions of the traditional view: 1. Security policy can be explicitly and correctly specified, 2. Programs can be correctly implemented, and 3. Systems can be correctly configured. Although these statements might be true theoretically, in practice all are false. Computers are not static systems: vendors, system administrators, and users constantly change the state of a system. Programs are added and removed, and configurations are changed. Formal verification of a statically defined system is time consuming and hard to do correctly; formal verification of a dynamic system is impractical. Without formal verifications, tools such as encryption, access controls, firewalls, and audit trails all become fallible, making perfect implementation of a security policy impossible even if a correct policy could be devised in the first place. Once we accept that our security policies, our implementations, and our configurations will have flaws, we must also accept that we will have imperfect security. This does not mean that we must be content with no security at all. As in the physical world, better security can be achieved with additional resources and better design. So, the real question is: how can we achieve better security than we currently have? Division of Computer Engineering, School of Engineering, CUSAT Page 1
  9. 9. Network Security Based on Immune Principles, 2010 Researchers believe it is possible to build better computer security systems by adopting design principles that are more appropriate for the imperfect, uncontrolled, and open environments in which most computers currently exist. As a case in point, we look to natural immune systems, which solve a similar problem, but in a radically different way from traditional computer security. For example, consider the human immune system. It is composed of many unreliable, short-lived, and imperfect components. It is autonomous. It is not “correct,” because it sometimes makes mistakes. However, in spite of these mistakes, it functions well enough to help keep most us alive for 70+ years, even though we encounter potentially deadly parasites, bacteria, and viruses every day. The analogy between computer security problems and biological processes was recognized as early as 1987, when the term “computer virus” was introduced by Adelman. However, in past work, we have concentrated on isolated ideas and mechanisms from the immune system and how they might be applied to concrete computer security problems without explaining the overall framework. Success of the immune system is due in large part to its organization and that an understanding of the immune system can help us design a robust, practical “computer immune system.” Such a system would incorporate many elements of current security systems, augmenting them with an adaptive response layer. Parts of this layer might be directly analogous to mechanisms present in the immune system; others will likely be quite different from those found in biology, even if they are based on similar principles to those found in the human body. Division of Computer Engineering, School of Engineering, CUSAT Page 2
  10. 10. Network Security Based on Immune Principles, 2010 CHAPTER-2 IMMUNE PRINCIPLES 2.1 Overview The immune system defends the body against harmful diseases and infections. It is capable of recognizing virtually any foreign cell or molecule and eliminating it from the body. To do this, it must perform pattern recognition tasks to distinguish molecules and cells of the body (called “self”) from foreign ones (called “nonself”). Thus, the problem that the immune system faces is that of distinguishing self from dangerous nonself. The number of foreign molecules that the immune system can recognize is unknown, but it has been estimated to be greater than 1016. These foreign proteins (kinds of molecules) must be distinguished from an estimated 105 different proteins of self, so recognition must be highly specific. These are staggering numbers, especially when one considers that the human genome, which encodes the “program” for constructing the immune system, only contains about 105 genes. The architecture of the immune system is multilayered, with defences provided at many levels. The outermost layer, the skin, is the first barrier to infection. A second barrier is physiological, where conditions such as pH and temperature provide inappropriate living conditions for some foreign organisms (pathogens). Once pathogens have entered the body, they are handled by the innate immune system and by the adaptive immune response. The innate immune system consists primarily of circulating scavenger cells such as macrophages that ingest extracellular molecules and materials, clearing the system of both debris and pathogens. The adaptive immune response (also called “the acquired immune response”) is the most sophisticated and involves many different types of cells and molecules. It is called “adaptive” because it is responsible for immunity that is adaptively acquired during the lifetime of the organism. Because the adaptive immune system provides the most potential from a computer security viewpoint, let us focus on it. The adaptive immune system can be viewed as a distributed detection system which consists primarily of white blood cells, called lymphocytes. Lymphocytes Division of Computer Engineering, School of Engineering, CUSAT Page 3
  11. 11. Network Security Based on Immune Principles, 2010 function as small independent detectors that circulate through the body in the blood and lymph systems. Lymphocytes can be viewed as negative detectors, because they detect nonself patterns, and ignore self patterns. Detection, or recognition, of non-self occurs when molecular bonds are formed between a pathogen and receptors that cover the surface of the lymphocyte. The more complementary the molecular shape and electrostatic surface charge between pathogen and lymphocyte receptor, the stronger the bond (or the higher the affinity). Detection is approximate; hence, a lymphocyte will bind with several different kinds of (structurally related) pathogens. The ability to detect most pathogens requires a huge diversity of lymphocyte receptors. This diversity is partly achieved by generating lymphocyte receptors through a genetic process that introduces a huge amount of randomness. Generating receptors randomly could result in lymphocytes that detect self instead of nonself, which would then likely cause autoimmune problems in which the immune system attacks the body. Autoimmune disorders are rare because lymphocytes are self-tolerant, i.e. they do not recognize self. Tolerance of self is achieved through a process called clonal deletion: lymphocytes mature in an organ called the thymus through which most self proteins circulate; if they bind to these self proteins while maturing they are eliminated. Even if receptors are randomly generated, there are not enough lymphocytes in the body to provide a complete coverage of the space of all pathogen patterns; one estimate is that there are 108 different lymphocyte receptors in the body at any given time, which must detect potentially 1016 different foreign patterns. The immune system has several mechanisms for addressing this problem, mechanisms which make the immune response more dynamic and more specific. Protection is made dynamic by the continual circulation of lymphocytes through the body, and by a continual turnover of the lymphocyte population. Lymphocytes are typically short-lived (a few days) and are continually replaced by new lymphocytes with new randomly generated receptors. Dynamic protection increases the coverage provided by the immune system over time. Protection is made more specific by learning and memory. If the immune system detects a pathogen that it has not encountered before, it undergoes a primary response, during which it “learns” the structure of the specific pathogen, i.e. it evolves a set of Division of Computer Engineering, School of Engineering, CUSAT Page 4
  12. 12. Network Security Based on Immune Principles, 2010 lymphocytes with high affinity for that pathogen, through a process called affinity maturation. This is a Darwinian process of variation and selection re­sembling the genetic algorithm. High-affinity lymphocytes (those that bind most tightly with available pathogens) are stimulated to reproduce in great numbers, and the resulting lymphocytes have a large number of mutations. These new (mutated) lymphocytes then compete for pathogens with their parents and with other clones. Affinity maturation produces a large number of lymphocytes that have high affinity for a particular pathogen, which accelerates its detection and elimination. Speed of response is important in the immune system because most pathogens are replicating and will cause increasing damage as their numbers increase. Speed of response to previously encountered pathogens is generally high, because the information encoded in adapted lymphocytes is retained as immune memory. On subsequent encounters with the same antigen pattern the immune system mounts a secondary response. In this case, the adapted lymphocytes eliminate the pathogens so rapidly that the symptoms of the infection are not noticeable by the individual. Even with all of these mechanisms, the coverage provided by the immune system is necessarily incomplete. The consequence is an immune system that is vulnerable to particular pathogens. However, not all individuals will be vulnerable to the same pathogens to the same degree, because each individual has a unique immune system. This diversity of immune systems across a population greatly enhances the survival of the population as a whole. One way in which immune systems differ from one individual to the next is by having different lymphocyte populations, and hence, different detector sets. Another key component that gives an immune system its uniqueness is the variation in a molecule called Major-Histocompatibility Complex (MHC). MHC molecules enable the immune system to detect intra­cellular pathogens (e.g., viruses) that reside inside cells. Intracellular pathogens are problematic because the inside of a cell is not “visible” to lymphocytes, that is, lymphocytes can only bind to structures on the surface of cells. MHC molecules bind to protein fragments called peptides (which could be viral) within a cell and transport the peptides to the surface, effectively displaying the contents of the cell to passing lymphocytes. The set of proteins to which an MHC molecule can bind is dependent on the structure of the Division of Computer Engineering, School of Engineering, CUSAT Page 5
  13. 13. Network Security Based on Immune Principles, 2010 MHC, which is genetically determined. Each person has only a limited number of MHC types and so is vulnerable to particular pathogens that cannot be readily transported by the available MHC types. However, as a whole, a population is far less vulnerable, because each individual has a different set of MHC types, and so is vulnerable to different pathogens. To summarize, the natural immune system has many features that are desirable from a computer science standpoint. The system is massively parallel and its functioning is truly distributed. Individual components are disposable and unreliable, yet the system as a whole is robust. Previously encountered infections are detected and eliminated quickly, while novel intrusions are detected on a slower time scale, using a variety of adaptive mechanisms. The system is autonomous, controlling its own behaviour both at the detector and effector levels. Each immune system detects infections in slightly different ways, so pathogens that are able to evade the defences of one immune system cannot necessarily evade those of every other immune system. 2.2 Organizing Principles Although the system described in the previous section is appealing, it is not immediately obvious how to use the immune system as a model for building successful computer security systems. There are several fundamental differences between the biology and computer systems. First, we desire an electronic system, built out of digital signals, not one constructed from cells and molecules. Further, we would like to avoid recreating all of the elaborate genetic controls, cell signalling, and other aspects of the immune system that are dictated by the physical constraints under which it evolved. Finally, the immune system is oriented towards problems of survival, which is only one of many considerations in computer security. Thus, the task of creating a useful system based on the immune-system analogy is a difficult one. In spite of these difficulties, a study of the immune system reveals a useful set of organizing principles that should guide the design of dynamic computer security systems. Division of Computer Engineering, School of Engineering, CUSAT Page 6
  14. 14. Network Security Based on Immune Principles, 2010 Distributability: Lymphocytes in the immune system are able to determine locally the presence of an infection. No central coordination takes place, which means there is no single point of failure. Distributed, mobile agent architecture for security was also proposed. However, the human immune system provides a good example of a highly distributed architecture that greatly enhances robustness. Multi-layered: In the immune system, no one mechanism confers complete security. Rather, multiple layers of different mechanisms are combined to provide high overall security. This too is not a new concept in computer security, but it is believed to be important and should be emphasized in system design. Diversity: By making systems diverse, security vulnerabilities in one system are less likely to be widespread. There are two ways in which systems can be diverse: the protection systems can be unique (as in natural immune systems) or the protected systems can be diversified. Disposability: No single component of the human immune system is essential— that is, any cell can be replaced. The immune system can manage this because cell death is balanced by cell production. Although we do not currently have selfreproducing hardware, death and reproduction at the process/agent level is certainly possible and would have some advantages if it could be controlled. Autonomy: The immune system does not require outside management or maintenance; it autonomously classifies and eliminates pathogens, and it repairs itself by replacing damaged cells. Although we do not expect (or necessarily want) such a degree of independence from our computers, as network and CPU speeds increase, and as the use of mobile code spreads, it will be increasingly important for computers to manage most security problems automatically. Adaptability: The immune system learns to detect new pathogens, and retains the ability to recognize previously seen pathogens through immune memory. A computer immune system should be similarly adaptable, both learning to recognize new intrusions and remembering the signatures of previous attacks. Division of Computer Engineering, School of Engineering, CUSAT Page 7
  15. 15. Network Security Based on Immune Principles, 2010 No secure layer: Any cell in the human body can be attacked by a pathogen— including those of the immune system itself. However, because lymphocytes are also cells, lymphocytes can protect the body against other compromised lymphocytes. In this way, mutual protection can stand in for a secure code base. Dynamically changing coverage: The immune system makes a space/time tradeoffs in its detector set: it cannot maintain a set of detectors (lymphocytes) large enough to cover the space of all pathogens, so instead at any time it maintains a random sample of its detector repertoire, which circulates throughout the body. This repertoire is constantly changing through cell death and reproduction. Identity via behaviour: In cryptography, identity is proven through the use of a secret. The human immune system, in contrast, does not depend on secrets; instead, identity is verified through the presentation of peptides, or protein fragments. Because proteins can be thought of as “the running code” of the body, peptides serve as indicators of behaviour. Anomaly detection: The immune system has the ability to detect pathogens that it has never encountered before, i.e. it performs anomaly detection. Ability to detect intrusions or violations that are not already known is an important feature of any security system. Imperfect detection: By accepting imperfect detection, the immune system increases the flexibility with which it can allocate resources. For example, less specific lymphocytes can detect a wider variety of pathogens but will be less efficient at detecting any specific pathogen. The numbers game: The human immune system replicates detectors to deal with replicating pathogens. It must do so—otherwise, the pathogens would quickly overwhelm any defences. Computers are subject to a similar numbers game, by hackers freely trading exploit scripts on the Internet, by denial-of-service attacks, and by computer viruses. For example, the success of one hacker can quickly lead to the compromise of thousands of hosts. Clearly, the pathogens in the computer security world are playing the numbers game—traditional systems, however, are not. Division of Computer Engineering, School of Engineering, CUSAT Page 8
  16. 16. Network Security Based on Immune Principles, 2010 These properties can be thought of as design principles for a computer immune system. Many of them are not new, and some have been integral features of computer security systems; however, no existing computer security system incorporates more than a few of these ideas. Although the exact biological implementation may or may not prove useful, we believe that these properties of natural immune systems can help us design more secure computer systems. Division of Computer Engineering, School of Engineering, CUSAT Page 9
  17. 17. Network Security Based on Immune Principles, 2010 CHAPTER-3 RESEARCH OVERVIEW In the last few years, Forrest et al., Dasgupta et al., and Kim et al. took artificial immune theory into Intrusion Detection System (IDS) and achieved comparatively excellent experiment results. Forrest et al. introduced immune principle into computer security. They developed a negative selection algorithm (NSA) based on the principles of self/non-self discrimination in the human immune system. This algorithm defines ‘self’ as normal behaviour patterns of a monitored system. It generates a number of random patterns. If any randomly generated pattern matches a self pattern, it fails to become a detector and will be removed. Otherwise, it becomes a ‘detector’ and is used to monitor subsequent access patterns. During the monitoring stage, if a ‘detector’ matches any access pattern, it is considered that a new abnormal access must have occurred. This algorithm operates on binary string, and adopts R-Contiguous Matching Function (RCMF) to determine a match degree between antibody and antigen. Dasgupta et al. in 1999, proposed a general model of mobile agents system for computer security. These agents imitate the immune cells’ function. This multi-agent system monitors several parameters at multiple levels (user level, system/resource level, process level, and packet level) to provide defences-in-depth. Through the agents’ interaction, some complex decision-making tasks are performed. In 2001, Dasgupta et al. presented a new model of intrusion detection based on immune theory and genetic algorithm. They gave a novel kind of detector, which can characterize different abnormal levels. Kim et al. shows that it is costly when only negative selection is used. Inspired by human immune system, an artificial immune system model used for network intrusion detection was presented by them. Their two-level (physical level and logical level) intrusion detection system has two parts, and including three stages: gene library evolution, negative selection, and clonal selection. These three processes are coordinated across the network to accommodate an effective intrusion detection Division of Computer Engineering, School of Engineering, CUSAT Page 10
  18. 18. Network Security Based on Immune Principles, 2010 system: being distributed, self-organizing, and lightweight. In 2002, Kim et al. proposed dynamic clonal selection algorithm (DynamiCS). Via the lifecycle mechanism of the immune cells, the detectors identify the abnormal behaviour that was normal in the past. So this system is much more adaptive to dynamically changing environments. Division of Computer Engineering, School of Engineering, CUSAT Page 11
  19. 19. Network Security Based on Immune Principles, 2010 CHAPTER-4 CORE ALGORITHMS This chapter discusses the main algorithms developed to bring in the features provided by the human immune system. The chapter also discusses various applications of each algorithm. 4.1 NEGATIVE SELECTION Negative selection was introduced in 1994 (Forrest et al., 1994), and is now often called negative detection. It is a loose abstract model of biological negative selection that concentrates on the generation of change detectors. These detectors are intended to detect when elements of a set of self strings have changed from an established norm. The algorithm of (Forrest et al., 1994) is as follows, in which they assume all strings are binary. 1. Create a set of self strings, S, by some means 2. Create a set of randomly generated strings, R0. 3. For each r ∈ R0, form a set, R, of those r0 that do not strongly match any s ∈ S. A strong match is defined by a matching function m(r0, s), such that: (i)m(r0, s) Ω θ; (ii) Ω is an operator, such as ≥, that defines whether high or low value of m(r0, s) indicates greater similarity between the strings, and (iii) θ defines a threshold. An example of the use of m(x, y) will be given below. 4. For each r in R, ensure that no s ∈ S matches above (or ‘below’, depending on the form of Ω) the threshold. 5. Return to step 4 while change detection of S is required. Division of Computer Engineering, School of Engineering, CUSAT Page 12
  20. 20. Network Security Based on Immune Principles, 2010 Steps 1, 2 and 3 form the censoring stage of the algorithm, and steps 4 and 5 form the protection stage. The R0 strings are censored until the desired repertoire, R, of detector strings is obtained. In (Forrest et al., 1994), the matching function was the k-contiguous bits function, such that m(r0, s) returned the largest number of contiguous, matching bits, Ω was ≥, and θ = k. In later work another matching function was used, as described below. Wierzchon has pointed out that even a simple thresholded Hamming distance rule, mH(x, y), would be suitable (Wierzchon, 2000); however, whereas Hamming Ω distance is irreflexive, and mH(x, x),) = 0, requiring to be ≤ k-contiguous bits distance is reflexive, and mC(x, x) = |x|. For example, the k-contiguous bits measure operates as follows. 01101001 10101010 ____ <---four matching bits m(r_0, s) = 4 Typical Applications: • Change detection (Forrest et al., 1994), in which a negative detection AIS has been used to detect changes made to computer machine code programs caused by computer viruses, to distributed change detection (Forrest and Hofmeyr, 2000), and to network security (Hofmeyr and Forrest, 2000). • Dasgupta and Forrest’s comparative application of negative selection Division of Computer Engineering, School of Engineering, CUSAT Page 13
  21. 21. Network Security Based on Immune Principles, 2010 to ‘tool breakage detection’ in a milling operation (Dasgupta and Forrest, 1995), and their work in benchmark problems (Dasgupta and Forrest, 1999) indicates that it hassignificant advantages over ANN and some standard fault detection methods. • Fault detection and diagnosis, such as work by (Ishida, 1993), (Ishiguru et al., 1994) and Bradley and Tyrell’s application to build hardware fault-tolerant systems (Bradley and Tyrell, 2002). • Deaton et al.’s work (Deaton et al., 1997) that proposed a means of implementing the negative selection algorithm in a DNA computer. • Network intrusion detection (Anchor et al., 2002), which also combined negative selection with multi-objective evolutionary programming. 4.2 CLONAL SELECTION AND HYPER-MUTATION Whereas negative selection centre on the detection of nonself , in clonal selection the focus is on how B-cells (and T-cells) can adapt to match and kill the invader. The immune system’s ability to adapt its B-cells to new types of antigen is powered by processes known as clonal selection and affinity maturation by hyper-mutation. In Artificial Immune System (AIS) literature, the shorthand ‘clonal selection’ is often used to refer to both processes. According to Burnet, biological clonal selection (Burnet, 1959) occurs to the degree that a B-cell matches an antigen. A strong match causes a B-cell to be cloned many times, and a weak match results in little cloning. These ‘clones’ are mutated from the original B-cell at a rate inversely proportional to the match strength: a strongly matching B-cell mutates little and thus retains its match to the antigen to a slightly greater or lesser extent; a weakly matching cell mutates much more. As mentioned above, since 1959 there have been improvements to Burnet’s theory, with respect to the Division of Computer Engineering, School of Engineering, CUSAT Page 14
  22. 22. Network Security Based on Immune Principles, 2010 way antigens are recognized, whether as nonself or as dangerous, but his basic principles of clonal selection and affinity maturation by hyper-mutation are sufficient for the purposes of AIS clonal selection. De Castro and Von Zuben’s CLONALG The artificial form of clonal selection has been popularized mainly by de Castro and Von Zuben, beginning with an algorithm they called CSA (de Castro and Von Zuben, 1999), which was then modified and renamed CLONALG (deCastroandVonZuben,2000). CLONALG currently exist in two forms(de Castro and Von Zuben, 2002)—one for optimization tasks and one for pattern matching, however the pattern matching algorithm has only recently been more fully investigated (White and Garrett, 2003). CLONALG has the advantage of being simple to implement, relative to previous work such as Fukuda’s. The basic CLONALG optimization algorithm is as follows: 1. Initialization: Create an initial, random population of antibodies, P0. Iterate steps 2–7 if a predefined termination condition is not met. 2. Evaluation and Selection 1: Select a subset, F, of the fittest antibodies from Pt according to some fitness function, f(abi) . 3. 4. Cloning: For each ab ∈ F ,create a set of clones, Ci , such that |Ci| = nc(abi) . The set of all clones, C = ∪iCi . Mutation: Mutate each clone c ∈ C by a function am(f(abi ) , ρ) Add the mutated clones, C’, to Pt to give Pt `. Division of Computer Engineering, School of Engineering, CUSAT Page 15
  23. 23. Network Security Based on Immune Principles, 2010 Evaluation and Selection 2: Select a subset, F` of the fittest 5. antibodies from Pt` . Diversity: Add r randomly generated new B-cells to F` give a new 6. population Pt `` . Death: Retain only the best | Pt | members of Pt`` to give Pt+1 ;all 7. other B-cells are considered to have died. The effect, on average, is that each successive generation of B-cells tends to be closer to the state space optima than the previous generation. The addition of random members at each generation provides diversity (to search unexplored areas of the state space) for still better approximations to the optima, but the evolutionary pressure lessens as all anti-bodies tend to their(local) optima. DeCastro presents results that show convergence can be on several optima at the same time (multi-modal optimization) (de Castro and Von Zuben, 2000). Typical Applications: Typical applications of clonal selection include the following: • CLONALG has been applied to uni-modal, combinatorial and multimodal optimization, and to the problem of initializing the centres of radial basis functions (de Castro and Von Zuben, 2001). • The use of CLONALG has been suggested for various types of pattern recognition (de CastroandVonZuben,2002)and shown to work by (WhiteandGarrett,2003). • Cutello and Nicosia have applied their clonal selection method to the Division of Computer Engineering, School of Engineering, CUSAT Page 16
  24. 24. Network Security Based on Immune Principles, 2010 graph colouring problem, to two NP-complete problems, and multiple character recognition problem (Cutello and Nicosia, 2002; Nicosia et al., 2003). • Hart et al.,1998, use a clonal selection-based method in automated scheduling. • Greensmith and Cayzerhave recently suggested the use of AIRS in document classification (Greensmith and Cayzer, 2003). Division of Computer Engineering, School of Engineering, CUSAT Page 17
  25. 25. Network Security Based on Immune Principles, 2010 CHAPTER-5 SEMINAL APPROACH Original work on Immune based Intrusion Detection System (IDS) and Network security was done by Forest et al. With their research they put forward an architecture for an Artificial Immune System (ARTIS) and implemented the methodology in an intrusion detection system called LISYS (Light weight Intrusion detection SYStem) 5.1 DEFINING THE PROBLEM All discrimination between self and nonself in the Immune System (IS) is based upon chemical bonds that form between protein chains. To preserve generality, they modelled protein chains as binary strings of fixed length l. The IS must distinguish self from nonself based on proteins; ARTIS addresses a similar problem, which is defined as follows. two disjoint subsets called self S and nonself N (i.e., U = S ∪N, S ∩N = Φ). ARTIS The set of all strings of length l forms a universe, U, which is partitioned into faces a discrimination or classification task: Given an arbitrary string from U, classify it as either normal (corresponding to self) or anomalous (corresponding to nonself). ARTIS can make two kinds of discrimination errors: A false positive occurs when a self string is classified as anomalous, and a false negative occurs when a nonself string is classified as normal. The IS also makes similar errors: A false negative occurs when the IS fails to detect and fight off pathogens, and a false positive error occurs when the IS attacks the body (known as an autoimmune response). In the body, both kinds of errors are harmful, so the IS has apparently evolved to minimize those errors; similarly, the goal of ARTIS is to minimize both kinds of errors when real-world problems are mapped to Division of Computer Engineering, School of Engineering, CUSAT Page 18
  26. 26. Network Security Based on Immune Principles, 2010 this abstraction, self and nonself may not be disjoint, because some strings may characterize both self and nonself. In this case, the categorization of strings as either one or the other will lead to unavoidable errors. We do not consider that case here. However, it illustrates the importance of choosing the right characteristic for the application domain: It is essential to choose the equivalent of proteins that can be used to reliably discriminate between self and nonself. 5.2 DETECTORS Natural immune systems consist of many different kinds of cells and molecules that have been identified and studied experimentally. ARTIS is simplified by introducing one basic type of detector modelled on the class of immune cells called lymphocytes. This detector combines properties of B-cells, T-cells, and antibodies. ARTIS is similar to the IS in that it consists of a multitude of mobile detectors, with a graph G = (V, E); each vertex v ∈ V contains a local set of detectors (called a circulating around a distributed environment. We model the distributed environment detection node), and detectors migrate from one vertex to the next via the edges. The graph model also provides a notion of locality: Detectors can only interact with other detectors at the same vertex. Lymphocytes have hundreds of thousands of identical receptors on their surface (and hence are termed monoclonal). These receptors bind to regions (epitopes) on pathogens. Binding depends on chemical structure and charge, so receptors are likely to bind to a few similar kinds of epitopes. The greater the likelihood of a bond occurring, the higher the affinity between the receptor and epitope. In ARTIS, both epitopes and receptors are modelled as binary strings of fixed length l, and chemical binding between them is modelled as approximate string matching. In effect, each detector is associated with a binary string, which represents its receptors. Obvious approximate matching rules include Hamming distance and edit distance, but we have adopted a more immunologically plausible rule called r- Division of Computer Engineering, School of Engineering, CUSAT Page 19
  27. 27. Network Security Based on Immune Principles, 2010 contiguous bits (Percus et aI., 1993): Two strings match if they have r contiguous bits in common. The value r is a threshold and determines the specificity of the detector, which is an indication of the size of the subset of strings that a single detector can match. For example, if r = l, the matching is completely specific, that is, the detector will match only a single string (itself), but if r = 0, the matching is completely general, that is, the detector will match every single string of length l. Figure 1: A two-dimensional representation of a universe of strings. Each string can belong to one of two sets: self or nonself. In this diagram, each point in the plane represents a string; if the point lies within the shaded area it is self, otherwise nonself A consequence of a partial matching rule with a threshold, such as r -contiguous bits, is that there is a trade-off between the number of detectors used and their specificity as the specificity of the detectors increases, so does the number of detectors required to achieve a certain level of detection. The optimal r is one that minimizes the number of detectors needed but still gives good discrimination. A lymphocyte becomes activated when its receptors bind to epitopes. Activation changes the state of the lymphocyte and triggers a series of reactions that can lead to elimination of the pathogens. A lymphocyte will only be activated when the number of its receptors binding to epitopes exceeds a threshold. Chemical bonds between receptors and epitopes are not long lasting, so to be activated, a lymphocyte must bind sufficient receptors within a short period of time. We Division of Computer Engineering, School of Engineering, CUSAT Page 20
  28. 28. Network Security Based on Immune Principles, 2010 model this with activation thresholds: A detector must match at least T strings within a given time period to be activated. This is implemented by allowing the detector to accumulate matches but decaying the match count over time, i.e., there is a γmatch probability that the match count will be reduced by one at any timestep. This models the probability of a bond between a receptor and an epitope decaying. Once a detector has been activated, its match count is reset to zero. 5.3 TRAINING THE DETECTION SYSTEM Lymphocytes are called negative detectors because they are trained to bind to nonself; i.e., when a lymphocyte is activated, the IS responds as if nonself were detected. This simple form of learning is known as tolerization, because the lymphocytes are trained to be tolerant of self. Lymphocytes are created with randomly generated receptors and so could bind to either self or nonself. One class of lymphocytes, T-cells, is tolerized in a single location, the thymus, which is an organ just behind the breastbone. Immature T-cells develop in the thymus, and if they are activated during development, they die through programmed cell death (apoptosis). Most self proteins are expressed in the thymus, so T-cells that survive to maturation and leave the thymus will be tolerant of all those self proteins. This process is called negative selection, because the T-cells that are not activated are the ones selected to survive. Lymphocytes are trained to perform anomaly detection. The IS uses a training set of self (proteins present in the thymus) to produce detectors that can distinguish between self and nonself. This clearly will not work if nonself is frequently expressed in the thymus, because then the IS will also be tolerant to that nonself. The underlying assumption is that self occurs frequently compared to nonself. This assumption is the basis of most anomaly detection systems, which define normal as the most frequently occurring patterns or behaviours. Division of Computer Engineering, School of Engineering, CUSAT Page 21
  29. 29. Network Security Based on Immune Principles, 2010 5.4 MEMORY The IS has an adaptive response that enables it to learn protein structures that characterize pathogens it encounters and "remember" those structures so that future responses to the same pathogens will be very rapid and efficient. This is called memory-based detection, because the IS "remembers" the structures of known pathogens to facilitate future responses. When the IS encounters pathogens of a type it has not encountered before, it mounts a primary response, which may take several weeks to eliminate the infection; during the primary response the IS is learning to recognize previously unseen foreign patterns. When the IS subsequently encounters the same type of pathogens, it mounts a secondary response, which is usually so efficient that there are no clinical indications of a re-infection. The secondary response illustrates the efficacy of memory-based detection. Memory-based detection in the IS has another important property: It is associative (Smith et aL, 1998). Memory detection allows the IS to detect new pathogens that are structurally related to ones previously encountered. This concept underlies immunization, where inoculation with a harmless form of pathogen A (such as an attenuated virus) induces a primary response that generates a population of memory cells that are cross-reactive with a harmful kind of pathogen B. This population of memory cells will ensure that the IS mounts a secondary response to any infections of B. Primary responses are slow because there may be very few lymphocytes that bind to a new type of pathogen, so the immune response will not be very efficient. To increase efficiency, activated lymphocytes clone themselves so that there is an exponentially growing population of lymphocytes that can detect the pathogens. The higher the affinity between a lymphocyte's receptors and the pathogen epitopes, the more likely it is that the lymphocyte will be activated. Hence, the lymphocytes that are replicating are those with the highest affinity for the pathogens present. During this time, the pathogens are also replicating, so there is a race between pathogen replication and lymphocyte replication. Division of Computer Engineering, School of Engineering, CUSAT Page 22
  30. 30. Network Security Based on Immune Principles, 2010 The IS improves its chances in this race through a class of lymphocytes called Bcells, which are subject to high mutation rates (known as somatic hypermutation) when cloning (ARTIS currently do not model this aspect). Hypermutation combined with clonal expansion is an adaptive process known as affinity maturation. Once the infection is eliminated, the IS retains a population of memory cells: long-lived lymphocytes that have a high affinity for the pathogen. This population of memory cells is of sufficient size and specificity to enable the very rapid secondary response when a re-infection occurs. ARTIS uses a similar form of memory based detection. When multiple detectors at a node are activated by the same nonself string s, they enter a competition to become memory detectors. Those detectors that have the closest match (under r-contiguous bits) with s will be selected to become memory detectors. These memory detectors make copies of themselves, which then spread out to neighbouring nodes. Consequently, a representation of the string s is distributed throughout the graph; future occurrences of s will be detected very rapidly in any node because detectors that match s exist at every node. In addition, memory detectors have lowered activation thresholds so that they will be activated far more rapidly in future to reoccurrences of previously encountered nonself strings, i.e., they are much more sensitive to those strings. This mimics the rapid secondary response seen in the IS. 5.5 CO-STIMULATION Unfortunately, tolerization in the IS is not as straightforward as described earlier. Some self proteins are never expressed in the thymus, and so lymphocytes that are tolerized centrally in the thymus may bind to these proteins and precipitate an autoimmune reaction. This does not happen in practice because T-cells require costimulation to be activated: In addition to binding to proteins (called signal one), a T-cell must be costimulated by a second signal. This second signal is usually a chemical signal which Division of Computer Engineering, School of Engineering, CUSAT Page 23
  31. 31. Network Security Based on Immune Principles, 2010 occurs when the body is damaged in some way. The second signal can come either from cells of the IS or other cells of the body. When a T-cell receives signal one in the absence of signal two, it dies. Hence, auto-reactive T-cells (those that bind to self) will be eliminated in healthy tissues. However, if the tissues are damaged, auto-reactive Tcells could survive. But they would only survive while the damage persisted; as soon as they left the area of damage, they would receive signal one in the absence of signal two and die. Moreover, they would have a high likelihood of dying before they ever reached the area of tissue damage because of the healthy tissue passed through on the way. s ∈ S during its tolerization period, so it is possible that detectors will mature that Likewise, we cannot assume that in ARTIS a detector will encounter every string match some strings in S. ARTIS implements a crude form of costimulation. Ideally, the second signal should be provided by other components of the system, but our first approximation is to use a human operator to provide the second signal. When a detector d is activated by a string s, it sends a signal to a human operator who is given a time period Ts (called the costimulation delay) in which to decide if s is really nonself. If the operator decides that s is indeed nonself, a second signal is returned to d. If the operator decides that s is actually self, no signal is sent to d and d dies off and is replaced by a new, immature detector. Consequently, a human operator need make no response in the case of false positives; the system will automatically correct itself to prevent similar false positives in future. 5.6 THE LIFECYCLE OF A DETECTOR If detectors lived indefinitely and only died off when they failed to receive costimulation, most detectors would only be immature once. Any nonself strings that occurred during the period of immaturity of these detectors would not be detected in the future because all detectors would be tolerant of them and would remain tolerant. In the IS this is not a problem because lymphocytes are typically short-lived (a few days) and so new, immature lymphocytes are always present, i.e., the population of lymphocytes is dynamic. ARTIS introduces a similar measure: Each detector has a probability Pdeath of dying once it has matured. When it dies, it is replaced by a new randomly-generated, Division of Computer Engineering, School of Engineering, CUSAT Page 24
  32. 32. Network Security Based on Immune Principles, 2010 immature detector. Ultimately, every detector dies sooner or later, unless it is a memory detector. An exception to the finite lifespan is memory detectors. In the IS, memory cells are long-lived so that the patterns that they encode are not lost over time. For example, exposure to measles early in life confers life-long protection against the disease. Similarly, memory detectors in ARTIS are long-lived: They can die only as a consequence of a lack of costimulation. A problem with this mechanism is that eventually all detectors could become memory detectors, with a loss of the advantages conferred by dynamic detector populations. To combat this problem, we limit the number of memory detectors to some fraction md of the total detectors. If a new detector wins a competition and becomes a memory detector, and the fraction of memory detectors has reached the limit, then the least-recently-used (LRU) memory detector is demoted to an ordinary mature detector (consequently it once more has a finite lifespan). We demote the LRU detector because the LRU detector is one that has not been activated for the longest time period of any memory detector, and hence we assume that it is the least useful memory detector. An additional benefit of a dynamic detector population is that the system can adapt to changing self sets. As the self set changes, it will tolerize new immature detectors, and mature detectors that were causing false positives will either die from lack of costimulation or from age. Eventually, all detectors will be tolerant of self, providing self does not change too quickly. If self changes rapidly compared to the life span of a detector, there will be a sizeable portion of detectors that are immature, because mature detectors will continually die due to lack of costimulation. Division of Computer Engineering, School of Engineering, CUSAT Page 25
  33. 33. Network Security Based on Immune Principles, 2010 Figure 2: Lifecycle of a detector 5.7 REPRESENTATIONS Molecules of the major Histocompatibility Complex (MHC) play an important role in the IS, because they transport peptides (fragments of protein chains) from the interior regions of a cell and present these peptides on the cell's surface. This mechanism enables roving IS cells to detect infections inside cells without penetrating the cell membrane. There are many variations of MHC, each of which binds a slightly different class of peptides. Each individual in a population is genetically capable of making a small set of these MHC types (about ten), but the set of MHC types varies in different individuals. Consequently, individuals in a population are capable of recognizing different profiles of peptides, providing an important form of populationlevel diversity. Division of Computer Engineering, School of Engineering, CUSAT Page 26
  34. 34. Network Security Based on Immune Principles, 2010 We speculate that MHC plays a crucial role in protecting a population of for which no valid detectors can be generated (D'haeseleer, 1996): A nonself string a ∈ individuals from holes in the detection coverage of nonself. A hole is a nonself string N is a hole if and only if ∀u ∈ U such that u and a match, then u matches some self string s ∈ S. Holes can exist for any approximate match rule with a constant probability of matching (such as the r-contiguous bits) (D'haeseleer, 1996), and it is reasonable to assume that they will exist in the biological realm of receptor binding, because binding between receptors in the IS and peptides is approximate. Moreover, pathogens will always be evolving so that they are more difficult to detect (they evolve towards becoming holes in the detection coverage). Those pathogens that are harder to detect will be the ones that survive better and hence are naturally selected. In the IS, each type of MHC can be regarded as a different way of representing a protein (depending on which peptides it presents); in effect, the IS uses multiple representations, or views, of proteins. Multiple representations can reduce the overall number of holes, because different representations will induce different holes. In ARTIS, each detection node uses a different representation: It filters incoming strings through a randomly-generated permutation mask. For example, given the strings S1 = 01101011, S2 = 00010011, and a permutation Ʌ defined by the randomly-generated permutation mask 1-6-2-5-8-3-7-4, these strings become Ʌ (S1)= 00111110 and Ʌ (S2) = 00001011. Using the contiguous bits rule with r = 3, S1 matches S2 because the last 3 positions are the same, but under the new representation, Ʌ (S1) does not match Ʌ (S2). Having a different representation for each detection node is equivalent to changing the "shape" of the detectors, while keeping the "shape" of the self set constant. Consequently, where one node fails to detect a nonself string, another node could succeed. Division of Computer Engineering, School of Engineering, CUSAT Page 27
  35. 35. Network Security Based on Immune Principles, 2010 Figure 3: The problem of holes can be reduced by using different representation s in different nodes 5.8 MAPPING OF ARTIS TO NETWORK SECURITY The first and, in a sense, most important step when applying ARTIS to a domain is to choose the equivalent of proteins. For example, in computer security, there are many different levels at which we can monitor performance and behaviour; characteristics at each of these levels represent potential proteins. We have chosen to monitor network traffic, and so our protein is a "datapath triple", which consists of a source internet protocol (ip) address, a destination ip address, and a TCP service (or port) by which two computers communicate. Essentially, a protein is a connection between two computers. Currently, we monitor only the start of TCP connections, i.e., we monitor TCP SYN packets. We have chosen this level (network connections) because the environment is naturally distributed (there are many computers communicating), and because other researchers have successfully implemented anomaly detection by monitoring network connections (Heberlein et aI., 1990). In ARTIS, the connection information is compressed to a single 49-bit string that unambiguously defines the connection. Self is then the set of "normally occurring" connections observed over time on the LAN. Division of Computer Engineering, School of Engineering, CUSAT Page 28
  36. 36. Network Security Based on Immune Principles, 2010 Thus, self is defined in terms of frequencies: We assume implicitly that any connection that occurs frequently over a long period of time is part of the self set. A connection can occur between two internal computers on the LAN or between an internal computer on the LAN and an external computer outside the LAN. Each connection is represented by a 49-bit string (whether internal or external). Similarly, nonself is also a set of connections (using the same 49-bit representation), the difference being that nonself consists of those connections, potentially an enormous number, that are not normally observed on the LAN. In ARTIS, the environment is modelled with a graph, where each vertex defined a locality corresponding to a detection node. For the NID (Network Intrusion Detection) application, each vertex corresponds to a computer within the LAN (an internal computer), and the network represents a fully connected graph, because we have assumed the network is broadcast. Broadcast LANs have the convenient property that every location (computer) sees every packet passing through the LAN. In summary, LISYS is an implementation of ARTIS as described in above sections. The binary strings are mapped from data-path triples, and the environment is a network of computers, where each internal computer corresponds to a vertex in the graph, running a detection node. ARTIS has not yet incorporated mobile detectors, replication, or response. Division of Computer Engineering, School of Engineering, CUSAT Page 29
  37. 37. Network Security Based on Immune Principles, 2010 CHAPTER-6 DERIVATIVE APPROACH Many researchers have investigated Artificial Immune Systems after the initial work by Forest et al. Here I present one such work by Work By Yu-Fang Zhang, GuiHua Sun, Zhong-Yang Xiong In virtue of the previous research works, a novel method of intrusion detection based on artificial immune system is introduced and adapt to diverse network environment. Making use of the constraint-based detectors and any-r intervals matching rule, the encoding of antibody-antigen is put forward. In reality, most IP packets are normal. If they need to be detected by large number of detectors, the accessing time of normal IP packets will be slow down. For this reason, the self-pattern class is proposed. The self-pattern class describes the common character of the normal packets and is mined through cluster algorithm. When the IP packet accesses the host, it is first detected by self-patterns. If self-pattern class matches the IP packet, the IP packet will be allowed to access the host; otherwise, the doubtful IP packet needs to be detected by detectors. With the self-pattern class, accessing time of the normal IP packets is improved. 6.1 ENCODING OF ANTIGEN-ANTIBODY The encoding of antigen-antibody is the important part in designing intrusion detection model. Forrest et al. Developed R-Contiguous Matching Function (RCMF) . Both antibody and antigen are implemented as binary strings. The antibody matches the antigen if the binary strings have same bits in at least r contiguous places. The algorithm is convenient to detect a mass of non-self antigens by a few antibodies. Consequently, it can achieve better detecting rate. But for comparing match between antigen and antibody, bit by bit a sliding-window is needed. Division of Computer Engineering, School of Engineering, CUSAT Page 30
  38. 38. Network Security Based on Immune Principles, 2010 Related definitions are as follows. Definition 1 antigen: An antigen is a decimal string, which consists of n features from the IP packet by feature selection. Like: Ag=(x1…xi…xn). Ag denotes the antigen; n expresses the number of features; xi stands for the ith feature in the IP packet (Ag). For example, a packet from IP address, to an internal machine with host ID 50 through port 23, will be represented as (131, 13, 216, 191, 50, 23, …). Definition 2 antibody set: The basic structure of antibody set is as follows: D={d | d = < s, age, life, count >}. D represents the antibody; s denotes the pattern of this antibody; age shows the age of this antibody; life figures the lifespan of this antibody; count depicts the affinity of this antibody, namely the number of antigen matched by this antibody. The constraint-based detectors are used to represent the pattern of antibody (s). Its definition is given in rule R: R: If x ∈ [l 1, h1 ] ⋀.... xn ∈[ln , hn ] then nonself (x1…xi…xn) is a feature vector. And li , hi specifies the lower and upper values for the feature xi in the condition part of the rule R. Judging whether feature xi of the antigen is matched with antibody is given in formula (1). 1, xi ∈ d. [li, hi] G(xi ,d. [li , hi ]) = � ( 1≤ i ≤ i ∈ N) (1) 0, xi ∉ d. [li, hi] Where, • 1 denotes matching and 0 denotes non-matching; • d expresses detector; x shows antigen (IP packet); • xi figures the i-th feature in this antigen (x); Division of Computer Engineering, School of Engineering, CUSAT Page 31
  39. 39. Network Security Based on Immune Principles, 2010 • d.[li , hi ]) stands for the lower and upper values for the ith feature in this detector. The antibody consists of immature antibody, mature antibody, and memory antibody. The antibody, which has not passed through self-tolerance yet, is the immature antibody; which has passed self-tolerance and not been activated by antigen yet, is the mature antibody. The memory antibody derives from the activated mature antibody. An any-r intervals matching rule is used to determine a match between antibody and antigen. That is, if at least r numbers of antigen fall into the corresponding r intervals of an antibody, then that antibody is said to be matched with the antigen. The mechanism is given in formula (2). The description of the variable is the same as formula (1). 1, ∑n g(xi , d. [li , hi ]) ≥ R i=1 Fm(x,d)=� ( 1≤ i ≤ i ∈ N) (2) 0, otherwise For example, the pattern (s) of the antibody (d) is If ∈[5,21] ∧x2∈[56,68] ∧x3∈ [12,136] then nonself. The antigen is (20,79,132). The threshold r is 2. Following above-mentioned way, this antibody matches this antigen. 6.2 WORK MECHANISM OF DETECTORS AND ALGORITHM Only mature and memory detectors are responsible for detecting the IP packets. When the IP packet accesses the host, the useful detector population must detect it. If it is self-antigen, then will be allowed to access host; otherwise, it will be refused. Step 1. Acquire the antigen from the IP packet by data pre-processing. Step 2. Monitor the antigen by the self-pattern classes. If one self-pattern class matches with the antigen then go to step 8 else continue to step 3. Division of Computer Engineering, School of Engineering, CUSAT Page 32
  40. 40. Network Security Based on Immune Principles, 2010 Step 3. Use memory detectors to detect the antigen. Having higher affinities the memory detectors have priority to detect the antigen. If one memory detector matches with the antigen then go to step 5 else go to step 4. Step 4. Detect the antigen by the mature detectors. Having higher affinities the mature detectors have priority to detect the antigen. If one mature detector matches with the antigen then go to step 5 else continue to step 8. Step 5. Judge whether the antigen is non-self antigen through co-stimulation. If co-stimulation succeeds then go to step 6 else turn to step 7. Step 6. This antigen is an abnormal IP packet, and therefore it is rejected to access the host. In addition, add 1 to the affinity of this detector. Turn to step 9. Step 7. This self-antigen was mistakenly detected by this detector. So this detector is divided using the split-detector method. Step 8. This antigen is a normal IP packet, and therefore it is allowed to access the host. Step 9. Add 1 to the age of all detectors. Division of Computer Engineering, School of Engineering, CUSAT Page 33
  41. 41. Network Security Based on Immune Principles, 2010 Figure 4: The flow of detecting IP packet The mature detectors and memory detectors must comply with the mechanism of the lifecycle. A mature detector will die after a certain period of time when not encounter non-self antigens for activation and evolution. After the mature detector is activated, it becomes a memory detector that has a longer lifespan than mature detectors. When the number of memory detectors gathers to the limit, Formula (4) is used to decide which memory detector will be removed. F ac(d ) = c1×d.count – c2×d. age (4) Division of Computer Engineering, School of Engineering, CUSAT Page 34
  42. 42. Network Security Based on Immune Principles, 2010 where, • c1, c2 denote coefficients (weights) respectively, • d represents the memory detector, • d.count shows the affinity of the memory detector, • d.age stands for the age of the memory detector. The memory detector that has minimal value of function f ac(d ) will become a mature detector and it will be removed from memory detector population. 6.3 EXPERIMENTS AND RESULTS The experiments data comes from the KDD CUP competition data set. This data is part of the data collected from the MIT Lincoln Labs 1998 DARPA Intrusion Detection Evaluation Program and is considered benchmark data for evaluating intrusion detection system. There are about 4,900,000 records and has 41 attributes. Two experiments have been conducted. In the first experiment, the system was trained with the training data set, which consists of 60638 selected records. The training data set contains 60032 normal IP packets and 606 abnormal IP packets. In the second experiment, the system was tested with the testing data set, which consists of 4 groups of data sets. There are 30,000 records per group. The performance of an intrusion detection system may be evaluated in terms of TP rate and FP rate. TP (True Positive) rate is calculated as the number of abnormal patterns detected by the system, divided by the total number of abnormal patterns. FP (False Positive) occurs when the system wrongfully classifies normal patterns as abnormal patterns. In this experiment, FP rate is calculated as the number of false positives created by the system, divided by the total number of self-antigens. In both of the experiments, the comparison of TP and FP rates are made between the traditional method and the new method proposed by ZHANG et al. In the traditional method, call Method I, both antibody and antigen are implemented as binary strings and RCMF (R-Contiguous Matching Function) is used to determine a match between antibody and antigen. The method presented here is called Method II. Division of Computer Engineering, School of Engineering, CUSAT Page 35
  43. 43. Network Security Based on Immune Principles, 2010 Figure 5 Comparison of the TP and FP rates between Method Ⅰ(Forrest et al ) and Method Ⅱ (Zhang et al) Above figure shows when Method I is used, the TP and FP rates tend to steady until about 800 seconds, and besides, the first mature detector is generated until about 125 seconds; Figure also shows that when Method II is adopted, the TP and FP rates have already tended to steady after about 650 seconds and the first mature detector has already been generated after about 70 seconds. So the Method II presented by Zhang et al. can achieve the faster running speed than Method I for TP and FP. There are two reasons for the phenomena. First, the decimal string encoding of antigen-antibody and any r interval matching are adopted and this makes the running speed much faster than RCMF. Second, the self-pattern class introduced, which accelerates the accessing of the normal IP packets; moreover, since most of IP packets are normal data, the total running speed of training is much faster. Division of Computer Engineering, School of Engineering, CUSAT Page 36
  44. 44. Network Security Based on Immune Principles, 2010 CHAPTER-7 DANGER THEORY: AN ALTERNATIVE TO NEGATIVE DETECTION This section considers an emerging AIS technique for intrusion detection that is based on an abstraction of Matzinger’s danger theory (Matzinger, 2002). Danger theory AIS is given a section to itself because it is a distinct, fast growing alternative/addition to negative detection. 7.1 BACKGROUND IMMUNOLOGY By the mid-1990s, immunology had made several modifications to the selfnonself theory. The original self-nonself theory did not fit experimental observations, such as why there is no immune response to bacteria in our gut, or to air, both of which are clearly nonself, and Matzinger suggested a radical, alternative approach. What if, instead of responding directly to a nonself element of some sort, the immune system responds to cells that are under stress; cells that are raising an alarm to some form of danger, such as an attack from a bacterium or virus? Matzinger characterizes this danger theory as a means of detecting some self from some non-self, thus explaining why there is no immune response to harmless nonself, but why there is usually an immune response to harmful self, such as cancerous cells. This is the basis of danger theory. Cells can die in two ways: via apoptotic, normal death that has been requested by the body’s internal signalling system, or via necrosis, a form of unexpected death caused by something going wrong with the cell, which often causes an inflammatory response. Matzinger suggested that the immune system is particularly activated by cell necrosis. Thus, under the danger theory, the immune response is contextualized to the location in which necrosis is occurring and is no longer a system-wide response. Despite Matzinger’s departure from more established theories, such as the work of Janeway, e.g. (Medzhitov and Janeway, 2002), both require two signals for an immune response to be initiated. This helps to avoid false positive reactions in nature (causing Division of Computer Engineering, School of Engineering, CUSAT Page 37
  45. 45. Network Security Based on Immune Principles, 2010 autoimmune effects), and may be of use in AIS. However, the major problem with the Matzinger’s theory is that the exact nature of the danger signal(s) is still unclear. 7.2 THE DEVELOPMENT OF DANGER THEORY AIS It appears that danger theory might help intrusion detection systems by focusing attention only on those internal or external events that are likely to be harmful— generally a smaller subset of events than the nonself subset. Whether or not this danger theory has validity in any natural immune system response is of little importance here because danger theory AIS (DT-AIS) is an abstraction of this process for purely computational purposes; it is not a model of the immune system. Several researchers are pursuing this approach (Aickelin andCayzer, 2002;Williamson, 2002; Burgess, 1998). It may seem pointless for DT-AIS to simply change the self-nonself dichotomy or a danger-nondanger dichotomy, but there are some important differences. The concepts of self and nonself are both relative to the self strings, which are not necessarily the full set of true self, may change over time, and may contain many features or attributes. However, the concepts of ‘dangerous’ and ‘non-dangerous’ are grounded in undesirable events, and a detection system based on these concepts should, in principle, only report the specific attributes or features that are causing concern. These danger signals may be positive—e.g. the presence of an event or state such as high memory or disk activity, frequency of file changes, unusual signals(e.g. SIGABRT in UNIX), and the presence of nonself (Aickelin and Cayzer, 2002)—or the signals may be negative, such as the absence of such signals or states. In addition, the possibility of requiring two signals to be present before an immune response is initiated seems to be a way of minimizing false positives. Secker et al., 2003, describe one possible application, email mining and spam detection. Their paper includes some pseudo code, but as yet the system has not been fully implemented, mainly because the nature of the danger signals is unclear. Division of Computer Engineering, School of Engineering, CUSAT Page 38
  46. 46. Network Security Based on Immune Principles, 2010 7.3 ASSESSMENT OF ITS PROMISE In the natural immune system, it is not immediately clear which signals are danger signals (if any), and it already appears that this is the major problem in implementing a DT-AIS too. Another drawback is that the DT-AIS system will apparently have to wait for damage to occur to self at least once before any protection steps can occur, because it requires examples of dangerous states. This is not the case with negative detection, which can give a response based solely on positive examples data of self. Issues of scaling also need to be considered. DT-AIS is a more complex theory than standard negative selection (e.g. the algorithm in (Secker et al., 2003)) because it is based on a much expanded version of Burnet’s original work in negative selection and the proliferation of B-cells (Burnet, 1959). It is also unclear how best to involve a human in the loop (if at all). Nevertheless, the possibility of using DT-AIS to obtain key features from a set of attributes is an interesting problem, with relevance in many areas. If DT-AIS can find an automated solution to this problem it would have great benefits within, and beyond, AIS. Division of Computer Engineering, School of Engineering, CUSAT Page 39
  47. 47. Network Security Based on Immune Principles, 2010 Chapter-8 EVALUATION OF ARTIFICIAL IMMUNE SYSTEMS Although AIS is a relatively young field, advancing on many fronts, some central themes have become apparent—the question is, is these AIS delivering anything useful, or are they just another addition to the increasingly long line of approaches that are biologically inspired? These approaches include established paradigms such as genetic and evolutionary computation (GEC), artificial neural networks (ANN) and various forms of artificial life; as well as less established topics such as ant colony dynamics (Dorigo, 1992; Dorigo, 1999) and cell membrane computing (Paun, 2002). There have been several surveys of AIS, and a few comparisons between AIS and other methods, e.g. (Dasgupta, 1997; Dasgupta, 1999; Perelson andWeisbuch, 1997). Unfortunately, most of this work is now somewhat out-of-date; only Perelson has reviewed three of the main themes of AIS, but did so from an immunological point of view; none have discussed danger theory, and there has been no specific attempt to assess the usefulness of AIS, with a view to its future development, which is the central focus here. It would appear that answers to the following questions are of value as an introduction and critique of AIS, and its relationship to other paradigms: • How have the various types of AIS developed, and what are their positive and negative features? • What are the criteria for deciding whether methods, such as AIS, are useful? • If ‘distinctiveness’ is one criterion of usefulness, how distinct are AIS from other paradigms, such as GEC and ANN? • If ‘effectiveness’ is another criterion of usefulness, what can AIS do uniquely, better or faster than other methods? • Having applied the assessment criteria, are AIS useful? What does this suggest for the future development of AIS? Division of Computer Engineering, School of Engineering, CUSAT Page 40
  48. 48. Network Security Based on Immune Principles, 2010 DEFINING ‘USEFUL’ IN TERMS OF ‘DISTINCT’ AND ‘EFFECTIVE’ An algorithm may be distinctly different from other algorithms but ineffective, or it may be highly effective but be reducible to other, existing paradigms, and therefore lacking in distinctiveness. However, if a method is both distinct and effective, then it offers a truly useful means of computation. These criteria of ‘distinctiveness’ and ‘effectiveness’ will be used for assessing the usefulness of AIS. They may also prove to be relevant in testing the usefulness of other computational methods. WHAT MAKES A RESEARCH PARADIGM DISTINCTIVE? With a biologically inspired paradigm, such as AIS, there may be a temptation to appeal to its source of inspiration as an indication of its distinctiveness; this appears to be a mistake. Firstly, it is possible to envisage methods that have different sources of inspiration, but which result in mathematically identical methods. Secondly, a single source of inspiration, such as immunology, has given rise to several types of AIS. Thirdly, even biologically implausible, or unlikely, ideas can inspire distinct mathematical algorithms. The source of inspiration is not a reliable test or distinctiveness. Given a supposed new method, the test questions are: • D.1 Does the new method contain unique symbols, or can the features of this method be transformed into the features of another method, without affecting the dynamics of the new method? • D.2 Are the new method’s symbols organized in novel expressions, or can its expressions be transformed to become same as some other method, without affecting its dynamics? • D.3 Does the new method contain unique processes that are applied to its expressions, or can its processes be transformed to become identical to some other method, without affecting its dynamics? If the answer is ‘no’ to all these questions then there is almost certainly nothing distinctive in the supposedly new method. The more questions that can be answered ‘yes’, the more likely that the method is distinct in some way from the existing method. Division of Computer Engineering, School of Engineering, CUSAT Page 41
  49. 49. Network Security Based on Immune Principles, 2010 WHAT MAKES A RESEARCH PARADIGM EFFECTIVE? A similar set of questions can help when assessing the effectiveness of an AIS method, although they are more obvious because assessing effectiveness is standard scientific practice: • E.1 Does the method provide unique means of obtaining a set of results? • E.2 Does the method provide better results than existing methods, when applied to a shared benchmark task? • E.3 Does the method allow a set of results to be obtained more quickly than another method, on a benchmark test? The same types of experiments should be run in both cases: there is no point, for example, in comparing the average best fitness in one test with the best fitness in another test Negative Detection Methods: No .1 Yes .1 In some cases perhaps Yes .2 .2 Yes .3 No .3 Overall : Yes Overall : Yes Clonal Selection Methods: No .1 .2 No .1 No, except for ARB Yes .3 .2 Sometimes, but superseded by AINET? Not yet clear .3 Overall : Yes Overall : Sometimes, superseded by AINET but Figure 6: Results of applying the ‘usefulness criteria’ to established types of AIS Division of Computer Engineering, School of Engineering, CUSAT Page 42
  50. 50. Network Security Based on Immune Principles, 2010 For a method to be effective, the answer must be ‘yes ’to at least one of the questions; unfortunately there is a complication with AIS methods. From the above results Negative detection appears to be distinct, the closest analogy probably being positive only learning in ILP. Comparison with many more techniques is required, but this will come with time as the field of AIS matures. Although it has been shown that there are application areas in which negative detection is uniquely able to work, the evidence could be more convincing. The discovery of new application areas in which negative detection can triumph will not only strengthen its position, it will undoubtedly help its theoretical development—perhaps negative databases (Esponda and Forrest, 2004) will prove to be one example. Clonal selection has also been presented as a distinct method, albeit one that is an immune-inspired form of GEC. The type of mutation, and method of reproduction, is both novel, as is the two-stage evaluation-selection operation per generation. In terms of effectiveness, CLONALG provides excellent results for optimization problems, but are they superior to other methods? This is still unclear. A comparison to GEC may help. GEC is widely studied despite there being deterministic methods that can optimize faster and better under some circumstances. GECs provide an excellent, general framework for optimization, and, furthermore, their study is worthwhile because investigating evolutionary processes is interesting, and can be simpler, than deterministic methods. These arguments also hold true for clonal selection, which is a form of GEC. The distinctiveness of immune network methods is inherited from clonal selection, and has also been shown to be distinct from other, similar methods, such as classifier systems. Hunt and Cooke’s now dated work is still one of the most effective, although de Castro’s AINET may prove to be a better replacement for variety of applications, and Mohr and Timmis’ work on using network models for GPS data is notable. Division of Computer Engineering, School of Engineering, CUSAT Page 43
  51. 51. Network Security Based on Immune Principles, 2010 CHAPTER-9 CONCLUSION Perhaps the biggest difficulty faced by AIS is that it has few application types for which it is undisputedly the most effective method. Despite the many points in its favour, this single point is enough to allow it to be ignored by many. Although two important areas have been identified in which AIS is unique in its ability to provide solutions, further impressive demonstrations of effectiveness will be required if AIS is to be pushed to the forefront. In a related point, researchers in both negative (and positive) detection (Gonzlez and Dagupta, 2003; Gomez et al., 2003), and various types of immune network (de Castro and Von Zuben, 2001; Nasaroui et al., 2002b), are beginning to produce algorithms that are hybrids of AIS with other methods. This is perhaps an indication that generalist versions of current AIS methods alone are not powerful enough for some tasks, but it should not be taken to mean that tailored applications of AIS cannot produce excellent results. It is notable that there is a strong relationship between clonal selection and immune networks; in future, clonal selection may be classified as a form of network modelling with the degree of networking set to zero (Garrett, 2003). This would leave two main areas of AIS research, negative/danger detection and immune networks: one that focuses on detecting antigen or danger and the other that focuses on destroying it. The immune system itself has much more to offer than this: the innate immune system, for example, has been all but ignored; as has the immune system’s use of parallel attack processes. As a whole, the immune system can be viewed as an adaptive, homeostatic control system (Somayaji, 2002; Bersini, 1991), and this suggests interesting possibilities, perhaps for a more unified AIS, and certainly for new applications in control. Division of Computer Engineering, School of Engineering, CUSAT Page 44
  52. 52. Network Security Based on Immune Principles, 2010 In conclusion then, is AIS a useful paradigm? It has reliably provided three distinct types of method, and in most cases has produced highly effective results; however, these two aspects have not yet been reliably combined—therefore, in terms of the criteria of usefulness, the current answer appears to be, ‘almost.’ However, it is only now that AIS is beginning to mature: the annual ICARIS conferences, which began in 2002 to provide a forum exclusively for AIS research, have already provided a good deal of stimulation to the field; negative selection and danger theory appear to have a way forward from the ‘detector impasse’, and both clonal selection and immune networks are exploring new ground with encouraging results. It seems extremely likely that AIS will become increasingly useful over the next few years. Division of Computer Engineering, School of Engineering, CUSAT Page 45
  53. 53. Network Security Based on Immune Principles, 2010 REFERENCES [1] Anil Somayaji, Steven Hofmeyr, & Stephanie, Principles of a Computer Immune System.1997 New Security Paradigms Workshop Langdale, Cumbria UK . ACM 1998. [2] S. Hafmeyr and S. Forrest, "Architecture for an Artificial Immune System". In Evolutionary Computation, vol. 7, 2000. [3] Forrest, S., Perelson, A. S., Allen, L., and Cherukuri, R. (1994). Self-nonself discrimination in a computer. In Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy. [4] Yu-Fang Zhang, Gui-Hua Sun, Zhong-Yang Xiong(2006) A Novel Method Of Intrusion Detection Based On Artificial Immune System .In Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006. [5] How Do We Evaluate Artificial Immune Systems? Simon M. Garrett. Evolutionary Computation Volume 13, Number 2 ,MIT. [6] D. Dasgupta, "Immunity-Based intrusion Detection System: A General Framework, in Proceedings of the 22nd National Information System Security Conference. pp. 147-160, 1999. [7] I. Kim and P. Bentley, "An Artificial Immune Model for Network Intrusion Detection", in Proceedings of the 7th European Congress on Intelligent Techniques and Soft Computing, 1999. [8] Gonzlez, F. and Dagupta, D. (2003). Anomaly detection using real-valued negative selection. Genetic Programming and Evolvable Machines, 4(4):383– 403. Division of Computer Engineering, School of Engineering, CUSAT Page 46