• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
A Study and Comparative analysis of Conditional Random Fields for Intrusion detection
 

A Study and Comparative analysis of Conditional Random Fields for Intrusion detection

on

  • 542 views

Intrusion detection systems are an important component of defensive measures protecting computer systems and networks from abuse. Intrusion detection plays one of the key roles in computer security ...

Intrusion detection systems are an important component of defensive measures protecting computer systems and networks from abuse. Intrusion detection plays one of the key roles in computer security techniques and is one of the prime areas of research. Due to complex and dynamic nature of computer networks and hacking techniques, detecting malicious activities remains a challenging task for security experts, that is, currently available defense systems suffer from low detection capability and high number of false alarms. An intrusion detection system must reliably detect malicious activities in a network and must perform efficiently to cope with the large amount of network traffic. In this paper we study the Machine Learning and data mining techniques to solve Intrusion Detection problems within computer networks and compare the various approaches with conditional random fields and address these two issues of Accuracy and Efficiency using Conditional Random Fields and Layered Approach.

Statistics

Views

Total Views
542
Views on SlideShare
542
Embed Views
0

Actions

Likes
0
Downloads
16
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    A Study and Comparative analysis of Conditional Random Fields for Intrusion detection A Study and Comparative analysis of Conditional Random Fields for Intrusion detection Document Transcript

    • International Journal of Research in Computer ScienceeISSN 2249-8265 Volume 2 Issue 4 (2012) pp. 31-38© White Globe Publicationswww.ijorcs.org A STUDY AND COMPARATIVE ANALYSIS OF CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION Deepa Guleria1, M.K.Chavan2 1 PG Scholar, VPCOE Baramati Email: deepa.guleria@gmail.com 2 Asstt Professor, VPCOE Baramati Email:chavan_manik@yahoo.comAbstract: Intrusion detection systems are an important Network Intrusion Detection Systems (NIDS) and thecomponent of defensive measures protecting computer other is Host Intrusion Detection Systemssystems and networks from abuse. Intrusion detection (HIDS).NIDS monitors the packets from the networkplays one of the key roles in computer security and it is an independent platform that identifiestechniques and is one of the prime areas of research. intrusion by examining the network traffic andDue to complex and dynamic nature of computer multiple hosts. HIDS analyzes the audit data of thenetworks and hacking techniques, detecting malicious operation system and monitors the inbound andactivities remains a challenging task for security outbound packets from the device only. It alerts theexperts, that is, currently available defense systems user or administrator of suspicious activity is detectedsuffer from low detection capability and high number [7].Intrusion detection systems can also be classifiedof false alarms. An intrusion detection system must as signature based or anomaly based depending uponreliably detect malicious activities in a network and the attack detection method. The signature-basedmust perform efficiently to cope with the large amount systems are trained by extracting specific patterns (orof network traffic. In this paper we study the Machine signatures from previously known attacks while theLearning and data mining techniques to solve anomaly-based systems learn from the normal dataIntrusion Detection problems within computer collected when there is no anomalous activity. Thenetworks and compare the various approaches with first approach is called as Misuse Detection and leadsconditional random fields and address these two issues us towards Signature Based IDS while the second isof Accuracy and Efficiency using Conditional Random called as Anomaly Detection and leads us to BehaviorFields and Layered Approach. based IDS. The Signature based systems though have very high detection accuracy but they fail when anKeywords: Intrusion Detection System, Conditional attack is previously unseen. On the other hand,Random Fields, Network Security, Decision tree Behavior based IDS or anomaly based may have the I. INTRODUCTION ability to detect new unseen attacks but have the problem of low detection accuracy [7]. Another An intrusion detection system monitors the approach for detecting intrusions is to consider bothactivities of a given environment and decides whether the normal and the known anomalous patterns forthese activities are malicious (intrusive) or legitimate training a system and then performing classification on(normal) based on system integrity, confidentiality and the test data. Such a system incorporates thethe availability of information resources. Intrusion advantages of both the signature-based and thedetection as defined by the Sysadmin, Audit, anomaly-based systems and is known as the HybridNetworking, and Security (SANS) institute is the act of System.detecting actions that attempt to compromise theconfidentiality, integrity or availability of a resource Hybrid systems can be very efficient, subject to the[1]. Detecting intrusions in networks and applications classification method used, and can also be used tohas become one of the most critical tasks to prevent label unseen or new instances as they assign one of thetheir misuse by attackers. The cost involved in known classes to every test instance. This is possibleprotecting these valuable resources is often negligible because during training the system learns features fromwhen compared with the actual cost of a successful all the classes. The only concern with the hybridintrusion, which strengthens the need to develop more method is the availability of labeled data. Further, apowerful intrusion detection systems. single system has limited attack detection coverage and it cannot detect a wide variety of attacks reliably. There are two types of IDS depending on theirmode of deployment and data used for analysis. www.ijorcs.org
    • 32 Deepa Guleria, M.K.ChavanWe introduce hybrid intrusion detection systems based involve computing a distance between numericon conditional random fields which can detect a wide features and therefore they cannot easily deal withvariety of attacks and which result in very few false symbolic attributes, resulting in inaccuracy. Addition,alarms. To improve the efficiency of the system, we clustering methods consider the features independentlythen integrate the layered framework. and are unable to capture the relationship between different features of a single record which results in II. APPROACHES TO IMPLEMENT IDS lower accuracy [9]. Intrusion detection has been an active field of Data Mining: Data mining (DM), also calledresearch for starting in 1980s after the influential paper Knowledge-Discovery and Data Mining, is the processfrom Anderson [7]. Several researchers have proposed of automatically searching large volumes of data forvarious intrusion detection methods and frameworks patterns using association rules. Data miningwhich are available to protect a computer system or approaches derive association rules and frequentnetwork from attacks. Various techniques such as episodes from available sample data, not from humanassociation rules, clustering, Naïve Bayes, Support experts. Using these rules, Lee et. al. developed a dataVector Machines, Neural Networks, and others have mining framework for the purpose of intrusionbeen developed to detect intrusions. This section detection[8].In particular, system usage behaviors areprovides a brief literature review on these technologies recorded and analyzed to generate rules which canand related frameworks. These methods can be broadly recognize misuse attacks. The drawback of suchdivided into three major categories: frameworks is that they tend to produce a large number of rules and thereby, increase the complexity of theA. Pattern Matching system. Pattern Matching is the simple type of attack Bayesian Classifiers: A Bayesian network is a modeldetection technique. It has the simple concept of string that encodes probabilistic relationships amongmatching. Using pattern matching technique, IDSs variables of interest. This technique is generally usedgenerally match the text (audit records) or binary for intrusion detection in combination with statisticalsequences against known attack signatures. A pattern schemes, a procedure that yields several advantages,matching technique basically looks for a specific including the capability of encoding interdependenciesattack signature which may be presented in audit between variables and of predicting events, as well asrecord. The limitation of pattern matching approach is the ability to incorporate both prior knowledge andthat it can recognize only known attacks. It requires data. However, a serious disadvantage of usingcontinuous updates of attack signatures to identify new Bayesian networks is that their results are similar toattacks. Pattern matching approach is well suited for those derived from threshold-based systems, whilemisuse detection. Snort system is based upon pattern considerably higher computational effort is required.matching. Decision Trees: Decision trees are one of the mostB. Statistical Methods commonly used supervised learning algorithms in IDS due to its simplicity, high detection accuracy and fast Statistical modeling is among the earliest methods adaptation. Decision trees used for intrusion detectionused for detecting intrusions in electronic information select the best features for each decision node duringsystems. It is assumed that an intruder’s behavior is tree construction based on some well-defined criterianoticeably different from that of a normal user, and [11]. One such criterion is the gain ratio which is usedstatistical models are used to aggregate the user’s in C4.5.behavior and distinguish an attacker from a normaluser. The techniques are applicable to other subjects, A decision tree is composed of three basic elements:such as user groups and programs. Two statistical 1. A decision node specifying a test attributes.models that have been proposed for anomaly detection: 2. An edge or a branch corresponding to the one ofNIDES/STAT and Haystack. the possible attribute values which means one of the test attribute outcomes.C. Data Mining and Machine Learning 3. A leaf which is also named an answer node Data mining and machine learning methods focus contains the class to which the object belongs.on analyzing the properties of the audit patterns ratherthan identifying the process which generated them. Artificial Neural Networks: Neural networks areThese methods include approaches for mining known for good performance in learning system-callassociation rules, classification and cluster analysis. sequences. Once the neural net is trained on a set of representative command sequences of a user, the net constitutes the profile of the user, and theClustering: For unsupervised intrusion detection, data fraction of incorrectly predicted events thenclustering methods can be applied. These methods measures, in some sense, the variance of the user www.ijorcs.org
    • A Study and Comparative Analysis of Conditional Random Fields for Intrusion Detection 33behavior from his profile. They can work effectively specific requirements of the environment where itwith noisy data but they require large amount of data is deployed.for training and it is often hard to select the bestpossible architecture for the neural network [12]. IV. CONDITIONAL RANDOM FIELDSSupport Vector Machines: Support vector map real A. Conditional Probabilityvalued input feature vector to higher dimensional Conditional probability is used to computefeature space through nonlinear mapping and have probability of an event Y given some other event X.been used for detecting intrusions. They can provide 𝑃(𝑋 ∩ 𝑌) Formally it is defined as: 𝑃(𝑌 | 𝑋) =real-time attack detection capability, deal with large 𝑃(𝑋)dimensionality of data and perform multi classclassification. Similar to the pattern matching andstatistical methods, these methods assumeindependence among consecutive events and hence do Where P(X) > 0. From this definition we can read thatnot consider the order of occurrence of events for if the occurrence of the event X takes place in the sameattack detection [17]. space as the event Y, and there are no other events that may act the occurrence of the event Y, then theMarkov Models: Markov chains and hidden Markov conditional probability of the occurrence of the eventmodel is a set of states that are interconnected through Y given the event X is the relative proportion ofcertain transition probabilities, which determine the outcomes that satisfy Y among those that satisfy X.topology and the capabilities of the model. During afirst training phase, the probabilities associated to the B. Conditional Random Field Frameworktransitions are estimated from the normal behavior of Conditional random fields [15] (CRFs) are athe target system. The detection of anomalies is then probabilistic framework for labeling and segmentingcarried out by comparing the anomaly score sequential data, based on the conditional approach(associated probability) obtained for the observed described in the previous paragraph. A CRF is a formsequences with a fixed threshold. Markov chains and of undirected graphical model that defines a single log-hidden Markov models can be used when dealing with linear distribution over label sequences given asequential representation of audit patterns. Hidden particular observation sequence. The primaryMarkov models have been shown to be effective in advantage of CRFs over hidden Markov models ismodeling sequences of system calls of a privileged their conditional nature, resulting in the relaxation ofprocess, which can be used to detect anomalous traces[13] the independence assumptions required by HMMs in . However, modeling system calls alone may not order to ensure tractable inference. Additionally, CRFsalways provide accurate classification as various avoid the label bias problem [14], a weaknessconnection level features are ignored. Further, hidden exhibited by maximum entropy Markov models [16]Markov models cannot model long range dependencies (MEMMs) and other conditional Markov modelsbetween the observations. based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-worldIII. CHALLENGES AND REQUIREMENT FOR sequence labeling tasks. INTRUSION DETECTION SYSTEM It is important intrusion detection must detect CRF was firstly proposed by Lafferty and hisattacks at an early stage in order to minimize their colleagues in 2001, [15] whose model idea mainlyimpact. The major challenges and requirements for came from MEMM (Maximum Entropy Markovbuilding intrusion detection systems are: Model).The critical difference between CRFs and MEMMs is that a MEMM uses per-state exponential i. The system must be able to detect as many models for the conditional probabilities of next states attacks as possible without giving false alarms i.e given the current state, while a CRF has a single the system must be accurate in detecting attacks. exponential model for the joint probability of the entire ii. The system must be able to handle large amount sequence of labels given the observation sequence. of data without affecting performance and Therefore, the weights of different features at different without dropping data. states can be traded off against each other. Conditionaliii. A system must not only detect an attack, but also models are probabilistic systems that are used to model able to identify the type of attack. the conditional distribution over a set of randomiv. A system must be resistant to attacks since, a variables. Such models have been extensively used in system that can be exploited during an attack may the natural language processing tasks. Conditional not be able to detect attacks reliably. models offer a better framework as they do not make v. The challenge is to build a system which is any unwarranted assumptions on the observations and scalable and can be easily customized as per the can be used to model rich overlapping features among the visible observations [6]. www.ijorcs.org
    • 34 Deepa Guleria, M.K.Chavan Lafferty, McCallum and Pereira define a CRF on distribution, using the Bayes rule, requires marginalobservations and random variables as follows: distribution p(x) which is difficult to estimate as the amount of training data is limited and the observation x Let X be the random variable over data sequence to contains highly dependent features. As a result strong be labeled and Y the corresponding label sequence. independence assumptions are made to reduce In addition, let G = (V , E ) be a graph such that Y is complexity. This results in reduced accuracy. indexed by the vertices of G . Then, ( X , Y ) is a attack attack attack attack attack CRF, when conditioned on X , the random variables   v obey the Markov property with respect to the Y graph: p (Yv | X , Yw, w ≠ v ) =Yv | X , Yw, w  v ) p( where w ~ v means that w and v are neighbors in G , duratio protocol n=0 service= flag= src_byt i.e., a CRF is a random field globally conditioned on =icmp echo_i SF e= 8 X . For a simple sequence (or chain) modeling, as in our case, the joint distribution over the label (a) Attack event sequence Y given X has the following form:  pθ ( y x ) α exp  ∑ λ kfk ( e, y e , x ) + ∑ µ kgk ( v, y v , x )  , (1) normal normal normal normal normal  e∈E ,k v∈V ,k where x is the data sequence, y is a label sequence,and Y S is the set of components of y associated withthe vertices or edges in subgraph S. In addition, thefeatures fk and gk are assumed to be given and fixed. duratio protocol service n=0 flag= src_byte = tcp = smtpFurther, the parameter estimation problem is to find SF = 4854the parameters θ = ( λ 1, λ 2,....; µ 1, µ 2....) from the (b) Normal event D = ( x , yi ) Ntraining data with the empirical i =1 Figure 2: Conditional Random Fields for Networkdistribution p ( y, x ) Intrusion Detection In the figure 2, observation features ‘duration’, labels y1 y2 y3 y4 ‘protocol’, ‘service’, ‘flag’ and ‘source bytes’ are used to discriminate between (att) attack and (nor) normal events. The features take some possible value for every connection which are then used to determine the most likely sequence of labels < attack, attack, attack, attack, attack > or < normal, normal, normal, normal, Observations x1 normal >. During training, feature weights are learnt x2 x3 x4 and during testing, features are evaluated for the given Figure 1: Graphical Representation of a CRF observation which is then labeled accordingly. It is evident from the figure that every input feature is The graphical structure of a conditional random connected to every label which indicates that all thefield is represented in Figure1 where x1, x2, x3, x4 features in an observation determine the final labelingrepresents an observed sequence of length four and of the entire sequence. Thus, a conditional randomevery event in the sequence is correspondingly labeled field can model dependencies among different featuresas y1, y2, y3, y4.The prime advantage of conditional in an observation. Present intrusion detection systemsrandom fields is that they are discriminative models do not consider such relationships.which directly model the conditional distribution p ( y | x ) .Generative models such as the Markov chains, The task of intrusion detection can be compared to many problems in machine learning, natural languagehidden Markov models and joint distribution have two processing, and bioinformatics. The CRFs have provendisadvantages. First, the joint distribution is not to be very successful in such tasks, as they do notrequired since the observations are completely visible make any unwarranted assumptions about the data.and the interest is in finding the correct class which is Hence, we explore the suitability of CRFs for buildingthe conditional distribution p ( y | x ) .Second, inferring efficient and accurate intrusion detection.conditional probability p ( y | x ) from the joint www.ijorcs.org
    • A Study and Comparative Analysis of Conditional Random Fields for Intrusion Detection 35C. Inference in CRF D. Detecting network intrusions using layered Approach For general graphs, the problem of exact inferencein CRFs is intractable. However there exist special Researchers are motivated to propose differentcases for which exact inference is feasible: approaches seeing the low detection rates caused by the imbalanced network intrusion dataset. Current • If the graph is a chain or a tree, message passing research work proposes a staged or layered approach algorithms yield exact solutions. The algorithms to detect network intrusions efficiently. The recent used in these cases are analogous to the forward- research work of Gupta and Nath [6], considered the backward and Viterbi algorithm for the case of attack categories as layers and different features were HMMs. selected for each layer. The dataset was, therefore, • If the CRF only contains pair-wise potentials and divided into five attack categories for training and the energy is submodular, combinatorial min testing purposes of each layer. The test data passed cut/max flow algorithms yield exact solutions. through the cascaded layers to determine the category All Features Yes Yes Probe Layer DoS Layer Normal Normal Feature Selection Feature Selection No No Block Block Yes Yes R2L Layer U2R Layer Normal Normal Allow Feature Selection Feature Selection No No Block Block Figure 3: Integrating the Layered Framework a record that belonged to Conditional Random Fields V. EXPERIMENTAL METHODOLOGY (CRFs) were used in the layered approach as proposed by the researcher [6]. The three layer system to ensure The Data Set complete security viz. availability, confidentiality and integrity, each layer corresponding to one aspect of The data set used for the entire course of research is security. In the system, every layer is trained the DARPA KDD99 benchmark data set [4], also separately with the normal instances and with the known as “DARPA Intrusion Detection Evaluation attack instances belonging to a single attack class. data set” that not only includes a large quantity of Here the features involved were different in each layer. network traffic but also collects a wide variety of Explanation of which features should be used or not be attacks. They setup an environment to collect TCP/IP used was given in the paper. However, the complete dump from a host located on a simulated military feature list for each layer was not presented in the network. Each TCP/IP connection is described by 41 paper. The above staged and layered approaches used discrete and continuous features and labeled as either classifiers of the same type or of different types for the normal or as an attack. Attacks fall into following detecting network intrusions. The approaches handled four main classes: the attacks separately to minimize the attack categories from affecting each other in classification or detection A. Denial of service (DOS) tasks. Since every layer in Layered framework is In this type of attack an attacker makes some independent, feature sets for all the four layers are not computing or memory resources too busy or too full to disjoint. The final goal is to improve both the attack handle legitimate requests or denies legitimate users detection accuracy and the efficiency of the system. access to a machine. Examples are Apache2, Back, Hence, by integrating the CRFs and the Layered Land, Smurf, Teardrop. Approach can build efficient and accurate single system. www.ijorcs.org
    • 36 Deepa Guleria, M.K.ChavanB. Remote to user (R2L) difficult to choose a particular method to implement an intrusion detection system over the other. This paper In this type of attack an attacker who does not have has drawn the conclusions on the basis ofan account on a remote machine sends packets to that implementations performed using various techniques.machine over a network and exploits some New techniques keep emerging which will remove thevulnerability to gain local access as a user of that drawbacks of the previous methods of implementation.machine. Examples are Dictionary, Ftp_write, Guest, In this paper, a new efficient and robust hybridImap, Named. intrusion detection systems using conditional randomC. User to root (U2R) field was discussed. The CRFs are very effective in improving the attack detection rate and decreasing the In this type of attacks an attacker starts out with FAR. Feature selection and implementing the Layeredaccess to a normal user account on the system and is framework significantly reduce the time required toable to exploit system vulnerabilities to gain root train and test the model. The sequence labelingaccess to the system. Examples are Eject, Loadmo methods such as the CRFs can be very effective indule, Ps, Xterm, Perl. detecting attacks and decreasing the false alarm rate. Compared approach with some well-known methodsD. Probing and found that most of the present methods for In this type of attacks an attacker scans a network intrusion detection fail to reliably detect R2L and U2Rof computers to gather information or find known attacks, while integrated system can effectively andvulnerabilities. Examples are Ipsweep, Mscan, Satan, efficiently detect such attacks Finally, system has theNmap. advantage that the number of layers can be increased VI. CONCLUSION or decreased depending upon the environment in which the system is deployed, giving flexibility to the Thus we conclude that there are various approaches network administrators. The areas for future researchand techniques to implement an intrusion detection include the use of Layered CRF method for extractingsystem based on its type and mode of deployment. features that can aid in the development of signaturesEach of the approaches to implement an intrusion for signature-based systems. This can further bedetection system has its own advantages and extended to implement pipelining of layers indisadvantages. This is apparent from the discussion of multicore processors, which is likely to result in verycomparison among the various methods. Thus it is high performance.Techniques Method Parameters Advantages Disadvantages A support vector The effectiveness of SVM 1. Able to model complex 1. High algorithmic machine constructs a lies in the selection of nonlinear decision complexity and hyper plane or set of kernel and soft margin boundaries. extensive memory hyper planes in a high parameters. For kernels, 2. Highly accurate. requirements of the or infinite dimensional different pairs of (C, γ) 3. Provide real-time required quadratic Support space, which can be values are tried and the one detection capability programming in Vector used for classification, with the best cross- 4. Deal with large large-scale tasks. Machine regression or other validation accuracy is dimensionality of 2. The choice of the tasks. picked. Trying data. kernel is difficult exponentially growing 5. Can be used for binary-class 3. The speed is slow sequences of C is a as well as multiclass both in training and practical method to classification. testing. identify good parameters. The cluster with the Convert d based on the Can work in near linear time. 1. Observation must be shortest distance is statistical information of numeric. selected, and if that the training set from which 2. Consider the features distance is less than the clusters were created. independently and are some constant W 1. Let d1 be the instance unable to capture the (cluster width) then after conversion. relationship between the instance is 2. Find a cluster which is different features of aClustering assigned to that closest to d1 under the single record which cluster. metric M (i.e. a cluster in results in lower the cluster set, such that accuracy. for all C1 in S, dist (C, d1) <= dist (C1, d1). 3. Classify d1 according to the label of C(either Normal or anomalous. www.ijorcs.org
    • A Study and Comparative Analysis of Conditional Random Fields for Intrusion Detection 37 An ANN is an ANN uses the cost function 1. Able to implicitly detect 1. Greater computational adaptive system that C is an important concept complex nonlinear burden. changes its structure in learning, as it is a relationships between 2. Requires long trainingArtificial based on external or measure of how far away a dependent and independent time. Neural internal information particular solution is from variables. 3. Hard to select the bestNetwork that flows through the an optimal solution to the 2. High tolerance to noisy possible architecture network during the problem to be solved. data. for a neural network. learning phase. 3. Availability of multiple 4. Require large amount training algorithms. of data for training. Based on the rule, In Bayes, all model 1. Exhibit high accuracy and 1. Make strict using the joint parameters (i.e., class speed when applied to independence between probabilities of sample priors and feature large databases. the features in observations and probability distributions) 2. Capability of encoding observations results classes, the algorithm can be approximated with interdependences between lower attack detection attempts to estimate relative frequencies from variable and of predicating accuracy. the conditional the training set. events. 2. Lack of availableBayesian probabilities of classes 3. Abiltity to incorporate probability data.Method given an observation. both prior knowledge and 3. A fully connected data. Bayesian network is complex and difficult to train. 4. 4.Higher computational effort is required. Decision tree builds a Decision Tree Induction 1. Construction does not 1. Output attribute must binary classification uses parameters like a set require any domain be categorical. tree. Each node of candidate attributes and knowledge. 2. Limited to one output corresponds to a an attribute selection 2. Can handle high attribute. binary predicate on method. dimensional data. 3. Decision tree one attribute; one 3. Representation is easy to algorithms areDecision branch corresponds to understand. unstable. Tree the positive instances 4. Able to process both 4. Trees created from of the predicate and numerical and categorical numeric datasets can the other to the data. be complex. negative instances. 5. High speed of operation and high attack detection accuracy. Markov chains and During a first training 1. Modeling the ordering 1. May not always hidden Markov phase, the probabilities property of events results provide accurate models can be used associated to the transitions higher detection accuracy. classification as when dealing with are estimated from the 2. Effective in modeling various connection sequential normal behavior of the sequences of system calls level features are representation of audit target system. The of a privileged process ignored Hidden patterns detection of anomalies is 2. HMMs become very Markov then carried out by complex for long Models comparing the anomaly range dependencies score (associated in observations. probability) obtained for 3. Results inaccuracy as the observed sequences the correlation among with a fixed threshold features is lost. Conditional Random • For training: 1. CRF do not assume 1. Computational Fields are – Forward Backward observation features to be expense of training. discriminative and algorithm is used which independent 2. Complete list of Layered undirected graphical has a complexity of 2. Not prohibitively expensive features for each levelConditional models which are used O(K2T), where K is the in testing. is not available. Random for sequence tagging. number of states and T is 3. CRF training is feasible for Fields They do not make any the length of the many real-world. unwarranted sequence 4. Integrated system (CRF & assumptions about the Layered) achieves data. www.ijorcs.org
    • 38 Deepa Guleria, M.K.Chavan • For testing: significant improvement, – Viterbi algorithm is used both, in the time required to which also has the same train and test the system and complexity also in the attack detection accuracy (F-value). 5. CRFs are robust to noise in training data. 6. CRFs avoid the label bias problem. 7. CRFs avoid a fundamental limitation of maximum entropy Markov models (MEMMs). VII. REFERENCES [13] Y. Du, H. Wang, and Y. Pang, “A Hidden Markov Models-Based Anomaly Intrusion Detection Method,”[1] SANS Institute—Intrusion Detection FAQ, Proc. Fifth World Congress on Intelligent Control and http://www.sans.org/ resources/idfaq/, 2010. Automation (WCICA ’04), vol. 5, pp. 4348-4351,[2] Autonomous Agents for Intrusion Detection, 2004. http://www.cerias.purdue.edu/research/aafid/, 2010. [14] A. McCallum, “Efficiently Inducing Features of[3] CRF++: Yet Another CRF Toolkit, Conditional Random Fields,” Proc. 19th Ann. Conf. http://crfpp.sourceforge.net/,2010. Uncertainty in Artificial Intelligence (UAI ’03), pp. 403-410, 2003.[4] KDD Cup 1999 Intrusion Detection Data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.ht [15] J. Lafferty, A. McCallum, and F. Pereira, “Conditional ml, 2010. Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proc. 18th Int’l Conf.[5] Overview of Attack Trends, Machine Learning (ICML ’01), pp. 282-289, 2001. http://www.cert.org/archive/pdf/ attack_trends.pdf, 2002. [16] A. McCallum, D. Freitag, and F. Pereira, “Maximum Entropy Markov Models for Information Extraction and[6] Kapil Kumar Gupta, Baikunth Nath, Ramamohanarao Segmentation,” Proc. 17th Int’l Conf. Machine Kotagiri, "Layered Approach Using Conditional Learning (ICML ’00), pp. 591-598,2000. Random Fields for Intrusion Detection," IEEE Transactions on Dependable and Secure Computing [17] D.S. Kim and J.S. Park, “Network-Based Intrusion (vol. 7 no. 1), pp. 3 5-49, 2010. Detection with Support Vector Machines,” Proc. Information Networking, networking Technologies for[7] J.P. Anderson, Computer Security Threat Monitoring Enhanced Internet Services Int’l Conf. (ICOIN ’03),pp. and Surveillance, 747-756, 2003. http://csrc.nist.gov/publications/history/ande80.pdf, 2010. [18] C. Sutton and A. McCallum, “An Introduction to Conditional Random Fields for Relational Learning,”[8] W. Lee and S. Stolfo, “Data Mining Approaches for Introduction to Statistical Relational Learning, 2006. Intrusion Detection,” Proc. Seventh USENIX Security Symp. (Security ’98), pp. 79-94, 1998.[9] H. Shah, J. Undercoffer, and A. Joshi, “Fuzzy Clustering for Intrusion Detection,” Proc. 12th IEEE Int’l Conf. Fuzzy Systems (FUZZ-IEEE ’03), vol. 2, pp. 1274-1278, 2003.[10] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur, “Bayesian Event Classification for Intrusion Detection,” Proc. 19th Ann. Computer Security Applications Conf. (ACSAC ’03), pp. 14-23, 2003.[11] N.B. Amor, S. Benferhat, and Z. Elouedi, “Naive Bayes vs. Decision Trees in Intrusion Detection Systems,” Proc. ACM Symp. Applied Computing (SAC ’04), pp. 420-424, 2004.[2] W. Lee and S. Stolfo, “Data Mining Approaches for Intrusion Detection,” Proc. Seventh USENIX Security Symp. (Security ’98), pp. 79-94, 1998.[12] H. Debar, M. Becke, and D. Siboni, “A Neural Network Component for an Intrusion Detection System,” Proc. IEEE Symp. Research in Security and Privacy (RSP ’92), pp. 240- 250, 1992. www.ijorcs.org