1. INTRODUCTION:.doc.doc


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

1. INTRODUCTION:.doc.doc

  1. 1. Survey Reports on Four Selected Research Papers on Data Mining Based Intrusion Detection System Authors: Zillur Rahman, S S Ahmedur Rahman, Lawangeen Khan Abstract--- Security is a big issue now a days alarm when a possible intrusion occurs in the and a lot of effort and finance is being invested system. into this sector. Improving Network Intrusion Detection is one of the most prominent field in Many IDS can be described with three this area. Applying data mining into network fundamental functional components – Information intrusion detection is not a very old concept Source, Analysis, and Response. Different sources of information and events based on information and now it has been proved that data mining can automate the network intrusion detection are gathered to decide whether intrusion has taken field with a greater efficiency. This paper tries place. This information is gathered at various to give a details view on four papers in this levels like system, host, application, etc. Based on area. Three of them are very current papers analysis of this data, they can detect the intrusion based on two common practices – Misuse from 2005 and one is from 2001. The paper from 2001 discusses about ADAM, which was detection and Anomaly detection. one of the leading data mining based intrusion detection system. One of other three papers The main motivation behind using intrusion discusses about various types of data mining detection in data mining is automation. Pattern of the normal behavior and pattern of the intrusion based intrusion detection system models and tries to draw the advantages of data mining can be computed using data mining. To apply data based network intrusion detection systems over mining techniques in intrusion detection, first, the misuse detection systems. The other two papers collected monitoring data needs to be discusses about two different models: one preprocessed and converted to the format suitable for mining processing. Next, the reformatted data discusses packet-based vs. session based modeling for intrusion detection and other will be used to develop a clustering or paper presented a model named SCAN, which classification model. Data mining techniques can can work with good accuracy even with be applied to gain insightful knowledge of incomplete dataset. This paper is based on intrusion prevention mechanisms. They can help detect new vulnerabilities and intrusions, discover these four papers and we hope that this paper can give the reader a good insight view into this previous unknown patterns of attacker behaviors, promising area. and provide decision support for intrusion management. They also identify several research issues such as performance, delay and many 1. INTRODUCTION: others that are involved in incorporating Data Mining in Intrusion Detection Systems. Intrusion Detection System (IDS) is an important detection used as a countermeasure to One of the most pervasive issues in the computing preserve data integrity and system availability world today is the problem of security – how to from attacks. IDS is a combination of software protect their networks and systems from and hardware that attempts to perform intrusion intrusions, disruptions, and other malicious detection. Intrusion detection is a process of activities from unwanted attackers. However in gathering intrusion related knowledge occurring the paper [1] they report the findings of their in the process of monitoring the events and research in the area of anomaly-based intrusion analyzing them for sign or intrusion. It raises detection systems using data-mining techniques to create a decision tree model of their network
  2. 2. using the 1999 DARPA Intrusion Detection Clustering: Evaluation data set. Clustering is the process of labeling data and assigning it into groups. Clustering algorithms can In the paper [3] authors have described some group new data instances into similar groups. existing ID systems in the first part and in the These groups can be used to increase the second part they have shown how data mining can performance of existing classifiers. High quality be useful in this field. To do so, they have clusters can also assist human expert with introduced ADAM (Audit Data Analysis and labeling. Clustering techniques can be categorized Mining). At the first section they have identified into the following classes: two kinds of ID systems currently in use. One 1) Pair-wise clustering. kind is signature-based IDS (example includes P- 2) Central clustering. BEST, USTAT and NSTAT). Another kind of Pair-wise clustering (similarity based clustering) IDS is statistics and data mining-based IDS unifies similar data instances based on a data pair- (example includes IDES, NIDES and EMERALD, wise distance measure. On the other hand, central Haystack and JAM). As the main part of this clustering, also called centroid-based or model- paper is the ADAM, so I‘m not describing the based clustering, models each cluster by its other IDS in this summary part but will be ―centroid‖. In terms of runtime complexity, explaining their design and experiences with centroid-based clustering algorithms are more ADAM. efficient than similarity-based clustering algorithms. Clustering discovers complex Currently most of the systems deployed assume intrusions occurred over extended periods of time the availability of complete and clean data for the and different spaces, correlating independent purpose of intrusion detection. In this paper [4] network events. It is used to detect hybrids of the authors have showed that this assumption is attack in the cluster. Clustering is an unsupervised not valid and they presented SCAN (Stochastic machine learning mechanism for finding patterns Clustering Algorithm for Network Anomaly in unlabeled data with many dimensions. Most of detection), that has the ability to detect intrusions the clustering techniques discussed above are the with high accuracy even when audit data is not basic steps involved in identifying intrusion. complete. They have used Expectation- These steps are as follows: Maximization Algorithm to cluster the incoming Find the largest cluster, i.e., the one with audit data and compute the missing values of audit the most number of instances, and label it data. They have validated their test using the 1999 normal. DARPA/ Lincoln Laboratory intrusion detection Sort the remaining clusters in an evaluation dataset. ascending order of their distances to the largest cluster. 2. EXPLOITING EFFICIENT DATA Select the first K1 clusters so that the MINING TECHNIQUES TO ENHANCE number of data instances in these clusters INTRUSION DETECTION SYSTEMS sum up to ¼ ´N, and label them as normal, where ´ is the percentage of Data Mining and Intrusion Detection: normal instances. Data Mining is assisting various applications for Label all the other clusters as attacks. required data analysis. Recently, data mining is becoming an important component in intrusion Classification: detection system. Different data mining Classification is similar to clustering in that it also approaches like classification, clustering, partitions customer records into distinct segments association rule, and outlier detection are called classes. But unlike clustering, classification frequently used to analyze network data to gain analysis requires that the end-user/analyst know intrusion related knowledge. ahead of time how classes are defined. It is necessary that each record in the dataset used to build the classifier already have a value for the
  3. 3. attribute used to define classes. As each record Association Rule: has a value for the attribute used to define the The Association rule is specifically designed for classes, and because the end-user decides on the use in data analyses. The association rule attribute to use, classification is much less considers each attribute/value pair as an item. An exploratory than clustering. The objective of a item set is a combination of items in a single classifier is not to explore the data to discover network request. The algorithm scans through the interesting segments, but to decide how new dataset trying to find item sets that tend to appear records should be classified. Classifications in many network data. The objective behind using algorithms can be classified into three types: association rule based data mining is to derive Extensions to linear discrimination (e.g., multi-feature (attribute) correlations from a multilayer perceptron, logistic database table. A typical and widely-used discrimination). example of association rule mining is Market Decision tree. Basket Analysis. Association rules provide Rule-based methods. information of this type in the form of "if-then" As compared to the clustering technique, statements. In association analysis the antecedent classification technique is less popular in the and consequent are sets of items (called Item sets) domain of intrusion detection. The main reason that are disjoint. The first number is called the for this phenomenon is the large amount of data support for the rule. The support is simply the needed to be collected to apply classification. To number of transactions that include all items in the build the traces and form the normal and abnormal antecedent and consequent parts of the rule. The groups, significant amount of data need to be other number is known as the confidence of the analyzed to ensure its proximity. Using the rule. Many association rule algorithms have been collected data as empirical models, false alarm developed in the last decades, which can be rate in such case is significantly lower when classified into two categories: compared to clustering. Classification approach candidate-generation-and-test approach can be useful for both misuse detection and such as Apriori. anomaly detection, but it is more commonly used pattern-growth approach. for misuse detection. The challenging issues of association rule algorithms are multiple scans of Outlier Detection: transaction databases and a large number Outlier detection has many applications, such as of candidates. Apriori was the first data cleaning, fraud detection and network scalable algorithm designed for intrusion. The existence of outliers indicates that association-rule mining algorithm. individuals or groups that have very different Basic steps for incorporating association rule for behavior from most of the individuals of the intrusion detection as follows. dataset. Many times, outliers are removed to First network data need to be formatted improve accuracy of the estimators. into a database table where each row is an Most anomaly detection algorithms require a set audit record and each column is a field of of purely normal data to train the model. the audit records. Commonly used outlier techniques in intrusion There is evidence that intrusions and user detection are Mahalanobis distance, detection of activities shows frequent correlations outliers using Partitioning around medias (PAM), among network data. and Bay‘s algorithm for distance-based outliers. Also rules based on network data can Outliers detection is very useful in anomaly based continuously merge the rules from a new run to intrusion detection. Outlier detection approaches the aggregate rule set of all previous runs. can useful for detecting any unknown attacks. This is the primary reason that makes outlier Data Mining Based IDS: detection a popular approach for intrusion Besides expert systems, state transition analysis, detection systems. and statistical analysis, data mining is becoming one of the popular techniques for detecting
  4. 4. intrusion. They explore several such available IDS are capable for detecting both known and which use data mining technique for intrusion unknown intrusions. detection. IDS can be classified on the basis of their strategy of detection. There are two IIDS (Intelligent Intrusion Detection System categories under this classification: Misuse Architecture): detection and Anomaly detection. It is distributed and network-based modular architecture to monitor activities across the whole Misuse Detection Based IDS: network. It is based on both anomaly and misuse Misuse detection searches for known patterns of detection. In IIDS multiple sensors, both anomaly attack. One disadvantage of this strategy is that it and misuse detection sensors serving as experts, can only detect intrusions which are based on monitor activities on individual workstations, known patterns. Example misuse detection activities of users, and network traffic by systems that use data mining include JAM (Java detecting intrusion from and within network Agents for Meta learning), MADAM ID (Mining traffic and at individual client levels. Currently, Audit Data for Automated Models for Intrusion the IIDS architecture runs in a high speed cluster Detection), and Automated Discovery of Concise environment. Predictive Rules for Intrusion Detection. JAM (developed at Columbia University) uses data RIDS-100(Rising Intrusion Detection System): mining techniques to discover patterns of RIDS make the use of both intrusion detection intrusions. It then applies a meta-learning technique, misuse and anomaly detection. classifier to learn the signature of attacks. The Distance based outlier detection algorithm is used association rules algorithm determines for detection deviational behavior among relationships between fields in the audit trail collected network data. For misuse detection, it records, and the frequent episodes algorithm has very vast set of collected data pattern which models sequential patterns of audit events. can be matched with scanned network data for Features are then extracted from both algorithms misuse detection. This large amount of data and used to compute models of intrusion behavior. pattern is scanned using data mining classification The classifiers build the signature of attacks. So Decision Tree algorithm. thus, data mining in JAM builds misuse detection model. 3. PACKET VS SESSION BASED Anomaly Detection Based IDS: MODELING FOR INTRUSION DETECTION Misuse detection can not detect the attack which SYSTEM signatures have not been defined. Anomaly detection addresses this shortcoming. In anomaly In this survey they report the findings of their detection, the system defines the expected research in the area of anomaly-based intrusion network behavior in advance. Any significant detection systems using data-mining techniques to deviations from the problem are then reported as create a decision tree model of their network using possible attacks. Such deviations are not the 1999 DARPA Intrusion Detection Evaluation necessarily actual attacks. The Data mining data set. technique are used to extract implicit, unknown, and useful information from data. Applications of Misuse Detectors: data mining to anomaly detection include ADAM Misuse detectors are similar to modern anti-virus (Audit Data Analysis and Mining), IDDM systems where an attack‘s signature is stored in (Intrusion Detection using Data Mining), and the IDS for later use if needed. EBays. Anomaly Detector: IDS Using both Misuse and Anomaly Detection: Anomaly detectors determine what is ―normal‖ in Following are the IDS‘s that use both misuse and a particular system or network and if an abnormal anomaly intrusion detection techniques. Thus they or anomalous event occurs, then it is flagged for further inspection.
  5. 5. The technique used is classification-tree individual rows represent network packets while techniques to accurately predict probable attack the columns represent the associated variables sessions as well as probable attack packets. found in that particular network packet Overall, the goal is to model network traffic into network sessions and packets and identify those Data Preparation: instances that have a high-probability of being an They studied the data sets‘ patterns and modeled attack and can be labeled as a ―suspect session‖ or the traffic patterns around a generated target ―suspect packet.‖ Then, they compare and contrast variable, ―TGT‖. This variable is a Boolean value the two techniques at the end to see which (true/false) where a true value is returned for TGT methodology is more accurate and suitable to their if the packet is a known attack packet as needs. The ultimate goal would be to create a designated by the studies done in the Lincoln layered system of network filters at the LAN Labs‘ tests. They used this variable as a predictor gateway so all incoming and outgoing traffic must target variable for setting the stage when they pass through these signature- and anomaly-based applied a new, separate data set to the model and filters before being allowed to successfully pass scored the new data set against this model. through the gateway. Eventually, they made two separate data sets – Problem with IDS: one for the packet-based modeling and the other (1). False Negative. for the session based modeling – and used the (2). False Positive. models in separate decision trees to see the results attained. For the session-based data set, they False Negative: molded it to become ―shorter‖ and ―wider‖ by False negatives are the situation when intrusive adding more variables (columns) and reducing the activities that are not anomalous result in a number of observations (rows) through the use of flagged event. Structure Query Language (SQL) constructs to mesh the associated packets into sessions, thereby False Positive: reducing the size and complexity of the data while False positives occur when anomalous activities maintaining the essential components of the data that are not intrusive are flagged as intrusive. set. Since the victim computer was involved in an attack session, its sending ―TGT‖ value was reset The author contented that it is better to focus on to ―1‖ to describe that it‘s receiving an attack anomaly detection techniques in the IDS field to from an outside source. model normal behavior in order to detect these novel attacks. The authors described two methods Classification Tree Modeling: for learning anomaly detectors: rule learning They used a predictive, classification tree method (LERAD) and clustering (CLAD). Both methods for their model. The advantage with using this present interesting advantages and possibilities for method over traditional pattern-recognition future research. methods is that the classification tree is an intuitive tool and relatively easy to interpret. Dynamic Binding: They started their research towards developing a Data Modeling and Assessment: robust network modeler through the use of data- They strived to accomplish their data-mining mining techniques using the MIT Lincoln Lab goals through the use of supervised learning data sets from their work in 1999. The MIT techniques on the binary target variable, ―TGT‖. researchers created a synthetic network They also used the concept of over-sampling, environment test bed to create their data sets. which allowed us to highlight the infrequent and These data sets – which contained both attack- unusual attacks that occurred during the 1999 filled and attack-free data – allowed us to look at IDEVAL. From the initial data set they over network traffic stored in tcpdump format where sampled the labeled attack sessions and created a the attack types, duration, and times were known new data set. and published. When looking at the data sets, the
  6. 6. From these revised data sets they further deleted a few extraneous variables – date/time, source IP Now let‘s give an example, suppose in library address, and destination IP address variables. All many customers who buy book also buy pencil other header variables were kept, including the leads to an association rule bookpencil. There source and destination port numbers, in order to are also two parameters involved with this: allow the system to model the important facets of support and confidence. The association rule the network better. They then partitioned both bookpencil has 30% support means that 30% data sets using stratified random sampling and customer buy at least one item from book and allocated 67% of the observations (network pencil. The rule has 10% confidence means that sessions) to the training data set and 33% to the 10% customers buy both of them. The most validation data set. The TGT variable was then difficult part of the association rules discovering specified to form subsets of the original data to algorithm is to find the item sets X U Y (in this improve the classification precision of their example book U pencil), that have strong support. model. They created a decision tree using the Chi- Square splitting criteria and modeled the data There are total two phases in this experiment from the IDEVAL. Their decision tree model model. In the first phase they trained the classifier. determined that the number of leaves required to This phase takes place only once offline before maximize the ―profit‖ of the model was around using the system. In the second phase they use the four or five leaves. Their model also shows that trained classifier to detect intrusions. after five leaves of depth, the tree maximizes its profit. Phase 1: 4. ADAM: DETECTING INTRUSIONS BY DATA MINING: ADAM uses a combination of association rules and classification to detect any attack in a TCPdump audit trails. First, ADAM collects normal, known frequent datasets by mining into this model. Secondly, it runs an on-line algorithm to find last frequent connections and compare them with the known mined data and discards those which seem to be normal. With the suspicious ones it then uses a classifier which is In the first step, a database of attack-free normal previously trained to classify the suspicious frequent itemsets, those have a minimum support, connections as known type of attack, unknown are created. This database serves as profile, which type of attack or a false alarm. is compared later against frequent datasets found. The profile database is populated with frequent As this ADAM model uses association rule in itemsets with a specific format for the attack-free audit data to improve classification task, so it will portion of the data. The algorithm used in this step be better if I describe the association rule before can be a conventional association data mining entering into the main design. algorithm although they used a customized algorithm for better speed. So, in the first step Suppose there are two sets of attributes X and Y. they are creating a profile of normal behavior. Association rule uses some notation and let me describe those notations. In the second step they again use the training data, XY: means that Y happens if X happens. the normal profile and an on-line algorithm for X∩Y = Ø: means that X and Y don‘t contain the association rule, whose output consists of frequent same thing itemsets that may be attacks. These suspicious ||Y||: means that Y contains at least one item itemsets along with a set of features extracted
  7. 7. from the data by a feature selection module are is based on an improved version of the used as training data for the classifier which is Expectation–Maximization (EM) algorithm. The based on decision tree. Now let‘s look how the EM algorithm‘s speed of convergence was dynamic on-line association rule algorithm works. accelerated by using a combination of Bloom This algorithm is driven by a sliding window of filters, data summaries and associated arrays. The tunable size. The algorithm outputs the itemsets clustering approach that they have proposed can those have received strong support with the be used to process incomplete and unlabelled data. profile within the specific window size time. They The salient features of SCAN are: compare any datasets with profile database, if there is match then it is normal. Otherwise they Improvements in speed of convergence: We leave a counter that will track the support of the use a combination of data summaries, itemset. If the support exceeds a certain threshold Bloom filters and arrays to accelerate the then the itemset is flagged as suspicious. Then the rate of convergence of the EM algorithm. classifier labels that suspicious dataset as known Ability to detect anomalies in the absence of attack, false alarm or unknown attack. complete audit data: We use SCAN to detect anomalies in network traffic in the Phase 2: absence of complete audit data. The improved EM clustering algorithm—which forms the core of SCAN—is used to compute the missing values in the audit dataset. The advantage of using the EM algorithm is that unlike other imputation techniques, EM computes the maximum likelihood estimates in parametric models based on prior information. System Model and Design: SCAN has three major components: Online Sampling and Flow Aggregation Clustering and Data Reduction Anomaly Detection In this phase our classifier is already trained and Those three parts are described in details below: can categorized any attack as known, unknown or false alarm. Here also the same dynamic on-line algorithm is used to produce suspicious data and along with feature selection module, profile those suspicious data are sent to the trained classifier. Classifier then outputs which kind of attack it is. If it is false alarm then the classifier excludes those from the attack list and don‘t sent those to system monitoring officer. 5. DETECTING DENIAL-OF-SERVICE ATTACKS WITH INCOMPLETE AUDIT DATA SCAN (Stochastic Clustering Algorithm for Figure: SCAN System Model Network anomaly detection) is an unsupervised network anomaly detection scheme. At the core of SCAN is a stochastic clustering algorithm, which
  8. 8. Clustering and Data Reduction: Online Sampling and Flow Aggregation: Clustering is a data mining technique, which can In this module, network traffic is sampled and group similar types of data in same cluster. The classified into flows. In the context of SCAN, a clustering technique in intrusion detection works flow is all the connections with a particular as follows: when incoming data are coming, it is destination IP address and a port number likely that most of them will be normal traffic. So, combination. The measurement and charactera- the cluster that is bigger in size is less likely to be tion of the sampled data proceeds in time slices. an intrusion. The goal of clustering is to divide They processed the sampled packets into flows into natural groups. The instances contained connection records. Because of its compact size as in a cluster are considered to be similar to one compared to other types of records (e.g., packet another according to some metric based on the logs), connection records are appropriate for data underlying domain from which the instances are analysis. Connection records provide the drawn. A stochastic clustering method (such as following fields: the EM algorithm) assigns a data point to each (SrcIP,SrcPort,DstIP,DstPort,ConnStatus,Proto, group with a certain probability. Such statistical Duration) approaches make sense in practical situations where no amount of training data is sufficient to make a completely firm decision about cluster memberships. Stochastic clustering algorithms aim to find the most likely set of clusters given the training data and prior expectations. The underlying model, used in the stochastic clustering algorithm, is called a finite mixture model. A mixture is a set of probability distributions—one for each cluster— that models the attribute values of the members of Figure: Online Sampling and Flow Aggregation that cluster. Algorithm EM Algorithm based Clustering Algorithm: Each field can be used to correlate network traffic Input to the EM algorithm is the Dataset and data and extract intrinsic features of each initial estimates and it outputs the clustered set of connection. They employed a very simple flow points. It starts with an initial estimate or initial aggregation algorithm. Bloom filters and guess for the parameters of the mixture model for associated arrays are used to store, retrieve and each cluster and then iteratively applies a process. run membership queries on incoming flow data in There are two steps: a given time slice. When a packet is sampled, we 1. Expectation step first identify the flow the packet belongs to by 2. Maximization step. executing a membership query in the Bloom filter In the E-step, the EM algorithm first finds the based on the flow ID. If an existing flow ID expected value of the complete-data log- cannot be found, a new flow ID is generated based likelihood, given the observed data and the on the information from the connection record. parameter estimates. This is called the insertion phase. If the flow ID exists, the corresponding flow array is updated. If In the M-step, the EM algorithm maximizes the the flow ID does not exist, we run the flow ID expectation computed in the E-step. through each of the k hash functions (that form a part of the Bloom filter), and use the result as an The two steps of the EM algorithm are repeated offset into the bit vector, turning on the bit we until an increase in the log-likelihood of the data, find at that position. If the bit is already set, they given the current model, is less than the accuracy left it on. threshold ε. The EM algorithm is guaranteed to converge to a local maximum, which may or may
  9. 9. not be the same as the global maximum. Handling Missing Data in a Dataset: Typically, the EM algorithm is executed multiple There are three ways to handle missing data in times with different initial settings for the connection records: parameter values. Then a given data point is Missing Completely At Random (MCAR) assigned to the cluster with the largest log- Missing At Random (MAR) likelihood score. Non-ignorable. The technique of using EM algorithm to handle missing data has been discussed in another paper in details and is out of scope of this paper. However the technique in summary is: In the E step their system computes the expected value of the complete data‘s log likelihood value based on the complete data cases and in M step the system inserts that value into that incomplete dataset. Anomaly Detection: In this model, elements of the network flow that are anomalous are generated from the anomalous distribution (A), and normal traffic elements are generated from the normal distribution (N). Therefore, the process of detecting anomalies involves determining whether the incoming flow Figure: EM Algorithm belongs to distribution A or distribution N. They calculated the likelihood of the two cases to Data Summaries for Data Reduction: determine the distribution to which the incoming After the EM algorithm gives output of clustered flow belongs to. They assumed that for normal set of data points then this step is executed to traffic, there exists a function FN which takes as build summary for each time slice and reduce data parameters the set of normal elements Nt, at time set. Basically this is an enhancement they have t, and outputs a probability distribution PNt over presented in their report. Their enhancement the data T generated by distribution T. Similarly, consists of three stages: they have a function FA for anomalous elements 1. At the end of each time slice they have which takes as parameters the set of anomalous made a pass over the connection dataset elements At and outputs a probability distribution and built data summaries. Data PAt . Assuming that p is the probability with summaries are important in EM which an element xt belongs to N, the likelihood algorithm framework as they reduce the Lt of the distribution T at time t is defined as I/O time by avoiding repeated scans over the data. This enhances the speed of convergence of the EM algorithm. 2. In this stage, they made another pass over the dataset to built cluster candidates The likelihood values are often calculated by using the data summaries collected in the taking the natural logarithm of the likelihood previous step. After the frequently rather than the likelihood itself because likelihood appearing data have been identified, the values are often very small numbers that are close algorithm builds all cluster candidates to zero. They calculated the log-likelihood as during this pass. The clusters are built follows: and hashed into bloom filter. 3. In this stage clusters are selected from the set of candidates.
  10. 10. They measured the likelihood of each flow, xt , belonging to a particular group by comparing the difference LLt (T)−LLt−1 (T). If this difference is greater than some value α, we declare the flow anomalous. They repeat this process for every flow in the time interval. 6. TESTING METHODOLOGY: Paper I: In this paper the authors did not use any testing Overall, out of the approximately 500,000 packets methodology. They described different kinds of with a 1.0000 probability, there are at least 50,000 data mining techniques and rules to implement packets that require further study. Retraining of various kinds of data mining based IDS. our model and readjusting the model‘s prior probabilities will allow us to see if those Paper II: remaining packets are truly attack packets or just The authors of this paper used MIT Lincoln Lab simply false alarms. 1999 intrusion detection evaluation (IDEVAL) data sets. From this data set they over-sampled the Session-Based Results labeled attack sessions and created a new data set They also scored the UCF data with the session- that contained a mix of 43.6% attack sessions and based network model and found that 56.4% non-attack sessions. After refining their approximately 32.9% of the sessions were network model using this data, they also have identified as having a probability of 1.0000 of collected hours of traffic from the University of being an attack session. Central Florida (UCL) network and looked at the TCP/IP header information and ran that Conversely, a session that scored a probability information against our newly created model. lower than 1.0000 does not also necessarily mean Each day of our UCF data set contained 20-30 that session is a ―good‖ session and poses no million packets, averaging 300 packets captured threat to their campus networks. per second. The vast majority of the sessions captured had a Packet-Based Results low or non-existent probability of being an attack They scored the UCF data with the packet-based session. Their studies showed that more than network model and found that approximately sixty-six percent of the sessions captured have an 2.5% of the packets were identified as having a attack probability of 0.0129. probability of 1.0000 of being an attack packet. Most of the anomalous sessions that were tagged Conversely, a packet that scored a probability of in the high-probability area were web-based port 0.0000 does not necessarily mean that packet is a 80 traffic. Overall, out of the approximately ―good‖ packet and poses no threat to their campus 30,000 sessions that were highly probable attack networks. The following figure shows that more sessions, there are at least 10,000 sessions that than seventy percent of the packets captured have require further study. Retraining of their model an attack probability of 0.0000 and ninety-seven and readjusting the model‘s prior probabilities percent of the packets have an attack probability will allow to see if those remaining sessions are of 0.5000 or less. truly attack sessions or just simply false alarms. Paper III The authors of this paper evaluated SCAN using the 1999 DARPA intrusion detection evaluation
  11. 11. data. The dataset consists of five weeks of TCPDump data. Data from week 1 and 3 consist of normal attack-free network traffic. Week 2 data consists of network traffic with labeled attacks. The week 4 and 5 data are the ―Test Data‖ and contain 201 instances of 58 different unlabelled attacks, 177 of which are visible in the inside TCPDump data. They trained SCAN on the DARPA dataset using week 1 and 3 data, then evaluated the detector on weeks 4 and 5. Evaluation was done in an off-line manner. Simulation Results with Complete Audit Data An IDS is evaluated on the basis of accuracy and efficiency. To judge the efficiency and accuracy of SCAN, they used Receiver-Operating Characteristic (ROC) curves. An ROC curve, which graphs the rate of detection versus the false positive ratio. The performance of SCAN at detecting a SSH Process Table attack is shown in Fig. 5. The attack is similar to the Process Table attack in that the goal of the attacker is to cause the SSHD daemon to spawn the maximum number of processes that the operating system will allow. In Fig. 6 the performance of SCAN at detecting a SYN flood attack is evaluated. A SYN flood attack is a type of a DoS attack. It occurs when an attacker makes a large number of half-open Paper IV: connections to a given port of a server during a The authors of this paper discussed that ADAM short period of time. This causes the data structure participated in DARPA 1999 intrusion detection in the ‗tcpd‘ daemon in the server to overflow. evaluation. It focused on detecting DOS and Due to the buffer overflow, the system will be PROBE attacks from tcpdump data and performed unable to accept any new incoming connections quite well. The following Figures 3 and 4 show till the queue empties. They compared the the results of DARPA 1999 test data. performance of SCAN with that of the K-Nearest Neighbors (KNN) clustering algorithm. Comparisons of the ROC curves seen in Fig. 5 and 6 suggests that SCAN outperforms the KNN algorithm. The two techniques are similar to each other in that both are unsupervised techniques that work with unlabelled data.
  12. 12. Figures 5 and 6 show the results of the recent 1999 DARPA Intrusion Detection Evaluation, administered by MIT Lincoln Labs. ADAM entered the competition and performed extremely well. We aimed to detect intrusions of the type Denial of Service (DOS) and Probe attacks. In those categories, as shown in Figure 5, ADAM ranked third, after EMERALD and the University of California Santa Barbara's STAT system. The 7. CONCLUSION: attacks detected by those systems and missed by ADAM were those that fell below our thresholds, In this report we have studied the details of four being attacks that involved usually only one papers in this area. We tried to make summary of connection and that can be best identified by those four papers, their system models, their signature-based systems. ADAM came very close technologies and their validation methods. We did to EMERALD in overall attack identification, as not go through all the cross-references given in shown in Figure 6. those papers rather we kept the scope of this paper limited into these four papers only. We strongly believe that this paper will be able to give the reader a view about what is currently developing in this area and how data mining is evolving into the field of network intrusion detection. 8. BIBLIOGRAPHY: [1] Chang-Tien Lu, Arnold P. Boedihardjo, Prajwal Manalwar, ―Exploiting Efficient Data Mining Techniques to Enhance Intrusion Detection Systems. Information Reuse and Integration, Conf, 2005. IRI -2005 IEEE International Conference on. [2] Caulkins, B.D.; Joohan Lee; Wang, M, ―Packet- vs. session-based modeling for intrusion detection system‖, Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on Volume 1, 4-6 April 2005 Page(s):116 - 121 Vol. 1 Digital Object Identifier 10.1109/ITCC.2005.222 . [3] Patcha, A.; Park, J.-M., ―Detecting denial-of-service attacks with incomplete audit data‖, Computer Communications and Networks, 2005. ICCCN 2005. Proceedings. 14th International Conference on 17-19 Oct. 2005 Page(s):263 - 268 Digital Object Identifier 10.1109/ICCCN.2005.1523864. [4] Daniel Barbara, Julia Couto, Sushil Jajodia, Leonard Popyack, Ningning Wu, ―ADAM: Detecting Intrusions by Data Mining‖, Proceedings of the 2001 IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, NY, 5-6 June 2001.