Computer security - A machine learning approach


Published on

An abridged version of my M.Sc. thesis.

Published in: Technology, Education
1 Comment
1 Like
  • Hi Sandeep! I wanted to download it to read offline. Will you allow me to do so?
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Computer security - A machine learning approach

  1. 1. Computer Security: A Machine Learning Approach We analyze two learning algorithms, NBTree and VFI, for the task of detecting intrusions. SANDEEP V. SABNANI AND ANDREAS FUCHSBERGER Produced by the Information Security Group at Royal Holloway, University of London in conjunction with TechTarget. Copyright © 2008 TechTarget. All rights reserved.
  2. 2. Royal Holloway series Machine learning approaches ABSTRACT Information Security is one of the key areas today as securing computers especially against novel attacks becomes a daunting task. Intrusion detection is a method by which unauthorised access to one’s assets is detected. In this paper, we present an application Sandeep V. of the field of machine learning to computer security, particularly to intrusion detection. Sabnani We analyse two learning algorithms (NBTree and VFI) for the task of detecting intrusions Information Security Group and compare their relative performances. We then comment on the suitability of NBTree Royal Holloway, University of London Egham, Surrey, TW20 0EX, algorithm over VFI for the intrusion detection task based on its high accuracy and high United Kingdom recall. We finally state the usefulness of machine learning to the field of computer security. Andreas Fuchsberger Computer Security: Information Security Group A Machine Learning Approach Royal Holloway, University of London Egham, Surrey, TW20 0EX, United Kingdom 1. INTRODUCTION There have been quite a few approaches Computer security has become a chal- which prevent and/or detect known attacks. lenging task these days with the rapid Novel or unknown attacks on the other hand This article was prepared by students and staff involved with the award-winning M.Sc. growth of the internet and the increasing are more difficult to detect and have received in Information Security offered by the complexity of communication protocols. considerable attention in the recent past. Information Security Group at Royal Holloway, New and complicated attack methods are Another important problem in the field of University of London. The student was judged to have produced an outstanding M.Sc. thesis being constantly developed by attackers computer security has been that of insider on a business-related topic. The full thesis thereby compromising the confidentiality, threats. Many of the insider threats may be is available as a technical report on the Royal Holloway website integrity and/or availability of one’s data. unintentional, nevertheless it has become CERT has reported 8064 new vulnerabili- essential to ensure that insider behaviour ties in the year 2006 and the number has is in sync with the security policy of the For more information about the Information Security Group at Royal Holloway or on the been increasing significantly over the past organisation. M.Sc. in Information Security, please visit few years [1]. These issues can prove to be quite expen- • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 2
  3. 3. Royal Holloway series Machine learning approaches sive for organisations to handle. The task 2. MACHINE LEARNING of detecting intrusions involves the formula- 2.1 Basic Concepts tion of efficient rules which require a high Learning can be described in many ways level of domain expertise and analysis of including acquisition of new knowledge, large amounts of data, which might make enhancement of existing knowledge, repre- the process slow and unreliable with human sentation of knowledge, organisation of experts. For checking compliance with a knowledge and discovery of facts through Every computer security policy, an administrator may have experiments [11]. When such learning is per- action can to be extremely cautious as normal user formed with the help of computer programs, be modeled behaviour may change over time. it is referred to as machine learning. Every computer action can be modeled as a function Machine learning is a field related to artifi- cial intelligence which deals with construct- as a function with sets of inputs and outputs. with sets of ing computer programs that automatically A learning task may be considered as the inputs and improve with experience [12]. The learning estimation of this function by observing the outputs. experience is provided in the form of data sets of inputs and outputs. (We use the term and actual learning is achieved with the help estimation as the exact function may not of algorithms. The two main tasks that are be determinate.) The function estimating addressed by machine learning are the process usually consists of a search in the ability to learn more about the given data hypothesis space (i.e. the space of all such and to make predictions about new data possible functions that might represent the based on learning outcomes from the learn- input and output sets under consideration). ing experience [9]; both of which are difficult The authors in [14] formally describe the and time-consuming for human analysts. function approximation process. Consider a Machine learning is thus, well-suited to set of input instances X = ( x1, x2, x3 . . . xn). problems that depend on rare, expensive Let f be a function which is to be guessed and unreliable human experts. by the learner. Let h be the learner’s This paper presents the intrusion detec- hypothesis about f. tion problem and a machine learning based Also, we assume a priori that both f and h solution to it. belong to a class of functions H. The function • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 3
  4. 4. Royal Holloway series Machine learning approaches f maps the input instances in X as, an important issue for machine learning. The ∋ X h→H h(X) learning element may be trained in different ways [3]. For classification problems like intru- A machine learning task may thus be sion detection, knowledge may be learned in defined as a search in this space H. This a supervised, unsupervised or semi-super- search results in approximating the relevant vised manner. In this paper, we use super- h, based on the training instances (i.e. the vised learning in which the learner is provided The inputs and set X). with training examples with the associated outputs to a The approximation is then checked against classes or values for the attribute to be machine learning a set of test instances which are then used predicted. task may be of to indicate the correctness of h. The search different kinds. requires algorithms which are efficient and Choosing a learning which best-fit the training data [12]. algorithm 2.2 Inputs and Outputs The inputs and outputs to a machine Training the algorithm learning task may be of different kinds. using training data Generally, they are in the form of numeric or nominal attributes. For instance, an attribute Evaluating the algorithm like temperature if used as a numeric attrib- by running it on test data ute, may have values like 25o C, 28o C, etc. On the other hand, if it is used as a nominal Figure 1. Life Cycle of a Machine Learning Task attribute, it may take values from a fixed set (like high, medium, low). In many cases, the 2.4 Defining a Machine Learning Task output may also be a boolean value (like yes In general, a machine learning task can be and no). defined formally in terms of three elements, viz. the learning experience E, the tasks T 2.3 Production of Knowledge and the performance element P. The way in which knowledge is learned is TM Mitchell in Machine Learning [12] • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 4
  5. 5. Royal Holloway series Machine learning approaches defines a learning task more precisely as becomes difficult to create separate sets follows: for training and testing. In such cases, some “A computer program is said to learn from data might be held over for testing, and the experience E with respect to some class of remaining used for training. This is called the tasks T and performance measure P, if its holdout procedure [20]. However, the data in performance at tasks in T, as measured this case might be distributed in an uneven by P, improves with experience E.” way in the training and test sets and might Once the not represent the output classes in the algorithm is 2.5 Life Cycle of a Machine correct proportions. Stratification [4] and selected, the Learning Task cross-validation [5] can be used to Figure 1 shows the life cycle of a machine next step is to circumvent this problem. learning task. Depending on the nature of train the algorithm the knowledge to be learned, different 2.6 Benefits of Machine Learning by providing it types of algorithms may be chosen at The field of machine learning has been with a set of different times. Also, the type of inputs found to be extremely useful in the following training instances. and outputs are also instrumental in areas relating to software engineering [12]: choosing an algorithm. 1. Data mining problems where large Once the algorithm is selected, the next databases may contain valuable implicit reg- step is to train the algorithm by providing it ularities that can be discovered automatically. with a set of training instances. The training 2. Difficult to understand domains where instances are used to build a model that rep- humans might not have the knowledge to resents the target concept to be learned (i.e. develop effective algorithms. the hypothesis). This model is then evaluated 3. Domains in which the program is using the set of test instances. required to adapt to dynamic conditions. In the case where a large amount of data is available, the general approach is to con- In the case of traditional intrusion detec- struct two independent sets, one for training tion systems, the alerts generated are and the other for testing. On the other hand, analysed by human analysts who evaluate if a limited amount of data is available, it them and take suitable actions. However, • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 5
  6. 6. Royal Holloway series Machine learning approaches this is an extremely onerous task as the set for intrusion detection obtained from the number of alerts generated may be quite UCI machine learning repository. The KDD large and the environment may change Cup ’99 data set is a version of a data set continuously [16]. This makes machine used at the DARPA Intrusion Detection learning well suited for intrusion detection. Evaluation program ( ideval/data/data index.html). 3. MACHINE LEARNING APPLIED The data set consists of TCP dump data TO COMPUTER SECURITY for a simulated Air Force LAN. In addition to In addition to 3.1 Intrusion Detection as normal LAN simulation, attacks were also a Machine Learning Task simulated and the corresponding TCP data normal LAN A machine learning task can be formally was captured. The attacks were launched simulation, defined as shown in section 2.4. We use on three UNIX machines, Windows NT attacks were also this notation to formulate intrusion detection hosts and a router along with background simulated and the as a machine learning task. traffic. Every record in the data set repre- corresponding Thus, for intrusion detection, we have, sents a TCP connection. Each connection 1. Task: To detect intrusions in an TCP data was was labeled as normal or as a specific accurate manner. attack type [15]. The attacks fall into one captured. 2. Experience: A dataset with instances of the following categories: representing normal as well as attack data. • DOS attacks (Denial of Service attacks) 3. Performance Measure: Accuracy • R2L attacks (unauthorised access from in terms of correct classification of intru- a remote machine) sion events and normal events and other • U2R attacks (unauthorised access to statistical metrics including precision, super user privileges) recall, F-measure and kappa statistic • Probing attacks which are described in section 3.5. A detailed description and format of the 3.2 Data Set Description dataset can be found in [17]. The data set used for evaluation in this paper is a subset of the KDD Cup ’99 data • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 6
  7. 7. Royal Holloway series Machine learning approaches 3.3 Algorithms 3.3.2 VFI The following algorithms were used in the The VFI4 algorithm is a classification experiments carried out for this paper. algorithm based on the voting frequency intervals. In VFI, each training instance is 3.3.1 NBTree represented as a vector of features along The NBTree algorithm is a hybrid between with a label that represents the class of decision-tree classifiers and Naive Bayes the instance. Feature intervals are then The NBTree classifiers. It represents the learned knowl- constructed for each feature. An interval algorithm is a edge in the form of a tree which is constructed represents a set of values for a given feature hybrid between recursively. However, the leaf nodes are where the same subset of class values Naive Bayes categorizers rather than nodes are observed. Thus, two adjacent intervals decision-tree predicting a single class [6]. For continuous represent different classes. classifiers and attributes, a threshold is chosen so as to A detailed explanation of both the above Naive Bayes limit the entropy measure. The utility of a algorithms can be found in [17]. classifiers. node is evaluated by discretizing the data and computing the fivefold cross-validation 3.4 Experimental Analysis accuracy estimation using Naive Bayes at The experiments done in this paper the node. The utility of the split is the weighted consist of the evaluation of the performance sum of utility of the nodes and this depends of NBTree and VFI algorithms for the task on the number of instances that go through of classifying novel intrusions. The dataset that node. The NBTree algorithm tries to described in section 3.2 was used in the approximate whether the generalisation experiments. accuracy of Naive Bayes at each leaf is Weka [20], a machine learning toolkit was higher than a single Naive Bayes classifier used for the implementation of the algo- at the current node. A split is said to be rithms described in sections 3.3.1 and significant if the relative reduction in error 3.3.2. Due to the limitation in the available is greater that 5% and there are at least memory and processing power, it was not 30 instances in the node [6]. possible to use the full dataset described in section 3.2. Instead a reduced subset was • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 7
  8. 8. Royal Holloway series Machine learning approaches used and 10-fold cross-validation (explained negative. The value b is referred to as a false in section 2.5) was used to overcome this negative and c is known as false positive. limitation. In the context of intrusion detection, a true positive is an instance which is normal and 3.5 Evaluation Metrics is also classified as normal by the intrusion In order to analyse and compare the detector. A true negative is an instance performance of the above mentioned which is an attack and is classified as an In the context algorithms, metrics like the classification attack. of intrusion accuracy, precision, recall, F-Measure and detection, a kappa statistic were used. These metrics are 3.5.1 Classification Accuracy derived from a basic data structure known Classification accuracy is the most basic true positive is as the confusion matrix. A sample confusion measure of the performance of a learning an instance which matrix for a two-class problem is shown in method. It determines the percentage of is normal and is Table 1. correctly classified instances. From the also classified confusion matrix, we can say that: as normal by the Predicted Predicted intrusion detector. Accuracy = a+d Class Class Positive Negative a+b+c+d Actual Class a b Positive This metric gives the number of instances from the dataset which are classified correctly Actual Class c d i.e. the ratio of true positives and true nega- Negative tives to the total number of instances. Table 1. Confusion Matrix for a two-class 3.5.2 Precision, Recall and F-Measure problem (Expected predictions) Precision gives the percentage of slots in the hypothesis that are correct, whereas In this confusion matrix, the value a is called recall gives the percentage of reference a true positive and the value d is called a true slots for which the hypothesis is correct. • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 8
  9. 9. Royal Holloway series Machine learning approaches Referring from the confusion matrix, we figure and deducts it from the predictor’s can define precision and recall for our success and expresses the result as a pro- purposes as [19]: portion of the total for a perfect predictor [20]. Precision = a In addition to the above statistical metrics, a+c the time taken to build the model was also Recall = a considered as a performance indicator. a+b 3.6 Attribute Selection The precision of an intrusion detection Attribute Selection is the process of learner would thus indicate the proportion of identifying and removing much of the redun- correctly classified positive instances to the dant and irrelevant information possible. The total number of predicted positive instances experiments conducted in this paper use the and recall would indicate the proportion of information gain attribute selection method correctly classified positive instances to the which is described in [17]. total number of actual positive instances. The F-measure is another metric defined 3.7 Summary of Experiments as the weighted harmonic mean of precision Based on the above algorithms, attribute and recall [8] to address a problem identified selection methods and the type of cross-vali- in [7], which may be present in any classifica- dation, the following table 2 shows a summary tion scenario. of the experiments conducted. F-measure = 2*Precision*Recall Feature Reduction Cross Precision+Recall Algorithm Method Validation NBTree — 10-fold 3.5.3 Kappa Statistic The Kappa statistic is used to measure VFI — 10-fold the agreement between predicted and NBTree Information Gain 10-fold observed categorizations of a dataset, while VFI Information Gain 10-fold correcting for agreements that occur by chance. It takes into account the expected Table 2. Summary of Experiments • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 9
  10. 10. Royal Holloway series Machine learning approaches 4. EVALUATION Metric Value The results of the experiments described Time taken to in section 3.4 are discussed in this section. build the model 0.92s A comparison between NBTree and VFI Accuracy 86.58 % methods is also made based on the values Average Precision 41.27 % of the metrics defined in section 3.5. Average Recall 80.54 % For an IDS, For an IDS, the accuracy indicates how Average F-Measure 44.05 % the accuracy correct the algorithm is in identifying normal Kappa Statistic 79.50 % indicates how and adversary behaviour. Table 5. Results of VFI with all attributes correct the Metric Value algorithm is in Metric Value Time taken to Time taken to identifying normal build the model 1115.05s and adversary build the model 0.2s Accuracy 99.94 % Average Precision 90.33 % Accuracy 75.81 % behaviour. Average Precision 35.71 % Average Recall 92.72 % Average Recall 75.82 % Average F-Measure 91.14 % Average F-Measure 37.43 % Kappa Statistic 99.99 % Kappa Statistic 66.21 % Table 3. Results of NBTree with all attributes Table 6. Results of VFI with selected attributes using Metric Value information gain measure Time taken to build the model 38.97s Recall would indicate the proportion of Accuracy 99.89 % correctly classified normal instances from Average Precision 94.54 % the total number of actual normal instances Average Recall 90.84 % whereas precision would indicate the num- Average F-Measure 92.28 % ber of correctly classified normal instances Kappa Statistic 99.82 % from the total number of instances identified Table 4. Results of NBTree with selected attributes using as normal by the IDS. The Kappa statistic information gain measure is a general statistical indicator and the • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 10
  11. 11. Royal Holloway series Machine learning approaches F-Measure is related to the problem men- The graphs in Figures 2 and 3 show a rel- tioned in section 3.5.2. In addition to these, ative performance of NBTree and VFI for the the time taken by the learning algorithm for intrusion detection task on the dataset under model construction is also important as it consideration. Figure 2 shows the compari- may have to handle extremely large amounts son with all attributes under consideration of data. and figure 3 depicts the comparison for attributes selected using the information There are tremen- gain measure. dous differences As per the definitions in section 3.5.2, in the precision a good IDS should have a recall that is as and recall values high as possible. A high precision is also desired. From our results, we see that the of NBTree and classification accuracy of NBTree is better VFI where the than that of VFI in both cases. There are NBTree exhibits tremendous differences in the precision and a relatively higher recall values of NBTree and VFI where the precision and Figure 2. NBTree v/s VFI - All Attributes NBTree exhibits a relatively higher precision and higher recall. Also, when all attributes higher recall. are used, NBTree has a lower precision value than the case when selected attributes are used. In both these cases, the recall is more or less the same. Also, the F- Measure value is high for NBTree in comparison to VFI. NBTree is seen to have a better performance as compared to the VFI in both the cases and it can thus be said that it is more suited to the intrusion detection task on the given data set. Figure 3. NBTree v/s VFI - Selected Attributes • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 11
  12. 12. Royal Holloway series Machine learning approaches 5. CONCLUSIONS assigning users to roles and vice-versa is Ron Condon Based on the experiments done in this described in [18]. Learning algorithms can also UK bureau chief paper and their corresponding results, we be used to develop applications which, for can state the following: instance, can check whether people in an Ron Condon • Machine learning is an effective method- organisation are adhering to the defined has been writing ology which can be used in the field of com- security policy. about develop- puter security. • It is possible to analyse huge quantities ments in the IT industry for • The inherent nature of machine learning of audit data by using machine learning more than 30 algorithms makes them more suited to the techniques, which is otherwise an extremely years. In that intrusion detection field of information secu- difficult task. time, he has charted the evolution from big main- rity. However, it is not limited to intrusion frames, to minicomputers and PCs in detection. The authors in [10] have developed Finally it can be said that in order to realise the 1980s, and the rise of the Internet a tool using machine learning to infer access the full potential of machine learning to the over the last decade or so. In recent control policies where policy requests and field of computer security, it is essential to years he has specialized in information security. He has edited daily, weekly responses are generated by using learning experiment with various machine learning and monthly publications, and has algorithms. These are effective with new pol- schemes towards addressing security-related written for national and regional icy specification languages like XACML [13]. problems and choose the one which is the newspapers, in Europe and the U.S. Similarly, a classifier-based approach to most appropriate to the problem at hand. m • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 12
  13. 13. Royal Holloway series Machine learning approaches REFERENCES [1] CERT Vulnerability Statistics 1995 - 2006. remediation.html, 2007. [2] G. Demiroz and H. A. Guvenir. Classification by voting feature intervals. In European Conference on Machine Learning, pages 85-92, 1997. [3] T. Dietterich and P. Langley. Machine learning for cognitive networks:technology assessments and research challenges, Draft of May 11, 2003. , 2003. [4] M. A. Hall. Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Department of Computer Science, 1999. [5] R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1137-1145, 1995. [6] R. Kohavi. Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the Second Inter- national Conference on Knowledge Discovery and Data Mining, pages 202-207, 1996. [7] M. Kubat and S. Matwin. Addressing the curse of imbalanced training sets: one-sided selection. In Proc. 14th Interna- tional Conference on Machine Learning, pages 179-186. Morgan Kaufmann, 1997. [8] J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel. Performance measures for information extraction., 1999. [9] M. A. Maloof, editor. Machine Learning and Data Mining for Computer Security. Springer, 2006. [10] E. Martin and T. Xie. Inferring access-control policy properties via machine learning. In POLICY ‘06: Proceedings of the Seventh IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY’06), pages 235-238, Washington, DC, USA, 2006. IEEE Computer Society. [11] R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors. Machine Learning: An Artificial Intelligence Approach. Tioga • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 13
  14. 14. Royal Holloway series Machine learning approaches Publishing Company, 1983. [12] T. M. Mitchell. Machine Learning. McGraw Hill, 1997. [13] T. Mose. Oasis, extensible access control markup language, (xacml) version 2.0. http://docs.oasis-, 2005. [14] N. J. Nilsson. Introduction to Machine Learning - an early draft of a proposed book., 1996. [15] U. Of California. The UCI KDD Archive, University of California., 1999. [16] T. Pietraszek. Using adaptive alert classification to reduce false positives in intrusion detection. Recent Advances in Intrusion Detection, 3224:102-124, 2004. [17] S. V. Sabnani. Computer security: A machine learning approach. Master’s thesis, Royal Holloway, University of London, 2007. [18] S. Sheng and S. L. Osborn. A classifier-based approach to user-role assignment for web applications. In Secure Data Management, pages 163-171, 2004. [19] S. Tesink. Improving intrusion detection systems through machine learning. sis-tesink.pdf, 2007. [20] I. H. Witten and E. Frank. Data Mining - Practical MachineLearning Tools and Techniques, Second Edition. Elsevier, 2005. [21] S. Wolthusen. Lecture 11 - Intrusion Detection and Prevention, Notes on Network Security, Royal Holloway University of London, 2006. • MACHINE LEARNING • COMPUTER SECURITY • EVALUATION • CONCLUSIONS APPLICATIONS 14