Prediction Methods for Mitigating
   Computer Security Threats

              Errin W. Fulp



     Department of Computer...
Outline


    Overview of data mining methods
          Machine learning tools, techniques, and tasks
          Preprocess...
What is Data Mining



   Extracting hidden patterns from data
       Can be used to uncover existing hidden patterns
    ...
Steps in the Process

    Standard data-oriented view of Knowledge Discovery in Databases

         selection      preproc...
Preprocessing Data
                          Once the objective is determined, assemble the data
                         ...
Types of Data Mining



                   transformed data            patterns



                                Data Mi...
Classification

     Arrange data into predefined groups, developed from training
            Learn a model (classifier) from...
Clustering

    Arrange data into groups, but the groups are not predefined
                                               ...
Regression

    Model the data with the least error
          Useful for forecasting and prediction
    As applied to secu...
Association Rule Learning


    Searches for relationships between variables
        Learn rules that capture normal behav...
Interpreting the Results

    Final step of the process, evaluate the patterns discovered
        Not all are valid or may...
When Applied to Computer Security




   Two major issues...
       Large data sets
       Rare events




               ...
Security and Large Data Sets


    Security typically involves large data sets
         Sendmail “11,500 system calls per ...
Security and Rare Events


    Rare event processing is often required
        We hope security events are infrequent...
 ...
Rare Events in Other Areas



    Insurance risk modeling [PRA00]
    E-commerce and web mining, “Online merchants convert...
Example Security Application: Who is Doing What?


    Given a computer network, discover what computers are doing
       ...
A New Approach

   Given a set of computer network trace data, is it possible to
   identify the application protocols (e....
Motifs

    A motif is a pattern of interconnections occurring in complex
    networks at numbers that are significantly hi...
Applying this Idea to Application Identification



                                             easy                      ...
Initial Experiments


    Sources of data
        Dartmouth University campus wireless network, Fall 2003
        OSDI Con...
Motif Profile Results




             AIM           DNS              HTTP                 Kazaa
                          ...
So What is the Problem?




                  Errin W. Fulp   Prediction Methods for Mitigating Computer Security Threats
For Further Reading I

[AWG+ 93]   C. Apte, S. M. Weiss, G. Grout, Chidanand Apte, Sholom Weiss, and Gordon Grout.
       ...
For Further Reading II
[LSM98]     Wenke Lee, Salvatore J. Stolfo, and Kui W. Mok.
            Mining audit data to build ...
Title




    Item
           Sub-item




                      Errin W. Fulp   Prediction Methods for Mitigating Compute...
Upcoming SlideShare
Loading in …5
×

Prediction Methods for Mitigating Computer Security Threats

538 views

Published on

  • Be the first to comment

  • Be the first to like this

Prediction Methods for Mitigating Computer Security Threats

  1. 1. Prediction Methods for Mitigating Computer Security Threats Errin W. Fulp Department of Computer Science Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  2. 2. Outline Overview of data mining methods Machine learning tools, techniques, and tasks Preprocessing, data mining, and interpretation Prediction or knowledge discovery When applied to computer security Large data sets and rare events (at least we hope...) Methods for addressing each concern Example application, function discovery in computer networks Who is doing what in a computer network? Identify the application based on the pattern of interactions Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  3. 3. What is Data Mining Extracting hidden patterns from data Can be used to uncover existing hidden patterns ...but it cannot uncover patterns not already in the data Typically two major objectives Knowledge discovery - determine facts about the data Forecasting or predictions - predict future events Both are relevant to computer security Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  4. 4. Steps in the Process Standard data-oriented view of Knowledge Discovery in Databases selection preprocessing transformation data mining interpretation Data Target Data Preprocessed Data Transformed Data Patterns Knowledge Let’s divide into a process-oriented view transformed data patterns Preprocessing Data Mining Interpretation Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  5. 5. Preprocessing Data Once the objective is determined, assemble the data Again, can only uncover existing patterns Clean the data, removing noise and account for missing data Remove unwanted data that hinders data analysis... but what is noise with regards to security... Do we really want to remove outliers? Reduce and transform data into important feature vectors preprocessing transformation h198.129.146.158 Host Facility Level Tag Time Message 200 tag Encoding (e) Sequence f (base 10) 198.129.8.6 198.129.8.6 local7 notice 189 1171061732 sysstat kern info 6 1171061732 kerne md : usin maxim um availablidl I bandwidth l g e e O 148 2 2 198.129.8.6 cron info 78 1171061733 cron 2500 (root CM D (/usr/lib/sa/sa1) d ) 1 1 150 148 2 22 198.129.8.6 auth info 38 1171062445 rsh(pam unix 2215 sessio opened fo user by (uid=0) ) n r 158 2 222 tag number 198.129.8.6 auth info 38 1171062445 in.rsh 2216 root@hpcs2cs.ed as root cmd=/root/temps d . u : 198.129.8.6 daemon info 30 1171062590 smart 88 Device /dev d : /twe0 SMAR T Prefailur e Attribute 40 1 2221 100 198.129.8.18 syslog info 46 1171062590 syslog restart. d 158 2 22212 239 198.129.7.282 daemon info 30 1171062590 ntpd 2555 synchronize to 198.129.149.218 d , str 188 2 22122 233 198.129.7.222 daemon info 30 1171062590 ntpd 2555 synchronize to 198.129.149.218 d , str 198.129.7.238 daemon info 30 1171062590 ntpd 2555 synchronize to 198.129.149.218 d , str 50 188 2 21222 215 198.129.8.6 auth notice 37 1171062590 sshd(pam unix 12430 aut failure ) h ; logname=el-fork-o 88 1 12221 160 198.129.8.6 kern info 6 1171062590 kerne md : usin 512k, over a tota of 12287936 blocks. l g l 198.129.8.6 cron info 78 1171062601 cron 2500 (root CM D ( d ) /usr/lib/sa/fork-i t 1 1) 0 158 2 22212 239 1.1778 1.1779 1.178 1.1781 1.1782 1.1783 1.1784 1.1785 198.129.8.6 kern alert 1 1171062692 kerne raid5 Dis fai l : k l ure on sde1, disablin device g time (seconds) x 10 9 188 2 22122 215 Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  6. 6. Types of Data Mining transformed data patterns Data Mining Classification Preprocessing Clustering Interpretation Regression Rule Learning Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  7. 7. Classification Arrange data into predefined groups, developed from training Learn a model (classifier) from labeled training data Examples include k-nearest neighbor and support vector machines Typically training is slow, but classification is fast When applied to security (specifically IDS) [CBK] 1 Cluster training data using algorithm 2 For new data, distance to closest cluster is anomaly score Assumption: Normal data instances belong to specific cluster(s) in the data, while anomalous does not. Normal data is closest to the centroid. Can also perform semi-supervised training Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  8. 8. Clustering Arrange data into groups, but the groups are not predefined No training data required, therefore no training time... Attack Graph Cluster Representation 1:execCode(commServer,root) 2:RULE 2 (remote exploit of a server program):1 140:vulExists(commServer,iccpVulnerability,iccpService,remoteExploit,privEscalation) 3:netAccess(commServer,iccpProtocol,iccpPort) 6:RULE 5 (multi-hop access):0.5 4:RULE 5 (multi-hop access):0.5 8:execCode(dataHistorian,root) 7:hacl(dataHistorian,commServer,iccpProtocol,iccpPort) 5:hacl(commServer,commServer,iccpProtocol,iccpPort) 5 9:RULE 2 (remote exploit of a server program):1 138:vulExists(dataHistorian,oracleSqlVulnerability,oracleSqlServer,remoteExploit,privEscalation) 10:netAccess(dataHistorian,sqlProtocol,sqlPort) 137:networkServiceInfo(dataHistorian,oracleSqlServer,sqlProtocol,sqlPort,root) 135:RULE 5 (multi-hop access):0.5 11:RULE 5 (multi-hop access):0.5 131:RULE 5 (multi-hop access):0.5 133:RULE 5 (multi-hop access):0.5 136:hacl(dataHistorian,dataHistorian,sqlProtocol,sqlPort) 13:execCode(citrixServer,normalAccount) 14:RULE 0 (When a principal is compromised any machine he has an account on will also be compromised):0.5 132:hacl(citrixServer,dataHistorian,sqlProtocol,sqlPort) 134:hacl(commServer,dataHistorian,sqlProtocol,sqlPort) 10 15:canAccessHost(citrixServer) 113:RULE 8 (Access a host through a log-in service):1 16:RULE 7 (Access a host through executing code on the machine):1 15 114:netAccess(citrixServer,sshProtocol,sshPort) 127:logInService(citrixServer,sshProtocol,sshPort) 17:RULE 7 (Access a host through executing code on the machine):1 121:RULE 5 (multi-hop access):0.5 119:RULE 5 (multi-hop access):0.5 125:RULE 5 (multi-hop access):0.5 123:RULE 5 (multi-hop access):0.5 117:RULE 5 (multi-hop access):0.5 115:RULE 5 (multi-hop access):0.5 128:RULE 12 ():1 122:hacl(vpnServer,citrixServer,sshProtocol,sshPort) 120:hacl(fileServer,citrixServer,sshProtocol,sshPort) 126:hacl(workStation,citrixServer,sshProtocol,sshPort) 118:hacl(citrixServer,citrixServer,sshProtocol,sshPort) 129:networkServiceInfo(citrixServer,sshd,sshProtocol,sshPort,root) 18:execCode(citrixServer,root) 19:RULE 4 (Trojan horse installation):0.2 20:accessFile(citrixServer,write, /usr/local/share ) 21:RULE 15 (NFS semantics):1 20 22:accessFile(fileServer,write, /export ) 112:nfsMounted(citrixServer, /usr/local/share ,fileServer, /export ,read) 29:RULE 16 (NFS shell):0.6 106:RULE 16 (NFS shell):0.6 109:RULE 16 (NFS shell):0.6 26:RULE 16 (NFS shell):0.6 23:RULE 16 (NFS shell):0.6 32:execCode(webServer,apache) 30:hacl(webServer,fileServer,nfsProtocol,nfsPort) 31:nfsExportInfo(fileServer, /export ,write,webServer) 111:nfsExportInfo(fileServer, /export ,write,workStation) 110:hacl(workStation,fileServer,nfsProtocol,nfsPort) 28:nfsExportInfo(fileServer, /export ,write,citrixServer) 27:hacl(citrixServer,fileServer,nfsProtocol,nfsPort) 33:RULE 2 (remote exploit of a server program):1 34:netAccess(webServer,httpProtocol,httpPort) 104:networkServiceInfo(webServer,httpd,httpProtocol,httpPort,apache) 105:vulExists(webServer, CAN-2002-0392 ,httpd,remoteExploit,privEscalation) 25 95:RULE 5 (multi-hop access):0.5 35:RULE 5 (multi-hop access):0.5 101:RULE 6 (direct network access):1 99:RULE 5 (multi-hop access):0.5 97:RULE 5 (multi-hop access):0.5 37:execCode(vpnServer,normalAccount) 96:hacl(webServer,webServer,httpProtocol,httpPort) 36:hacl(vpnServer,webServer,httpProtocol,httpPort) 102:hacl(attacker,webServer,httpProtocol,httpPort) 100:hacl(workStation,webServer,httpProtocol,httpPort) 38:RULE 0 (When a principal is compromised any machine he has an account on will also be compromised):0.5 39:canAccessHost(vpnServer) 94:hasAccount(ordinaryEmployee,vpnServer,normalAccount) 30 40:RULE 7 (Access a host through executing code on the machine):1 41:RULE 8 (Access a host through a log-in service):1 91:logInService(vpnServer,vpnProtocol,vpnPort) 42:netAccess(vpnServer,vpnProtocol,vpnPort) 92:RULE 13 ():1 43:RULE 5 (multi-hop access):0.5 86:RULE 5 (multi-hop access):0.5 47:RULE 5 (multi-hop access):0.5 88:RULE 6 (direct network access):1 45:RULE 5 (multi-hop access):0.5 93:networkServiceInfo(vpnServer,vpnService,vpnProtocol,vpnPort,root) 44:hacl(vpnServer,vpnServer,vpnProtocol,vpnPort) 87:hacl(workStation,vpnServer,vpnProtocol,vpnPort) 89:hacl(attacker,vpnServer,vpnProtocol,vpnPort) 103:attackerLocated(attacker) 46:hacl(webServer,vpnServer,vpnProtocol,vpnPort) 49:execCode(workStation,normalAccount) 35 50:RULE 0 (When a principal is compromised any machine he has an account on will also be compromised):0.5 51:canAccessHost(workStation) 79:principalCompromised(ordinaryEmployee) 59:RULE 8 (Access a host through a log-in service):1 52:RULE 7 (Access a host through executing code on the machine):1 80:RULE 10 (password sniffing):0.8 82:RULE 10 (password sniffing):0.8 84:RULE 11 (incompetent user):0.2 60:netAccess(workStation,tcp,sshProtocol) 75:logInService(workStation,tcp,sshProtocol) 53:RULE 7 (Access a host through executing code on the machine):1 130:hasAccount(ordinaryEmployee,citrixServer,normalAccount) 83:hasAccount(ordinaryEmployee,workStation,normalAccount) 85:inCompetent(ordinaryEmployee) 40 61:RULE 5 (multi-hop access):0.5 63:RULE 5 (multi-hop access):0.5 69:RULE 5 (multi-hop access):0.5 65:RULE 5 (multi-hop access):0.5 76:RULE 12 ():1 73:RULE 5 (multi-hop access):0.5 71:RULE 5 (multi-hop access):0.5 64:hacl(citrixServer,workStation,tcp,sshProtocol) 70:hacl(vpnServer,workStation,tcp,sshProtocol) 67:execCode(fileServer,root) 66:hacl(fileServer,workStation,tcp,sshProtocol) 77:networkServiceInfo(workStation,sshd,tcp,sshProtocol,sshPort) 74:hacl(workStation,workStation,tcp,sshProtocol) 54:execCode(workStation,root) 68:RULE 4 (Trojan horse installation):0.2 55:RULE 4 (Trojan horse installation):0.2 56:accessFile(workStation,write, /usr/local/share ) 57:RULE 15 (NFS semantics):1 58:nfsMounted(workStation, /usr/local/share ,fileServer, /export ,read) 5 10 15 20 25 30 35 40 Examples of statistical classification include k-means clustering and fuzzy clustering Have difficulty with higher dimensional data [CBK] Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  9. 9. Regression Model the data with the least error Useful for forecasting and prediction As applied to security, regression typically has two steps 1 Fit regression model to the data 2 For each test instance, residual determines anomaly score Presence of anomalies can influence the robustness of the model Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  10. 10. Association Rule Learning Searches for relationships between variables Learn rules that capture normal behavior, any test that is not covered is an anomaly (one-class) [EEGPP06, LSM98] For multi-class if UDP is AVERAGE ∧ TCP is AVERAGE then ICMP is AVERAGE if SYN is AVERAGE ∧ FIN is AVERAGE then ICMP is AVERAGE if ICMP is AVERAGE ∧ UDP is AVERAGE ∧ TCP is AVERAGE ∧ Learn rules from training data SYN is AVERAGE then FIN is AVERAGE if UDP is AVERAGE ∧ FIN is AVERAGE then SYN is AVERAGE using algorithm, each rule has a if UDP is AVERAGE ∧ SYN is AVERAGE then ICMP is AVERAGE if SYN is AVERAGE then ICMP is AVERAGE if ICMP is AVERAGE ∧ FIN is AVERAGE then SYN is AVERAGE confidence values if UDP is AVERAGE ∧ TCP is AVERAGE ∧ SYN is AVERAGE ∧ FIN is AVERAGE then ICMP is AVERAGE For each test instance find the if UDP is AVERAGE ∧ SYN is AVERAGE then FIN is AVERAGE if ICMP is AVERAGE ∧ TCP is AVERAGE ∧ SYN is AVERAGE best rule, the inverse of the then FIN is AVERAGE if ICMP is AVERAGE ∧ SYN is AVERAGE then FIN is AVERAGE . confidence is the anomaly score . . Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  11. 11. Interpreting the Results Final step of the process, evaluate the patterns discovered Not all are valid or may have a validity time period Standard measures: accuracy, precision, recall, and F-score Unbalanced test sets are a concern Overfitting – excellent job of fitting the data, but not predicting Find patterns in training-set not present in test set 3 data overfit model 2 correct model 1 0 -1 -2 -3 0 0.2 0.4 0.6 0.8 1 Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  12. 12. When Applied to Computer Security Two major issues... Large data sets Rare events Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  13. 13. Security and Large Data Sets Security typically involves large data sets Sendmail “11,500 system calls per message” [WGZ08] 1998 MIT network data, 7 weeks is about 5 million connections Must be processed quickly and accurately Data oriented solutions Discretization, feature selection [FFH08], feature construction (principal component analysis) [WGZ04], and sampling [PP07] Method oriented solutions Parallel data mining (high-performance data mining ) Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  14. 14. Security and Rare Events Rare event processing is often required We hope security events are infrequent... Are there enough examples for supervised learning? Black swan theory (hard to predict, high consequence, and easy to see afterwards) Bulk anomalies (worms) are the opposite... [CBK] Standard approaches do not work well with rare events [JAK01] Normal events maybe similar, but rare events often different Many techniques attempt to model normal, look for variations Over-sample rare class, down-size large class, artificial cases Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  15. 15. Rare Events in Other Areas Insurance risk modeling [PRA00] E-commerce and web mining, “Online merchants convert an average of 2%-3% of their site visitors into buyers” Churn analysis, “number of customers that end relationship with a company in a given period” [NGK+ 06] Hardware faults, for example new disk failures [AWG+ 93] Airline No-Show predictions [LHC03] Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  16. 16. Example Security Application: Who is Doing What? Given a computer network, discover what computers are doing Specifically what applications or types of applications Identifying an application is important for two reasons Management of network resources Compliance with security policies However current methods do not always work Port numbers are unreliable Payloads can be encrypted Current in-the-dark methods can defeated Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  17. 17. A New Approach Given a set of computer network trace data, is it possible to identify the application protocols (e.g. HTTP, AIM, DNS) that hosts are using, based on interactions patterns? Three different views of the same network Physical Logical Application Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  18. 18. Motifs A motif is a pattern of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks Motifs have been applied to several complex networks Gene regulation, neural networks, ecosystem food webs, electronic circuits (forward logic chips, digital fractional multipliers), and World Wide Web Certain motifs can be linked to specific functions Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  19. 19. Applying this Idea to Application Identification easy talk to ugggh... time consuming time consuming grad student... Parse Construct Create motif Nearest neighbor Interpret data application graphs profiles classification results Evolutionary attribute weighting Preprocessing Collect data, parse into connection information Find all order 3 and 4 motifs and build motif profiles k-nearest-neighbor classification (for training and testing ) Interpret results, possibly weight features to improve performance Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  20. 20. Initial Experiments Sources of data Dartmouth University campus wireless network, Fall 2003 OSDI Conference 2006 Lawrence Berkeley National Lab 2004/2005 Create a profile per application Application x profile = 1.000 0.662 0.650 0.632 0.585 Application y profile = 0.900 0.672 0.50 0.772 0.85 Given new application, find best matching profile Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  21. 21. Motif Profile Results AIM DNS HTTP Kazaa AIM DNS HTTP Kazaa MSDS Netbios SSH MSDS Netbios SSH Results very good compared to traditional graph statistics Although there is a problem with AIM and SSH... So what is the problem...? Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  22. 22. So What is the Problem? Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  23. 23. For Further Reading I [AWG+ 93] C. Apte, S. M. Weiss, G. Grout, Chidanand Apte, Sholom Weiss, and Gordon Grout. Predicting defects in disk drive manufacturing: A case study. In Proceedings of the IEEE CAIA93, pages 212–218, 1993. [CBK] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. To appear in ACM Computing Surveys, September 2009. [EEGPP06] Aly ElSemary, Janica Edmonds, Jes´s Gonz´lez-Pino, and Mauricio Papa. u a Applying data mining of fuzzy association rules to network intrusion detection. In Proceedings of the IEEE Workshop on Information Assurance , 2006. [FFH08] Errin W. Fulp, Glenn. A. Fink, and Jereme N. Haack. Predicting computer system failures using support vector machines. In Proceedings of the Workshop on Analysis of Sytem Logfiles , 2008. [JAK01] Mahesh V. Joshi, Ramesh C. Agarwal, and Vipin Kumar. Mining needle in a haystack: classifying rare classes via two-phase rule induction. In SIGMOD ’01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data , pages 91–102, 2001. [LHC03] Richard D. Lawrence, Se June Hong, and Jacques Cherrier. Passenger-based predictive modeling of airline no-show rates. In Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 397–406, 2003. Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  24. 24. For Further Reading II [LSM98] Wenke Lee, Salvatore J. Stolfo, and Kui W. Mok. Mining audit data to build intrusion detection models. In Proceedings of the International Conference on Knowledge Discovery and Data Mining , 1998. [NGK+ 06] Scott A. Neslin, Sunil Gupta, Wagner Kamakura, Junxiang Lu, and Charlotte H. Mason. Defection detection: Measuring and understanding the predictive accuracy of customer churn models. Journal of Marketing Research, 43:204–211, 2006. [PP07] Animesh Patcha and Jung-Min Park. An adaptive sampling algorithm with applications to denial-of-service attack detection. In Proceedings of the IEEE International Conference on Computer Communications and Networks, pages 11–16, 2007. [PRA00] Edwin P. D. Pednault, Barry K. Rosen, and Chidanand Apte. Handling imbalanced data sets in insurance risk modeling. Technical Report RC-21731, IBM, 2000. [WGZ04] Wei Wang, Xiaohong Guan, and Xiangliang Zhang. A novel intrusion detection method based on principle component analysis in computer security. In Proceedings of the International Symposium on Neural Networks, pages 657–662, 2004. [WGZ08] Wei Wang, Xiaohong Guan, and Xiangliang Zhang. Processing of massive audit data streams for real-time anomaly intrusion detection. Computer Communications, 31(1):58 – 72, 2008. Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats
  25. 25. Title Item Sub-item Errin W. Fulp Prediction Methods for Mitigating Computer Security Threats

×