Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NETWORK FORENSIC ANALYSIS

Network forensics is a security infrastructure, and becomes the research focus of forensic investigation. However many challenges still exist in conducting network forensics: network has produced large amounts of data; the comprehensibility of evidence extracting from collected data; the efficiency of evidence analysis methods, etc. To solve these problems, in this paper we develop a network intrusion forensics system based on transductive scheme that can detect and analyze efficiently computer crime in networked environments, and extract digital evidence automatically. At the end of the paper, we evaluate our method on a series of experiments on KDD Cup 1999 dataset. The results demonstrate that our methods are actually effective for real-time network forensics, and can provide comprehensible aid for a forensic expert.

  • Be the first to comment

  • Be the first to like this

A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NETWORK FORENSIC ANALYSIS

  1. 1. A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NETWORK FORENSIC ANALYSIS BY: AKSHAYA ARUNAN M1 NE [IT] GECBH 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 1
  2. 2. OUTLINE  Objective  Introduction  Literature Survey  Proposed System  Conclusion  Reference 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 2
  3. 3. OBJECTIVE To develop a Network Intrusion Forensics System based on “transductive scheme” that can detect and analyze efficiently computer crime extract digital evidence 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 3
  4. 4. INTRODUCTION Rapid development of network connectivity Complexity and growth Increase in the number of crimes System connected are potential candidates for the malicious attack 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 4
  5. 5. These attacks can affect: physical or digital assets funds consumer confidence national security loss of life 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 5
  6. 6. Network Forensics Goal: To discover the source of security breaches or other information assurance problems [1]. Evidence is captured from networks Interpretation is substantially based on knowledge of network attacks Allows us to make forensic determinations based on the observed traffic [2] 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 6
  7. 7. LITERATURE SURVEY Tcpdump [4],[5] Wireshark[5] Artificial Neural Network[1] Support Vector Machine[5],[6] 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 7
  8. 8. tcpdump A free source common packet analyzer that runs under the command line. Few functions: Prints the contents of network packets Display TCP/IP and other packets being transmitted or received Can read packets from a network interface card Can write packets to standard output or a file 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 8
  9. 9. Wireshark Wireshark is a free and open source packet analyzer. Wireshark is similar to TCP Dump, but has a graphical front-end, plus some integrated sorting and filtering options. It is used for network troubleshooting analysis software and communications protocol development educational purpose 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 9
  10. 10. Artificial Neural Network [1] An ANN is an interconnected group of nodes, akin to the vast network of neurons in a brain.  They can be used to infer a function from: observations data processing Example: Robotics etc. 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 10
  11. 11. 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 11 INPUT HIDDEN OUTPUT In the figure, each node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another.
  12. 12. Support Vector Machine [5], [6] Constructs a hyperplane or a set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression, or other tasks. Supervised learning models Analyze data and recognize patterns Hyperplane: It is a subspace of one dimension less than its ambient space 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 12
  13. 13. Disadvantages  ANN and SVM: They were designed to find features for network forensics These methods are effective in reducing the processing-time But are insufficient in forensic analysis tcpdump and Wireshark These tools are designed to help debug network problems, but not special for forensic analysis 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 13
  14. 14. PROPOSED SYSTEM First, we propose an efficient TCM-KNN[3] based inference technology It is much more effective than single, multiple traffic threshold Second, to boost the real-time network forensic performance of TCM-KNN simulated annealing (SA) algorithm[10] Reduce the computational cost More suitable in real network environment 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 14
  15. 15. Transductive Confidence Machines for K-Nearest Neighbors Commonly used machine learning and data mining method Effective in fraud detection, pattern recognition and outlier detection The confidence measure used in TCM is based upon universal tests for randomness or their approximation 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 15
  16. 16. Transductive scheme based network forensic We develop a network intrusion forensics system based on transductive scheme (NIFSTC) that can detect and analyze efficiently network crime, and digital evidence 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 16
  17. 17. NIFSTC consists of the following components: Network Traffic Capturer Instance Selection and Feature Extractor TCMKNN Based Network Forensic Analyzer Evidence Analyzer 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 17
  18. 18. NIFSTC system architecture 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 18
  19. 19. Traffic capturer The first step of NIFSTC system Network traffic capture Preparation for traffic analysis Provides the base information for other components of the forensics system The traditional packet capture library, Libpcap[4] provides implementation independent access to the underlying packet capture facility 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 19
  20. 20.  Problems while using Libcap: While heavy traffic network - captured data is transferred by the kernel to the user processes with system call and memory copy. In a high throughput network - the total amount of valuable CPU cycles is non- ignorable. The system overhead- too many operations of memory copy will consume a large amount of CPU and memory resources. 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 20
  21. 21. In order to improve the packet capture performance of the NIFSTC, it is necessary to reduce the intermediate steps during packet transmission, bypass the OS kernel and eliminate kernel’s memory copy. 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 21
  22. 22. An efficient user-level packet capture mechanism based on semi-polling driven technique [7,8]. Semi polling - With the semi-polling driven mechanism, 1) interrupts frequency is lowered 2) processing performance for short message is significantly ameliorated 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 22
  23. 23. TCM-KNN based network forensic analyzer TCM-KNN is an algorithm combining TCM [9] and KNN algorithm effectively In the KNN algorithm, we denote the sorted sequence (in ascending order) of the distances of point “i”, from the other points, with the same classification “y” as In this paper, we use Euclidean distance to calculate the distances between points 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 23 𝐷𝑖 𝑦
  24. 24.  We assign to every point a measure called the individual strangeness measure  This measure defines the strangeness of the point in relation to the rest of the points  In our case the strangeness measure for a point I belonging to a normal class is defined as:  = Ʃ D (1)   computed for an anomaly  D will stand for the jth shortest distance in this sequence  k is the number of neighbors used 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 24 ik J=1 ij
  25. 25. Equation (1) to compute the p-value as follows: p( ) = #{i:  ≥  } (n+1)  # denotes the cardinality of the set   is the strangeness value for the test point   is among the j largest occurs with probability of at most j/n+1.  p value – non universal tests (Proedru et al) - a measure of how well the data supports or not a null hypothesis – should be smaller to get greater evidence 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 25 new i new (2) new
  26. 26. Feature extractor Extracting features on the “network traffic” captured by Traffic Capturer component. A group of features is a kind of data structure characterizing network traffic. The data structure for network event analysis is the connection log. Some of the secondary attributes are 1) TCP flags 2) connection duration 3) volume of data passed in each direction 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 26
  27. 27. Simulated annealing based instance selection A local search technique simulating the physical process of “annealing”[10].  Deals with highly non–linear problems. Begins a random solution, and in the next neighborhood search for each step of the process. Moves are controlled by some probability function. The acceptance of a downhill depends on reduction in the value of the objective function size of the search time 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 27
  28. 28. Selects the most contributing examples and omits useless fitness function. To apply SA, two important problems should be addressed: Specification of the representation of the solutions Definition of the fitness function 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 28
  29. 29. 1) Representation: Training dataset - TR with instances. Search space associated with the instance selection of TR is constituted by – Subsets of TR Eg: chromosomes - subsets of TR - Uses a binary representation A chromosome consists of genes with two possible states: 0 and 1 If 1, then its associated instance is included in the subset of TR represented by the chromosome. If 0, then this does not occur. Result: Selected chromosomes would be the reduced training dataset for TCM- KNN. 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 29
  30. 30. 2) Fitness function: Let F(X) be a subset of instances of TR to evaluate and be coded by a chromosome. Three measures to be seriously considered: TP FP Percentage of training dataset reduction Thus, Fitness function combines three values: the detect_rate associated with fal_rate reduce_rate of instances of with regards to TR F(x)=C * (detect_rate - fal_rate) +(1-C) * reduce_rate (3) 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 30
  31. 31. reduce rate =|TR|-|S | * 100 (4) |TR| |TR| - the number of the original training dataset and |S| - the reduced training dataset using SA C - an adjustment constant set by experiences The objective of the SA is to maximize the fitness function defined maximize detection rate minimize the number of instances obtained as well as FP rate 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 31
  32. 32. Evidence analyzer Can connect distant, and incomplete abnormal events A set of evidence analyzing utilities can examine different aspects of correlated events in an efficient way Then utilities are formed into NIFSTC system Evidence analyzer uses two work modes: 1) count mode or 2) weighted analysis mode 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 32
  33. 33. Evidence analyzer results in undirected evidence graph Value of the attribute - nodes in graph Node size - different weight Edges - a relationship between two attribute values. An evidence graph is shown in figure. 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 33
  34. 34. 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 34 Evidence Graph
  35. 35. CONCLUSION TCM- KNN is the most modern and precise algorithm to detect the network crimes and analyze the forensic data. Evidence analyzer gives the package of number of evidences and corresponding weighted values. 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 35
  36. 36. REFERENCES 1) S Mukkamala, A.Sung, - ‘’Identifying significant features for network forensic analysis using artificial intelligent techniques’’ - Int’l Journal of Digital Evidence[2003] 2) M.I. Cohen. PyFlag‚ - “An advanced network forensic framework” - Digital Investigation (Elsevier Journal) [2008] 3) Y. Li, L. Guo, - “An active learning based TCM-KNN algorithm for supervised network intrusion detection” – Computers Security (Elsevier Journal) [2007] 4) Libpcap – http://www.tcpdump.org/release/libcap-0.7.2.tar.gz, [2002] 5) Wikipedia – www.wikipedia.com 6) E. Eskin, A. Arnold, M, Prerau, L. Portnoy, S. Stolfo. – “A geometric framework for unsupervised anomaly detection: detecting intrusions in unlabeled data” - D. Barbara and S. Jajodia (editors), Applications of Data Mining in Computer Security, Kluwer, [2002] 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 36
  37. 37. 7) ZH Tian, BX Fang, XC Yun, - “User-Level message passing mechanism based on semi- polling driven in RTLinux” - Journal of Software [2004] 8) ZH Tian, MZ Hu, B Li., - “Semi-Polling Based Interrupt Mitigation for High Performance Packet Processing” - High Technology Letters [2005] 9) A. Gammerman, V. Vovk, - “Prediction algorithms and confidence measure based on algorithmic randomness theory”, - Theoretical Computer Science[2002] 10) Aarts, E. and van Laarhoven, - “ Simulated anealing: A pedestrian review of the theory and some applications”, in J. Kittler and P.A. Devijver (Eds.) - Pattern Recognition and Applications, Springer-Verlag, Berlin[1987] 22-Jul-16 A TRANSDUCTIVE SCHEME BASED INFERENCE TECHNIQUES FOR NFA 37

×