Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and                    Applications (IJE...
Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and                    Applications (IJE...
Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and                    Applications (IJE...
Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and                              Applica...
Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and                         Applications...
Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and                      Applications (I...
Upcoming SlideShare
Loading in …5



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.1495-1500 Performance Comparison Of Different Clustering Algorithms With ID3 Decision Tree Learning Method For Network Anomaly Detection Sonika Tiwari*, Prof. Roopali Soni** *(Department of Computer Science, OIST Bhopal) ** (Department of Computer Science, OIST Bhopal)ABSTRACTThis paper proposes a combinatorial method the signature of known attacks. Anomaly detectionbased on different clustering algorithms with ID3 keeps a profile of normal system behavior anddecision tree classification for the classification of interprets any significant deviation from this normalnetwork anomaly detection. The idea is to detect profile as malicious activity. One of the strengths ofthe network anomalies by first applying any anomaly detection is the ability to detect newclustering algorithm to partition it into a number attacks. Anomaly detection’s most serious weaknessof clusters and then applying ID3 algorithm for is that it generates too many false alarms. Anomalythe decision that whether an anomaly has been detection falls into two categories: superviseddetected or not. An ID3 decision tree is anomaly detection and unsupervised anomalyconstructed on each cluster. A special algorithm detection. In supervised anomaly detection, theis used to combine results of the two algorithms instances of the data set used for training the systemand obtain final anomaly score values. The are labelled either as normal or as specific attackthreshold rule is applied for making decision on type. The problem with this approach is that labelingthe test instance normality or abnormality. Here the data is time consuming. Unsupervised anomalywe are comparing the result performance of the detection, on the other hand, operates on unlabeledbest clustering algorithm for the detection of the data. The advantage of using unlabeled data is thatnetwork anomalies. The algorithms that we shall the unlabeled data is easy and inexpensive to obtain.apply here are k-mean algorithm, hierarchical The main challenge in performing unsupervisedclustering, expected maximization clustering. All anomaly detection is distinguishing the normal datathese algorithms are first applied on the data sets patterns from attack data patterns.consisting of a captured network ARP traffic to Recently, clustering has been investigated as onegroup them into a number of clusters and then approach to solving this problem. As attack databy applying ID3 decision tree classification on patterns are assumed to differ from normal dataeach of the clustering algorithm for the detection patterns, clustering can be used to distinguish attackof the network anomalies and compare the data patterns from normal data patterns. Clusteringperformance of each clustering algorithm. network traffic data is difficult because: 1. of high data volumeI. INTRODUCTION 2. of high data dimension It is important for companies to keep their 3. the distribution of attack and normal classes iscomputer systems secure because their economical skewedactivities rely on it. Despite the existence of attack 4. the data is a mixture of categorical andprevention mechanisms such as firewalls, most continuous datacompany computer networks are still the victim of 5. of the pre-processing of the data required.attacks. According to the statistics of CERT [1], thenumber of reported incidents against computer Network anomaly detectionnetworks has increased from 252 in 1990 to 21756 As we explained earlier, detectors needin 2000 and to 137529 in 2003. This happened models or rules for detecting intrusions. Thesebecause of misconfiguration of firewalls or because models can be built off-line on the basis of earliermalicious activities are generally cleverly designed network traffic data gathered by agents. Once theto circumvent the firewall policies. It is therefore model has been built, the task of detecting andcrucial to have another line of defence in order to stopping intrusions can be performed online. One ofdetect and stop malicious activities. This line of the weaknesses of this approach is that it is notdefence is intrusion detection systems (IDS). adaptive. This is because small changes in trafficDuring the last decades, different approaches to affect the model globally. Some approaches tointrusion detection have been explored. The two anomaly detection perform the model constructionmost common approaches are misuse detection and and anomaly detection simultaneously on-line. Inanomaly detection. In misuse detection, attacks are some of these approaches clustering has been used.detected by matching the current traffic pattern with One of the advantages of online modelling is that it is less time consuming because it does not require a 1495 | P a g e
  2. 2. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.1495-1500separate training phase. Furthermore, the model 3: else if all tuples in S have the same class valuereflects the current nature of network traffic. The thenproblem with this approach is that it can lead to 4: Return a leaf with that specific class value.inaccurate models. This happens because this 5: elseapproach fails to detect attacks performed 6: Determine attribute A with the highestsystematically over a long period of time. These information gain in S.types of attacks can only be detected by analysing 7: Partition S in m parts S(a1), ..., S(am) such thatnetwork traffic gathered over a long period of time. a1, ..., am are the different values of A.The clusters obtained by clustering network traffic 8: Return a tree with root A and m branches labeleddata off-line can be used for either anomaly, such that branch i contains ID3(R − {A} ,C,detection or misuse detection. For anomaly S(ai)).detection, it is the clusters formed by the normal 9: end ifdata that are relevant for model construction. Formisuse detection, it is the different attack clusters II. RELATED WORKthat are used for model construction. Paper: A Novel Unsupervised Classification Approach for Network Anomaly Detection by K Clustering is a division of data into groups Means Clustering and ID3 Decision Treeof similar objects. Each group, called cluster, Learning Methods[8].consists of objects that are similar amongst them anddissimilar compared to objects of other groups. Author: Yasser Yasami, Saadat Pour Mozaffari,Representing data by fewer clusters necessarily loses Computer Engineering Department Amirkabircertain fine details, but achieves simplification. It University of Technology (AUT) Tehran, Iran.represents many data objects by few clusters, and Abstract: This paper presents a novel host-basedhence, it models data by its clusters. combinatorial method based on k-Means clusteringCluster analysis is the organization of a collection of and ID3 decision tree learning algorithms forpatterns (usually represented as a vector of unsupervised classification of anomalous andmeasurements, or a point in a multidimensional normal activities in computer network ARP into clusters based on similarity. Patterns The k-Means clustering method is first applied towithin a valid cluster are more similar to each other the normal training instances to partition it into kthan they are to a pattern belonging to a different clusters using Euclidean distance similarity. An ID3cluster. It is important to understand the decision tree is constructed on each cluster.difference between clustering (unsupervised Anomaly scores from the k-Means clusteringclassification) and discriminate analysis algorithm and decisions of the ID3 decision trees are(supervised classification). In supervised extracted. A special algorithm is used to combineclassification, we are provided with a collection of results of the two algorithms and obtain finallabelled (preclassified) patterns; the problem is to anomaly score values. The threshold rule is appliedlabel a newly encountered, yet unlabeled, pattern. for making decision on the test instance normality orTypically, the given labeled (training) patterns are abnormality.used to learn the descriptions of classes which in turn Conclusion: The proposed method is compared withare used to label a new pattern. In the case of the individual k-Means and ID3 methods and theclustering, the problem is to group a given collection other proposed approaches based on markovianof unlabeled patterns into meaningful clusters. In a chains and stochastic learning automata in terms ofsense, labels are associated with clusters also, but the overall classification performance defined overthese category labels are data driven; that is, they five different performance measures. Results on realare obtained solely from the data [2,3,4]. evaluation test bed network data sets show that: the proposed method outperforms the individual k-ID3 Algorithm Means and the ID3 compared to the other The ID3 algorithm (Inducing Decision approaches.Trees) was originally introduced by Quinlan in [11] Paper: Privacy Preserving ID3 overand is described below in Algorithm 1. Here we Horizontally, Vertically and Grid Partitionedbriefly recall the steps involved in the algorithm. For Data [7].a thorough discussion of the algorithm we refer theinterested reader to [10]. Author: Bart Kuijpers, Vanessa Lemmens, BartRequire: R, a set of attributes. Moelans Theoretical Computer Science, HasseltRequire: C, the class attribute. University & Transnational University Limburg,Require: S, data set of tuples. Belgium.1: if R is empty then Abstract: This consider privacy preserving decision2: Return the leaf having the most frequent value in tree induction via ID3 in the case where the trainingdata set S. data is horizontally or vertically distributed. Furthermore, we consider the same problem in the 1496 | P a g e
  3. 3. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.1495-1500case where the data is both horizontally and factors: size of dataset, number of clusters, type ofvertically distributed, a situation we refer to as grid dataset and type of software used.partitioned data. We give an algorithm for privacy Conclusion: The main conclusion that can bepreserving ID3 over horizontally partitioned data concluded is the performance comparison ofinvolving more than two parties. For grid partitioned different clustering, we discuss two different evaluation methods Paper: Dynamic Network Evolution: Models,for preserving privacy ID3, namely, first merging Clustering, Anomaly Detection[9].horizontally and developing vertically or firstmerging vertically and next developing horizontally. Author: Cemal Cagatay Bilgin and B¨ulent YenerNext to introducing privacy preserving data mining Rensselaer Polytechnic Institute, Troy NY, 12180.over grid-partitioned data, the main contribution of Abstract: Traditionally, research on graph theorythis paper is that we show, by means of a focused on studying graphs that are static. However,complexity analysis that the former evaluation almost all real networks are dynamic in nature andmethod is the more efficient. large in size. Quite recently, research areas forConclusion: Here the datasets when partitioned studying the topology, evolution, applications ofhorizontally, vertically and after that the clustering complex evolving networks and processes occurringalgorithm is applied performs better performance in them and governing them attracted attention fromthan on the whole datasets. researchers. In this work, we review the significantPaper: A comparison of clustering method for contributions in the literature on complex evolvingunsupervised anomaly detection in network networks; metrics used from degree distribution totraffic[5]. spectral graph analysis, real world applications from biology to social sciences, problem domains fromAuthor: Koffi Bruno Yao. anomaly detection, dynamic graph clustering toAbstract: Network anomaly detection aims at community detection.detecting malicious activities in computer network Conclusion: Many real world complex systems cantraffic data. In this approach, the normal profile of be represented as graphs. The entities in thesethe network traffic is modelled and any significant system represent the nodes or vertices and links ordeviation from this normal profile is interpreted as edges connect a pair or more of the nodes. Wemalicious. While supervised anomaly detection encounter such networks in almost any applicationmodels the normal traffic behaviour on the basis of domain i.e. computer science, sociology, chemistry,an attack free data set, unsupervised anomaly biology, anthropology, psychology, geography,detection works on a data set which contains both history, engineering.normal and attack data. Clustering has recently beeninvestigated as one way of approaching the issues of III. Proposed SCHEMEunsupervised anomaly detection.Conclusion: The main goal of the paper has been to Algorithm :1 K-mean Clustering Algorithminvestigate the efficiency of different classical 1) Pick a number (K) of cluster centers (atclustering algorithms in clustering network traffic random)data for unsupervised anomaly detection. The 2) Assign every item to its nearest clusterclusters obtained by clustering the network traffic center (e.g. using Euclidean distance)data set are intended to be used by a security expert 3) Move each cluster center to the mean of itsfor manual labelling. A second goal has been to assigned itemsstudy some possible ways of combining these 4) Repeat steps 2, 3 until convergencealgorithms in order to improve their performance. (change in cluster assignments less than aPaper: Comparisons between Data Clustering threshold).Algorithms [6]. Algorithm : 2 Hierarchical ClusteringAuthor: Osama Abu Abbas Computer Science Bottom upDepartment, Yarmouk University, Jordan 1) Start with single-instance clustersAbstract: Clustering is a division of data into group s 2) At each step, join the two closest clustersof similar objects. Each group, called a cluster, 3) Design decision: distance between clustersconsists of objects that are similar between Top downthemselves and dissimilar compared to objects of 1) Start with one universal clusterother groups. This paper is intended to study 2) Find two clustersand compare different data clustering algorithms. 3) Proceed recursively on each subsetThe algorithms under investigation are: k-means 4) Can be very fastalgorithm, hierarchical clustering algorithm, self-organizing maps algorithm, and expectationmaximization clustering algorithm. All thesealgorithms are compared according to the following 1497 | P a g e
  4. 4. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.1495-1500Algorithm 3: Density Based Clustering IV. DATASET USED Here we are using a number of attributes 1) select a point p and contains a class attribute to find whether an 2) Retrieve all points density-reachable from p wrt  and MinPts.anamoly has been or not. 3) If p is a core point, a cluster is formed. Node {A, B, C} 4) If p is a border point, no points are density-reachable from pLoad DBSCAN and {Hign,Low} visits the next point of the database. Transmission {TCP, UDP} 5) Continue the prsocess until all of the points have been processed. Address {,} MacAlgorithm 4: Proposed ID3 Algorithm Class_anomaly {yes, no}Input Layer A,low,,udp,no A,low,,udp,no  Define P1, P2, …., Pn Parties.(Horizontally C,low,,udp,yes C,low,,udp,yes partitioned). A,high,,tcp,yes A,high,,tcp,yes  Each Party contains R set of attributes A1, A2, …., B,high,,tcp,yes B,high,,tcp,yes AR. A,low,,udp,no B,low,,tcp,yes  C the class attributes contains c class values C1, C2, C,low,,udp,yes A,low,,udp,no …., Cc. A,high,,tcp,yes C,low,,udp,yes  For party Pi where i = 1 to n do B,high,,tcp,yes A,high,,tcp,yes  If R is Empty Then B,low,,udp,yes B,high,,tcp,yes  Return a leaf node with class value A,low,,udp,no B,low,,udp,yes  Else If all transaction in T(Pi) have the same class A,high,,tcp,yes A,low,,udp,no Then B,low,,udp,yes A,high,,tcp,yes  Return a leaf node with the class value A,low,,udp,no C,high,,tcp,no A,high,,tcp,yes A,high,,tcp,yes  Else  Calculate Expected Information classify the given sample for each party Pi individually. A,low,,udp,no,cl A,low,,udp,no,cluste  Calculate Entropy for each attribute (A1, A2, …., AR) uster0 r0 of each party Pi. C,low,,udp,yes,c C,low,,udp,yes,clust  Calculate Information Gain for each attribute (A1, luster0 er0 A2,…., AR) of each party Pi A,high,,tcp,yes,c A,high,,tcp,yes,clust  Calculate Total Information Gain for each attribute luster1 er0 of all parties (TotalInformationGain( )). B,high,,tcp,yes,c B,high,,tcp,yes,clust  ABestAttribute  MaxInformationGain( ) uster1 er0  Let V1, V2, …., Vm be the value of attributes. A,low,,udp,no,cl B,low,,tcp,yes,cluste ABestAttribute partitioned P1, P2,…., Pn parties into m uster0 r0 parties C,low,,udp,yes,c A,low,,udp,no,cluste  P1(V1), P1(V2), …., P1(Vm) luster0 r0  P2(V1), P2(V2), …., P2(Vm) A,high,,tcp,yes,c C,low,,udp,yes,clust  . . luster1 er0  . . B,high,,tcp,yes,c A,high,,tcp,yes,clust  Pn(V1), Pn(V2), …., Pn(Vm) luster1 er0  Return the Tree whose Root is labelled ABestAttribute B,low,,udp,yes,c B,high,,tcp,yes,clust and has m edges labelled V1, V2, …., Vm. Such that luster0 er0 for every i the edge Vi goes to the Tree A,low,,udp,no,cl B,low,,udp,yes,clust  NPPID3(R – ABestAttribute, C, (P1(Vi), P2(Vi), …., uster0 er0 Pn(Vi))) A,high,,tcp,yes,c A,low,,udp,no,cluste  End. luster1 r0 B,low,,udp,yes,c A,high,,tcp,yes,clust luster0 er0 A,low,,udp,no,cl C,high,,tcp,no,cluste uster0 r1 A,high,,tcp,yes,c A,high,,tcp,yes,clust luster1 er0 1498 | P a g e
  5. 5. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.1495-1500 V. RESULT ANALYSIS In Table 2. Our proposed work performs less means square error as compared to the existing algorithm. ID3_Relative HP_Relative number_of_instances absolute error absolute error 14 60% NP 25 63.22% 60.13% 50 64.53% 63.24% 100 64.28% 63.63% 200 65.06% 64.28% Table 3. In Table 3. It was observed that the proposed algorithm has less absolute error than the existing algorithm. Clustering Time Mean Mean with (ms) absolute absolute proposed id3 error error K-mean with 47 0.0714 14.2105 % proposed id3 Hierarchical 31 0.0357 36.5854 % number_of_instances id3_time(ms) HP_time(ms) with proposed id3 14 78 15 EM with 43 0.0238 5.4119 % 25 93 15 proposed id3 50 110 16 Table 4. 100 125 31 As shown in the table 4 is the comparative study of different clustering algorithm with our 200 150 32 proposed algorithm. Table 1. As shown in Table 1. is the time needed for Clustering Time Mean Mean the decision of any dataset. It was observerd that the with existing (ms) absolute absolute existing id3takes more time as compared our id3 error error proposed work. K-mean with 65 0.0914 20.2105 % Where, existing id3 HP is the proposed horizontal partioned based ID3. Hierarchical 50 0.0557 45.5854 % Relative absolute error can be calculated with existing id3 as: EM with 60 0.0438 7.4119 % Mean squared error can be calculated existing id3 Table 5 as: VI. CONCLUSION with The clustering algorithms are used to Actual target values: a1 a2 … an divide any datasets into a number of clusters, this Predicted target values: p1 p2 … pn time clustering algorithms are combined with ID3 number_of_instan ID3 Mean HP Mean algorithm to detect the network anomaly detectionces absolute absolute and the performance is compared with the other error error clustering algorithms. The proposed algorithm implemented here provides a way of classifying and provides better leaning of the network anomalies 14 0.2857 NP and normal activities in computer network ARP 25 0.24 0.237 traffic. 50 0.24 0.24 100 0.23 0.22 200 0.235 0.23 Table 2. 1499 | P a g e
  6. 6. Sonika Tiwari, Prof. Roopali Soni / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue 5, September- October 2012, pp.1495-1500 noise. Proceedings of 2nd internationalREFERENCES Conference on Knowledge Discovery and [1] A comparative Study of Anomaly Data Mining, 1996. Detection Schemes in Network Intrusion [13] A. Wespi, G. Vigna and L.Deri. Recent Detection, A. Lazarevic, L. Ertoz, V. Advances in Intrusion Detection. 5th Kumar, A. Ozgur, J. Srivastava. International Symposium, Raid 2002 [2] Keogh E., Chakrabarti K., Pazzani M., Zurich, Switzerland, October 2002 and Mehrotra S., “Dimensionality Proceedings. Springer. Reduction for Fast Similarity Search in [14] G. Qu, S. Hariri, and M. Yousif, “A New Large Time Series Databases,” Knowledge Dependency and Correlation Analysis for and Information Systems, vol. 3, pp. 263- Features,” IEEE Trans. Knowledge and 286, 2001. Data Eng., vol. 17, no. 9, pp. 1199-1207, [3] Lepere R. and Trystram D., “A New Sept. 2005. Clustering Algorithm for Large [15] J. Kittler, M. Hatef, R.P.W. Duin, and J. Communication Delays,” in Proceedings Matas, “On Combining Classifiers,” IEEE of 16th IEEE-ACM Annual International Trans. Pattern Analysis and Machine Parallel and Distributed Processing Intelligence, vol. 20, no. 3, pp. 226-239, Symposium (IPDPS’02), Fort Lauderdale, Mar. 1998. USA, 2002. [4] Li C. and Biswas G., “Unsupervised Learning with Mixed Numeric and Nominal Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 4, pp. 673-690, 2002. [5] A comparison of clustering method for unsupervised anomaly detection in network traffic, Koffi Bruno Yao. [6] Comparisons between Data Clustering Algorithms,Osama Abu Abbas Computer Science Department, Yarmouk University, Jordan [7] Privacy Preserving ID3 over Horizontally, Vertically and Grid Partitioned Data,Bart Kuijpers, Vanessa Lemmens, Bart Moelans Theoretical Computer Science, Hasselt University & Transnational University Limburg, Belgium. [8] A Novel Unsupervised Classification Approach for Network Anomaly Detection by K Means Clustering and ID3 Decision Tree Learning Methods,Yasser Yasami, Saadat Pour Mozaffari, Computer Engineering Department Amirkabir University of Technology (AUT) Tehran, Iran. [9] Dynamic Network Evolution: Models, Clustering, Anomaly Detection,Cemal Cagatay Bilgin and B¨ulent Yener Rensselaer Polytechnic Institute, Troy NY, 12180., [10] Wenke Lee and S. J. Stolfo. Data Mining Approaches for Intrusion Detection, 1998. [11] Stefano Zanero and Sergio M. Savaresi. Unsupervised learning techniques for an intrusion detection system, ACM March 2004. [12] Martin Ester, Hans-Peter Kriegel,Jorg Sander,Xiaowei Xu. A density-based clustering algorithm for discovering Clusters in Large Spatial databases with 1500 | P a g e