Published on

IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275 Comparative Study of Data Mining echniques toEnhance Intrusion Detection Mitchell D’silva, Deepali Vora M.E. Information Technology, Vidyalankar Institute of Technology, Mumbai, India. Asst. Professor Information Technology Department, Vidyalankar Institute of Technology,Mumbai, India.Abstract Today, Intrusion Detection Systems learned before. Intrusion detection techniques arehave been employed by majority of the of two types namely; Misuse detection andorganizations to safeguard the security of Anomaly detection. Firewalls are used for intrusioninformation systems. Firewalls that are used detection but they often fail in detecting attacks thatfor intrusion detection possess certain take place from within the organization. Todrawbacks which are overcome by various overcome this drawback of firewalls, different datadata mining approaches. Data mining mining techniques are used that handle intrusionstechniques play a vital role in intrusion detection occurring from within the organization. Data miningby analysing the large volumes of network data techniques have been successfully used forand classifying it as normal or anomalous. intrusion detection in different application areas likeSeveral data mining techniques like bioinformatics, stock market, web analysis etc.Classification, Clustering and Association rules These methods extract previous unknownare widely used to enhance intrusion significant relationships and patterns from largedetection. Among them clustering is preferred databases. The extracted patterns are then used as aover classification since it does not require basis to identify new attacks. Data Mining basedmanual labelling of the training data and the IDS‘s require less expert knowledge yetsystem need not be aware of the new attacks. provides good performance and security. TheseThis paper discusses three different systems are capable of detecting known as wellclustering algorithms namely K-Means as unknown attacks from the network. DifferentClustering, Y-Means Clustering and Fuzzy C- data mining techniques likeMeans Clustering. K-Means clustering results classification, clustering and associationin degeneracy and is not suitable for large rule mining can be used for analyzing the networkdatabases. Y-Means is an improvement over K- traffic and thereby detecting intrusions. Amongmeans that eliminates empty clusters. Fuzzy the above, clustering algorithms are widely usedC-Means is based on the fuzzy logic that allows for intrusion detection as they do not requirean item to belong to more than one cluster manual classification of training data. This paperand concentrates on the minimization of the discusses the three clustering algorithms namelyobjective function that examines the quality of K-Means, Y-Means and Fuzzy C-Means describingpartitioning. The performance and efficiency how each technique overcomes the drawbacks ofof Fuzzy C- Means clustering is the best in the previous one. A comparison between the threeterms of intrusion detection than the other two clustering techniques is made that shows whichtechniques. Finally, the paper discusses the technique is the most efficient in terms of intrusioncomparison of the three clustering algorithms. detection.Keywords— Intrusion detection, K-Means II. INTRUSION DETECTION SYSTEMClustering, Y-Means Clustering, Fuzzy C-Means Intrusion detection system plays anClustering, Data Mining, Firewall. important role in detecting malicious activities in computer systems. The following discusses theI. INTRODUCTION various terms related to intrusion detection. Internet is widely spread in each corner ofthe world, computers all over are exposed to diverse A. Intrusionintrusions from the World Wide Web. To protect Intrusion is a type of malicious activitythe computers from these unauthorized attacks, that tries to deny the security aspects of a computereffective intrusion detection systems (IDS‘s) need system. It is defined as any set of actions thatto be employed. Traditional instance based learning attempts to compromise the integrity, confidentialitymethods for Intrusion Detection can only detect or availability of any resource [1].known intrusions since these methods classifyinstances based on what they have learned. They i) Data integrity: It ensures that the data beinghardly detect the intrusions that they have not transmitted by the sender is not altered during its 1267 | P a g e
  2. 2. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275transmission until it reaches the intended receiver. B. Probe – Surveillance and probingIt maintains and assures the accuracy and Attacker examines a network to discoverconsistency of the data from its transmission to well-known vulnerabilities of the target machine.reception. These network investigations are reasonably valuable for an attacker who is planning an attack inii) Data confidentiality: It ensures that the data future. For example: port-scan, ping- sweep, etc [2].being transmitted through the network is accessibleto only those receivers who are authorized to C. R2L – Remote to Localreceive the respective data. It assures that the data Unauthorized attackers gain local access ofhas not been read by unauthorized users. the target machine from a remote machine and theniii) Data availability: The network or a system exploit the target machine‘s vulnerabilities. Forresource ensures that the required data is accessible example: guessing password etc [2].and usable by the authorized system users upondemand or whenever they need it. D. U2R – User to Root Target machine is already attacked, butB. Intrusion Detection the attacker attempts to gain access with super-user Intrusion detection is the process of privileges. For example: buffer overflow attacks etcmonitoring and analyzing the events occurring in a [2].computer system in order to detect maliciousactivities taking place through the network. ID is an IV.TECHNIQUES FOR INTRUSIONarea growing in significance as more and more DETECTIONsensitive data are stored and processed in networked Each malicious activity or attack has asystems. specific pattern. The patterns of only some of the attacks are known whereas the other attacks onlyC. Intrusion Detection System show some deviation from the normal patterns. Intrusion Detection system is a Therefore, the techniques used for detectingcombination of hardware and software that detects intrusions are based on whether the patterns of theintrusions in the network. IDS monitor all the attacks are known or unknown. The two mainevents taking place in the network by gathering and techniques used are:analyzing information from various areas withinthe network. It identifies possible security breaches, A. Anomaly Detectionwhich include attacks from within and outside the It is based on the assumption thatorganization and hence can detect the signs of intrusions always reflect some deviations fromintrusions. The main objective of IDS is to alarm normal patterns. The normal state of the network,the system administrator whenever any suspicious traffic load, breakdown, protocol and packet sizeactivity is detected in the network. are defined by the system administrator in advance. Thus, anomaly detector compares the In general, IDS makes two assumptions current state of the network to the normal behaviorabout the data set used as input for intrusion and looks for malicious behavior. It can detect bothdetection as follows: known and unknown attacks.i) The amount of normal data exceeds the abnormal B. Misuse Detectionor attack data quantitatively. It is based on the knowledge of knownii) The attack data differs from the normal data patterns of previous attacks and systemqualitatively. vulnerabilities. Misuse detection continuously compares current activity to known intrusionIII. MAJOR TYPES OF INTRUSION patterns to ensure that any attacker is not attemptingATTACKS to exploit known vulnerabilities. To accomplish Most intrusions occur via network by using this task, it is required to describe each intrusionthe network protocols to attack their target systems. pattern in detail. It cannot detect unknown attacks.These kinds of connections are labeled as abnormalconnections and the remaining connections as V. NEED OF DATA MINING INnormal connections. Generally, there are four INTRUSION DETECTIONcategories of attacks as follows: Data Mining refers to the process of extracting hidden, previously unknown and usefulA. DoS – Denial of Service information from large databases. It extracts Attacker tries to prevent legitimate users patterns and concentrates on issues relating to theirfrom accessing the service in the target machine. It is a convenient way of extracting patterns andFor example: ping-of-death, SYN flood etc [2]. focuses on issues relating to their feasibility, utility, efficiency and scalability. Thus data mining 1268 | P a g e
  3. 3. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275techniques help to detect patterns in the data set and column is a field of the audit records.use these patterns to detect future intrusions in ii) The intrusions and user activities shows frequentsimilar data. The following are a few specific things correlations among the network data. Consistentthat make the use of data mining important in an behaviors‘ in the network data can be captured inintrusion detection system: association rules.i) Manage firewall rules for anomaly detection. ii) iii) Rules based on network data can continuouslyAnalyze large volumes of network data. merge the rules from a new run to aggregate rule setiii) Same data mining tool can be applied to of all previous runs.different data sources. iv) Thus with the association rule, we get theiv) Performs data summarization and visualization. capability to capture behavior for correctly detectingv) Differentiates data that can be used for deviation intrusions and henceanalysis. vi) Clusters the data into groups such lowering the false alarm rate.that it possess high intra-class similarity and lowinter-class similarity. C. Clustering It is an unsupervised machine learningVI. DATA MINING TECHNIQUES FOR mechanism for discovering patterns in unlabeledIDS data. It is used to label data and assign it into Data mining techniques play an important clusters where each cluster consists of members thatrole in intrusion detection systems. Different data are quite similar. Members from different clustersmining techniques like classification, clustering, are different from each other. Hence clusteringassociation rule mining are usedfrequently to methods can be useful for classifying network dataacquire information about intrusions by for detecting intrusions. Clustering can be appliedobserving and analyzing the network data. The on both Anomaly detection and Misuse detection.following describes the different data mining The basic steps involved in identifying intrusion aretechniques: follows [3]: i) Find the largest cluster, which consists ofA. Classification maximum number of instances and label it as It is a supervised learning technique. A normal.classification based IDS will classify all the ii) Sort the remaining clusters in an ascending ordernetwork traffic into either normal or malicious. of their distances to the largest cluster.Classification technique is mostly used for anomaly iii) Select the first K1 clusters so that the number ofdetection. The classification process is as follows: data instances in these clusters sum up to ¼`N andi) It accepts collection of items as input. label them as normal, where ` is the percentage ofii) Maps the items into predefined groups or normal instances.classes defined by some attributes. iv) Label all other clusters as malicious.iii) After mapping, it outputs a classifier that can v) After clustering, heuristics are used toaccurately predict the class to which a new item automatically label each cluster as either normal orbelongs. malicious. The self-labeled clusters are then used to detect attacks in a separate testB. Association Rule dataset. This technique searches a frequently From the three data mining techniques discussedoccurring item set from a large dataset. Association above clustering is widely used for intrusionrule mining determines association rules and/or detection because of the following advantages overcorrelation relationships among large set of data the other techniques:items. The mining process of association rule can be i) Does not require the use of a labeled data set fordivided into two steps as follows: training ii) No manual classification of training datai) Frequent Item set Generation needs to be done iii) Need not have to be aware ofGenerates all set of items whose support is greater new types of intrusions in order for the system tothan the specified threshold called as minsupport. be able to detect them.ii) Association Rule GenerationFrom the previously generated frequent item sets, it VII. CLUSTERING TECHNIQUES USEDgenerates the association rules in the form of ―if IN IDSthen‖ statements that have confidence greater than Several clustering algorithms have beenthe specified threshold called as minconfidence. used for intrusion detection. All these algorithms reduce the false positive rate and increase the The basic steps for incorporating detection rate of the intrusions. The detection rate isassociation rule for intrusion detection are as defined as the number of intrusion instancesfollows [3]: detected by the system divided by the totali) The network data is arranged into a database table number of intrusion instances present in the datawhere each row represents an audit record and each set. The false positive rate is defined as total 1269 | P a g e
  4. 4. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275number of normal instances that were The newly calculated mean is assigned as the newincorrectly classified as intrusions defined by the number of normal instances. Some of the vi) Repeat step (iii) until the cluster centroids do notclustering techniques such as K-Means Clustering, change. vii) Label the clusters as normal andY-Means Clustering and Fuzzy C-Means Clustering abnormal depending on the number of data items inare discussed below. each cluster.A. K-Means Clustering K-Means algorithm is a hard partitionedclustering algorithm widely used due to itssimplicity and speed. It uses Euclidean distance asthe similarity measure. Hard clustering means thatan item in a data set can belong to one and onlyone cluster at a time. It is a clustering analysisalgorithm that groups items based on their featurevalues into K disjoint clusters such that the items inthe same cluster have similar attributes and those indifferent clusters have different attributes. TheEuclidean distance function used tocompute the distance (i.e. similarity) betweentwo items is given as follows: (1) where, p = (p1, p2,…, pm) and q = (q1,q2,…,qm) are the two input vectors with mquantitative attributes. The algorithm is applied totraining datasets which may contain normal andabnormal traffic without being labeled previously.The main idea of this approach is based on theassumption that normal and abnormal traffic formdifferent clusters. The data may also containoutliers, which are the data items that areverydifferent from the other items in the clusterand hence do not belong to any cluster. An outlieris found by comparing the radiuses of the dataitems; that is, if the radius of a data item is greaterthan a given threshold, it is considered as an outlier.But this does not disturb the K-means clusteringprocess as long as the number of outliers is small. Figure 1: Flowchart of K-Means ClusteringK-Means Clustering Algorithm is as follows: An essential problem of the K-means clusteringi) Define the number of clusters K. For example, method is to determine an initial partition and theif K=2, we appropriate number of clusters K. It alsoassume that normal and abnormal traffic in the sometimes leads to degeneracy which means thattraining data form two different clusters. the clustering process may end up with someii) Initialize the K cluster centroids. This can be empty clusters. The figure 2 shown belowdone by randomly selecting K data items from the represents the generation of empty set.iii) Compute the distance from each item to thecentroids of all the clusters by using the Euclideandistance metric which isused to find the similarity between the items in dataset.iv) Assign each item to the cluster with the nearestcentroid. In this way all the items will beassigned to different clusters such that each clusterwill have items with similar attributes.v) After all the items have been assigned to differentclusters re-calculate the means of modified clusters. 1270 | P a g e
  5. 5. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275 Depending on the value of K specified by the user, the items in a data set are assigned to the nearest clusters depending on the distance between the item and the centroid of each cluster ii) Splitting clusters: According to the Cumulative Standardized Normal Distribution Function, 99% of the instances of the cluster lie within the circle of radius 2.32 σ where σ is the standard deviation of the data. Therefore the threshold t = 2.32 σ is chosen. The area within the circle is called as the Confident Area of the cluster. Thus, all the points in the cluster that lie outside the Confident Area are considered as outliers [5]. These outliers are then removed from their current clusters and assigned as new centroids. The newly formed centroids then may attract some items from its adjacent clusters and thereby form new clusters. The splitting of clusters will continue until no outlier exists. Splitting turns clusters into finer grains thereby increasing the number of clusters and making the items within the same cluster more similar to each other.Figure 2: Generation of Degeneracy [4]B. Y-Means Clustering Y-means is another clustering algorithmused for intrusion detection. This techniqueautomatically partitions a data set into a reasonablenumber of clusters so as to classify the data itemsinto normal and abnormal clusters. The mainadvantage of Y-Means clustering algorithm is that itovercomes the three shortcomings of K-meansalgorithm namely dependency on the initialcentroids, dependency on the number of clusters anddegeneracy. Y-means clustering eliminates thedrawback of empty clusters. The main difference Figure 3: Before splitting the clusters [4]between Y-Means and K- Means is that the numberof clusters (K) in Y-Means is a self- definedvariable instead of a user-defined constant. Ifthe value of K is too small, Y-Means increases thenumber of clusters by splitting clusters. On theother hand, if value of K is too large, it decreasesthe number of clusters by merging nearby clusters.Y-Means determines an appropriate value of Kby splitting and linking clusters evenwithout any knowledge of item distribution. Thismakes Y-means an efficient clustering technique forintrusion detection since the network log data israndomly distributed and the value of K is difficultto obtain manually. Y-means uses Euclideandistance to evaluate the similarity between twoitems in the data set. Y- Means clustering has 3main steps:i) Assigning items to K clusters: Figure 4: After splitting the clusters [4] 1271 | P a g e
  6. 6. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275iii) Linking clusters:When two adjacent clusters have an overlap, theycan be merged into a larger cluster. The mergingthreshold is set to2.32 σ i.e. whenever there are any data items in onecluster‘s Confident Area that also lie in anothercluster‘s Confident Area, the two clusters can bemerged [5]. When two clusters are merged theircentroids are kept intact and no new centroid iscreated i.e. the new cluster has twocentroids. Theadvantage is that the clusters can be of arbitraryshapes suchas chain-like shapes. Figure 6: Flowchart of Y-Means Clustering C. Fuzzy C-Means Clustering Fuzzy C-Mean (FCM) is an unsupervised clustering algorithm based on fuzzy set theory that allows an element to belong to more than one cluster. The degree of membership of each dataFigure 5: Linking clusters [4] item to the cluster is calculated which decides the cluster to which that data item is supposed toY-Means Clustering Algorithm is as follows: belong. For each item, we have a coefficient thati) Define the number of clusters K. specifies the membership degree (uij) of being inii) Initialize the K cluster centroids. This is done by the kth cluster as follows:randomly selecting K items from the data set.iii) Compute the distance from each item to thecentroids of all the clusters by using the Euclideandistance metric which is used to find the similarity (2)between items in data set.iv) Assign each item to the cluster with the nearest where, dij – distance of ith item from jth cluster dikcentroid. In this way all the items will beassigned to different clusters such that each cluster – distance of ith item from kth cluster andhas all items with similar attributes. m – fuzzification factorv) After all the items have been assigned to differentclusters re-calculate the means of modified clusters. The existence of a data item in more thanThe newly calculated mean is assigned as the new one cluster depends on the fuzzification value ―m‖centroid. defined by the user in the range of [0, 1] whichvi) Check if there is degeneracy. If yes then remove determines the degree of fuzziness in the cluster.the empty clusters and goto step (vii). Thus, the items on the edge of a cluster may be invii) Find the outliers, if any, of the cluster and then the cluster to a lesser degree than the items in thesplit the clusters. center of the cluster. When m reaches the value ofviii) Check if there is degeneracy. If yes then 1 the algorithm works like a crisp partitioningremove the empty clusters and goto step (ix). algorithm and for larger values of m theix) Link the overlapping clusters if any. overlapping of clusters tends to be more [2]. Thex) Label the clusters as normal and abnormal. main objective of fuzzy clustering algorithm is to 1272 | P a g e
  7. 7. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275partition the data into clusters so that the VIII. COMPARISON OF K-MEANS, Y-similarity of data items within each cluster is MEANS AND FUZZY C-MEANSmaximized and the similarity of data items in CLUSTERINGdifferent clusters is minimized. Moreover, it This section presents the comparison of themeasures the quality of partitioning that divides a three clustering techniques namely K-Means, Y-dataset into C clusters. FCM algorithm focuses Means and Fuzzy C-Means clustering. Theon minimizing the value of the following comparison is made by taking into account variousobjective function [6]: criteria like the performance, efficiency, detection rate, false positive rate, purity etc. Each technique has some good features to overcome the (3) drawbacks of the other technique and so on.where, m is any real number greater than 1 uij is the Table 1: Comparison of K-Means, Y-Means anddegree of membership of xi in the cluster j xi is the Fuzzy C- Means clustering techniquesith of d-dimensional measured data vj is the d- 2 Criteria K-Means Y-Means Fuzzy C-dimensional center of the cluster||*|| is any Meansnorm expressing the similarity between anymeasured data and the center. Input Number of Number of Number ofFuzzy C-Means Clustering Algorithm is as follows: clusters Kclusters K clusters c suchi) Randomly select ―c‖ cluster centers. such that such that that c<mii) Initialize the fuzzy membership matrix ‗uij‘ K<m K<musing: Set of data items Set of data Set of data (x1, x2,..,xm) items items (x1,x2,..,xm) (x1,x2,..,xm) Set of clusteriii) Calculate the fuzzy centers ‗vj‘ using: centers (v1,v2,..,vc)iv) Update the membership matrix i.e goto step(ii) Membership Does not Does not Membershipv) If ||u(k+1) - u(k) || < threshold then stop value exist exist value uijelse goto step (iii), where k is the iteration step. Output Set of K Set of non- Set of c clusters empty clusters where where each clusters each cluster has cluster haswhere each more similar similar items cluster has items similar items Computation Simple and Involves Involves the Time straight- splitting and calculation of forward solinking of several formulas requires lessclusters so so requires more time requires more time time Purity of Low High High clusterFigure 7: Flowchart of Fuzzy C-MeansClustering 1273 | P a g e
  8. 8. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275Criteria K-Means Y-Means Fuzzy C- Criteria K-Means Y-Means Fuzzy C- Means MeansEmpty May or may No No Dis- Need to Does not Performancecluster not generate advantages determine an measure thedepends on thegeneration appropriate quality of initial number number of clusters of clustersEfficiency Works well Works well Works well clusters for small datafor small as for small as Time Time sets well as well as large Degeneracy consuming consuming large data sets data sets Does not work well with non-Number of One One One or more globularclusters an than one cluster clustersitem belongs Does not measure theOverall Depends on Does not Depends on quality ofPerformance the initialdepend on the the initial clusters number ofinitial number number of clusters ―K‖ of clusters ―K‖ clusters ―c‖Shape of Works well Works well Works wellcluster for compact for both for both and globularglobular and globular and IX. CONCLUSION clusters nonglobular nonglobular Data mining techniques are widely used because of their capability to drastically improve the clusters clusters performance and usability of intrusion detection systems. Different data mining techniques like classification, clustering and associationDetection Highest Higher High rule mining are very helpful in analyzing theRate network data. Since large amount of network traffic needs to be collected for intrusion detection,False Lowest Low Lower clustering is more suitable than classification in thePositive Rate domain of intrusion detection as it does not require labeled data set thereby reducing manual efforts.Advantages Simple and No Measures the Data mining techniques can detect known as well as fast Degeneracy quality of unknown attacks. Data mining technology helps to partitioning understand normal behavior inside the data and use Works well for Does not this knowledge for detecting unknown intrusions. compact anddepend on the Items can Three clustering algorithms namely K-means, Y- globular initial number belong to more means and Fuzzy C-means have been discussed. Clusters of clusters K than one cluster Each of these has both advantages and disadvantages and are an improvement over the Works well Works well for other. Among these Fuzzy C-Means clustering can with globular globular and be considered as an efficient algorithm for intrusion and nonglobular detection since it allows an item to belong to more nonglobular clusters than one cluster and also measures the quality of clusters partitioning. The technique can be used for large data sets as well as data sets that have overlapping items. Moreover it does not generate any empty clusters and has the highest purity thereby creating clusters that consists of highly similar items. The main advantage of Fuzzy C-Means clustering for intrusion detection is the high detection rate and lower false positive rate that it offers. Although Fuzzy C-Means is an efficient technique it is time consuming. The performance of intrusion detection 1274 | P a g e
  9. 9. Mitchell D’silva, Deepali Vora / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.1267-1275systems can be still improved by combining the Engineering Research and Applicationsfeatures of Fuzzy C- Means clustering technique (IJERA), pp.961-964, ISSN: 2248-9622with some other technique so that it reduces the Vol. 2, Issue 2, Mar-Aprtime required by Fuzzy C-Means for the clustering 2012.process and also increases the detection rate and [13] Gerhard Munz and Sa Li, Georg Carle,decreases the false positive rate thereby making the ―Traffic Anomaly Detection Using K-intrusion detection system more accurate and Means Clustering‖,effective. [14] Raymond Chi-Wing Wong and Ada Wai- Chee Fu, ―Association Rule Mining and itsREFERENCES Application to MPIS‖ [1] Ye Qing, Wu Xiaoping and Huang Gaofeng, ―An Intrusion Detection Approach based on Data Mining‖, 2nd International Conference on Future Computer and Communication, pp. no. 695 – 698, 2010 IEEE. [2] Disha Sharma, ―Fuzzy Clustering as an Intrusion Detection Technique‖, International Journal of Computer Science & Communication Networks, pp. no. 69 – 75, Vol 1 (1), September-October 2011. [3] Deepthy K Denatious and Anita John, ―Survey on Data Mining Techniques to Enhance Intrusion Detection‖, International Conference on Computer Communication and Informatics, Jan 2012. [4] Yu Guan, ―Y-Means: A Dynamic Clustering Algorithm‖. [5] Sivanadiyan Sabari Kannan, ―Y-means Clustering vs N-CP Clustering with canopies for Intrusion Detection‖. [6] Ming-Chuan Hung and Don-Lin Yang, ―An Efficient Fuzzy C-Means Clustering Algorithm‖, pg. no. 225-232, 2001 IEEE. [7] Yu Guan and Ali A. Ghorbani, Nabil Belacel, ―Y-Means: A Clustering Method for Intrusion Detection‖, CCECE 2003 CCGEI 2003, Montreal May 2003 IEEE. [8] E. Kesavulu Reddy, V. Naveen Reddy and P. Govinda Rajulu ―A Study of Intrusion Detection in Data Mining‖, Proceedings of the World Congress on Engineering, July 2011, Vol III, London, UK. [9] Reema Patel, Amit Thakkar and Amit Ganatra, ―A Survey and Comparative Analysis of Data Mining Techniques for Network Intrusion Detection Systems‖, International Journal of Soft Computing and Engineering (IJSCE), pp. no. 265 – 271, ISSN: 2231-2307, Volume- 2, Issue-1, March 2012. [10] Intrusion Detection Systems: Definition, Need and Challenges [11] Harley Kozushko, ―Intrusion Detection: Host-Based and Network-Based Intrusion Detection Systems‖, Independent Study, September 11, 2003. [12] Manish Joshi, ―Classification, Clustering and Intrusion Detection System‖, International Journal of 1275 | P a g e