Ms.R.S.Landge, Mr.A.P.Wadhe / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.430.435430 | P a g eReview of Various Intrusion Detection Techniques based on Datamining approachMs.Radhika S.Landge Mr.Avinash P.WadheM.E (CSE)App M..E (CSE)G.H.Raisoni College of Engineering G.H.Raisoni College of Engineering& Management,Amravati & Management,AmravatiAbstractOver the past several years, the Internetenvironment has become more complex anduntrusted. Enterprise networked systems areinevitably exposed to the increasing threats posedby hackers as well as malicious users internal to anetwork. IDS technology is one of the importanttools used now-a-days, to counter such threats.Various IDS techniques has been proposed,which identifies and alarms for such threats orattacks. IDS are an essential component of thenetwork to be secured. The traditional IDS areunable to manage various newly arising attacks.To deal with these new problems of networks,data mining based IDS are opening new researchavenues.. Data mining provides a wide range oftechniques to classify these attacks. The paperprovides a study on the various data miningbased intrusion detection techniques.1. INTRODUCTIONInternet is widely spread in each corner ofthe world; computers all over are exposed to diverseintrusions from the World Wide Web. To protect thecomputers from these unauthorized attacks,effective intrusion detection systems (IDS) need tobe employed. Traditional instance based learningmethods for Intrusion Detection can only detectknown intrusions since these methods classifyinstances based on what they have learned. Theyhardly detect the intrusions that they have notlearned before. Intrusion detection techniques are oftwo types namely; Misuse detection and Anomalydetection. Firewalls are used for intrusion detectionbut they often fail in detecting attacks that take placefrom within the organization. To overcome thisdrawback of firewalls, different data miningtechniques are used that handle intrusions occurringfrom within the organization. Data miningtechniques have been successfully used for intrusiondetection in different application areas likebioinformatics, stock market, web analysis etc.These methods extract previous unknown significantrelationships and patterns from large databases. Theextracted patterns are then used as a basis to identifynew attacks. Data Mining based IDS require lessexpert knowledge yet provides good performanceand security. These systems are capable of detectingknown as well as unknown attacks from thenetwork. Different data mining techniques likeclassification, clustering and association rule can beused for analyzing the network traffic and therebydetecting intrusions. .This paper gives a review ofvarious data mining based techniques for intrusiondetection as well as some proposed techniques andsystems.2. LITERATURE REVIEWIntrusion detection system plays animportant role in detecting malicious activities incomputer systems. The following discusses thevarious terms related to intrusion detection.Intrusion is a type of malicious activity that tries todeny the security aspects of a computer system. It isdefined as any set of actions that attempts tocompromise the integrity, confidentiality oravailability of any resource.i) Data integrity: It ensures that the data beingtransmitted by the sender is not altered during itstransmission until it reaches the intended receiver. Itmaintains and assures the accuracy and consistencyof the data from its transmission to reception.ii) Data confidentiality: It ensures that the data beingtransmitted through the network is accessible to onlythose receivers who are authorized to receive therespective data. It assures that the data has not beenread by unauthorized users.iii) Data availability: The network or a systemresource ensures that the required data is accessibleand usable by the authorized system users upondemand or whenever they need it.Intrusion detection is the process of monitoring andanalyzing the events occurring in a computer systemin order to detect malicious activities taking placethrough the network. ID is an area growing insignificance as more and more sensitive data arestored and processed in networked systems.Intrusion Detection system is a combination ofhardware and software that detects intrusions in thenetwork. IDS monitor all the events taking place inthe network by gathering and analyzing informationfrom various areas within the network. It identifiespossible security breaches, which include attacksfrom within and outside the organization and hencecan detect the signs of intrusions. The mainobjective of IDS is to alarm the system
Ms.R.S.Landge, Mr.A.P.Wadhe / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.430.435431 | P a g eadministrator whenever any suspicious activity isdetected in the network. In general, IDS makes twoassumptions about the data set used as input forintrusion detection as follows: i) The amount ofnormal data exceeds the abnormal or attack dataquantitatively. ii) The attack data differs from thenormal data qualit1atively.2.1. Major Types of AttacksMost intrusions occur via network by usingthe network protocols to attack their target systems.These kinds of connections are labeled as abnormalconnections and the remaining connections asnormal connections. Generally, there are fourcategories of attacks as follows:A. DoS – Denial of Service : Attacker triesto prevent legitimate users from accessing theservice in the target machine. For example: ping-of-death, SYN flood etc.B. Probe – Surveillance and probing :Attacker examines a network to discover well-known vulnerabilities of the target machine. Thesenetwork investigations are reasonably valuable foran attacker who is planning an attack in future. Forexample: port-scan, ping- sweep, etc.C. R2L – Remote to Local : Unauthorizedattackers gain local access of the target machinefrom a remote machine and then exploit the targetmachines vulnerabilities. For example: guessingpassword etc.D. U2R – User to Root: Target machine isalready attacked, but the attacker attempts to gainaccess with super-user privileges. For example:buffer overflow attacks etc .3. METHODOLOGY3.1. Techniques for Intrusion DetectionEach malicious activity or attack has aspecific pattern. The patterns of only some of theattacks are known whereas the other attacks onlyshow some deviation from the normal patterns.Therefore, the techniques used for detectingintrusions are based on whether the patterns of theattacks are known or unknown.The two main techniques used are:A. Anomaly Detection: It is based on theassumption that intrusions always reflect somedeviations from normal patterns. The normal state ofthe network, traffic load, breakdown, protocol andpacket size are defined by the system administratorin advance. Thus, anomaly detector compares thecurrent state of the network to the normal behaviorand looks for malicious behavior. It can detect bothknown and unknown attacks.B. Misuse Detection: It is based on the knowledgeof known patterns of previous attacks and systemvulnerabilities. Misuse detection continuouslycompares current activity to known intrusionpatterns to ensure that any attacker is not attemptingto exploit known vulnerabilities. To accomplish thistask, it is required to describe each intrusion patternin detail. It cannot detect unknown attacks.3.2. Advantages and Disadvantages of AnomalyDetection and Misuse DetectionThe main disadvantage of misuse detectionapproaches is that they will detect only the attacksfor which they are trained to detect. Novel attacks orunknown attacks or even variants of commonattacks often go undetected. The main advantage ofanomaly detection approaches is the ability to detectnovel attacks or unknown attacks against softwaresystems, variants of known attacks, and deviationsof normal usage of programs regardless of whetherthe source is a privileged internal user or anunauthorized external user. The disadvantage of theanomaly detection approach is that well-knownattacks may not be detected, particularly if they fitthe established profile of the user. Once detected, itis often difficult to characterize the nature of theattack for forensic purposes. Finally a high falsepositive rate may result for a narrowly traineddetection algorithm, or conversely, a high falsenegative rate may result for a broadly trainedanomaly detection approach.1.3 Need of Data Mining In Intrusion DetectionData Mining refers to the process of extractinghidden, previously unknown and useful informationfrom large databases. It is a convenient way ofextracting patterns and focuses on issues relating totheir feasibility, utility, efficiency and scalability.Thus data mining techniques help to detect patternsin the data set and use these patterns to detect futureintrusions in similar data. The following are a fewspecific things that make the use of data miningimportant in an intrusion detection system:i) Manage firewall rules for anomaly detection.ii) Analyze large volumes of network data.iii) Same data mining tool can be applied todifferent data sources.iv) Performs data summarization andvisualization.v) Differentiates data that can be used fordeviation analysis.vi) Clusters the data into groups such that itpossess high intra-class similarity and lowinter-class similarity.3.4. Data Mining Techniques for IntrusionDetection SystemsData mining techniques play an importantrole in intrusion detection systems. Different datamining techniques like classification, clustering,association rule mining are used frequently toacquire information about intrusions by observingand analyzing the network data. The followingdescribes the different data mining techniques:
Ms.R.S.Landge, Mr.A.P.Wadhe / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.430.435432 | P a g eA. Classification: It is a supervised learningtechnique. A classification based IDS will classifyall the network traffic into either normal ormalicious. Classification technique is mostly usedfor anomaly detection. The classification process isas follows: i) It accepts collection of items as input.ii) Maps the items into predefined groups or classesdefined by some attributes. iii) After mapping, itoutputs a classifier that can accurately predict theclass to which a new item belongs.B. Association Rule: This technique searches afrequently occurring item set from a large dataset.Association rule mining determines association rulesand/or correlation relationships among large set ofdata items. The mining process of association rulecan be divided into two steps as follows: i) FrequentItem set Generation Generates all set of items whosesupport is greater than the specified threshold calledas minsupport. ii) Association Rule GenerationFrom the previously generated frequent item sets, itgenerates the association rules in the form of ―ifthen‖ statements that have confidence greater thanthe specified threshold called as minconfidence. Thebasic steps for incorporating association rule forintrusion detection are as follows: i) The networkdata is arranged into a database table where eachrow represents an audit record and each column is afield of the audit records. ii) The intrusions and useractivities shows frequent correlations among thenetwork data. Consistent behaviors in the networkdata can be captured in association rules. iii) Rulesbased on network data can continuously merge therules from a new run to aggregate rule set of allprevious runs. iv) Thus with the association rule, weget the capability to capture behavior for correctlydetecting intrusions and hence lowering the falsealarm rate.C. Clustering: It is an unsupervised machinelearning mechanism for discovering patterns inunlabeled data. It is used to label data and assign itinto clusters where each cluster consists of membersthat are quite similar. Members from differentclusters are different from each other. Henceclustering methods can be useful for classifyingnetwork data for detecting intrusions. Clustering canbe applied on both Anomaly detection and Misusedetection. The basic steps involved in identifyingintrusion are follows : i) Find the largest cluster,which consists of maximum number of instancesand label it as normal. ii) Sort the remaining clustersin an ascending order of their distances to the largestcluster. iii) Select the first K1 clusters so that thenumber of data instances in these clusters sum up to¼`N and label them as normal, where ` is thepercentage of normal instances. iv) Label all otherclusters as malicious. v)After clustering, heuristicsare used to automatically label each cluster as eithernormal or malicious. The self-labeled clusters arethen used to detect attacks in a separate test dataset.From the three data mining techniques discussedabove clustering is widely used for intrusiondetection because of the following advantages overthe other techniques:i) Does not require the use of a labeled data set fortraining. ii) No manual classification of training dataneeds to be done. iii) Need not have to be aware ofnew types of intrusions in order for the system to beable to detect them.3.5. Where to do Intrusion DetectionAccording to the monitored system, thesource of input information can be on a host ornetwork or host and network. Thus IDS is furtherclassified into three categories as follows :i) Network-based intrusion detection system (NIDS)It is an independent platform that identifiesintrusions by examining network traffic andmonitors multiple hosts. Network intrusiondetection systems gain access to network traffic byconnecting to a network hub, network switchconfigured for port mirroring, or network tap.ii) Host-based intrusion detection system (HIDS)It consists of an agent on a host that identifiesintrusions by analyzing system calls, applicationlogs, file-system modifications (binaries, passwordfiles, capability databases, Access control lists, etc.)and other host activities and state. In a HIDS,sensors usually consist of a software agent.iii) Hybrid Intrusion detection system (Hybrid IDS)It complements HIDS system by the ability ofmonitoring the network traffic for a specific host; itis different from the NIDS that monitors all networktraffic . In computer security, a Network IntrusionDetection System (NIDS) is an intrusion detectionsystem that attempts to discover unauthorized accessto a computer network by analysing traffic on thenetwork for signs of malicious activity.3.6. New techniques introduced for IDS based ondata mining3.6.1 Multi Agent Based Approach For NetworkIntrusion DetectionIn a multi agent based approach is used fornetwork intrusion detection. An adaptive NIDS willbe used. Here more numbers of agents are usedwhich will be continuously monitoring the data tocheck for any intruder which might have entered inthe system. Each agent is trained accordingly so thatit can check for any type of intruder entering intothe system. There are five types of agent based onthree data mining techniques, which are clustering,association rules and sequential association rulesapproaches. The problem is that current NIDS aretuned specifically to detect known service levelnetwork attacks. Attempts to expand beyond this
Ms.R.S.Landge, Mr.A.P.Wadhe / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.430.435433 | P a g elimited realm typically results in an unacceptablelevel of false positives. At the same time, enoughdata exists or could be collected to allow networkadministrators to detect these policy violations.Unfortunately, the data is so volumous, and theanalysis process so time consuming, that theadministrators don’t have the resources to gothrough it all and find the relevant knowledge, savefor the most exceptional situations, such as after theorganization has taken a large loss and the analysisis done as part of a legal investigation. In otherwords, network administrators don’t have theresources to proactively analyze the data for policyviolations, especially in the presence of a highnumber of false positives that cause them to wastetheir limited resources. An adaptive NIDS based ondata mining techniques is proposed. However,unlike most of the current researches, which onlyone engine is used for detection of various attacks;the system is constructed by a multi-agent, whichare totally different in both training and detectionprocesses. After training with normal traffic for anetwork behavior, when new type of attack comes,the system can detect such anomaly bydistinguishing it from normal traffic .3.6.2 Intrusion detection using fuzzy logic anddata miningThe method extracts fuzzy classification rules fromnumerical data, applying a heuristic learningprocedure. The learning procedure initially classifiesthe input space into non-overlapping activationrectangles corresponding to different outputintervals.There is no overlapping and inhibitionareas. However, the disadvantage listed is, the highfalse positive rates which is the primary scaling ofall the IDS. Researcher describes the approaches toaddress three types of issues: accuracy, efficiency,and usability.First issues of improving accuracy isachieved by using data mining programs to analyzeaudit data and extract features that can distinguishnormal activities from intrusions. Second issue,efficiency is improved by analyzing thecomputational costs of features and a multiple-model cost-based approach is used to producedetection models with low cost and high accuracy.Third issue, improved usability, is solved by usingadaptive learning algorithms to facilitate modelconstruction and incremental updates; unsupervisedanomaly detection algorithms are used to reduce thereliance on labelled data. Researchers developed theFuzzy Intrusion Recognition Engine (FIRE) usingfuzzy sets and fuzzy rules. FIRE uses simple datamining techniques to process the network input dataand generate fuzzy sets for every observed feature.The fuzzy sets are then used to define fuzzy rules todetect individual attacks. FIRE does not establishany sort of model representing the current state ofthe system, but instead relies on attack specific rulesfor detection. Instead, FIRE creates and appliesfuzzy logic rules to the audit data to classify it asnormal or anomalous. Dickerson et al. found that theapproach is particularly effective against port scansand probes. The primary disadvantage to thisapproach is the labour intensive rule generationprocess. The research work shown by Figure 3.5 canbe considered as an extension of the above work byautomating the rule generation process.Figure 3.5 A ID model using neural networks andfuzzy logicThe model combines neural networks andfuzzy logic. This system works by mapping atemplate graph and user action graph to determinepatterns of misuse. The output of this mappingprocess will be used by the central strategic engineto determine whether an intrusion has taken place ornot. The major drawback is that new type attacksrules need to be given by the external securityofficer i.e. it does not automate rule generationprocess and more number of components prevents itfrom working fast. .3.6.3 Data Mining And Real Time IDSsEven though offline processing has anumber of significant advantages, data miningtechniques can also be used to enhance IDSs in realtime. Lee were one of the first to address importantand challenging issues of accuracy, efficiency, andusability of real-time IDSs. They implementedfeature extraction and construction algorithms forlabeled audit data. Eg. entropy, conditional entropy,relative entropy, information gain, and informationcost to capture intrinsic characteristics of normaldata and use such measures to guide the process ofbuilding and evaluating anomaly detection models.A serious limitation of their approaches (as well aswith most existing IDSs) is that they only dointrusion detection at the network or system level.However, with the rapid growth of e-Commerce ande-Government applications, there is an urgent needto do intrusion detection at the application-level.
Ms.R.S.Landge, Mr.A.P.Wadhe / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.430.435434 | P a g eThis is because many attacks may focus onapplications that have no effect on the underlyingnetwork or system activities.4. SURVEY OF APPLIED TECHNIQUESIn this section we present a survey of datamining techniques that have been applied to IDSs byvarious research groups.4.1.Machine LearningMachine Learning is the study of computeralgorithms that improve automatically throughexperience. Applications range from data miningprograms that discover general rules in large datasets, to information filtering systems thatautomatically learn users’ interests. In contrast tostatistical techniques, machine learning techniquesare well suited to Clustering and Classification areprobably the two most popular machine learningproblems. Techniques that address both of theseproblems have been applied to IDSs.4.1.1 Cla1ssification TechniquesIn a classification task in machine learning,the task is to take each instance of a dataset andassign it to a particular class. A classification basedIDS attempts to classify all traffic as either normalor malicious. The challenge in this is to minimizethe number of false positives (classification ofnormal traffic as malicious) and false negatives(classification of malicious traffic as normal). Fivegeneral categories of techniques have been tried toperform classification for intrusion detectionpurposes:a) Neural Networks : The application of neuralnetworks for IDSs has been investigated by anumber of researchers. Neural networks provide asolution to the problem of modeling the users’behavior in anomaly detection because they do notrequire any explicit user model. Neural networks forintrusion detection were first introduced as analternative to statistical techniques in the IDESintrusion detection expert system to model . Theresearcher McHugh have pointed out that advancedresearch issues on IDSs should involve the use ofpattern recognition and learning by exampleapproaches for one reason:• The capability of learning by example allows thesystem to detect new types of intrusion.A different approach to anomaly detection based onneural networks is proposed by Lee et al. Whileprevious works have addressed the anomalydetection problem by analyzing the audit recordsproduced by the operating system, in this approach,anomalies are detected by looking at the usage ofnetwork protocols.b) Fuzzy Logic : Fuzzy logic is derived from fuzzyset theory dealing with reasoning that isapproximate rather than precisely deduced fromclassical predicate logic. An enhancement of thefuzzy data mining approach has also been appliedby Florez et al. The authors use fuzzy data miningtechniques to extract patterns that represent normalbehavior for intrusion detection. Luo also attemptedclassification of the data using Fuzzy logic rules.c) Genetic Algorithm : Genetic algorithms wereoriginally introduced in the field of computationalbiology. Since then, they have been applied invarious fields with promising results. Fairlyrecently, researchers have tried to integrate thesealgorithms with IDSs.d) Support Vector Machine : Support vectormachines (SVMs) are a set of related supervisedlearning methods used for classification andregression. SVMs attempt to separate data intomultiple classes. Mukkamala, Sung, et al. used amore conventional SVM approach. They used fiveSVMs, one to identify normal traffic, and one toidentify each of the four types of malicious activityin the KDD Cup dataset. Eskin et al. and Honig etal. used an SVM in addition to their clusteringmethods for unsupervised learning. The achievedperformance was comparable to or better than bothof their clustering methods.4.1.2 Clustering TechniquesData clustering is a common technique forstatistical data analysis, which is used in manyfields, including machine learning, data mining,pattern recognition, image analysis andbioinformatics. Clustering is the classification ofsimilar objects into different groups, or moreprecisely, the partitioning of a data set into subsets(clusters), so that the data in each subset (ideally)share some common trait - often proximityaccording to some defined distance measure.Machine learning typically regards data clusteringas a form of unsupervised learning. Clustering isuseful in intrusion detection as malicious activityshould cluster together, separating itself from non-malicious activity. Clustering provides somesignificant advantages over the classificationtechniques already discussed, in that it does notrequire the use of a labeled data set for training.4.1.2 Existing SystemsIn this section, we present some of theimplemented systems that apply data miningtechniques in the field of Intrusion Detection.a) The MINDS System : The Minnesota IntrusionDetection System (MINDS), uses data miningtechniques to automatically detect attacks againstcomputer networks and systems. While the long-term objective of MINDS is to address all aspects of
Ms.R.S.Landge, Mr.A.P.Wadhe / International Journal of Engineering Research andApplications (IJERA) ISSN: 2248-9622 www.ijera.comVol. 3, Issue 3, May-Jun 2013, pp.430.435435 | P a g eintrusion detection, the system currently focuses ontwo specific issues:b) EMERALD (SRI) : EMERALD is a software-based solution that utilizes lightweight sensorsdistributed over a network or series of networks forreal-time detection of anomalous or suspiciousactivity. EMERALD sensors monitor activity bothon host servers and network traffic streams. Byusing highly distributed surveillance and responsemonitors, EMERALD provides a wide range ofinformation security coverage, real-time monitoringand response, protection of informational assets.c) IDSs in the Open Market: Various systems thatemploy data mining techniques have already beenreleased as parts of commercial security package.5.CONCLUSIONThe application of Data Mining inIntrusion Detection System is emerging trend in therecent years. The Data Mining techniques canextract characteristics of sample data, thus reducesthe difficulties involved in the collection of trainingdata. Thereby achieving the active defence forIntrusion Detection System. The traditionalIntrusion Detection System cannot do all of these. Itis necessary to describe this indeterminacy becausethe data of network traffic and host audit and thedetective process of Intrusion Detection System areindeterminable. This paper describes the distinctionof attack degree due to above reason. The DataMining plays a major role in wide variety of itsapplication areas. The sequence representationof data in network traffic is uncertain. There islimitation in the application of intrusion detectiontechnology. The flexibility of system is not good toanalyze the huge amount of data based uponproposed method. Still there is scope for research inthis area.REFERENCES Mrs. Sneha Kumari, Dr. ManeeshShrivastava “A Study Paper on IDS AttackClassification Using Various Data MiningTechniques” International Journal ofAdvanced Computer Research Volume-2Number-3 Issue-5 September-2012. Mitchell D’silva, Deepali Vora“Comparative Study of Data MiningTechniques to Enhance Intrusion Detection” International Journal of EngineeringResearch and Applications (IJERA) Vol. 3,Issue 1, January -February 2013. S.A.Joshi, Varsha S.Pimprale “NetworkIntrusion Detection System (NIDS) basedon Data Mining” International Journal ofEngineering Science and InnovativeTechnology (IJESIT) Volume 2, Issue 1,January 2013. Reema Patel, Amit Thakkar, Amit Ganatra“A Survey and Comparative Analysis ofData Mining Techniques for NetworkIntrusion Detection Systems’ InternationalJournal of Soft Computing and Engineering(IJSCE) Volume-2, Issue-1, March 2012 Ankita Agarwal “ Multi Agent BasedApproach For Network Intrusion DetectionUsing Data Mining Concept” Journal ofGlobal Research in Computer Science, 3(3), March 2012. Miss. Prajkta P. Chapke & Prof. A.B. Raut“ Intrusion Detection System using Fuzzylogic and Data Mining Technique”International Journal of AdvancedResearch in Computer Science andSoftware Engineering 2 (12), December –2012. Monali Shetty, Prof. N.M.Shekokar “DataMining Techniques for Real Time IntrusionDetection Systems” International Journalof Scientific & Engineering ResearchVolume 3, Issue 4, April-2012. Alok Ranjan, Dr. Ravindra S. Hegadi,Prasanna Kumara “Emerging Trends inData Mining for Intrusion Detection”International Journal of AdvancedResearch in Computer Science Volume 3,No. 2, March-April 2012.Ms.Radhika S. Landge hasreceived her B.E in computerScience & Engineering fromSant Gadgebaba AmravatiUniversity (SGBAU) andpersuing M.E (CSE) FromG.H.Raisoni College ofEngineering &Management,Amravati.Her research interest includesNetwork Security and Datamining.Prof. Avinash P. Wadhe:Received the B.E fromSGBAU Amravati universityand M-Tech (CSE) From G.HRaisoni College ofEngineering, Nagpur (anAutonomous Institute). He isCurrently an AssistantProfessor with the G.HRaisoni College ofEngineering and Management,Amravati SGBAU Amravatiuniversity.His research interestinclude Network Security , Datamining and Fuzzy system .Hehas contributed to more than 20research papers. He hadawarded with younginvestigator award ininternational conference.