Published on

IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Indr Jeet Rajput, Deshdeepak Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1752-1755 Data Mining Based Database Intrusion Detection System: A Survey *Indr Jeet Rajput, **Deshdeepak Shrivastava *Computer Engineering Department, Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat India-395007 ** ITM Universe, Gwalior, IndiaAbstract Significant security problem for 2. CURRENT USED TECHNIQUESdatabase system is unfriendly encroach a user or Traditional method for intrusion detectionsoftware. Intruder is one of the most publicized based on signature based method. For this extensivethreats to security. Database is a most valuable knowledge of signature of previously known attacksasset of any organization or companies. This is necessary. In this monitors events are matched withpaper presents the feature of data mining based the signature to detect intrusion. The feature is extractdatabase intrusion detection system. In addition from various audit database and then comparing thesethe paper gives general guidance for open feature with to a set of attack signature provide byresearch area and future direction. The objective human expert for intrusion detection. The signatureof this survey is to give the researcher a broad database has to be manually updated for each newoverview of the work that has been done at the type of intrusion that is detected.collaboration between intrusion detection and The important boundaries with suchdata mining. method are that they can not detect novel attack that has a no previous signature. Another issue is that itKeywords: Data mining, Clustering, Intrusion need extensive training data for the associatedDetection system, Anomaly Detection, Classification, technique and it may be possible they producing lotsAssociation. of false alarm. These all problem guide to an increasing interest in intrusion detection technique1. INTRODUCTION that based on data mining Data mining has attracted a lot of attention Intrusion detection technique can bedue to increased, generation, transmission and classified into Anomaly and misuse detection. Thestorage of hugs volume data and an imminent need anomaly detection takes its decision on profile of afor extracting useful information and knowledge user normal behavior. It analyzed user’s currentfrom them. In recent year’s research have started session and compares it with the profile representinglooking into the possibility of using data mining his normal behavior. An alarm raised if significanttechniques in the emerging field of computer security deviation is found during the comparison of sessionespecially in the challenging problem of intrusion data and user profile. This type of system is useful fordetection. Intrusion is commonly defined as a set of detection of previously unknown attacks. In contrast,actions that attempt to violate the integrity, misuse detection model takes decision based onconfidentiality or availability of a system. Intrusion comparison of user’s session or commands with thedetection is the process of finding important events rules or signature of attach previously used byoccurring in a computer system and analyzing them attacks. The main advantage of misuse detection isfor possible presence of intrusion. Intrusion detection that it can accurately and efficiently detectis a second line of defense, when all the prevention occurrence of known attacks.technique is compromised and an intrusion haspotentially entered into the system. In general, that 3. DATA MINING TECHNIQUEare two types of attacks: (i) Inside attack are the ones 3.1 Misuse detection or Signature based: Inin which an intruder has all the privilege to access the signature based approach a signature of known attackapplication or the system, but it perform malicious is generated. The generated attack signature has beenactions. (ii) Outside attack are the ones in which the kept in for intrusion detection. A signature is aintruder does not have proper rights to access the feature of an intruder. This approach detects onlysystem. Detecting inside attack is usually more known attacks. The problem with this approach isdifficult compare to outside attack. that it is not capable to detect new attack introduced by intruder that has a no signature in database. In signature based false negative alarm rate increase. Chung et al. [1] present DEMIDS, misuse detection 1752 | P a g e
  2. 2. Indr Jeet Rajput, Deshdeepak Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1752-1755system for relational database systems. This method can be used as a training database. It is generatingassumes that the legitimate users show some level of profile of either user level, role level, or organizationconsistency in using the database system. If this level for intrusion detection. The database schemaassumption does not hold, it results in a large number and the purpose to achieve a meaningful task wouldof false positives. Lee et al. [2] designed a signature- make the user follow a particular sequence ofbased database intrusion detection system (DIDS) database operations. Therefore, sequence can be awhich detects intrusions by matching new SQL effective way of representing user profile. When astatements against a known set of transaction new transaction come it generate the profile usingfingerprints. However, generating the complete set of same granularity that can be used for trainingfingerprints for all transactions and maintaining its database development, if any deviation found inconsistency is a rigorous activity. Moreover, if any of newly generated profile than that transaction to bethe legitimate transaction fingerprints are missing, it consider as a malicious transaction. The newlycan cause many false alarms. The main problem with generated sequence compared with the sequencesthis approach is that it is difficult to ensure that the stored for normal profile for identifying maliciousfingerprints thus learned are indeed precise and transaction. It uses active window of a size of equalcomplete. to the sequence to be taken for comparison and Alarm threshold which decided the sequence is valid3.2 Anomaly or Profile based: In profile based or malicious. The active window and alarm thresholdintrusion detection approach, a profile of normal user are the parameter to decide the efficiency of theis used for intrusion detection. This approach is concept. The problem with this concept is how tosuitable for finding unknown attack in database. The decide the value of this parameter.profile of normal user is stored in database forintrusion detection. The problem with this approach 3.3 Association rule or dependency mining:is it requires more training data set. In this approach Association refers to the correlation between items infalse negative alarm increased. Another problem is a transaction. This approach work on datathat significant time and effort is required for dependency, in which one item is modify anothertraining. Zhong et al. [3] use query templates to mine item refer with this also modify. Hu et al [6]user profiles. They developed an elementary determine dependency among data items where datatransaction-level user profiles. A constrained query dependency refers to the access correlations amongtemplate is a four tuple <OP,F, T, C> where OP is the data items. These data dependencies are generated intype of the SQL query, F is the set of attributes, T is the form of classification rules, i.e., before one datathe set of tables, and C is the constrained condition item is updated in the database, which other dataset. There is, however, no provision for handling items probably need to be read and after this datavarious levels of granularity of access in their query item is updated, which other data items are mosttemplate. Bertino et al. [4] proposed a database IDS likely to be updated by the same transactions.that has similarity with role-based access control Transactions that do not follow any of the mined data(RBAC) model in profile granularity. They use c- dependency rules are marked as malicioustuple (represented by query type, number of tables transactions. The problem with this concept is thataccessed, number of attributes accessed), m-tuple they consider only those attribute that appeared more(query type, accessed table name, number of frequently either they are sensitive or not. They treatattributes accessed from individual table), and f-tuple all the attributes at the same level and of equal(querytype, accessed table name, accessed attribute importance, which is not always the case in realname) to represent a query. They build role profiles applications. In this approach there is no concept forusing a classifier which is then used to detect attribute sensitivity. Some attribute may be accessedanomalous behavior. The approach Presented in [13] less frequently but their modification made a moreis query based which does not detect transaction level inconsistency in database. Wang et al [7] havedependency resulting some of the database attacks proposed a weighted association rule miningmay undetected. It can easily detect the attributes technique in which they assign numerical weights towhich are to be referred together, but it cannot detect each item to reflect interest/intensity of the itemthe queries which are to be executed together. These within the transaction It is calling as weightedproblems resume by Rao et al. [11], this approach association rule (WAR).WAR not only improves theextracts the correlation among queries of the confidence of the rules, but also provide a mechanismtransaction. In this approach database log is read to to do more effective target marketing by identifyingextract the list of tables accessed by transaction and or segmenting customers based on their potentiallist of attributes read and written by transaction. degree of loyalty or volume of purchases They firstKundu et al. [5], who propose an IDS that uses inter- ignore the weight and find the frequent itemsets fromtransactional as well as intra-transactional features the unweighted data and then introduce weight duringfor intrusion detection. It supports selection of profile rule generation. Problem with this concept is thatand transactional feature granularity as well. In this they consider weight of attribute only rule generation.approach a profile of normal user is generated, that The rule is generated by using frequent itemset, if 1753 | P a g e
  3. 3. Indr Jeet Rajput, Deshdeepak Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1752-1755some attribute access less frequently they can’t accessed infrequently may not be captured at all inconsider in frequent itemsets, that is no rule for such the dependency rules. Use of weighted data miningattribute. If this attribute is more sensitive then algorithm reduces the problem to some extent butmodification made by some intruder they can’t be cannot obliterate it. The major drawback of thisdetected. Tao et al [8] use weighted support for detection method is that the weights of attributesdiscovering the significant itemsets during the must be assigned manually.frequent itemset finding phase. In this paper addressthe issues of discovering significant binary 3.4 Clustering Based: Cluster analysis divide datarelationships in transaction datasets in a weighted into meaningful or useful clusters in such a way thatsetting Traditional model of association rule mining intra-cluster similarity maximized while inter-clusteris adapted to handle weighted association rule mining similarity minimized It is a unsupervised learningproblems where each item is allowed to have a technique. It is basically four types: (i) Partitioningweight In order to tackle this challenge, we made Algorithm: in which two approach k-means and k-adaptation on the traditional association rule mining medoid. The k-means work for numerical data set.model under the “significant – weighted support” The main problem with k-means approach first ismetric framework instead of the “large – support” how to decide k values and second is determining theframework used in previous works. In this new optimal number of clusters [12]. The k-medoid worksproposed model, the iterative generation and pruning for categorical data sets. (ii)Hierarchical Algorithm:of significant itemsets is justified by a “weighted In hierarchical clustering data are not gets clustereddownward closure property”. Srivastava et al [9] at ones instead stepwise procedures are followed forhave recently proposed the use of weighted clustering the datasets. It resolve the problem of howassociation rule mining for speeding up web access to decide k value, but it has a disadvantage to specifyby prefetching the URLs. These pages may be kept in a termination condition.(iii) Density Baseda server’s cache to speed up web access. Existing Algorithm: it is based on a simple assumption thattechniques of selecting pages to be cached do not clusters are dense regions in the data space that arecapture a user’s surfing patterns correctly. It use a separated by regions of lower density. Their generalWeighted Association Rule (WAR) mining technique idea is to continue growing the given cluster as longthat finds pages of the user’s current interest and as the density in the neighborhood exceeds somecache them to give faster net access. This approach threshold. M. Ester et al [13] DBSCAN, main twocaptures both user’s habit and interest as compared to advantages: first, it is not sensitive to the order ofother approaches where emphasis is only on habit. data input; second, it can find clusters of arbitraryData mining techniques can be used to mine these shape in spatial database with noise. The problemlogs and extract association rules between the URLs with DBSCAN is that it not consider many smallrequested by the users. The association rules will be clusters that generated after clustering that containof the form X → Y where X and Y are URLs. It number of normal record, that result in high falsemeans if a user accesses URL X then he would be alarm rate. (iv) Grid-based Algorithm: It is segmentsaccessing URL Y most likely. A. Srivastava et al [10] the data space into pieces like grid-cells. Grid-basedDatabase intrusion detection system is design using clustering algorithms treat these grid-cells as datavarious approach here with we explain using data points, so grid-based clustering is moremining techniques. In this work, we have identified computationally efficient than other forms ofsome of the limitations of the existing intrusion clustering. Typical examples are STING [14].detection systems in general, and their incapability intreating database attributes at different levels of 4. SOME COMMAN IDS RESEARCHsensitivity in particular. In every database, some of ISSUEthe attributes are considered more sensitive to We have identified a number of researchmalicious modifications compared to others. Here issues in the intrusion detection area. We can seewith we explain an algorithm for finding here, developing intrusion detection system is notdependencies among important data items in a only about finding suitable detection algorithm butrelational database management system. Any also deciding about what data to collect, how to adapttransaction that does not follow these dependency the IDS to the resources of the target system, how torules are identified as malicious. The importance of test the IDS, and so on. Some of this issue is: (i)this approach is it minimizes the number of false Foundation. (ii) Data collection. (iii) Detectionpositive alarm. This approach generates more rules as Methods.(iv) Response. (v) Social Aspects. (vi)compared to non-weighted approach. So there is a Operational Aspects. (vii) Testing and Evaluationneed for a mechanism to find out which of the new (vii) IDS environment and Architecture. (ix) IDSrules are useful for detecting malicious transactions. security.Such a mechanism helps in discarding redundantrules. However, the main problem with attributedependency mining is the identification of propersupport and confidence values. The attributes that are 1754 | P a g e
  4. 4. Indr Jeet Rajput, Deshdeepak Shrivastava / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1752-1755CONCLUSION [9] A. Srivastava, A. Bhosale, S. Sural, Here we have explained various data mining “Speeding up Web Access Using Weightedapproach for database intrusion detection. A Association Rules”, Lecture Notes insignature based approach suitable when pattern of Computer Science, Springer Verlag,attack is known. It is applicable for known attack. Proceedings of International Conference onAssociation rule mining approach in which a Pattern Recognition and Machinedependency rules to be used for intrusion detection. Intelligence (PReMI05), pp. 660-665Here we have also discussed some future issue for (2005).intrusion detection system development. [10] Srivastava, A, Sural S., and Majumdar, AK.: “Database Intrusion Detection UsingACKNOWLEDGMENT Weighted Sequence Mining”, Journal of I will thank my teacher assistant professor Computers, vol. 1, no. 4 (2006)Udai Pratap Rao and Dr. Dhiren R. Patel here for [11] Rao, U.P., sahani, G.J., Patel, D.R.,fervent help when I have some troubles in paper “Detection of Malicious Activity in Rolewriting. I will also thank my class mates in laboratory Based Access Control (RBAC) Enabledfor their concern and support both in study and life. Databases”. In Proceeding of Journal of Information Assurance and Security 5REFERENCES (2010) 611-617. [1] C. Y. Chung, M. Gertz and K. Levitt, [12] Witcha Chimphlee, et. al. “Un-supervised “DEMIDS: A Misuse Detection System for clustering methods for identifying rare Database Systems”, In Proceedings of the events in anomaly detection”. In Proc. of Integrity and Internal Control in Information world academy of science engg. And Tech System, Pages 159-178, 1999. ( PWASET), Vol.8, Oct2005, pp. 253-258. [2] S.Y. Lee, W. L. Low and P. Y. Wong, [13] M. Ester, H.-P Kriegel, J. Sander and X. Xu, “Learning Fingerprints for a Database “A density-based algorithm for discovering Intrusion Detection System”, In Proceedings clusters in large spatial database with noise“, of the 7th European Symposium on Research in Proc. of KDD-96,pp.226-231,1996. in Computer Security, Pages 264-280, 2002. [14]. W. Wang, J. Yang, and R. Muntz, STING: [3]. Zhong, Y., Qin, X.: Database Intrusion “A Statistical Information Grid Approach to Detection Based on User Query Frequent Spatial Data Mining”, proceedings of 23rd Itemsets Mining with Item Constraints. In: International Conference on Very Large Proceeding of the 3rd international Data Bases, pp. 186-195, 1997. conference on information security, pp. 224–225 (2004) [4]. Bertino, E., Terzi, E., Kamra, A., Vakali, A: “Intrusion Detection in RBAC-Administered Databases”. In: Proceedings of the 21st annual computer security applications conference (ACSAC), pp. 170–182 (2005) [5] A. Kundu, S. Sural, A. K. Majumdar, “Database Intrusion Detection Using Sequence Alignment”. International Journal of information security volume 9, number 3, 179-191, DOI: 10.1007/s 10207-010-0102- 5. [6] Y. Hu, B. Panda, “A Data Mining Approach for Database Intrusion Detection”, Proceedings of the ACM Symposium on Applied Computing, pp. 711-716 (2004). [7] W. Wang, J. Yang, P. S. Yu, “Efficient Mining of Weighted Association Rules”, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 270-274 (2000). [8] F. Tao, F. Murtagh, M. Farid, “Weighted Association Rule Mining using Weighted Support and Significance Framework”, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 661-666 (2003). 1755 | P a g e