Your SlideShare is downloading. ×
20120140502006
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

20120140502006

89

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
89
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. International Journal of Advanced Research in Engineering RESEARCH IN ENGINEERING INTERNATIONAL JOURNAL OF ADVANCED and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online)TECHNOLOGY (IJARET) pp. 52-60, © IAEME AND Volume 5, Issue 2, February (2014), ISSN 0976 - 6480 (Print) ISSN 0976 - 6499 (Online) Volume 5, Issue 2, February (2014), pp. 52-60 © IAEME: www.iaeme.com/ijaret.asp Journal Impact Factor (2014): 4.1710 (Calculated by GISI) www.jifactor.com IJARET ©IAEME A NOVEL APPROACH FOR FREQUENT PATTERN MINING Sanjay Patel Vishwakarma Government Engineering College, Ahmedabad, Gujarat, India Nitin Raval M.E. Student, Gujarat Technological University, Ahmedabad, Gujarat, India Dr. K. Kotecha Institute of Technology, Nirma University, Ahmedabad, Gujarat, India ABSTRACT Frequent pattern mining is an important task of Data mining. It plays an essential role in several Data mining techniques like clustering, classification and ARM. Most of the existing methods are based on Apriori-like adopt candidate-generate-and-test approaches. However, those methods may encountered serious problem like candidate generation and multiple database scan also in Real time scenario transactions are added, deleted and modified constantly but they are not supporting iterative and incremental mining. In this work, Novel and efficient pattern-growth method has been developed for mining various frequent patterns from large databases and also it extends the idea of FP-Tree to improve storage compression and allow frequent pattern mining without generation of candidate item sets. The proposed algorithms allow incremental and iterative mining. KEYWORDS: Data Mining, Knowledge Discovery, Anti Monotone, Iterative Mining, Incremental Mining. I. INTRODUCTION Large amount of data are collecting everyday in all fields of science, business, medicine, military etc. That means we are data rich but information is poor. the same rate of growth in the processing power of evaluating and analyzing the data did not follow this massive growth. Due to this phenomenon, a tremendous volume of data is still kept without being studied. Data mining, a research field that tries to ease this problem, processes some solutions for the extraction of significant and potentially useful patterns from these large amounts of data. It is also called KDD 52
  • 2. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME (Knowledge Discovery from Data) [10]. Data mining is to find valid, novel, potentially useful and ultimately understandable patterns in data. In general there are many kinds of patterns that can be discovered from data . For example, association rules can be mined for market basket analysis, classification rules can be found for accurate classifiers, clusters and outliers can be identified for customer relationship management [10]. There are mainly two types in ARM first is frequent pattern generation and another one is to generate rules from frequent patterns. A well known algorithm for frequent pattern mining is Apriori. Apriori is a classical algorithm that requires candidate generation and multiple database scans to find frequent patterns [1] [2] [3]. To overcome the limitations of Apriori Han et al. propose a data structure, frequent pattern tree or FP-Tree, and an algorithm called FP-growth that allows mining of frequent item sets without generating candidate item sets. Still there are some limitations that FP-growth requires two database scan and does not support iterative and incremental mining [4] [5]. CATS-tree requires only one database scan and also supports iterative mining but merging and splitting of nodes create bottlenecks [6]. To overcome this limitation researchers have proposed CAN tree [8] [9]. It uses some canonical order to construct the tree. It enables incremental mining. Proposed algorithms enable frequent pattern mining with different supports without rebuilding the tree structure. It also allows mining with a single pass over the database as well as efficient insertion or deletion of transactions at any time. This paper is organized as follows. Section II discusses about related background of the frequent pattern mining. Section III discusses our proposed Tree method. Section IV shows the comparison of existing algorithm. Section V shows the experimental results, and Conclusion is discussed in section VI. II. RELATED BACKGROUNDS A. Apriori Algorithm Apriori is a well known algorithm for frequent pattern mining. It uses generate and test approach. First step is to generate candidates item sets from the given database then to test whether it is frequent or not if any item set does not support minimum threshold value then remove that item. Important property of Apriori is its anti monotone approach that all nonempty subsets of a super set must also be frequent. Drawbacks of Apriori are candidate generation and multiple database scan [1] [2] [3] [10]. B. FP-growth Algorithm It overcomes the limitations of Apriori like huge no. of candidate generation and need to scan database again and again. It uses Divide and conquers approach. It compresses the database representing frequent patterns into a FP-Tree which contain the item sets association information. Construction of FP-tree uses two pass. In first pass Take Database D Scan and generate 1-itemsets by sorted frequent items in order of descending support count and in second pass Construction of FPtree. Create root nod as “null”. Use linked list concept and start construction of FP-tree using support count. For mining the frequent patterns take initial suffix pattern from FP-tree then Construct conditional pattern base that is “Sub-database which contains set of prefix paths in FP-tree cooccurring with the suffix pattern”. Perform mining recursively on the tree. Drawbacks are FP-Tree may not fit in memory, FP-Tree is expensive to build, does not support incremental mining [4] [5] [10]. C. ECLAT Algorithm It uses TID (transactional id). It uses vertical data format (horizontal transactional can be transformed into the vertical data format) [10]. 53
  • 3. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME D. CATS-tree and FELINE Algorithm CATS tree is compressed and Arranged Transaction Sequences tree algorithm. It is an extension of FP-tree. Use only single data scan. It contains all elements of FP-tree. It supports Interactive mining. But Tree construction is expensive to build. Swapping and/or merging of nodes require extra cost. The algorithm needs to traverse both upward and downward to include frequent items [4] [6]. E. CAN tree To overcome the limitation of existing algorithms like extra cost for swapping and/or merging of nodes researchers have proposed CAN tree. In the Can Tree, items are arranged according to some canonical order, which is unaffected by frequency changes. The frequency of a node in the Can Tree is at least as high as the sum of frequencies of all its children. Use only single data scan. Support Iterative mining and Incremental Mining. But still tree construction requires more memory [8] [9]. F. Variant of CAN-tree A variant of Can Tree is called CANTries. It reduces the size of node. Use only single data scan. The structure of a Can Tries is quite similar to that of the Can Tree, except that nodes along the same path are combining into a mega-node if they have the same frequency. It overcomes the memory problem of CAN Tree [8] [9]. III. PROPOSED ALGORITHM In FP-growth algorithm it works on two pass first is to Take Database D. Scan and generate 1-itemsets by sorted frequent items in order of descending support count shown in figure 1 then in second pass Construction of FP-tree. Create root nod as “null”. Use linked list concept and start construction of FP-tree using support count which is shown in figure 2. Finally mine patterns by using conditional pattern base and conditional FP-tree. Figure 1. First pass of FP-growth 54
  • 4. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME Figure 2. Second pass of FP-growth In proposed algorithm rather than going to use support descending order that is in FP-growth some canonical order has been adopted that is lexicographic order it can also called alphabetical order. For an example take one transaction like 45, 23,10,6,8,52 by sorting it into lexicographic order it will be like 6,8,10,23,45,52. Working of proposed algorithm is as follows. In figure 3 all transactions are sorted in alphabetical order in first pass and in second pass shown in figure 4 it will generate the tree as FPgrowth and also mine frequent patterns as in FP-growth.it requires only one scan and enables insertion, deletion, and modification of transactions at any time without starting it from scratch. The proposed algorithms also enable frequent pattern mining with different support without rebuilding the tree structure. Figure 3. First pass of our proposed algorithm 55
  • 5. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME Figure 4. Second pass of our proposed algorithm To illustrate, how the proposed tree is working, take the following database as an example. TID LIST OF ITEMS D001 C,M,E,A D002 C,M,E D003 M,A D004 C,E D005 C,E,A Table 1: Transaction Database First all the elements of the transaction will be sorted according to the alphabetical order, and then the tree will be formed according to that order as shown in the figure 5. From the figure 5 in final tree, the frequency of all the elements will be found. Suppose min_sup=2 A: 3, C: 4, E: 4, M: 3 Now the elements whose frequency is less than the min_sup will be removed from the tree. Here there is no item will be removed all items support the minimum support threshold value. So the new tree will be formed as shown in Figure 5 the final tree. 56
  • 6. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME Figure 5. Construction of proposed tree After that the mining of the frequent items will be done according to the FP-growth approach as follows. Step 1: The Conditional pattern base will be formed according to the ascending order of the items. Step 2: Conditional FP-tree will be generated according to the same order as in step 1 by removing the items with the frequency less than the min_sup from the conditional pattern base. Step 3: Finally frequent patterns will be generated from the conditional FP-tree. 57
  • 7. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME IV. COMPARISON Figure shows comparison between existing algorithms of frequent pattern mining by various parameters. Figure 6. Comparison V. EXPERIMENTAL RESULTS In these experiments, transaction databases generated by IBM [11] are used in computer system having core 2 duo 2.0 GHz processor, 160 GB hard disk and 2 GB RAM. The goal of experiment is to find out the performance of proposed algorithm over existing algorithms. In figure 7, it shows comparison between Apriori, FP-growth and Extension of FPgrowth by requiring time for different min_sup value. Results show that Extension of FP-growth requires minimum time as compared to Apriori and FP-growth. The Apriori algorithm works on the principle of candidate generate and test, so it requires the maximum execution time. 58
  • 8. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME Figure 7 In figure 8, it shows that whenever there is an update in database FP-growth algorithm requires more memory because there is a need to build tree from the start but in extension of FPgrowth algorithm it requires less memory than FP-growth because it allows incremental mining. In figure 9, it shows that mining for different min_sup FP-growth algorithm requires more time because there is a need to build tree from the start but in extension of FP-growth algorithm it requires less time than FP-growth because it allows iterative mining. Finally, all experiments show that if any modifications are proposed, as per the algorithm of FP-Growth, the tree generation procedure has to be started from the scratch. In the Extension of FPgrowth algorithm, if any transaction is going to be added, inserted or deleted there is a provision to make changes directly in the existing tree because it uses alphabetical order. So for incremental size of the database, Extension of FP-growth algorithm is better than any of the existing algorithms. Figure 9 Figure 10 59
  • 9. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME VI. CONCLUSION A Novel approach has been implemented to provide the efficient and powerful tree support for incremental mining. The extension of FP-growth algorithm captures transactions of database and arranges nodes according to alphabetical order that is unaffected by changes in item frequency. By exploiting its nice properties, the extension of FP-growth algorithm can be easily maintained when there is an update in database transactions. Extension of FP-growth does not require merging and/or splitting of tree nodes. It avoids the rescan of the entire updated database or the construction of a tree from the scratch for incremental updating. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] Agrawal R, Imielinski T, Swami AN. "Mining Association Rules between Sets of Items in Large Databases." SIGMOD. June 1993. R. Agrawal and R. Srikant, “Fast algorithms for mining association rules”, Proceeding of the 20th VLDB Conference Santiago, Chile 1994. R Agrawal, Mannila H, Toivonen H, Verkamo AI. “Fast Discovery of Association Rules." at Quest Project at IBM Almaden Research Centre and research at the university of Helsinki 1994. Cheung W., ”Frequent Pattern mining without candidate generation or support constraint.” Master’s thesis, University of Alberta, 2002. Jiawei Han, Jian Pei, and Yiwen Yin,” Mining Frequent Patterns without Candidate Generation “, Simon Fraser University, 2002. William Cheung and Osmar R. Zaiane, “Incremental Mining of Frequent Patterns without candidate Generation or Support Constraint”, IDEAS’03. Christian Borgelt, “An Implementation of the FP-growth Algorithm” OSDM’05. Q. I. Khan, T. Hoque and C. K. Leung, “CANTree: A Tree structure for Efficient Incremental mining of frequent patterns”, ICDM ’05. Sanjay Patel and Dr. Ketan Kotecha, “Incremental Frequent Pattern Mining using Graph based approach”, International Journal of Computers & Technology, March-April 2013. Jiawei Han and Micheline Kamber, Book.”Data Mining, Concept and Techniques”. http://www.almaden.ibm.com/cs/quest//syndata.html#assocSynData M. Karthikeyan, M. Suriya Kumar and Dr. S. Karthikeyan, “A Literature Review on the Data Mining and Information Security”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 141 - 146, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. R. Manickam, D. Boominath and V. Bhuvaneswari, “An Analysis of Data Mining: Past, Present and Future”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 1 - 9, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. R. Lakshman Naik, D. Ramesh and B. Manjula, “Instances Selection Using Advance Data Mining Techniques”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 47 - 53, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, “Development of Pattern Knowledge Discovery Framework Using Clustering Data Mining Algorithm”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, pp. 101 - 112, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. 60

×