More Related Content
Similar to 20120140502006
Similar to 20120140502006 (20)
More from IAEME Publication
More from IAEME Publication (20)
20120140502006
- 1. International Journal of Advanced Research in Engineering RESEARCH IN ENGINEERING
INTERNATIONAL JOURNAL OF ADVANCED and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online)TECHNOLOGY (IJARET) pp. 52-60, © IAEME
AND Volume 5, Issue 2, February (2014),
ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
Volume 5, Issue 2, February (2014), pp. 52-60
© IAEME: www.iaeme.com/ijaret.asp
Journal Impact Factor (2014): 4.1710 (Calculated by GISI)
www.jifactor.com
IJARET
©IAEME
A NOVEL APPROACH FOR FREQUENT PATTERN MINING
Sanjay Patel
Vishwakarma Government Engineering College, Ahmedabad, Gujarat, India
Nitin Raval
M.E. Student, Gujarat Technological University, Ahmedabad, Gujarat, India
Dr. K. Kotecha
Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
ABSTRACT
Frequent pattern mining is an important task of Data mining. It plays an essential role in
several Data mining techniques like clustering, classification and ARM. Most of the existing
methods are based on Apriori-like adopt candidate-generate-and-test approaches. However, those
methods may encountered serious problem like candidate generation and multiple database scan also
in Real time scenario transactions are added, deleted and modified constantly but they are not
supporting iterative and incremental mining. In this work, Novel and efficient pattern-growth method
has been developed for mining various frequent patterns from large databases and also it extends the
idea of FP-Tree to improve storage compression and allow frequent pattern mining without
generation of candidate item sets. The proposed algorithms allow incremental and iterative mining.
KEYWORDS: Data Mining, Knowledge Discovery, Anti Monotone, Iterative Mining, Incremental
Mining.
I. INTRODUCTION
Large amount of data are collecting everyday in all fields of science, business, medicine,
military etc. That means we are data rich but information is poor. the same rate of growth in the
processing power of evaluating and analyzing the data did not follow this massive growth. Due
to this phenomenon, a tremendous volume of data is still kept without being studied. Data mining, a
research field that tries to ease this problem, processes some solutions for the extraction of
significant and potentially useful patterns from these large amounts of data. It is also called KDD
52
- 2. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME
(Knowledge Discovery from Data) [10]. Data mining is to find valid, novel, potentially useful and
ultimately understandable patterns in data. In general there are many kinds of patterns that can
be discovered from data . For example, association rules can be mined for market basket analysis,
classification rules can be found for accurate classifiers, clusters and outliers can be identified
for customer relationship management [10]. There are mainly two types in ARM first is frequent
pattern generation and another one is to generate rules from frequent patterns. A well known
algorithm for frequent pattern mining is Apriori. Apriori is a classical algorithm that requires
candidate generation and multiple database scans to find frequent patterns [1] [2] [3]. To overcome
the limitations of Apriori Han et al. propose a data structure, frequent pattern tree or FP-Tree, and an
algorithm called FP-growth that allows mining of frequent item sets without generating candidate
item sets. Still there are some limitations that FP-growth requires two database scan and does not
support iterative and incremental mining [4] [5]. CATS-tree requires only one database scan and also
supports iterative mining but merging and splitting of nodes create bottlenecks [6]. To overcome this
limitation researchers have proposed CAN tree [8] [9]. It uses some canonical order to construct the
tree. It enables incremental mining. Proposed algorithms enable frequent pattern mining with
different supports without rebuilding the tree structure. It also allows mining with a single pass over
the database as well as efficient insertion or deletion of transactions at any time.
This paper is organized as follows. Section II discusses about related background of the frequent
pattern mining. Section III discusses our proposed Tree method. Section IV shows the comparison of
existing algorithm. Section V shows the experimental results, and Conclusion is discussed in section
VI.
II. RELATED BACKGROUNDS
A. Apriori Algorithm
Apriori is a well known algorithm for frequent pattern mining. It uses generate and test
approach. First step is to generate candidates item sets from the given database then to test whether it
is frequent or not if any item set does not support minimum threshold value then remove that item.
Important property of Apriori is its anti monotone approach that all nonempty subsets of a super set
must also be frequent. Drawbacks of Apriori are candidate generation and multiple database scan [1]
[2] [3] [10].
B. FP-growth Algorithm
It overcomes the limitations of Apriori like huge no. of candidate generation and need to scan
database again and again. It uses Divide and conquers approach. It compresses the database
representing frequent patterns into a FP-Tree which contain the item sets association information.
Construction of FP-tree uses two pass. In first pass Take Database D Scan and generate 1-itemsets by
sorted frequent items in order of descending support count and in second pass Construction of FPtree. Create root nod as “null”. Use linked list concept and start construction of FP-tree using support
count. For mining the frequent patterns take initial suffix pattern from FP-tree then Construct
conditional pattern base that is “Sub-database which contains set of prefix paths in FP-tree cooccurring with the suffix pattern”. Perform mining recursively on the tree. Drawbacks are FP-Tree
may not fit in memory, FP-Tree is expensive to build, does not support incremental mining [4] [5]
[10].
C. ECLAT Algorithm
It uses TID (transactional id). It uses vertical data format (horizontal transactional can be
transformed into the vertical data format) [10].
53
- 3. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME
D. CATS-tree and FELINE Algorithm
CATS tree is compressed and Arranged Transaction Sequences tree algorithm. It is an
extension of FP-tree. Use only single data scan. It contains all elements of FP-tree. It supports
Interactive mining. But Tree construction is expensive to build. Swapping and/or merging of nodes
require extra cost. The algorithm needs to traverse both upward and downward to include frequent
items [4] [6].
E. CAN tree
To overcome the limitation of existing algorithms like extra cost for swapping and/or
merging of nodes researchers have proposed CAN tree. In the Can Tree, items are arranged
according to some canonical order, which is unaffected by frequency changes. The frequency of a
node in the Can Tree is at least as high as the sum of frequencies of all its children. Use only single
data scan. Support Iterative mining and Incremental Mining. But still tree construction requires more
memory [8] [9].
F. Variant of CAN-tree
A variant of Can Tree is called CANTries. It reduces the size of node. Use only single data
scan. The structure of a Can Tries is quite similar to that of the Can Tree, except that nodes along the
same path are combining into a mega-node if they have the same frequency. It overcomes the
memory problem of CAN Tree [8] [9].
III. PROPOSED ALGORITHM
In FP-growth algorithm it works on two pass first is to Take Database D. Scan and generate
1-itemsets by sorted frequent items in order of descending support count shown in figure 1 then in
second pass Construction of FP-tree. Create root nod as “null”. Use linked list concept and start
construction of FP-tree using support count which is shown in figure 2. Finally mine patterns by
using conditional pattern base and conditional FP-tree.
Figure 1. First pass of FP-growth
54
- 4. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME
Figure 2. Second pass of FP-growth
In proposed algorithm rather than going to use support descending order that is in FP-growth
some canonical order has been adopted that is lexicographic order it can also called alphabetical
order. For an example take one transaction like 45, 23,10,6,8,52 by sorting it into lexicographic order
it will be like 6,8,10,23,45,52.
Working of proposed algorithm is as follows. In figure 3 all transactions are sorted in
alphabetical order in first pass and in second pass shown in figure 4 it will generate the tree as FPgrowth and also mine frequent patterns as in FP-growth.it requires only one scan and enables
insertion, deletion, and modification of transactions at any time without starting it from scratch. The
proposed algorithms also enable frequent pattern mining with different support without rebuilding
the tree structure.
Figure 3. First pass of our proposed algorithm
55
- 5. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME
Figure 4. Second pass of our proposed algorithm
To illustrate, how the proposed tree is working, take the following database as an example.
TID
LIST OF ITEMS
D001
C,M,E,A
D002
C,M,E
D003
M,A
D004
C,E
D005
C,E,A
Table 1: Transaction Database
First all the elements of the transaction will be sorted according to the alphabetical order, and
then the tree will be formed according to that order as shown in the figure 5.
From the figure 5 in final tree, the frequency of all the elements will be found. Suppose
min_sup=2
A: 3, C: 4, E: 4, M: 3
Now the elements whose frequency is less than the min_sup will be removed from the tree.
Here there is no item will be removed all items support the minimum support threshold value. So the
new tree will be formed as shown in Figure 5 the final tree.
56
- 6. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME
Figure 5. Construction of proposed tree
After that the mining of the frequent items will be done according to the FP-growth approach
as follows.
Step 1:
The Conditional pattern base will be formed according to the ascending order of the
items.
Step 2:
Conditional FP-tree will be generated according to the same order as in step 1 by
removing the items with the frequency less than the min_sup from the
conditional pattern base.
Step 3:
Finally frequent patterns will be generated from the conditional FP-tree.
57
- 7. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME
IV. COMPARISON
Figure shows comparison between existing algorithms of frequent pattern mining by various
parameters.
Figure 6. Comparison
V. EXPERIMENTAL RESULTS
In these experiments, transaction databases generated by IBM [11] are used in computer
system having core 2 duo 2.0 GHz processor, 160 GB hard disk and 2 GB RAM.
The goal of experiment is to find out the performance of proposed algorithm over existing
algorithms. In figure 7, it shows comparison between Apriori, FP-growth and Extension of FPgrowth by requiring time for different min_sup value. Results show that Extension of FP-growth
requires minimum time as compared to Apriori and FP-growth. The Apriori algorithm works on the
principle of candidate generate and test, so it requires the maximum execution time.
58
- 8. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME
Figure 7
In figure 8, it shows that whenever there is an update in database FP-growth algorithm
requires more memory because there is a need to build tree from the start but in extension of FPgrowth algorithm it requires less memory than FP-growth because it allows incremental mining.
In figure 9, it shows that mining for different min_sup FP-growth algorithm requires more
time because there is a need to build tree from the start but in extension of FP-growth algorithm it
requires less time than FP-growth because it allows iterative mining.
Finally, all experiments show that if any modifications are proposed, as per the algorithm of
FP-Growth, the tree generation procedure has to be started from the scratch. In the Extension of FPgrowth algorithm, if any transaction is going to be added, inserted or deleted there is a provision to
make changes directly in the existing tree because it uses alphabetical order. So for incremental size
of the database, Extension of FP-growth algorithm is better than any of the existing algorithms.
Figure 9
Figure 10
59
- 9. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 2, February (2014), pp. 52-60, © IAEME
VI. CONCLUSION
A Novel approach has been implemented to provide the efficient and powerful tree support
for incremental mining. The extension of FP-growth algorithm captures transactions of database and
arranges nodes according to alphabetical order that is unaffected by changes in item frequency. By
exploiting its nice properties, the extension of FP-growth algorithm can be easily maintained when
there is an update in database transactions. Extension of FP-growth does not require merging and/or
splitting of tree nodes. It avoids the rescan of the entire updated database or the construction of a tree
from the scratch for incremental updating.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
Agrawal R, Imielinski T, Swami AN. "Mining Association Rules between Sets of Items in
Large Databases." SIGMOD. June 1993.
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules”, Proceeding of
the 20th VLDB Conference Santiago, Chile 1994.
R Agrawal, Mannila H, Toivonen H, Verkamo AI. “Fast Discovery of Association Rules." at
Quest Project at IBM Almaden Research Centre and research at the university of
Helsinki 1994.
Cheung W., ”Frequent Pattern mining without candidate generation or support
constraint.” Master’s thesis, University of Alberta, 2002.
Jiawei Han, Jian Pei, and Yiwen Yin,” Mining Frequent Patterns without Candidate
Generation “, Simon Fraser University, 2002.
William Cheung and Osmar R. Zaiane, “Incremental Mining of Frequent Patterns without
candidate Generation or Support Constraint”, IDEAS’03.
Christian Borgelt, “An Implementation of the FP-growth Algorithm” OSDM’05.
Q. I. Khan, T. Hoque and C. K. Leung, “CANTree: A Tree structure for Efficient
Incremental mining of frequent patterns”, ICDM ’05.
Sanjay Patel and Dr. Ketan Kotecha, “Incremental Frequent Pattern Mining using Graph
based approach”, International Journal of Computers & Technology, March-April 2013.
Jiawei Han and Micheline Kamber, Book.”Data Mining, Concept and Techniques”.
http://www.almaden.ibm.com/cs/quest//syndata.html#assocSynData
M. Karthikeyan, M. Suriya Kumar and Dr. S. Karthikeyan, “A Literature Review on the Data
Mining and Information Security”, International Journal of Computer Engineering &
Technology (IJCET), Volume 3, Issue 1, 2012, pp. 141 - 146, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
R. Manickam, D. Boominath and V. Bhuvaneswari, “An Analysis of Data Mining: Past,
Present and Future”, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 1, 2012, pp. 1 - 9, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
R. Lakshman Naik, D. Ramesh and B. Manjula, “Instances Selection Using Advance Data
Mining Techniques”, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 2, 2012, pp. 47 - 53, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, “Development of Pattern
Knowledge Discovery Framework Using Clustering Data Mining Algorithm”, International
Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013,
pp. 101 - 112, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
60