Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Mining

651 views

Published on

Mining frequent item-sets is one of the most
important concepts in data mining. It is a fundamental and
initial task of data mining. Apriori[3] is the most popular and
frequently used algorithm for finding frequent item-sets.
There are other algorithms viz, Eclat[4], FP-growth[5] which
are used to find out frequent item-sets. In order to improve
the time efficiency of Apriori algorithms, Jiemin Zheng
introduced Bit-Apriori[1] algorithm with the following
corrections with respect to Apriori[3] algorithm.
1) Support count is implemented by performing bitwise “And”
operation on binary strings
2) Special equal-support pruning
In this paper, to improve the time efficiency of Bit-Apriori[1]
algorithm, a novel algorithm that deletes infrequent items
during trie2 and subsequent tire’s are proposed and
demonstrated with an example.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
651
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Modifed Bit-Apriori Algorithm for Frequent Item- Sets in Data Mining

  1. 1. Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013 Modifed Bit-Apriori Algorithm for Frequent ItemSets in Data Mining J Karthikeyan1 and Dr. Udaykumar2 1 Research Scholar, Hindustan University, Chennai, India Email: karthikeyan_world@hotmail.com 2 ACOE, Hindustan University, Chennai, India Email: aukumar71@gmail.com Abstract -Mining frequent item-sets is one of the most important concepts in data mining. It is a fundamental and initial task of data mining. Apriori[3] is the most popular and frequently used algorithm for finding frequent item-sets. There are other algorithms viz, Eclat[4], FP-growth[5] which are used to find out frequent item-sets. In order to improve the time efficiency of Apriori algorithms, Jiemin Zheng introduced Bit-Apriori[1] algorithm with the following corrections with respect to Apriori[3] algorithm. 1) Support count is implemented by performing bitwise “And” operation on binary strings 2) Special equal-support pruning In this paper, to improve the time efficiency of Bit-Apriori[1] algorithm, a novel algorithm that deletes infrequent items during trie2 and subsequent tire’s are proposed and demonstrated with an example. unimportant patterns in the item-sets mining. II. RELATED WORK A. Apriori algorithm In computer science and data mining, Apriori is a classic algorithm for learning association rules[8]. Apriori is designed to operate on databases containing transactions. Apriori is commonly used in association rule mining [3]. Apriori uses a “bottom up” approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data[9][10]. The algorithm terminates when no further successful extensions are found. Apriori [2] uses breadth-first [3] search and a tree structure to count[6][12[13] candidate item sets efficiently. It generates candidate item sets of length K from item sets of length k-1. Then it prunes the candidates which have an infrequent sub pattern[11]. According to the downward closure lemma, the candidate set contains all frequent k- length item sets. After that, it scans the transaction database to determine frequent item-sets among the candidates. Apriori [2], though historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all -1of its proper subsets. The pseudo code for Apriori is shown in Table I. Index Terms - Data mining; frequent item-sets; Apriori; BitApriori, trie2. I. INTRODUCTION In recent years the size of database has increased rapidly. This has led to a growing interest in the development of tools capable of automatic extraction of knowledge from data. The term data mining or knowledge discovery in database has been adopted for a field of research dealing with the automatic discovery of implicit information or knowledge within the databases. The implicit information within databases, mainly the interesting association relationships[5] among sets of objects that lead to association rules may disclose useful patterns for decision support, financial forecast, marketing policies, even medical diagnosis and many other applications[7]. In frequent patterns, the challenge is large number of result patterns. As the minimum threshold becomes lower, an exponentially large number of item-sets are generated. Therefore, pruning[1] unimportant patterns can be done effectively in mining process and that becomes one of the main topics in frequent pattern mining. Hence, the main aim is to optimize the process of finding frequent patterns which should be efficient, scalable and can detect the important patterns that can be used in various ways of extraction of knowledge from data. Therefore, the study of frequent item-sets mining is well acknowledged in frequent pattern mining because of its broad applications on association rules and for other data mining tasks. An attempt is made in the present work to prune © 2013 ACEEE DOI: 03.LSCS.2013.2.66 B. Bit-Apriori Algorithm Bit-Apriori used the datastructure and techniques of Apriori [1] algorithm. The main difference between Apriori and Bit-Apriori lies in candidate item-sets generation and support count approach. These two steps consume more time and memory in the Apriori [2] algorithm. Given a set of item-sets, the algorithm attempts to find subsets which are common to at least a minimum number C of the item-sets. The time required for mining [14][15]frequent k-item-sets grows significantly when k increases in Apriori. But Bit-Apriori [1] performs much better because it has no candidate generation and needs to traverse the trie only once. The pseudocode for Bit-Apriori is shown in Table II. 54
  2. 2. Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013 there exist a node with child then we go for traversal else ignore the node by considering as infrequent. Such nodes will not be considered for the further iterations in the proposed algorithm. This will reduce the time complexity when the occurance of the infrequent items are increased in the given dataset. The pseudo code for the proposed algorithm is shown in Table III. TABLE I. THE PSUEDOCODE FOR FINDING FREQUENT ITEM-SETS USING APRIORI ALGORITHM TABLE III. THE PSUEDOCODE FOR THE PROPOSED ALGORITHM TABLE II. T HE PSUEDOCODE FOR BIT-APRIORI To demonstrate the process of proposed algorithm, an example is given below. As shown in table , the example database is in the second column. In the database, there are ten transactions. TABLE IV. T HE EXAMPLE DATABASE TID 1 2 3 4 5 6 7 8 9 10 III. PROBLEM STATEMENT To find out frequent item-sets, both Apriori[3] and BitApriori[1] algorithms are used to search elements in the entire item-sets starting from 1 to N. When the total support count for an item is zero or lesser than the support count, then the elements are not required for the consecutive iterations. While forming tires Apriori and Bit-Apriori algorithms are considering these elements. Hence there is a scope for improvement by eliminating such items during tires formation. A new algorithm is proposed to improve the performance, resource utilization, time and efficiency. IV. PROPOSED ALGORITHM A new algorithm has been developed which deletes the infrequent items during the trie2 and subsequent iterations. The removal of infrequent items results with improvement in computation time. Apriori and Bit-Apriori algorithms do not removes the infrequent items during the tire2 and subsequent iterations. In the graph, the proposed algorithm checks if © 2013 ACEEE DOI: 03.LSCS.2013.2. 66 55 Items ABDEFL AGO CEI ACDEG ABCEGK EH ABCEFJ ACD ACEGM ACEGN Ordered frequent items AE GA CE GCAE GCAE E CAE CA GCAE GCAE Suppose the support threshold min_sup is 40%. The support of each item is counted, and infrequent items are deleted, during the first scan of the database. The support of each item is given as follows. A:8, B:3, C:7, D:3, E:8, F:2, G:5, H:1, I:1, J:1, K:1, L:1, M:1, N:1, O:1 Since the minimum support is 4, frequent items are sorted into a non-decreasing list, according to their respective supports. And if two items have the same support, they will be sorted according to their lexicographic order. In Step 2 of Bit-Apriori, all frequent 2-item-sets are found as shown in Table V. The trie with the binary string shown in each leaf is established, which is shown in Fig. 1.
  3. 3. Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013 TABLE V. FREQUENT 2-I TEM-SETS TID 1 2 3 4 5 6 7 8 9 10 Ordered Items AE GA CE GCAE GCAE E CAE CA GCAE GCAE {G, C} 0 0 0 1 1 0 0 0 1 1 {G, A} 0 1 0 1 1 0 0 0 1 1 {G, E} 0 0 0 1 1 0 0 0 1 1 {C, A} 0 0 0 1 1 0 1 1 1 1 {C, E} 0 0 1 1 1 0 1 0 1 1 algorithms. Interesting finding is that, when the occurrence of the non-frequent item-sets are higher then the execution time gets reduced drastically. The experimental result shows that the proposed algorithm not only decreases the computation time but also decreases the resources used and the execution time is represented in Table VI. {A, E} 1 0 0 1 1 0 1 0 1 1 During the consequent iterations, element ‘E’ can be ignored by considering it as non-frequent item set. The computation time can be considerably reduced when the occurrence of element like ‘E’ are more in the frequent items. By completing all iterations the final output of the binary string is shown in Fig. 2. Fig. 3. Execution Time Of Algorithms TABLE VI. C OMPARISON OF EXECUTION T IME BETWEEN APRIORI/B IT-APRIORI/ MODIFIED BIT-APRIORI (Execution Time in Seconds) Dataset Apriori pusmsb Bit-Apriori Modified Bit-Apriori 4.5 1.32 0.98 VII. CONCLUSIONS In this paper, the modified Bit-Apriori technique improves the performance of Bit-Apriori, by eliminating the search of infrequent item-sets. It also improves the computational efficiency significantly. Experimental results have shown that modified Bit-Apriori algorithm out performs the fast BitApriori, especially when the occurrence of the non-frequent item-sets are more. When the database is large, the Bit-Apriori may suffer from the problem of memory scarcity due to large number of bitwise operations. Future work can be done in the direction of replacing bitwise operations. Fig. 1. Trie After Generation(2) REFERENCES [1] Jiemin Zheng., 1, Defu Zhang 1, Stephen C.H.Leung 2,Xiyue Zhou, “An efficient algorithm for frequent itemsets in data mining” Service Systems and Service Management(ICSSSM), 2010 7th International Conference on: 28-30 June 2010. [2] Agrwal R., R.Srikant, “Fast algorithms for mining association rules”, The International Conference on Very Large Dabases, pp. 487-499, 1994. [3] Zaki M.J., S. Parthasarathy, M.Ogihara, W.Li,” New algorithms for fast discovery of association rules”, in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283-296,1997. [4] Han J., J. Pei, Y. Yin, “Mining frequent patterns without candidate generation” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Fig. 2. Trie After Completion V. EXPERIMENTAL RESULTS The proposed algorithm is tested on different data sets and the experimental results are shown in Fig. 3. The proposed algorithm consumes considerably a lesser amount of time compared to Bit-Apriori and Apriori © 2013 ACEEE DOI: 03.LSCS.2013.2.66 56
  4. 4. Poster Paper Proc. of Int. Conf. on Advances in Information Technology and Mobile Communication 2013 ACM Press, pp. 1-12,2000. [5] Pork J.S., M.S. Chen, P.S. Yu, “An effective hash based algorithm for mining association rules” ACM SIGMOD, pp. 175-186, 1995. [6] Brin S., R. Motwani, J.D. Ullman, S. Tsur,”Dynamic itemset counting and implicationrulesformarket basket data”,in Proceedings of the ACMSIGMOD International Conference on Management of Data, pp. 255–264, 1997. [7] Brin S., R. Motwani, C. Silverstein, “Beyond market baskets: generalizing association rules to correlations”, in Proceedings of the ACM SIGMOD International Conference on Management of Data, Tuscon, Arizona, pp. 265-276, 1997. [8] Toivonen H., “Sampling large databases for association rules”, in Proceedings of 22nd VLDB Conference, Mumbai, India, pp. 134-145, 1996. [9] Savasere A., E. Omiecinski, S.B. Navathe, “An efficient algorithm for mining association rules in large databases”, in Proceedings of 21th International Conference on Very Large Data Bases (VLDB’95), Zurich, pp. 432-444, 1995. © 2013 ACEEE DOI: 03.LSCS.2013.2. 66 [10] Tsay Y.J., J.Y. Chiang, “CBAR: an efficient method formining association rules,” Knowledge Based Systems, 18 (2-3), pp. 99-105, 2005. [11] Liu G., H. Lu, W. Lou, Y. Xu, J.X. Yu, “Efficient mining of frequent patterns using ascending frequency Ordered prefixtree”, Data Mining Knowledge Discovery, 9 (3), pp. 249-274, 2004. [12] Grahne G., J. Zhu, “Fast algorithms for frequent itemset mining using FP-Trees”, IEEE Transaction on Knowledge and Data Engineering, 17 (10), pp.1347-1362, 2005. [13] Zaki M.J., “Scalable algorithms for association mining” IEEE Transactions on Knowledge and Data Engineering, 12 (3), pp. 372-390, 2000. [14] Zaki M.J., K. Gouda, “Fast Vertical Mining Using Diffsets”, in Proceedings of the ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, pp. 326-335, 2003. [15] Dong J., M. Han, “BitTableFI: an efficient mining frequent itemsets algorithm” Knowledge Based Systems, 20 (4), pp. 329-335, 2007. 57

×