Hardware-Enhanced Association Rule Mining
with Hashing and Pipelining
Ying-Hsiang Wen, Jen-Wei Huang, and Ming-Syan Chen, ...
on multiprocessors. The performance, however, cannot
improve linearly as the number of the parallel nodes grows.
Therefore...
2 RELATED WORKS
In this section, we discuss two previous works that use a
systolic array architecture to enhance the perfo...
Based on Theorem 2, whether an item can be trimmed or
not depends on how many candidate itemsets in the current
transactio...
the systolic array, the frequencies that candidate itemsets
occur in the transactions can be determined. Note that if the
...
itemsets. The ith flag in the trimming vector is set to true if
the ith item in the transaction matches the candidate item...
Consider the example in Fig. 9. We assume the current k is
3. The first three index entries in the index array are used in...
comparison process has to be executed several times. Thus,
the execution cycle is, at most
tcandidate generation ¼ dNfreqÀ...
to build H3, we divide the database into 30 parts, i.e.,
Npipe ¼ 30. By applying pipeline scheme to these three
hardware m...
With the trimming filter, the execution time increases to
about 10 percent-70 percent compared to DC. As the
number of dif...
the size of applications is growing much faster than the
speed of CPU. Thus, hardware-enhanced data mining
technique is im...
[20] C. Wolinski, M. Gokhale, and K. McCabe, “A Reconfigurable
Computing Fabric,” Proc. Int’l Conf. Eng. of Reconfigurable...
Upcoming SlideShare
Loading in...5
×

Hardware enhanced association rule mining

517

Published on

Hardware enhanced association rule mining

Published in: Devices & Hardware
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
517
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Hardware enhanced association rule mining"

  1. 1. Hardware-Enhanced Association Rule Mining with Hashing and Pipelining Ying-Hsiang Wen, Jen-Wei Huang, and Ming-Syan Chen, Fellow, IEEE Abstract—Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware- enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in a priori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach and the software algorithm in terms of execution time. Index Terms—Hardware enhanced, association rule. Ç 1 INTRODUCTION DATA mining technology is now used in a wide variety of fields. Applications include the analysis of customer transaction records, web site logs, credit card purchase information, call records, to name a few. The interesting results of data mining can provide useful information such as customer behavior for business managers and research- ers. One of the most important data mining applications is association rule mining [11], which can be described as follows: Let I ¼ fi1; i2; . . . ; ing denote a set of items; let D denote a set of database transactions, where each transac- tion T is a set of items such that T I; and let X denote a set of items, called an itemset. A transaction T contains X if and only if X T. An association rule is an implication of the form X¼)Y , where X I, Y I, and X T Y ¼ . The rule X¼)Y has support s percent in the transaction set D if s percent of transactions in D contain X S Y . The rule X¼)Y holds in the transaction set D with confidence c percent if c percent of transactions in D that contain X also contain Y . The support of the rule X¼)Y is given by s percent ¼ jfT 2 DjX S Y Tgj jDj à 100 percent; where j:j indicates the number of transactions. The confidence of the rule X¼)Y is given by c percent ¼ suppððX [ Y Þ suppðXÞ Ã 100 percent: A typical example of an association rule is that 80 percent of customers who purchase beef steak and goose liver paste would also prefer to buy bottles of red wine. Once we have found all frequent itemsets that meet the minimum support requirement, calculation of confidence for each rule is trivial. Therefore, we only need to focus on the methods of finding the frequent itemsets in the database. The Apriori [2] approach was the first to address this issue. Apriori finds frequent itemsets by scanning a database to check the frequencies of candidate itemsets, which are generated by merging frequent subitemsets. However, Apriori-based algorithms have undergone bottlenecks because they have too many candidate itemsets. DHP [16] proposed a hash table scheme, which effectively reduces the number of candidate itemsets. In addition, several mining techniques, such as TreeProjection [1], the FP-growth algorithm [12], partitioning [18], sampling [19], and the Hidden Markov Model [5] have also received a significant amount of research attention. With the increasing amount of data, it is important to develop more efficient algorithms to extract knowledge from the data. However, the volume of data size is increasing much faster than CPU execution speeds, which has a strong influence on the performance of software algorithms. Several works [7], [8] have proposed parallel computing schemes to execute operations simultaneously 784 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 . The authors are with the National Taiwan University, 106, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan. E-mail: {winshung, jwhuang}@arbor.ee.ntu.edu.tw, mschen@cc.ee.ntu.edu.tw. Manuscript received 25 Feb. 2007; revised 9 Aug. 2007; accepted 8 Oct. 2007; published online 11 Feb. 2008. For information on obtaining reprints of this article, please send e-mail to: tkde@computer.org, and reference IEEECS Log Number TKDE-0086-0207. Digital Object Identifier no. 10.1109/TKDE.2008.39. 1041-4347/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society
  2. 2. on multiprocessors. The performance, however, cannot improve linearly as the number of the parallel nodes grows. Therefore, some researchers have tried to use hardware devices to accomplish data mining tasks. In [15], Liu et al. proposed a parallel matrix hardware architecture, which can efficiently generate candidate 2-itemsets, for high- throughput data stream applications. Baker and Prasanna [3], [4] designed scalable hardwares for association rule mining by utilizing the systolic array proposed in [13] and [14]. The architecture utilizes parallel computing techniques to execute a large number of pattern matching operations at the same time. Other hardware architectures [6], [9], [10], [20] have been designed to speed up the K-means clustering algorithm. Generally speaking, Apriori-based hardware schemes require loading the candidate itemsets and the database into the hardware. Since the capacity of the hardware is fixed, if the number of items in the database is larger than the hardware capacity, the data items must be loaded sepa- rately. Therefore, the process of comparing candidate item- sets with the database needs to be executed several times. Similarly, if the number of candidate itemsets is larger than the capacity of the hardware, the pattern matching proce- dure has to be separated into many rounds. Clearly, it is infeasible for any hardware design to load the candidate itemsets and the database into hardware for multiple times. Since the time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets and the number of items in the database, this procedure is very time consuming. In addition, numerous candidate itemsets and a huge database may cause a bottleneck in the system. In this paper, we propose a HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware-enhanced association rule mining. That is, we identify certain parts of the mining process that is suitable and will benefit from hardware implementation and perform hardware-enhanced mining. Explicitly, we incorporate the pipeline methodology into the HAPPI architecture to compare itemsets and collect useful information that enables us to reduce the number of candidate itemsets and items in the database simulta- neously. As shown in Fig. 1, there are three hardware modules in our system. First, when the database is fed into the hardware, the candidate itemsets are compared with the items in the database by the systolic array. Candidate itemsets that have a higher frequency than the minimum support value are viewed as frequent itemsets. Second, we determine the frequency that each item occurs in the candidate itemsets in the transactions at the same time. These frequencies are called trimming information. From this information, infrequent items in the transactions can be eliminated since they are not useful in generating frequent itemsets through the trimming filter. Third, we generate itemsets from transactions and hash them into the hash table, which is then used to filter out unnecessary candidate itemsets. After the hardware compares candidate itemsets with the items in the database, the trimming information is collected and the hash table is built. The useful information helps us to reduce the number of items in the database and the number of candidate itemsets. Based on the trimming information, items are trimmed if their corresponding occurrence frequencies are not larger than the length of the current candidate itemsets. In addition, after the candidate itemsets are generated by merging frequent subitemsets, they are sent to the hash table filter. If the number of itemsets in the corresponding bucket of the hash table is less than the minimum support, the candidate itemsets are pruned. As such, HAPPI solves the bottleneck problem mentioned earlier by the cooperation of these three hardware modules. To achieve these goals, we devise the following five procedures in the HAPPI architecture: support counting, transaction trimming, hash table building, candidate gen- eration, and candidate pruning. Moreover, we derive several formulas to decide the optimal design in order to reduce the overhead induced by the pipeline scheme and the ideal number of hardware modules to achieve the best utilization. The execution time between sequential processing and pipeline processing is also analyzed in this paper. We conduct several experiments to evaluate the perfor- mance of the HAPPI architecture. In addition, we implement the work of Baker and Prasanna [3] and a software algorithm DHP [16] for comparison purposes. The experiment results show that HAPPI outperforms the previous approach on execution time significantly, especially when the number of items in the database is large and the minimum support value increases. Moreover, the performance of HAPPI is better than that of the previous approach [3] when the systolic array contains different numbers of hardware cells. In fact, by using only 25 hardware cells in the systolic array, we can achieve the same performance as more than 800 hardware cells in the previous approach. The advantages of the HAPPI architecture are that it has more computing power and saves the space costs for mining association rules in hardware design. The scale-up experiment also shows that HAPPI outperforms the previous approach on different numbers of transactions in the database. Indeed, our architecture is a good example to demonstrate the metho- dology of performance enhancement by hardware. We implement our architecture on a commercial FPGA board. It is easily to be realized in a custom ASIC. With the progress in IC process technology, the performance of HAPPI will further be improved. In view of the fast increase in the amount of data in various emerging mining applications (e.g., network application mining, data stream mining, and bioinformatics data mining), it is envisioned that hardware- enhanced mining is an important research direction to explore for future data mining tasks. The remainder of the paper is organized as follows: We discuss related works in Section 2. The preliminaries are presented in Section 3. The HAPPI architecture is described in Section 4. Next, we show several experiments conducted on HAPPI in Section 5. Finally, we present our conclusions in Section 6. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 785 Fig. 1. System architecture.
  3. 3. 2 RELATED WORKS In this section, we discuss two previous works that use a systolic array architecture to enhance the performance of data mining. The Systolic Process Array (SPA) architecture is pro- posed in [10] to perform K-means clustering. SPA accel- erates the processing speed by utilizing several hardware cells to calculate the distances in parallel. Each cell corresponds to a cluster and stores the centroid of the cluster in local memory. The data flows linked by each cell include the data object, the minimum distance between the object and its closest centroid, and the closest centroid of the object. The cell computes the distance between the centroid and the input data object. Based on the resulting distance, the cell updates the minimum distance and the closest centroid of the data object. Therefore, the system can obtain the closest centroid of each object, respectively, from SPA. The centroids are recomputed and updated by the system, and the new centroids are sent to the cells. The system continuously updates clustering results. In [3], the authors implemented a systolic array with several hardware cells to speed up the Apriori algorithm. Each cell performs an ALU (larger than, smaller than, or equal to) operation, which compares the incoming item with items in the memory of the cell. This operation generates frequent itemsets by comparing candidate item- sets with the items in the database. Since all the cells can execute their own operations simultaneously, the perfor- mance of the architecture is better than that of a single processor. However, the number of cells in the systolic array is fixed. If the number of candidate itemsets is larger than the number of hardware cells, the pattern matching procedure has to be separated into many rounds. It is infeasible to load candidate itemsets and the database into the hardware for multiple times. As reported in [3], the performance is only about four times faster than some software algorithms. Hence, there is much room to improve the execution time. 3 PRELIMINARIES The hash table scheme proposed in DHP [16] improves the performance of Apriori-based algorithms by filtering out infrequent candidate itemsets. In addition, DHP employs an effective pruning scheme to eliminate infrequent items in transactions. We summarize these two schemes below. In the hash table scheme, a hash function is applied to all of candidate k-itemsets generated by frequent subitemsets. Each candidate k-itemset is mapped to a hash value, and itemsets with the same hash value are put into the same bucket of the hash table. If the number of the candidate itemsets in the bucket is less than the minimum support threshold, the number of these candidate itemsets in the database is less than the minimum support threshold. As a result, these candidate itemsets cannot be frequent and are removed from the system. On the other hand, if the number of the candidate itemsets in the bucket is larger than the minimum support threshold, the itemsets are carried to real frequency testing process by scanning the database. The hash table for filtering candidate k-itemsets Hk is built by hashing the k-itemsets generated by each transac- tion. A hash table contains n buckets, where n is an arbitrary number. When an itemset is hashed to the bucket i, the number of itemsets in the bucket is increased by one. The number of itemsets in each bucket represents the accumu- lated frequency of the itemsets whose hash values are assigned to that bucket. After candidate k-itemsets have been generated, they are hashed and assigned to buckets of Hk. If the number of itemsets in a bucket is less than the minimum support, candidate itemsets in this bucket are removed. The example in Fig. 2 demonstrates how to build H2 and how to use it to filter out candidate 2-itemsets. After we scan the transaction TID ¼ 100, AC , AD , and CD are hashed to the buckets. According to the hash function shown in Fig. 2, the hash values of AC , AD , and CD are 6, 0, and 6, respectively. As a result, the number of itemsets in the buckets indexed by 6, 0, and 6 is increased by one. After all the transactions in the database have been scanned, frequent 1-itemsets are found, i.e., L1 ¼ fA; B; C; Eg. In addition, the number of itemsets in the buckets of H2 are 3; 1; 2; 0; 3; 1; 3 , and the minimum support frequency is 2. Thus, the candidate 2-itemsets in buckets 1, 3, and 5 should be pruned. If we generate candidate 2-itemsets from L1 Ã L1 directly, the original set of candidate 2-itemsets C2 is f AB ; AC ; AE ; BC ; BE ; CE g: After filtering out unnecessary candidate itemsets by checking H2, the new C0 2 becomes f AC ; BC ; BE ; CE g: Therefore, the number of candidate itemsets can be reduced. The pruning scheme which is able to filter out infrequent items in the transactions will be implemented in hardware. The theoretical backgrounds of the pruning scheme are based on the following two theorems which were presented in [17]: Theorem 1. A transaction can only be used to support the set of frequent ðk þ 1Þ-itemsets if it consists of at least ðk þ 1Þ candidate k-itemsets. Theorem 2. An item in a transaction can be trimmed if it does not appear in at least k of the candidate k-itemsets contained in the transaction. 786 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 Fig. 2. The process of building H2 and using H2 to filter out C2.
  4. 4. Based on Theorem 2, whether an item can be trimmed or not depends on how many candidate itemsets in the current transaction contain this item. The transaction trimming module is based on the frequencies of all candidate itemsets in an individual transaction. Therefore, we can handle every transaction independently regardless of other trans- actions in the database. A counter array a½ Š is used to record the number of times that each item in a transaction occurs in the candidate k-itemsets. That is, counter a½iŠ represents the frequency of ith item in the transaction. If a candidate k-itemset is a subset of the transaction, the numbers in the counter array of the corresponding items that appear in this candidate itemset are increased by one. After comparing with all the candidate k-itemsets, if the value of a counter is less than k, the item in the transaction is trimmed, as shown in Fig. 3. For example, transaction TID ¼ 100 has three items fA; C; Dg. Counter a½0Š represents A, a½1Š represents C, and a½2Š represents D. Transaction TID ¼ 300 has four items fA; B; C; Eg. Counter a½0Š corresponds to A, a½1Š corresponds to B, a½2Š corresponds to C, and a½3Š corre- sponds to E, respectively. After the comparison with the set of candidate 2-itemsets, the values of the counter array for TID ¼ 100 are 1; 1; 0 and the values of the counter array for TID ¼ 300 are 1; 2; 2; 2 . Since all values of the counter array for TID ¼ 100 are less than 2, all correspond- ing items are trimmed from the transaction TID ¼ 100. On the other hand, because the value of a½0Š for TID ¼ 300 is less than 2, item A is trimmed from the transaction. Therefore, transaction TID ¼ 300 becomes fB; C; Eg. 4 HAPPI ARCHITECTURE As noted earlier, Apriori-based hardware schemes have to load candidate itemsets and the database into the hardware to execute the comparison process. Too many candidate itemsets and a huge database would cause a performance bottleneck. To solve this problem, we propose the HAPPI architecture to deal with efficient hardware-enhanced association rule mining. We incorporate the pipeline methodology into the HAPPI architecture to perform pattern matching and collect useful information to reduce the number of candidate itemsets and items in the database simultaneously. In this way, HAPPI effectively solves the bottleneck problem. In Section 4.1, we introduce our system architecture. In Section 4.2, the pipeline scheme of the HAPPI architecture is presented. The transaction trimming scheme is given in Section 4.3. Then, we describe the hardware design of hash table filter in Section 4.4. Finally, we derive some properties for performance evaluation in Section 4.5. 4.1 System Architecture As shown in Fig. 4, the HAPPI architecture consists of a systolic array, a trimming filter, and a hash table filter. There are several hardware cells in the systolic array. Each cell can perform the comparison operation. Based on the comparison results, the cells update the support counters of candidate itemsets and the occurrence frequencies of items in the trimming information. A trimming filter then removes infrequent items in the transactions according to the trimming information. In addition, we build a hash table by hashing itemsets generated by each transaction. The hash table filter then prunes unsuitable candidate itemsets. To find frequent k-itemsets and generate candidate ðk þ 1Þ-itemsets efficiently, we devise five procedures in the HAPPI architecture using the three hardware modules: the systolic array, the trimming filter, and the hash table filter. The procedures are support counting, transaction trimming, hash table building, candidate generation, and candidate pruning. The work flow is shown in Fig. 5. The support counting procedure finds frequent itemsets by comparing candidate itemsets with transactions in the database. By loading candidate k-itemsets and streaming transactions into WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 787 Fig. 3. An example of transaction trimming. Fig. 4. The HAPPI architecture: (a) systolic array, (b) trimming filter, and (c) hash table filter.
  5. 5. the systolic array, the frequencies that candidate itemsets occur in the transactions can be determined. Note that if the number of candidate itemsets is larger than the number of hardware cells in the systolic array, the candidate itemsets are separated into several groups. Some of the candidate itemsets are loaded into the hardware cells and the database is fed into the systolic array. Afterward, the other candidate itemsets are loaded into the systolic array one by one. To complete the comparison with all the candidate itemsets, the database has to be examined several times. To reduce the overhead of repeated loading, we design two additional hardware modules, namely, a trimming filter and a hash table filter. Infrequent items in the database are eliminated by the trimming filter, and the number of candidate itemsets is reduced by the hash table filter. Therefore, the time required for support counting procedure can be effectively reduced. After all the candidate k-itemsets have been compared with the transactions, their frequencies are sent back to the system. The frequent k-itemsets can be obtained from the candidate k-itemsets whose occurrence frequencies are larger than the minimum support. While the transactions are being compared with the candidate itemsets, the corresponding trimming information is collected. The occurrence frequency of each item, which is contained in the candidate itemsets in the transactions, is recorded and updated to the trimming information. After comparing candidate itemsets with the database, the trimming information is collected. The occurrence frequencies and the corresponding transactions are then transmitted to the trimming filter, and infrequent items are trimmed from the transactions according to the occurrence frequencies in the trimming information. Then, the hash table building procedure generates ðk þ 1Þ-itemsets from the trimmed transactions. These ðk þ 1Þ-itemsets are hashed into the hash table for processing. Next, the candidate generation procedure is also executed by the systolic array. The frequent k-itemsets are fed into the systolic array for comparison with other frequent k-itemsets. The candidate ðk þ 1Þ-itemsets are generated by the systolic injection and stalling techniques similar to [3]. The candidate pruning procedure uses the hash table to filter candidate ðk þ 1Þ-itemsets that are not possible to be frequent itemsets. Then, the procedure reverts to the support counting procedure. The pruned candidate ðk þ 1Þ-itemsets are loaded into the systolic array for comparison with transactions that have been trimmed already. The above five processes are executed repeatedly until all frequent itemsets have been found. 4.2 Pipeline Design We observe that the transaction trimming and the hash table building procedures are blocked by the support counting procedure. The transaction trimming procedure has to obtain trimming information to execute the trimming process. However, this process cannot be completed until the support counting procedure compares all the transac- tions with all the candidate itemsets. In addition, the hash table building procedure has to get the trimmed transactions from the trimming filter after all the transactions have been trimmed. This problem can be resolved by applying the pipeline scheme, which utilize the three hardware modules simultaneously in the HAPPI framework. First, we divide the database into Npipe parts. One part of the transactions in the database is streamed into the systolic array and the support counting process is performed on all candidate itemsets. After comparing the transactions with all the candidate itemsets, the transactions and their trimming information are passed to the trimming filter first. The systolic array then processes the next group of transactions. After items have been trimmed from a transaction by the trimming filter, the transaction is passed to the hash table filter, as shown in Fig. 6, and the trimming filter can deal with the next transaction. In this way, all the hardware modules can be utilized simultaneously. Although the pipelined architecture improves the system’s performance, it increases the computational overhead because of multiple times of loading candidate itemsets into the systolic array. The performance of the pipeline scheme and the improved design of the HAPPI architecture are discussed in Section 4.5. 4.3 Transaction Trimming While the support counting procedure is being executed, the whole database is streamed into the systolic array. However, not all the transactions are useful for generating frequent itemsets. Therefore, we filter out items in the transactions according to Theorem 2 so that the database is reduced. In the HAPPI architecture, the trimming information records the frequency of each item in a transaction that appears in the candidate itemsets. The support counting and trimming information collecting operations are similar since they all need to compare candidate itemsets with transactions. Therefore, in addition to transactions in the database, their corresponding trimming information is also fed into the systolic array in another pipe, while the support counting process is being executed. As shown in Fig. 7, a trimming vector is embedded in each hardware cell of the systolic array to record items that are matched with candidate 788 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 Fig. 5. The procedure flow of one round. Fig. 6. A diagram of the pipeline procedures.
  6. 6. itemsets. The ith flag in the trimming vector is set to true if the ith item in the transaction matches the candidate itemset. After comparing the candidate itemset with all the items in a transaction, if the candidate itemset is a subset of the transaction, the incoming corresponding trimming informa- tion will be accumulated according to the trimming vector. Since transactions and trimming information are input in different pipes, support counters and trimming information can be updated simultaneously in a hardware cell. In Fig. 7a, the candidate itemset BC is stored in the candidate memory, and a transaction fA; B; C; D; Eg is about to be fed into the cell. The resultant trimming vector after comparing BC with all the items in the transac- tion is shown in Fig. 7b. Because items B and C are matched with the candidate itemset, the trimming vector becomes 0; 1; 1; 0; 0 . Meanwhile, the corresponding trimming information is fed into the trimming register, and the trimming information is updated from 0; 1; 1; 0; 1 to 0; 2; 2; 0; 1 . After passing through the systolic array, transactions and their corresponding trimming information are passed to the trimming filter. The filter trims off items whose frequencies are less than k. As the example in Fig. 8 shows, the trimming information of the transaction fA; B; C; D; Eg is 2; 2; 2; 1; 2 and the current k is 2. Therefore, the item D should be trimmed. The new transaction becomes fA; B; C; Dg. In this way, the size of the database can be reduced. The trimmed transactions are sent to the hash table filter module for hash table building. 4.4 Hash Table Filtering To build a hardware hash table filter, we use a hash value generator and hash table updating module. The former generates all the k-itemset combinations of the transactions and puts the k-itemsets into the hash function to create the corresponding hash values. As shown in Fig. 9, the hash value generator comprises a transaction memory, a state machine, an index array, and a hash function. The transaction memory stores all the items of a transaction. The state machine is the controller that generates control signals of different lengths ðk ¼ 2; 3 . . .Þ flexibly. Then, the control signals are fed into the index array. To generate a k-itemset, the first k entries in the index array are utilized. The values in the index array are the indices of the transaction memory. The item selected by the ith entry of the index array is the ith item in a k-itemset. By changing the values in the index array, the state machine can generate different combinations of k-itemsets from the transaction. The procedure starts by loading a transaction into the transaction memory. Then, the values in the index array are reset, and the state machine starts to generate control signals. The values in the index array are changed by the different states. Each item in the generated itemset is passed to the hash function through the multiplexer. The hash function takes some bits from the incoming k-itemsets to calculate the hash values. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 789 Fig. 7. An example of streaming a transaction and the corresponding trimming information into the cell. (a) Stream a transaction into the cell. (b) Stream trimming information into the cell. Fig. 8. The trimming filter. Fig. 9. The hash value generator.
  7. 7. Consider the example in Fig. 9. We assume the current k is 3. The first three index entries in the index array are used in this case. The transaction fA; C; E; F; Gg is loaded into the transaction memory. The values in the index array are initiated to 0, 1, and 2, respectively, so that the first itemset generated is ACE . Then, the state machine changes the values in the index array. The following numbers in the index array will be 0; 1; 3 , 0; 1; 4 , 0; 2; 3 , 0; 2; 4 , to name a few. Therefore, the corresponding itemsets are ACF , ACG , AEF , AEG , and so on. The hash values generated by the hash value generator are passed to the hash table updating module. To speed up the process of hash table building, we utilize Nparallel hash value generators so that the hash values can be generated simultaneously. In addition, the hash table is divided into several parts to increase the throughput of hash table building. Each part of the hash table contains a range of hash values, and the controller passes the incoming hash values to the buffer they belong to. These hash values are taken as indexes of the hash table to accumulate the values in the table, as shown in Fig. 10. There are four parallel hash value generators. The size of the whole hash table is 65,536, and it is divided into four parts. Thus, the range of each part is 16,384. If the incoming hash value is 5, it belongs to the first part of the hash table. The controller would pass the value to buffer 1. If there are parallel accesses to the hash table at the same time, only one access can be executed. The others will be delayed and be handled as soon as possible. The delayed itemsets are stored in the buffer temporally. Whenever the access port of hash table is free, the delayed itemsets are put into the hash table. After all the candidate k-itemsets have been generated, they are pruned by the hash table filter. Each candidate itemset is hashed by the hash function. By querying the number of itemsets in the bucket with the corresponding hash value, the candidate itemset is pruned if the number of itemsets in the bucket does not meet the minimum support criteria. Therefore, the number of the candidate itemsets can be reduced effectively with the help of the hash table filter. 4.5 Performance Analysis In this section, we derive some properties of our system with and without the pipeline scheme to investigate the total execution time. Suppose the number of candidate k-itemsets is NcandÀk and the number of frequent k-itemsets is NfreqÀk: There are Ncell hardware cells in the systolic array. jTj represents the average number of items in a transaction, and jDj is the total number of items in the database. As shown in Fig. 5, the time needed to find frequent k-itemsets and candidate ðk þ 1Þ-itemsets includes the time required for support counting, transaction trimming, hash table building, candidate generation, and candidate pruning. First, the execution time of the support counting procedure is related to the number of times candidate itemsets and the database are loaded into the systolic array. That is, if NcandÀk is larger than Ncell, the candidate itemsets and the database must be input into the systolic array dNcandÀk=Ncelle times. Each time there are, at most, Ncell candidate itemsets loaded into the systolic array. The number of items in these candidate k-itemsets is, at most, k à Ncell. In addition, all items in the database need jDj cycles to be streamed into the systolic array. Therefore, the execution cycle of the support counting procedure is, at most tsup ¼ dNcandÀk=Ncelle à ðk à Ncell þ jDjÞ: Second, the transaction trimming procedure eliminates infrequent items and receives incoming items at the same time. A transaction item and the corresponding trimming information are fed into the trimming filter during each cycle. After the whole database has been passed through the trimming filter, the transaction trimming procedure is finished. Thus, the execution cycle depends on the number of items in the database: ttrim ¼ jDj: Third, the hash table building procedure consists of the hash value generation and the hash table updating pro- cesses. Because the processes can be executed simulta- neously, the execution time is based on the process that generates the hash values. The execution time of hash value generation consists of the time taken by transaction loading and by hash value generation from transactions. The overall transaction loading time is jDj cycles. In addition, there are jTj items in a transaction on average. Thus, the number of ðk þ 1Þ-itemset combinations from a transaction is C jTj kþ1. Each time ðk þ 1Þ-itemset requires ðk þ 1Þ cycles to be generated. The average number of transactions in the database is jDj jTj . Therefore, the execution time of hash value generation from the whole database is ðk þ 1Þ Ã C jTj k à jDj jTj cycles. Because we have designed a parallel architecture, the procedure can be executed by Nparallel hardware modules simultaneously. The execution cycle of the hash table building procedure is thash ¼ jDj þ ðk þ 1Þ Ã C jTj kþ1 à jDj jTj à 1 Nparallel : The fourth procedure is candidate generation. Frequent k-itemsets are compared with other frequent k-itemsets to generate candidate ðk þ 1Þ-itemsets. The execution time of candidate generation consists of the time required to load frequent k-itemsets (at most, k à Ncell each time), the time taken to pass frequent k-itemsets ðk à NfreqÀkÞ through the systolic array, and the time needed to generate candidate k-itemsets ðNcandÀðkþ1ÞÞ. Similar to the support counting procedure, if NfreqÀk is larger than Ncell, the frequent k-itemsets have to be separated into several groups and the 790 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 Fig. 10. The parallel hash table building module.
  8. 8. comparison process has to be executed several times. Thus, the execution cycle is, at most tcandidate generation ¼ dNfreqÀk=Ncelle à ðk à Ncell þ k à NfreqÀkÞ þ NcandÀðkþ1Þ: Finally, the candidate pruning procedure has to hash candidate ðk þ 1Þ-itemsets and to query the hash table Hkþ1 each time ðk þ 1Þ-itemset requires ðk þ 1Þ cycles to be generated. The hash table can accept one input query during each cycle. Thus, the execution time needed to hash candidate k-itemsets ððk þ 1Þ Ã NcandÀðkþ1ÞÞ and query the hash table ðNcandÀkþ1Þ is tcandidate pruning ¼ ðk þ 1Þ Ã NcandÀðkþ1Þ þ NcandÀðkþ1Þ: Since the size of the database that we consider is much larger than the number of the candidate k-itemsets, we can neglect the execution time of candidate generation and pruning. Therefore, the time required for one round of the sequential execution tseq is the sum of the time taken by the support counting, transaction trimming, and hash table building procedures, as shown in Property 1. Property 1. tseq ¼ tsup þ ttrim þ thash. The pipeline scheme incorporated in the HAPPI archi- tecture divides the database into Npipe parts and inputs them into the three modules. However, this scheme causes some overhead toverhead because of reloading candidate itemsets for multiple times in the support counting procedure: toverhead ¼ dNcandÀk=Ncelle à ðk à NcellÞ Ã Npipe: Therefore, the support counting procedure has to con- sider toverhead. The execution time of the support counting procedure in the pipeline scheme becomes t0 sup ¼ tsup þ toverhead: The execution time of the pipeline scheme tpipe is analyzed according to the following two cases: Case 1. If the execution time of the support counting procedure is longer than that of the hash table building procedure, the other parts of the procedure would finish their operations before the support counting procedure. However, the transaction trimming and hash table building procedures have to wait for the last part of the data from the support counting procedure. Therefore, the total execution time is t0 sup, and the time required to process the last part of the database with the trimming filter and the hash table filter is tpipe ¼ t0 sup þ ðttrim þ thashÞ Ã 1 Npipe : Case 2. If the execution time of the support counting procedure is less than that of the hash table building procedure, the other parts of the procedure would be completed before the hash table building procedure. Since the hash table building procedure has to wait for the data from the support counting and transaction trimming procedures, the total execution time is thash, and the time required to process the first part of the database with the support counting and the trimming filter is tpipe ¼ t0 sup þ ttrim à 1 Npipe þ thash: Summarizing the above two cases, the execution time tpipe can be presented as Property 2. Property 2. tpipe ¼ maxðt0 sup; thashÞ þ minðt0 sup; thashÞ Ã 1 Npipe þ ttrim à 1 Npipe : To achieve the minimal value of tpipe, we consider the following two cases: 1. If t0 sup is larger than thash, the execution time tpipe is dominated by t0 sup. To decrease the value of t0 sup, we can increase the number of hardware cells in the systolic array until t0 sup is equal to thash. Therefore, the optimal value for Npipe to reach the minimal tpipe is Npipe ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 dNcandÀk=Ncelle à k à Ncell à ðttrim þ thashÞ s : 2. If t0 sup is smaller than thash, the execution time tpipe is mainly taken up by thash. To decrease the value of thash, we can increase the number of Nparallel until thash is equal to t0 sup. As a result, the optimal value for Npipe to achieve the minimal tpipe in this case is Npipe ¼ 1 dNcandÀk=Ncelle à k à Ncell à ðthash À tsupÞ: To decide the values of Ncell and Nparallel in the HAPPI architecture, we have to get the value of NcandÀk. However, NcandÀk varies with different numbers of k. Therefore, we decide these values according to our experimental experi- ence. Generally speaking, for many types of data in the real world, the number of candidate 2-itemsets is the largest. Thus, we can focus on the case k ¼ 2, since the execution time would be the largest of all the processes. For a data set with jTj ¼ 10 and jDj ¼ 1 million, if the minimum support is 0.2 percent, NcandÀk is about 3,000. Assume that there are 500 hardware cells in the systolic array. To accelerate the hash table building procedure, we can increase the number of Nparallel. Based on Property 2, the best number of Nparallel is set to 4. In addition, after the transactions are trimmed by the trimming filter we can get the current number of items in the database. Also, the number of NcandÀk can be obtained after candidate k-itemsets are pruned by the hash table filter. Therefore, we can calculate the value of tsup and thash before starting the support counting and the hash table building procedures. Since the toverhead is less than the size of the database under consideration, tsup can be viewed as t0 sup. Based on the formulas we derived above, we can get the best value of Npipe to minimize the value of tpipe. When the support counting procedure is dealing with candidate 2-itemsets and the hash table building procedure is about WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 791
  9. 9. to build H3, we divide the database into 30 parts, i.e., Npipe ¼ 30. By applying pipeline scheme to these three hardware modules, the hardware utilization increases and the wastage due to the blocking can be reduced. 5 EXPERIMENT RESULTS In this section, we conduct several experiments on a number of synthetic data sets to evaluate the performance of the HAPPI architecture. We also implement an approach mainly based on [3], abbreviated as Direct Comparison (DC) method, for comparison purposes. Although a hard- ware algorithm was proposed in [4], its performance improvement is found much less than DC. However, HAPPI significantly outperforms DC in orders of magni- tude. Moreover, we implement a software algorithm DHP [16], denoted by SW_DHP, as the baseline. The software algorithm is executed on a PC with 3-GHz P-4 CPU and 1-Gbytes RAM. Both HAPPI and DC are implemented on the Altera Stratix 1S40 FPGA board with 50-MHZ clock rate and 10-Mbytes SDRAM. The hardware modules are coded in Verilog. We use ModelSim to simulate Verilog codes and verify the functions of our design. In addition, we use Altera Quartus II IDE to build hardware modules and synthesize modules into hardware circuits. Finally, the hardware circuit image is sent to the FPGA board. There is a synthesized CPU on an FPGA board. We implement a software program on NIOSII to verify and to record hardware execution time. At first, the program downloads data from the database into the memory on FPGA. Then, the data is fed into hardware modules. The bandwidth consumed from the memory to the chip is 16 bits/cycle in our design. Since the data is transferred on the bus, the maximum bandwidth is limited by the bandwidth of the bus on FPGA, which is generally 32 bits/cycle. The bandwidth can be upgraded to 64 bits/cycle in some modern FPGA board. After the execution of hardware modules, we can acquire outcomes and execution cycles. The following experimental results are based on execution cycles on FPGA board. In addition, in our hardware design, the critical path is in the hash table building module. There are many logic combinations in this module. The synthesized hardware core is, therefore, complex. This makes the maximum clock frequency in our hardware design bound to 58.6 MHz. However, since the maximum clock frequency on Altera FPGA1s40 is 50 MHz, our design meets the hardware requirement. In our future work, we are going to increase the clock frequency of the hardware architecture. We will try to optimize the bottleneck module. In the hardware implementation of the HAPPI architec- ture, Nparallel is set to 4. The number of the hardware cells in the systolic array is 500; there are 65,536 buckets in the hash table; and Npipe are assigned according to the methodology in Section 4.5. The method used to generate synthetic data is described in Section 5.1. The performance comparison of several schemes in the HAPPI architecture and DC are discussed in Section 5.2. Section 5.3 presents the perfor- mance analysis of different distributions of frequent item- sets. Finally, in Section 5.4, the results of some scale-up experiments are discussed. 5.1 Generation of Synthetic Data To obtain reliable experimental results, we employ similar methods to those used in [16] to generate synthetic data sets. These data sets are generated with the following parameters: T represents the average number of items in a transaction of the database, I denotes the average length of the maximum potentially frequent itemsets, D is the number of transac- tions in the database, L is the number of maximal potentially frequent itemsets, and N is the number of different items in the database. Table 1 summarizes the parameters used in our experiments. In the following experiment data sets, L is set to 3,000. To evaluate the performance of HAPPI, we conduct several experiments with different data sets. We use TxIyDz to represent that T ¼ x, I ¼ y, and D ¼ z. In addition, the sensitivity analysis and the scale-up experiments are also explored with different data sets. Note that the y-axis of the following figures is the execution cycle in logarithmic scale. 5.2 Performance Evaluation Initially, we conduct experiments to evaluate the perfor- mance of several schemes in the HAPPI architecture and DC. The testing data sets are T10I4D100 with different numbers of items in the database. The minimum support is set to 0.5 percent. As shown in Fig. 11, the four different schemes are 1. the DC scheme, 2. the systolic array with a trimming filter, 3. the combined scheme made up of the systolic array, the trimming filter and the hash table filter, and 4. the overall HAPPI architecture with the pipeline design. 792 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 TABLE 1 Summary of the Parameters Used Fig. 11. The execution cycles of several schemes.
  10. 10. With the trimming filter, the execution time increases to about 10 percent-70 percent compared to DC. As the number of different items in the database increases, the improvement due to the trimming filter increases. The reason is that, if the number of different items grows, the number of infrequent itemsets also increases. Therefore, the filtering rate by utilizing the trimming filter is more remarkable. Moreover, the execution cycle of the combined scheme with the help of the hash table filter is about 25- 51 times faster. The HAPPI architecture with the pipeline scheme is 47-122 times better than DC. Note that the main improvement of the execution time results from the hash table filter. We not only implemented an efficient hardware module for the hash table filter in the HAPPI architecture but also designed the pipeline scheme to let the hash table filter work together with other hardware modules. The pipeline and the parallel design are two helpful properties of the hardware architecture. Therefore, we utilize these hardware design skills to accelerate the overall system. As shown in Fig. 11, although the combined scheme provides much performance boost, the performance improvement still benefits from the overall HAPPI architecture with pipeline design. In summary, the HAPPI architecture outperforms DC, especially when the number of different items in the database is large. 5.3 Sensitivity Analysis In the second experiment, we generate several data sets with different distributions of frequent itemsets to examine the sensitivity of the HAPPI architecture. The experiment results of several synthetic data sets with various mini- mum supports are shown in Figs. 12 and 13. The results show that no matter what combination of the parameters T and I is used, the HAPPI architecture consistently outperforms DC and SW_DHP. Specifically, the execution time of HAPPI is less than that of DC and that of SW_DHP by several orders of magnitude. The margin grows as the minimum support increases. As Fig. 12 shows, HAPPI has a better performance enhancement ratio to DC on the T5I2D100K data set than on the T20I8D100K data set. The reason is that the hash table filter is especially effective in eliminating infrequent candidate 2-itemsets, as reported in the experimental results of DHP [16]. Thus, SW_DHP also has better performance with these two data sets. In addition, the execution time tpipe is mainly related to the number of times dNcandÀk=Ncelle of reloading the database. Since the number of NcandÀk can be substantially reduced when k is small, the overall performance enhancement is remarkable. Since the average size of the maximal potentially frequent itemsets of the data set T20I8D100K is 8, the performance of the HAPPI architecture is only 2.65 times faster. However, most data in the real world contains short frequent itemsets. That is, T and I are small. It is noted that DC has only little enhancement to SW_DHP with the short frequent itemsets, while HAPPI possesses better performance. Therefore, the HAPPI architecture can perform well with real-world data sets. Fig. 13 demonstrates that the improvement of the HAPPI architecture over DC becomes more noticeable with increas- ing minimum support. This is because the number of long itemsets would be eliminated with large minimum support. Therefore, the improvement due to the hash table filter increases. In comparison with DC, the overall performance is outstanding. 5.4 Scale-up Experiment According to the performance analysis in Section 4.5, the execution time tpipe is mainly related to the number of times, dNcandÀk=Ncelle, of reloading the database into the systolic array. Therefore, if the number of Ncell increases, the time we need to stream the database is less and the overall execution time is faster. Fig. 14 illustrates the scaling performance of the HAPPI architecture and DC, where the y-axis is also in logarithmic scale. The execution cycles of both HAPPI and DC linearly decrease with the increasing number of hardware cells. We observe that HAPPI outper- forms DC on different numbers of hardware cells in the systolic array. The most important result is that we only utilize 25 hardware cells in the systolic array but achieve the same performance as the 800 hardware cells used in DC. The benefit of the HAPPI architecture is more computing power with lower costs for data mining in hardware design. We also conduct experiments with different numbers of transactions in the synthetic data sets to explore the scalability of the HAPPI architecture. The generated data sets are T10I4, and the minimum support is set to 0.5 percent. As shown in Fig. 15, the execution time of HAPPI increases linearly as the number of transactions in the synthetic data sets increases. The performance of HAPPI outperforms DC on different numbers of transactions in the database. Furthermore, Fig. 15 shows the good scalability of HAPPI and DC. This feature is especially important because WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 793 Fig. 12. The execution time of data sets with different T and I. Fig. 13. The execution cycles with various minimum supports.
  11. 11. the size of applications is growing much faster than the speed of CPU. Thus, hardware-enhanced data mining technique is imperative. 6 CONCLUSION In this work, we have proposed the HAPPI architecture for hardware-enhanced association rule mining. The bottleneck of a priori-based hardware schemes is related to the number of candidate itemsets and the size of the database. To solve this problem, we apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information to reduce the number of candidate itemsets and items in the database simulta- neously. HAPPI can prune infrequent items in the transac- tions and reduce the size of the database gradually by utilizing the trimming filter. In addition, HAPPI can effectively eliminate infrequent candidate itemsets with the help of the hash table filter. Therefore, the bottleneck of a priori-based hardware schemes can be solved by the HAPPI architecture. Moreover, we derive some properties to analyze the performance of HAPPI. We conduct a sensitivity analysis of various parameters to show many insights into the HAPPI architecture. HAPPI outperforms the previous approach, especially with the increasing number of different items in the database, and with the increasing minimum support values. Also, HAPPI increases computing power and saves the costs of data mining in hardware design as compared to the previous approach. Furthermore, HAPPI possesses good scalability. ACKNOWLEDGMENTS The authors would like to thank Wen-Tsai Liao at Realtek for his helpful comments to improve this paper. The work was supported in part by the National Science Council of Taiwan under Contract NSC93-2752-E-002-006-PAE. REFERENCES [1] R. Agarwal, C. Aggarwal, and V. Prasad, “A Tree Projection Algorithm for Generation of Frequent Itemsets,” J. Parallel and Distributed Computing, 2000. [2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int’l Conf. Very Large Databases (VLDB), 1994. [3] Z.K. Baker and V.K. Prasanna, “Efficient Hardware Data Mining with the Apriori Algorithm on FPGAS,” Proc. 13th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM), 2005. [4] Z.K. Baker and V.K. Prasanna, “An Architecture for Efficient Hardware Data Mining Using Reconfigurable Computing Sys- tems,” Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM ’06), pp. 67-75, Apr. 2006. [5] C. Besemann and A. Denton, “Integration of Profile Hidden Markov Model Output into Association Rule Mining,” Proc. 11th ACM SIGKDD Int’l Conf. Knowledge Discovery in Data Mining (KDD ’05), pp. 538-543, 2005. [6] C.W. Chen, J. Luo, and K.J. Parker, “Image Segmentation via Adaptive K-Mean Clustering and Knowledge-Based Morphologi- cal Operations with Biomedical Applications,” IEEE Trans. Image Processing, vol. 7, no. 12, pp. 1673-1683, 1998. [7] S.M. Chung and C. Luo, “Parallel Mining of Maximal Frequent Itemsets from Databases,” Proc. 15th IEEE Int’l Conf. Tools with Artificial Intelligence (ICTAI), 2003. [8] S. Cong, J. Han, J. Hoeflinger, and D. Padua, “A Sampling-Based Framework for Parallel Data Mining,” Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP ’05), June 2005. [9] M. Estlick, M. Leeser, J. Szymanski, and J. Theiler, “Algorithmic Transformations in the Implementation of K-Means Clustering on Reconfigurable Hardware,” Proc. Ninth Ann. IEEE Symp. Field- Programmable Custom Computing Machines (FCCM), 2001. [10] M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, and D. Lavenier, “Experience with a Hybrid Processor: K-Means Cluster- ing,” J. Supercomputing, pp. 131-148, 2003. [11] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001. [12] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD ’00, pp. 1-12, May 2000. [13] H. Kung and C. Leiserson, “Systolic Arrays for VLSI,” Proc. Sparse Matrix, 1976. [14] N. Ling and M. Bayoumi, Specification and Verification of Systolic Arrays. World Scientific Publishing, 1999. [15] W.-C. Liu, K.-H. Liu, and M.-S. Chen, “High Performance Data Stream Processing on a Novel Hardware Enhanced Framework,” Proc. 10th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD ’06), Apr. 2006. [16] J.S. Park, M.-S. Chen, and P.S. Yu, “An Effective Hash Based Algorithm for Mining Association Rules,” Proc. ACM SIGMOD ’95, pp. 175-186, May 1995. [17] J.S. Park, M.-S. Chen, and P.S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Trans. Knowledge and Data Eng., vol. 9, no. 5, pp. 813-825, Sept./ Oct. 1997. [18] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 21st Int’l Conf. Very Large Databases (VLDB ’95), pp. 432-444, Sept. 1995. [19] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 22nd Int’l Conf. Very Large Databases (VLDB ’96), pp. 134-145, 1996. 794 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 Fig. 14. The execution cycles with different numbers of hardware units. Fig. 15. The execution cycles with various numbers of transactions, zðKÞ.
  12. 12. [20] C. Wolinski, M. Gokhale, and K. McCabe, “A Reconfigurable Computing Fabric,” Proc. Int’l Conf. Eng. of Reconfigurable Systems and Algorithms (ERSA), 2004. Ying-Hsiang Wen received the BS degree in computer science from the National Chiao Tung University and the MS degree in electrical engineering from the National Taiwan Univer- sity, Taipei, in 2006. His research interests include data mining, video streaming, and multi- media SoC design. Jen-Wei Huang received the BS degree in electrical engineering from the National Taiwan University, Taipei, in 2002, where he is currently working toward the PhD degree in computer science. He is familiar with data mining area. His research interests include data mining, mobile computing, and bioinformatics. Among these, the web mining, incremental mining, mining data streams, time series issues, and sequential pattern mining are his special inter- ests. In addition, some of his research are on mining general temporal association rules, sequential clustering, data broadcasting, progressive sequential pattern mining and bioinformatics. Ming-Syan Chen received the BS degree in electrical engineering from the National Taiwan University, Taipei, and the MS and PhD degrees in computer, information, and control engineer- ing from the University of Michigan, Ann Arbor, in 1985 and 1988, respectively. He was the chairman of the Graduate Institute of Commu- nication Engineering (GICE), National Taiwan University, from 2003 to 2006. He is currently a distinguished professor jointly appointed by the Electrical Engineering Department, Computer Science and Information Engineering Department, and GICE, National Taiwan University. He was a research staff member at IBM T.J. Watson Research Center, New York, from 1988 to 1996. He served as an associate editor of the IEEE Transactions on Knowledge and Data Engineering from 1997 to 2001 and is currently on the editorial board of the Very Large Data Base (VLDB) Journal and Knowledge and Information Systems. His research interests include database systems, data mining, mobile computing systems, and multimedia networking. He has published more than 240 papers in his research areas. He is a recipient of the National Science Council (NSC) Distinguished Research Award, Pan Wen Yuan Distinguished Research Award, Teco Award, Honorary Medal of Information, and K.-T. Li Research Breakthrough Award for his research work, as well as the IBM Outstanding Innovation Award for his contribution to a major database product. He also received numerous awards for his teaching, inventions, and patent applications. He is a fellow of the ACM and the IEEE. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 795

×