Hardware enhanced association rule mining with hashing and pipelining


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hardware enhanced association rule mining with hashing and pipelining

  1. 1. 784 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 Hardware-Enhanced Association Rule Mining with Hashing and Pipelining Ying-Hsiang Wen, Jen-Wei Huang, and Ming-Syan Chen, Fellow, IEEE Abstract—Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware- enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in a priori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach and the software algorithm in terms of execution time. Index Terms—Hardware enhanced, association rule. Ç1 INTRODUCTIOND ATA mining technology is now used in a wide variety of fields. Applications include the analysis of customertransaction records, web site logs, credit card purchase where j:j indicates the number of transactions. The confidence of the rule X¼)Y is given byinformation, call records, to name a few. The interesting suppððX [ Y Þ c percent ¼ Ã 100 percent:results of data mining can provide useful information such suppðXÞas customer behavior for business managers and research- A typical example of an association rule is that 80 percent ofers. One of the most important data mining applications isassociation rule mining [11], which can be described as customers who purchase beef steak and goose liver pastefollows: Let I ¼ fi1 ; i2 ; . . . ; in g denote a set of items; let D would also prefer to buy bottles of red wine. Once we havedenote a set of database transactions, where each transac- found all frequent itemsets that meet the minimum supporttion T is a set of items such that T I ; and let X denote a requirement, calculation of confidence for each rule isset of items, called an itemset. A transaction T contains X if trivial. Therefore, we only need to focus on the methods ofand only if X T . An association rule is an implication of T finding the frequent itemsets in the database. The Apriorithe form X¼ )Y , where X I , Y I , and X Y ¼ . The [2] approach was the first to address this issue. Apriorirule X¼ )Y has support s percent in the transaction set D if S finds frequent itemsets by scanning a database to check thes percent of transactions in D contain X Y . The rule frequencies of candidate itemsets, which are generated byX¼ )Y holds in the transaction set D with confidence merging frequent subitemsets. However, Apriori-basedc percent if c percent of transactions in D that contain X also algorithms have undergone bottlenecks because they havecontain Y . The support of the rule X¼ )Y is given by too many candidate itemsets. DHP [16] proposed a hash table scheme, which effectively reduces the number of S candidate itemsets. In addition, several mining techniques, jfT 2 DjX Y T gj s percent ¼ Ã 100 percent; such as TreeProjection [1], the FP-growth algorithm [12], jDj partitioning [18], sampling [19], and the Hidden Markov Model [5] have also received a significant amount of research attention.. The authors are with the National Taiwan University, 106, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan. With the increasing amount of data, it is important to E-mail: {winshung, jwhuang}@arbor.ee.ntu.edu.tw, develop more efficient algorithms to extract knowledge mschen@cc.ee.ntu.edu.tw. from the data. However, the volume of data size isManuscript received 25 Feb. 2007; revised 9 Aug. 2007; accepted 8 Oct. 2007; increasing much faster than CPU execution speeds, whichpublished online 11 Feb. 2008. has a strong influence on the performance of softwareFor information on obtaining reprints of this article, please send e-mail to:tkde@computer.org, and reference IEEECS Log Number TKDE-0086-0207. algorithms. Several works [7], [8] have proposed parallelDigital Object Identifier no. 10.1109/TKDE.2008.39. computing schemes to execute operations simultaneously 1041-4347/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society
  2. 2. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 785 itemsets from transactions and hash them into the hash table, which is then used to filter out unnecessary candidate itemsets. After the hardware compares candidate itemsets with the items in the database, the trimming information is collected and the hash table is built. The useful information helps us to reduce the number of items in the database andFig. 1. System architecture. the number of candidate itemsets. Based on the trimming information, items are trimmed if their correspondingon multiprocessors. The performance, however, cannot occurrence frequencies are not larger than the length of theimprove linearly as the number of the parallel nodes grows. current candidate itemsets. In addition, after the candidateTherefore, some researchers have tried to use hardware itemsets are generated by merging frequent subitemsets,devices to accomplish data mining tasks. In [15], Liu et al. they are sent to the hash table filter. If the number of itemsetsproposed a parallel matrix hardware architecture, which in the corresponding bucket of the hash table is less than thecan efficiently generate candidate 2-itemsets, for high- minimum support, the candidate itemsets are pruned. Asthroughput data stream applications. Baker and Prasanna such, HAPPI solves the bottleneck problem mentioned[3], [4] designed scalable hardwares for association rule earlier by the cooperation of these three hardware modules.mining by utilizing the systolic array proposed in [13] and To achieve these goals, we devise the following five[14]. The architecture utilizes parallel computing techniques procedures in the HAPPI architecture: support counting,to execute a large number of pattern matching operations at transaction trimming, hash table building, candidate gen-the same time. Other hardware architectures [6], [9], [10], eration, and candidate pruning. Moreover, we derive several[20] have been designed to speed up the K-means clustering formulas to decide the optimal design in order to reduce thealgorithm. overhead induced by the pipeline scheme and the ideal Generally speaking, Apriori-based hardware schemes number of hardware modules to achieve the best utilization.require loading the candidate itemsets and the database into The execution time between sequential processing andthe hardware. Since the capacity of the hardware is fixed, if pipeline processing is also analyzed in this paper.the number of items in the database is larger than the We conduct several experiments to evaluate the perfor-hardware capacity, the data items must be loaded sepa- mance of the HAPPI architecture. In addition, we implementrately. Therefore, the process of comparing candidate item- the work of Baker and Prasanna [3] and a software algorithmsets with the database needs to be executed several times. DHP [16] for comparison purposes. The experiment resultsSimilarly, if the number of candidate itemsets is larger than show that HAPPI outperforms the previous approach onthe capacity of the hardware, the pattern matching proce- execution time significantly, especially when the number ofdure has to be separated into many rounds. Clearly, it is items in the database is large and the minimum supportinfeasible for any hardware design to load the candidateitemsets and the database into hardware for multiple times. value increases. Moreover, the performance of HAPPI isSince the time complexity of those steps that need to load better than that of the previous approach [3] when thecandidate itemsets or database items into the hardware is in systolic array contains different numbers of hardware cells.proportion to the number of candidate itemsets and the In fact, by using only 25 hardware cells in the systolic array,number of items in the database, this procedure is very time we can achieve the same performance as more than 800consuming. In addition, numerous candidate itemsets and a hardware cells in the previous approach. The advantages ofhuge database may cause a bottleneck in the system. the HAPPI architecture are that it has more computing In this paper, we propose a HAsh-based and PiPelIned power and saves the space costs for mining association rules(abbreviated as HAPPI) architecture for hardware-enhanced in hardware design. The scale-up experiment also showsassociation rule mining. That is, we identify certain parts of that HAPPI outperforms the previous approach on differentthe mining process that is suitable and will benefit from numbers of transactions in the database. Indeed, ourhardware implementation and perform hardware-enhanced architecture is a good example to demonstrate the metho-mining. Explicitly, we incorporate the pipeline methodology dology of performance enhancement by hardware. Weinto the HAPPI architecture to compare itemsets and collect implement our architecture on a commercial FPGA board.useful information that enables us to reduce the number of It is easily to be realized in a custom ASIC. With the progresscandidate itemsets and items in the database simulta- in IC process technology, the performance of HAPPI willneously. As shown in Fig. 1, there are three hardware further be improved. In view of the fast increase in themodules in our system. First, when the database is fed into amount of data in various emerging mining applicationsthe hardware, the candidate itemsets are compared with the (e.g., network application mining, data stream mining, anditems in the database by the systolic array. Candidate bioinformatics data mining), it is envisioned that hardware-itemsets that have a higher frequency than the minimum enhanced mining is an important research direction tosupport value are viewed as frequent itemsets. Second, we explore for future data mining tasks.determine the frequency that each item occurs in the The remainder of the paper is organized as follows: Wecandidate itemsets in the transactions at the same time. discuss related works in Section 2. The preliminaries areThese frequencies are called trimming information. From presented in Section 3. The HAPPI architecture is describedthis information, infrequent items in the transactions can be in Section 4. Next, we show several experiments conductedeliminated since they are not useful in generating frequent on HAPPI in Section 5. Finally, we present our conclusionsitemsets through the trimming filter. Third, we generate in Section 6.
  3. 3. 786 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 20082 RELATED WORKSIn this section, we discuss two previous works that use asystolic array architecture to enhance the performance ofdata mining. The Systolic Process Array (SPA) architecture is pro-posed in [10] to perform K-means clustering. SPA accel-erates the processing speed by utilizing several hardwarecells to calculate the distances in parallel. Each cellcorresponds to a cluster and stores the centroid of thecluster in local memory. The data flows linked by each cellinclude the data object, the minimum distance between the Fig. 2. The process of building H2 and using H2 to filter out C2 .object and its closest centroid, and the closest centroid of theobject. The cell computes the distance between the centroid The hash table for filtering candidate k-itemsets Hk isand the input data object. Based on the resulting distance, built by hashing the k-itemsets generated by each transac-the cell updates the minimum distance and the closest tion. A hash table contains n buckets, where n is an arbitrarycentroid of the data object. Therefore, the system can obtain number. When an itemset is hashed to the bucket i, thethe closest centroid of each object, respectively, from SPA. number of itemsets in the bucket is increased by one. TheThe centroids are recomputed and updated by the system, number of itemsets in each bucket represents the accumu-and the new centroids are sent to the cells. The system lated frequency of the itemsets whose hash values arecontinuously updates clustering results. assigned to that bucket. After candidate k-itemsets have In [3], the authors implemented a systolic array with been generated, they are hashed and assigned to buckets ofseveral hardware cells to speed up the Apriori algorithm. Hk . If the number of itemsets in a bucket is less than theEach cell performs an ALU (larger than, smaller than, or minimum support, candidate itemsets in this bucket areequal to) operation, which compares the incoming item removed. The example in Fig. 2 demonstrates how to buildwith items in the memory of the cell. This operation H2 and how to use it to filter out candidate 2-itemsets. Aftergenerates frequent itemsets by comparing candidate item- we scan the transaction TID ¼ 100, AC , AD , andsets with the items in the database. Since all the cells can CD are hashed to the buckets. According to the hashexecute their own operations simultaneously, the perfor- function shown in Fig. 2, the hash values of AC ,mance of the architecture is better than that of a single AD , and CD are 6, 0, and 6, respectively. As aprocessor. However, the number of cells in the systolic result, the number of itemsets in the buckets indexed by 6, 0,array is fixed. If the number of candidate itemsets is larger and 6 is increased by one. After all the transactions in thethan the number of hardware cells, the pattern matching database have been scanned, frequent 1-itemsets are found,procedure has to be separated into many rounds. It is i.e., L1 ¼ fA; B; C; Eg. In addition, the number of itemsets ininfeasible to load candidate itemsets and the database into the buckets of H2 are 3; 1; 2; 0; 3; 1; 3 , and the minimumthe hardware for multiple times. As reported in [3], the support frequency is 2. Thus, the candidate 2-itemsets inperformance is only about four times faster than some buckets 1, 3, and 5 should be pruned. If we generatesoftware algorithms. Hence, there is much room to improve candidate 2-itemsets from L1 Ã L1 directly, the original set ofthe execution time. candidate 2-itemsets C2 is f AB ; AC ; AE ; BC ; BE ; CE g:3 PRELIMINARIES After filtering out unnecessary candidate itemsets byThe hash table scheme proposed in DHP [16] improves the 0 checking H2 , the new C2 becomesperformance of Apriori-based algorithms by filtering outinfrequent candidate itemsets. In addition, DHP employs an f AC ; BC ; BE ; CE g:effective pruning scheme to eliminate infrequent items in Therefore, the number of candidate itemsets can betransactions. We summarize these two schemes below. reduced. In the hash table scheme, a hash function is applied to all The pruning scheme which is able to filter out infrequentof candidate k-itemsets generated by frequent subitemsets. items in the transactions will be implemented in hardware.Each candidate k-itemset is mapped to a hash value, and The theoretical backgrounds of the pruning scheme areitemsets with the same hash value are put into the same based on the following two theorems which were presentedbucket of the hash table. If the number of the candidate in [17]:itemsets in the bucket is less than the minimum supportthreshold, the number of these candidate itemsets in the Theorem 1. A transaction can only be used to support the set ofdatabase is less than the minimum support threshold. As a frequent ðk þ 1Þ-itemsets if it consists of at least ðk þ 1Þresult, these candidate itemsets cannot be frequent and are candidate k-itemsets.removed from the system. On the other hand, if the number Theorem 2. An item in a transaction can be trimmed if it doesof the candidate itemsets in the bucket is larger than the not appear in at least k of the candidate k-itemsets containedminimum support threshold, the itemsets are carried to real in the transaction.frequency testing process by scanning the database.
  4. 4. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 787Fig. 3. An example of transaction trimming. Based on Theorem 2, whether an item can be trimmed ornot depends on how many candidate itemsets in the currenttransaction contain this item. The transaction trimmingmodule is based on the frequencies of all candidate itemsetsin an individual transaction. Therefore, we can handleevery transaction independently regardless of other trans-actions in the database. A counter array a½ Š is used to recordthe number of times that each item in a transaction occurs inthe candidate k-itemsets. That is, counter a½iŠ represents thefrequency of ith item in the transaction. If a candidatek-itemset is a subset of the transaction, the numbers in thecounter array of the corresponding items that appear in thiscandidate itemset are increased by one. After comparingwith all the candidate k-itemsets, if the value of a counter isless than k, the item in the transaction is trimmed, as shownin Fig. 3. For example, transaction TID ¼ 100 has threeitems fA; C; Dg. Counter a½0Š represents A, a½1Š representsC, and a½2Š represents D. Transaction TID ¼ 300 has fouritems fA; B; C; Eg. Counter a½0Š corresponds to A, a½1Šcorresponds to B, a½2Š corresponds to C, and a½3Š corre-sponds to E, respectively. After the comparison with the setof candidate 2-itemsets, the values of the counter array forTID ¼ 100 are 1; 1; 0 and the values of the counterarray for TID ¼ 300 are 1; 2; 2; 2 . Since all values of the Fig. 4. The HAPPI architecture: (a) systolic array, (b) trimming filter, andcounter array for TID ¼ 100 are less than 2, all correspond- (c) hash table filter.ing items are trimmed from the transaction TID ¼ 100. Onthe other hand, because the value of a½0Š for TID ¼ 300 is 4.1 System Architectureless than 2, item A is trimmed from the transaction. As shown in Fig. 4, the HAPPI architecture consists of aTherefore, transaction TID ¼ 300 becomes fB; C; Eg. systolic array, a trimming filter, and a hash table filter. There are several hardware cells in the systolic array. Each4 HAPPI ARCHITECTURE cell can perform the comparison operation. Based on theAs noted earlier, Apriori-based hardware schemes have to comparison results, the cells update the support counters ofload candidate itemsets and the database into the hardware candidate itemsets and the occurrence frequencies of itemsto execute the comparison process. Too many candidate in the trimming information. A trimming filter thenitemsets and a huge database would cause a performance removes infrequent items in the transactions according tobottleneck. To solve this problem, we propose the HAPPI the trimming information. In addition, we build a hasharchitecture to deal with efficient hardware-enhanced table by hashing itemsets generated by each transaction.association rule mining. We incorporate the pipeline The hash table filter then prunes unsuitable candidatemethodology into the HAPPI architecture to perform itemsets.pattern matching and collect useful information to reduce To find frequent k-itemsets and generate candidate ðk þthe number of candidate itemsets and items in the database 1Þ-itemsets efficiently, we devise five procedures in thesimultaneously. In this way, HAPPI effectively solves the HAPPI architecture using the three hardware modules: thebottleneck problem. systolic array, the trimming filter, and the hash table filter. In Section 4.1, we introduce our system architecture. In The procedures are support counting, transaction trimming,Section 4.2, the pipeline scheme of the HAPPI architecture is hash table building, candidate generation, and candidatepresented. The transaction trimming scheme is given in pruning. The work flow is shown in Fig. 5. The supportSection 4.3. Then, we describe the hardware design of hash counting procedure finds frequent itemsets by comparingtable filter in Section 4.4. Finally, we derive some properties candidate itemsets with transactions in the database. Byfor performance evaluation in Section 4.5. loading candidate k-itemsets and streaming transactions into
  5. 5. 788 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008Fig. 5. The procedure flow of one round. Fig. 6. A diagram of the pipeline procedures.the systolic array, the frequencies that candidate itemsetsoccur in the transactions can be determined. Note that if the 4.2 Pipeline Designnumber of candidate itemsets is larger than the number ofhardware cells in the systolic array, the candidate itemsets We observe that the transaction trimming and the hash table building procedures are blocked by the support countingare separated into several groups. Some of the candidate procedure. The transaction trimming procedure has toitemsets are loaded into the hardware cells and the database obtain trimming information to execute the trimmingis fed into the systolic array. Afterward, the other candidate process. However, this process cannot be completed untilitemsets are loaded into the systolic array one by one. To the support counting procedure compares all the transac-complete the comparison with all the candidate itemsets, the tions with all the candidate itemsets. In addition, the hashdatabase has to be examined several times. To reduce the table building procedure has to get the trimmed transactionsoverhead of repeated loading, we design two additional from the trimming filter after all the transactions have beenhardware modules, namely, a trimming filter and a hash trimmed. This problem can be resolved by applying thetable filter. Infrequent items in the database are eliminated pipeline scheme, which utilize the three hardware modulesby the trimming filter, and the number of candidate itemsets simultaneously in the HAPPI framework. First, we divideis reduced by the hash table filter. Therefore, the time the database into Npipe parts. One part of the transactions inrequired for support counting procedure can be effectively the database is streamed into the systolic array and thereduced. support counting process is performed on all candidate After all the candidate k-itemsets have been compared itemsets. After comparing the transactions with all thewith the transactions, their frequencies are sent back to the candidate itemsets, the transactions and their trimmingsystem. The frequent k-itemsets can be obtained from the information are passed to the trimming filter first. Thecandidate k-itemsets whose occurrence frequencies are systolic array then processes the next group of transactions.larger than the minimum support. While the transactions After items have been trimmed from a transaction by theare being compared with the candidate itemsets, the trimming filter, the transaction is passed to the hash tablecorresponding trimming information is collected. The filter, as shown in Fig. 6, and the trimming filter can dealoccurrence frequency of each item, which is contained in with the next transaction. In this way, all the hardwarethe candidate itemsets in the transactions, is recorded and modules can be utilized simultaneously. Although theupdated to the trimming information. After comparing pipelined architecture improves the system’s performance,candidate itemsets with the database, the trimming it increases the computational overhead because of multipleinformation is collected. The occurrence frequencies and times of loading candidate itemsets into the systolic array.the corresponding transactions are then transmitted to the The performance of the pipeline scheme and the improvedtrimming filter, and infrequent items are trimmed from design of the HAPPI architecture are discussed in Section 4.5.the transactions according to the occurrence frequencies inthe trimming information. Then, the hash table building 4.3 Transaction Trimmingprocedure generates ðk þ 1Þ-itemsets from the trimmed While the support counting procedure is being executed, thetransactions. These ðk þ 1Þ-itemsets are hashed into the whole database is streamed into the systolic array. However,hash table for processing. Next, the candidate generation not all the transactions are useful for generating frequentprocedure is also executed by the systolic array. The itemsets. Therefore, we filter out items in the transactionsfrequent k-itemsets are fed into the systolic array for according to Theorem 2 so that the database is reduced. Incomparison with other frequent k-itemsets. The candidate the HAPPI architecture, the trimming information recordsðk þ 1Þ-itemsets are generated by the systolic injection and the frequency of each item in a transaction that appears installing techniques similar to [3]. The candidate pruning the candidate itemsets. The support counting and trimmingprocedure uses the hash table to filter candidate ðk þ information collecting operations are similar since they all1Þ-itemsets that are not possible to be frequent itemsets. need to compare candidate itemsets with transactions.Then, the procedure reverts to the support counting Therefore, in addition to transactions in the database, theirprocedure. The pruned candidate ðk þ 1Þ-itemsets are corresponding trimming information is also fed into theloaded into the systolic array for comparison with systolic array in another pipe, while the support countingtransactions that have been trimmed already. The above process is being executed. As shown in Fig. 7, a trimmingfive processes are executed repeatedly until all frequent vector is embedded in each hardware cell of the systolicitemsets have been found. array to record items that are matched with candidate
  6. 6. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 789 Fig. 8. The trimming filter. is 2; 2; 2; 1; 2 and the current k is 2. Therefore, the item D should be trimmed. The new transaction becomes fA; B; C; Dg. In this way, the size of the database can be reduced. The trimmed transactions are sent to the hash table filter module for hash table building. 4.4 Hash Table Filtering To build a hardware hash table filter, we use a hash value generator and hash table updating module. The former generates all the k-itemset combinations of the transactions and puts the k-itemsets into the hash function to create the corresponding hash values. As shown in Fig. 9, the hash value generator comprises a transaction memory, a state machine, an index array, and a hash function. The transaction memory stores all the items of a transaction. The state machine is the controller that generates control signals of different lengths ðk ¼ 2; 3 . . .Þ flexibly. Then, the control signals are fed into the index array. To generate a k-itemset, the first k entries in the index array are utilized. The values in the index array are the indices of the transaction memory. The item selected by the ith entry of the index array is the ith item in a k-itemset. By changing the values in the index array, the state machine can generate different combinations of k-itemsets from the transaction. The procedure starts by loading a transaction into theFig. 7. An example of streaming a transaction and the corresponding transaction memory. Then, the values in the index array aretrimming information into the cell. (a) Stream a transaction into the cell. reset, and the state machine starts to generate control(b) Stream trimming information into the cell. signals. The values in the index array are changed by the different states. Each item in the generated itemset is passeditemsets. The ith flag in the trimming vector is set to true if to the hash function through the multiplexer. The hashthe ith item in the transaction matches the candidate itemset. function takes some bits from the incoming k-itemsets toAfter comparing the candidate itemset with all the items in atransaction, if the candidate itemset is a subset of the calculate the hash values.transaction, the incoming corresponding trimming informa-tion will be accumulated according to the trimming vector.Since transactions and trimming information are input indifferent pipes, support counters and trimming informationcan be updated simultaneously in a hardware cell. In Fig. 7a, the candidate itemset BC is stored in thecandidate memory, and a transaction fA; B; C; D; Eg isabout to be fed into the cell. The resultant trimming vectorafter comparing BC with all the items in the transac-tion is shown in Fig. 7b. Because items B and C are matchedwith the candidate itemset, the trimming vector becomes 0; 1; 1; 0; 0 . Meanwhile, the corresponding trimminginformation is fed into the trimming register, and thetrimming information is updated from 0; 1; 1; 0; 1 to 0; 2; 2; 0; 1 . After passing through the systolic array, transactionsand their corresponding trimming information are passedto the trimming filter. The filter trims off items whosefrequencies are less than k. As the example in Fig. 8 shows,the trimming information of the transaction fA; B; C; D; Eg Fig. 9. The hash value generator.
  7. 7. 790 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 in Fig. 5, the time needed to find frequent k-itemsets and candidate ðk þ 1Þ-itemsets includes the time required for support counting, transaction trimming, hash table building, candidate generation, and candidate pruning. First, the execution time of the support counting procedure is related to the number of times candidate itemsets and the database are loaded into the systolic array. That is, if NcandÀk is larger than Ncell , the candidate itemsets and the database must be input into the systolic array dNcandÀk =Ncell e times. Each time there are, at most, NcellFig. 10. The parallel hash table building module. candidate itemsets loaded into the systolic array. The number of items in these candidate k-itemsets is, at most, Consider the example in Fig. 9. We assume the current k is k à Ncell . In addition, all items in the database need jDj cycles3. The first three index entries in the index array are used in to be streamed into the systolic array. Therefore, thethis case. The transaction fA; C; E; F; Gg is loaded into the execution cycle of the support counting procedure is, at mosttransaction memory. The values in the index array are tsup ¼ dNcandÀk =Ncell e à ðk à Ncell þ jDjÞ:initiated to 0, 1, and 2, respectively, so that the first itemsetgenerated is ACE . Then, the state machine changes the Second, the transaction trimming procedure eliminatesvalues in the index array. The following numbers in the index infrequent items and receives incoming items at the samearray will be 0; 1; 3 , 0; 1; 4 , 0; 2; 3 , 0; 2; 4 , time. A transaction item and the corresponding trimmingto name a few. Therefore, the corresponding itemsets are information are fed into the trimming filter during each ACF , ACG , AEF , AEG , and so on. cycle. After the whole database has been passed through the The hash values generated by the hash value generator trimming filter, the transaction trimming procedure isare passed to the hash table updating module. To speed up finished. Thus, the execution cycle depends on the numberthe process of hash table building, we utilize Nparallel hash of items in the database:value generators so that the hash values can be generated ttrim ¼ jDj:simultaneously. In addition, the hash table is divided intoseveral parts to increase the throughput of hash table Third, the hash table building procedure consists of thebuilding. Each part of the hash table contains a range of hash value generation and the hash table updating pro-hash values, and the controller passes the incoming hash cesses. Because the processes can be executed simulta-values to the buffer they belong to. These hash values aretaken as indexes of the hash table to accumulate the values neously, the execution time is based on the process thatin the table, as shown in Fig. 10. There are four parallel hash generates the hash values. The execution time of hash valuevalue generators. The size of the whole hash table is 65,536, generation consists of the time taken by transaction loadingand it is divided into four parts. Thus, the range of each part and by hash value generation from transactions. The overallis 16,384. If the incoming hash value is 5, it belongs to the transaction loading time is jDj cycles. In addition, there arefirst part of the hash table. The controller would pass the jT j items in a transaction on average. Thus, the number ofvalue to buffer 1. If there are parallel accesses to the hash jT j ðk þ 1Þ-itemset combinations from a transaction is Ckþ1 . Eachtable at the same time, only one access can be executed. The time ðk þ 1Þ-itemset requires ðk þ 1Þ cycles to be generated.others will be delayed and be handled as soon as possible.The delayed itemsets are stored in the buffer temporally. The average number of transactions in the database is jDj . jT jWhenever the access port of hash table is free, the delayed Therefore, the execution time of hash value generation from jT jitemsets are put into the hash table. the whole database is ðk þ 1Þ Ã Ck à jDj cycles. Because we jT j After all the candidate k-itemsets have been generated, have designed a parallel architecture, the procedure can bethey are pruned by the hash table filter. Each candidate executed by Nparallel hardware modules simultaneously. Theitemset is hashed by the hash function. By querying the execution cycle of the hash table building procedure isnumber of itemsets in the bucket with the corresponding hash value, the candidate itemset is pruned if the number of jT j jDj 1itemsets in the bucket does not meet the minimum support thash ¼ jDj þ ðk þ 1Þ Ã Ckþ1 à à : jT j Nparallelcriteria. Therefore, the number of the candidate itemsets canbe reduced effectively with the help of the hash table filter. The fourth procedure is candidate generation. Frequent k-itemsets are compared with other frequent k-itemsets to4.5 Performance Analysis generate candidate ðk þ 1Þ-itemsets. The execution time ofIn this section, we derive some properties of our system with candidate generation consists of the time required to loadand without the pipeline scheme to investigate the total frequent k-itemsets (at most, k à Ncell each time), the timeexecution time. Suppose the number of candidate k-itemsets taken to pass frequent k-itemsets ðk à NfreqÀk Þ through theis NcandÀk and the number of frequent k-itemsets is NfreqÀk : systolic array, and the time needed to generate candidateThere are Ncell hardware cells in the systolic array. jT j k-itemsets ðNcandÀðkþ1Þ Þ. Similar to the support countingrepresents the average number of items in a transaction, and procedure, if NfreqÀk is larger than Ncell , the frequentjDj is the total number of items in the database. As shown k-itemsets have to be separated into several groups and the
  8. 8. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 791comparison process has to be executed several times. Thus, required to process the first part of the database with thethe execution cycle is, at most support counting and the trimming filter is 1tcandidate generation ¼ dNfreqÀk =Ncell e à ðk à Ncell þ k à NfreqÀk Þ tpipe ¼ t0sup þ ttrim à þ thash : þ NcandÀðkþ1Þ : Npipe Finally, the candidate pruning procedure has to hash Summarizing the above two cases, the execution timecandidate ðk þ 1Þ-itemsets and to query the hash table Hkþ1 tpipe can be presented as Property 2.each time ðk þ 1Þ-itemset requires ðk þ 1Þ cycles to be Property 2.generated. The hash table can accept one input query tpipe ¼ maxðt0sup ; thash Þ þ minðt0sup ; thash Þduring each cycle. Thus, the execution time needed to hashcandidate k-itemsets ððk þ 1Þ Ã NcandÀðkþ1Þ Þ and query the 1 1 à þ ttrim à :hash table ðNcandÀkþ1 Þ is Npipe Npipe tcandidate pruning ¼ ðk þ 1Þ Ã NcandÀðkþ1Þ þ NcandÀðkþ1Þ : To achieve the minimal value of tpipe , we consider the Since the size of the database that we consider is much following two cases:larger than the number of the candidate k-itemsets, we canneglect the execution time of candidate generation and 1. If t0sup is larger than thash , the execution time tpipe ispruning. Therefore, the time required for one round of the dominated by t0sup . To decrease the value of t0sup , wesequential execution tseq is the sum of the time taken by the can increase the number of hardware cells in thesupport counting, transaction trimming, and hash table systolic array until t0sup is equal to thash . Therefore, thebuilding procedures, as shown in Property 1. optimal value for Npipe to reach the minimal tpipe isProperty 1. tseq ¼ tsup þ ttrim þ thash . sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 Npipe ¼ à ðttrim þ thash Þ: dNcandÀk =Ncell e à k à Ncell The pipeline scheme incorporated in the HAPPI archi-tecture divides the database into Npipe parts and inputs theminto the three modules. However, this scheme causes some 2. If t0sup is smaller than thash , the execution time tpipe isoverhead toverhead because of reloading candidate itemsets mainly taken up by thash . To decrease the value offor multiple times in the support counting procedure: thash , we can increase the number of Nparallel until thash is equal to t0sup . As a result, the optimal value for toverhead ¼ dNcandÀk =Ncell e à ðk à Ncell Þ Ã Npipe : Npipe to achieve the minimal tpipe in this case is Therefore, the support counting procedure has to con- 1sider toverhead . The execution time of the support counting Npipe ¼ à ðthash À tsup Þ: dNcandÀk =Ncell e à k à Ncellprocedure in the pipeline scheme becomes To decide the values of Ncell and Nparallel in the HAPPI t0sup ¼ tsup þ toverhead : architecture, we have to get the value of NcandÀk . However, The execution time of the pipeline scheme tpipe is NcandÀk varies with different numbers of k. Therefore, weanalyzed according to the following two cases: decide these values according to our experimental experi- Case 1. If the execution time of the support counting ence. Generally speaking, for many types of data in the realprocedure is longer than that of the hash table building world, the number of candidate 2-itemsets is the largest.procedure, the other parts of the procedure would finish Thus, we can focus on the case k ¼ 2, since the executiontheir operations before the support counting procedure. time would be the largest of all the processes. For a data setHowever, the transaction trimming and hash table building with jT j ¼ 10 and jDj ¼ 1 million, if the minimum supportprocedures have to wait for the last part of the data from the is 0.2 percent, NcandÀk is about 3,000. Assume that there aresupport counting procedure. Therefore, the total execution 500 hardware cells in the systolic array. To accelerate the hash table building procedure, we can increase the numbertime is t0sup , and the time required to process the last part of of Nparallel . Based on Property 2, the best number of Nparallelthe database with the trimming filter and the hash table is set to 4. In addition, after the transactions are trimmed byfilter is the trimming filter we can get the current number of items 1 in the database. Also, the number of NcandÀk can be obtained tpipe ¼ t0sup þ ðttrim þ thash Þ Ã : after candidate k-itemsets are pruned by the hash table Npipe filter. Therefore, we can calculate the value of tsup and thash Case 2. If the execution time of the support counting before starting the support counting and the hash tableprocedure is less than that of the hash table building building procedures. Since the toverhead is less than the size ofprocedure, the other parts of the procedure would be the database under consideration, tsup can be viewed as t0sup .completed before the hash table building procedure. Since Based on the formulas we derived above, we can get thethe hash table building procedure has to wait for the data best value of Npipe to minimize the value of tpipe . When thefrom the support counting and transaction trimming support counting procedure is dealing with candidateprocedures, the total execution time is thash , and the time 2-itemsets and the hash table building procedure is about
  9. 9. 792 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008 TABLE 1 Summary of the Parameters Usedto build H3 , we divide the database into 30 parts, i.e.,Npipe ¼ 30. By applying pipeline scheme to these threehardware modules, the hardware utilization increases andthe wastage due to the blocking can be reduced.5 EXPERIMENT RESULTSIn this section, we conduct several experiments on anumber of synthetic data sets to evaluate the performanceof the HAPPI architecture. We also implement an approach Fig. 11. The execution cycles of several schemes.mainly based on [3], abbreviated as Direct Comparison(DC) method, for comparison purposes. Although a hard- in Section 4.5. The method used to generate synthetic data isware algorithm was proposed in [4], its performance described in Section 5.1. The performance comparison ofimprovement is found much less than DC. However, several schemes in the HAPPI architecture and DC areHAPPI significantly outperforms DC in orders of magni- discussed in Section 5.2. Section 5.3 presents the perfor-tude. Moreover, we implement a software algorithm DHP[16], denoted by SW_DHP, as the baseline. The software mance analysis of different distributions of frequent item-algorithm is executed on a PC with 3-GHz P-4 CPU and sets. Finally, in Section 5.4, the results of some scale-up1-Gbytes RAM. experiments are discussed. Both HAPPI and DC are implemented on the AlteraStratix 1S40 FPGA board with 50-MHZ clock rate and 5.1 Generation of Synthetic Data10-Mbytes SDRAM. The hardware modules are coded in To obtain reliable experimental results, we employ similarVerilog. We use ModelSim to simulate Verilog codes and methods to those used in [16] to generate synthetic data sets.verify the functions of our design. In addition, we use Altera These data sets are generated with the following parameters:Quartus II IDE to build hardware modules and synthesize T represents the average number of items in a transaction ofmodules into hardware circuits. Finally, the hardware the database, I denotes the average length of the maximumcircuit image is sent to the FPGA board. There is a potentially frequent itemsets, D is the number of transac-synthesized CPU on an FPGA board. We implement a tions in the database, L is the number of maximal potentiallysoftware program on NIOSII to verify and to recordhardware execution time. At first, the program downloads frequent itemsets, and N is the number of different items indata from the database into the memory on FPGA. Then, the the database. Table 1 summarizes the parameters used in ourdata is fed into hardware modules. The bandwidth experiments. In the following experiment data sets, L is setconsumed from the memory to the chip is 16 bits/cycle in to 3,000. To evaluate the performance of HAPPI, we conductour design. Since the data is transferred on the bus, the several experiments with different data sets. We use TxIyDzmaximum bandwidth is limited by the bandwidth of the bus to represent that T ¼ x, I ¼ y, and D ¼ z. In addition, theon FPGA, which is generally 32 bits/cycle. The bandwidth sensitivity analysis and the scale-up experiments are alsocan be upgraded to 64 bits/cycle in some modern FPGA explored with different data sets. Note that the y-axis of theboard. After the execution of hardware modules, we can following figures is the execution cycle in logarithmic scale.acquire outcomes and execution cycles. The followingexperimental results are based on execution cycles on FPGA 5.2 Performance Evaluationboard. In addition, in our hardware design, the critical path Initially, we conduct experiments to evaluate the perfor-is in the hash table building module. There are many logic mance of several schemes in the HAPPI architecture andcombinations in this module. The synthesized hardwarecore is, therefore, complex. This makes the maximum clock DC. The testing data sets are T10I4D100 with differentfrequency in our hardware design bound to 58.6 MHz. numbers of items in the database. The minimum support isHowever, since the maximum clock frequency on Altera set to 0.5 percent. As shown in Fig. 11, the four differentFPGA1s40 is 50 MHz, our design meets the hardware schemes arerequirement. In our future work, we are going to increasethe clock frequency of the hardware architecture. We will 1. the DC scheme,try to optimize the bottleneck module. 2. the systolic array with a trimming filter, In the hardware implementation of the HAPPI architec- 3. the combined scheme made up of the systolic array,ture, Nparallel is set to 4. The number of the hardware cells in the trimming filter and the hash table filter, andthe systolic array is 500; there are 65,536 buckets in the hash 4. the overall HAPPI architecture with the pipelinetable; and Npipe are assigned according to the methodology design.
  10. 10. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 793Fig. 12. The execution time of data sets with different T and I. Fig. 13. The execution cycles with various minimum supports.With the trimming filter, the execution time increases toabout 10 percent-70 percent compared to DC. As the when k is small, the overall performance enhancement isnumber of different items in the database increases, the remarkable. Since the average size of the maximalimprovement due to the trimming filter increases. The potentially frequent itemsets of the data set T20I8D100Kreason is that, if the number of different items grows, the is 8, the performance of the HAPPI architecture is onlynumber of infrequent itemsets also increases. Therefore, the 2.65 times faster. However, most data in the real worldfiltering rate by utilizing the trimming filter is more contains short frequent itemsets. That is, T and I are small.remarkable. Moreover, the execution cycle of the combined It is noted that DC has only little enhancement toscheme with the help of the hash table filter is about 25- SW_DHP with the short frequent itemsets, while HAPPI51 times faster. The HAPPI architecture with the pipeline possesses better performance. Therefore, the HAPPIscheme is 47-122 times better than DC. Note that the main architecture can perform well with real-world data sets.improvement of the execution time results from the hash Fig. 13 demonstrates that the improvement of the HAPPItable filter. We not only implemented an efficient hardware architecture over DC becomes more noticeable with increas-module for the hash table filter in the HAPPI architecture ing minimum support. This is because the number of longbut also designed the pipeline scheme to let the hash table itemsets would be eliminated with large minimum support.filter work together with other hardware modules. The Therefore, the improvement due to the hash table filterpipeline and the parallel design are two helpful properties increases. In comparison with DC, the overall performance isof the hardware architecture. Therefore, we utilize these outstanding.hardware design skills to accelerate the overall system. Asshown in Fig. 11, although the combined scheme provides 5.4 Scale-up Experimentmuch performance boost, the performance improvement According to the performance analysis in Section 4.5, thestill benefits from the overall HAPPI architecture with execution time tpipe is mainly related to the number of times,pipeline design. In summary, the HAPPI architecture dNcandÀk =Ncell e, of reloading the database into the systolicoutperforms DC, especially when the number of different array. Therefore, if the number of Ncell increases, the timeitems in the database is large. we need to stream the database is less and the overall execution time is faster. Fig. 14 illustrates the scaling5.3 Sensitivity Analysis performance of the HAPPI architecture and DC, whereIn the second experiment, we generate several data sets the y-axis is also in logarithmic scale. The execution cycles ofwith different distributions of frequent itemsets to examine both HAPPI and DC linearly decrease with the increasingthe sensitivity of the HAPPI architecture. The experiment number of hardware cells. We observe that HAPPI outper-results of several synthetic data sets with various mini- forms DC on different numbers of hardware cells in themum supports are shown in Figs. 12 and 13. The results systolic array. The most important result is that we onlyshow that no matter what combination of the parameters utilize 25 hardware cells in the systolic array but achieve theT and I is used, the HAPPI architecture consistentlyoutperforms DC and SW_DHP. Specifically, the execution same performance as the 800 hardware cells used in DC.time of HAPPI is less than that of DC and that of SW_DHP The benefit of the HAPPI architecture is more computingby several orders of magnitude. The margin grows as the power with lower costs for data mining in hardware design.minimum support increases. As Fig. 12 shows, HAPPI has We also conduct experiments with different numbers ofa better performance enhancement ratio to DC on the transactions in the synthetic data sets to explore theT5I2D100K data set than on the T20I8D100K data set. The scalability of the HAPPI architecture. The generated datareason is that the hash table filter is especially effective in sets are T10I4, and the minimum support is set toeliminating infrequent candidate 2-itemsets, as reported in 0.5 percent. As shown in Fig. 15, the execution time ofthe experimental results of DHP [16]. Thus, SW_DHP also HAPPI increases linearly as the number of transactions inhas better performance with these two data sets. In the synthetic data sets increases. The performance of HAPPIaddition, the execution time tpipe is mainly related to the outperforms DC on different numbers of transactions in thenumber of times dNcandÀk =Ncell e of reloading the database. database. Furthermore, Fig. 15 shows the good scalability ofSince the number of NcandÀk can be substantially reduced HAPPI and DC. This feature is especially important because
  11. 11. 794 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 6, JUNE 2008Fig. 14. The execution cycles with different numbers of hardware units. Fig. 15. The execution cycles with various numbers of transactions, zðKÞ. [3] Z.K. Baker and V.K. Prasanna, “Efficient Hardware Data Miningthe size of applications is growing much faster than the with the Apriori Algorithm on FPGAS,” Proc. 13th Ann. IEEEspeed of CPU. Thus, hardware-enhanced data mining Symp. Field-Programmable Custom Computing Machines (FCCM),technique is imperative. 2005. [4] Z.K. Baker and V.K. Prasanna, “An Architecture for Efficient Hardware Data Mining Using Reconfigurable Computing Sys-6 CONCLUSION tems,” Proc. 14th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM ’06), pp. 67-75, Apr. 2006.In this work, we have proposed the HAPPI architecture for [5] C. Besemann and A. Denton, “Integration of Profile Hiddenhardware-enhanced association rule mining. The bottleneck Markov Model Output into Association Rule Mining,” Proc. 11th ACM SIGKDD Int’l Conf. Knowledge Discovery in Data Mining (KDDof a priori-based hardware schemes is related to the number ’05), pp. 538-543, 2005.of candidate itemsets and the size of the database. To solve [6] C.W. Chen, J. Luo, and K.J. Parker, “Image Segmentation viathis problem, we apply the pipeline methodology in the Adaptive K-Mean Clustering and Knowledge-Based Morphologi- cal Operations with Biomedical Applications,” IEEE Trans. ImageHAPPI architecture to compare itemsets with the database Processing, vol. 7, no. 12, pp. 1673-1683, 1998.and collect useful information to reduce the number of [7] S.M. Chung and C. Luo, “Parallel Mining of Maximal Frequentcandidate itemsets and items in the database simulta- Itemsets from Databases,” Proc. 15th IEEE Int’l Conf. Tools withneously. HAPPI can prune infrequent items in the transac- Artificial Intelligence (ICTAI), 2003. [8] S. Cong, J. Han, J. Hoeflinger, and D. Padua, “A Sampling-Basedtions and reduce the size of the database gradually by Framework for Parallel Data Mining,” Proc. 10th ACM SIGPLANutilizing the trimming filter. In addition, HAPPI can Symp. Principles and Practice of Parallel Programming (PPoPP ’05),effectively eliminate infrequent candidate itemsets with June 2005. [9] M. Estlick, M. Leeser, J. Szymanski, and J. Theiler, “Algorithmicthe help of the hash table filter. Therefore, the bottleneck of Transformations in the Implementation of K-Means Clustering ona priori-based hardware schemes can be solved by the Reconfigurable Hardware,” Proc. Ninth Ann. IEEE Symp. Field-HAPPI architecture. Moreover, we derive some properties Programmable Custom Computing Machines (FCCM), 2001.to analyze the performance of HAPPI. We conduct a [10] M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, and D. Lavenier, “Experience with a Hybrid Processor: K-Means Cluster-sensitivity analysis of various parameters to show many ing,” J. Supercomputing, pp. 131-148, 2003.insights into the HAPPI architecture. HAPPI outperforms [11] J. Han and M. Kamber, Data Mining: Concepts and Techniques.the previous approach, especially with the increasing Morgan Kaufmann, 2001. [12] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns withoutnumber of different items in the database, and with the Candidate Generation,” Proc. ACM SIGMOD ’00, pp. 1-12, Mayincreasing minimum support values. Also, HAPPI increases 2000.computing power and saves the costs of data mining in [13] H. Kung and C. Leiserson, “Systolic Arrays for VLSI,” Proc. Sparsehardware design as compared to the previous approach. Matrix, 1976. [14] N. Ling and M. Bayoumi, Specification and Verification of SystolicFurthermore, HAPPI possesses good scalability. Arrays. World Scientific Publishing, 1999. [15] W.-C. Liu, K.-H. Liu, and M.-S. Chen, “High Performance Data Stream Processing on a Novel Hardware Enhanced Framework,”ACKNOWLEDGMENTS Proc. 10th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD ’06), Apr. 2006.The authors would like to thank Wen-Tsai Liao at Realtek [16] J.S. Park, M.-S. Chen, and P.S. Yu, “An Effective Hash Basedfor his helpful comments to improve this paper. The work Algorithm for Mining Association Rules,” Proc. ACM SIGMODwas supported in part by the National Science Council of ’95, pp. 175-186, May 1995.Taiwan under Contract NSC93-2752-E-002-006-PAE. [17] J.S. Park, M.-S. Chen, and P.S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Trans. Knowledge and Data Eng., vol. 9, no. 5, pp. 813-825, Sept./ Oct. 1997.REFERENCES [18] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient[1] R. Agarwal, C. Aggarwal, and V. Prasad, “A Tree Projection Algorithm for Mining Association Rules in Large Databases,” Algorithm for Generation of Frequent Itemsets,” J. Parallel and Proc. 21st Int’l Conf. Very Large Databases (VLDB ’95), pp. 432-444, Distributed Computing, 2000. Sept. 1995.[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining [19] H. Toivonen, “Sampling Large Databases for Association Rules,” Association Rules,” Proc. 20th Int’l Conf. Very Large Databases Proc. 22nd Int’l Conf. Very Large Databases (VLDB ’96), pp. 134-145, (VLDB), 1994. 1996.
  12. 12. WEN ET AL.: HARDWARE-ENHANCED ASSOCIATION RULE MINING WITH HASHING AND PIPELINING 795[20] C. Wolinski, M. Gokhale, and K. McCabe, “A Reconfigurable Ming-Syan Chen received the BS degree in Computing Fabric,” Proc. Int’l Conf. Eng. of Reconfigurable Systems electrical engineering from the National Taiwan and Algorithms (ERSA), 2004. University, Taipei, and the MS and PhD degrees in computer, information, and control engineer- Ying-Hsiang Wen received the BS degree in ing from the University of Michigan, Ann Arbor, computer science from the National Chiao Tung in 1985 and 1988, respectively. He was the University and the MS degree in electrical chairman of the Graduate Institute of Commu- engineering from the National Taiwan Univer- nication Engineering (GICE), National Taiwan sity, Taipei, in 2006. His research interests University, from 2003 to 2006. He is currently a include data mining, video streaming, and multi- distinguished professor jointly appointed by the media SoC design. Electrical Engineering Department, Computer Science and Information Engineering Department, and GICE, National Taiwan University. He was a research staff member at IBM T.J. Watson Research Center, New York, from 1988 to 1996. He served as an associate editor of the IEEE Transactions on Knowledge and Data Engineering from 1997 to 2001 Jen-Wei Huang received the BS degree in and is currently on the editorial board of the Very Large Data Base electrical engineering from the National Taiwan (VLDB) Journal and Knowledge and Information Systems. His research University, Taipei, in 2002, where he is currently interests include database systems, data mining, mobile computing working toward the PhD degree in computer systems, and multimedia networking. He has published more than science. He is familiar with data mining area. 240 papers in his research areas. He is a recipient of the National His research interests include data mining, Science Council (NSC) Distinguished Research Award, Pan Wen Yuan mobile computing, and bioinformatics. Among Distinguished Research Award, Teco Award, Honorary Medal of these, the web mining, incremental mining, Information, and K.-T. Li Research Breakthrough Award for his research mining data streams, time series issues, and work, as well as the IBM Outstanding Innovation Award for his sequential pattern mining are his special inter- contribution to a major database product. He also received numerousests. In addition, some of his research are on mining general temporal awards for his teaching, inventions, and patent applications. He is aassociation rules, sequential clustering, data broadcasting, progressive fellow of the ACM and the IEEE.sequential pattern mining and bioinformatics. . For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.