D0352630
Upcoming SlideShare
Loading in...5
×
 

D0352630

on

  • 346 views

www.iosrjournals.org

www.iosrjournals.org

Statistics

Views

Total Views
346
Views on SlideShare
346
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    D0352630 D0352630 Document Transcript

    • IOSR Journal of Computer Engineering (IOSRJCE)ISSN: 2278-0661 Volume 3, Issue 5 (July-Aug. 2012), PP 26-30www.iosrjournals.org Efficient Parallel Pruning of Associative Rules with Optimized Search 1 K.Sangeetha, 2Dr.P.S.Periasamy, 3S.Prakash 1 Assistant Professor (SG), S.N.S.College of Technology, Coimbatore ,2 Professor, K.S.R.College of Engineering, Tiruchengode 3 Assistant Professor (SG,) Sri Shakthi Institute of Engineering and Technology, CoimbatoreAbstract: The main focus of this research work is to propose an improved association rule mining algorithm tominimize the number of candidate sets while generating association rules with efficient pruning time and searchspace optimization. The relative association with reduced candidate item set reduces the overall execution time.The scalability of this work is measured with number of item sets used in the transaction and size of the data set.Further Fuzzy based rule mining principle is adapted in this work to obtain more informative associative rulesand frequent items with increased sensitive. The requirement for sensitive items is to have a semantic connectionbetween the components of the item-value pairs. The effectiveness of item-value pairs minimizes the searchspace to its optimality. Optimality of the search space indicates the trade off between pruning time and size ofthe data set. I. Introduction A rapid growth of information extraction from large transactional data sets fueled the demand ofknowledge discovery and associative relation between the items. To identify the most frequent transacted itemsand generates associative rules between various items, Apriori algorithm is one such most sought solution forassociation rule mining, in which scanning of transaction item was done efficiently without missing any items.So the generation of candidate item set consumes more time to generate associative rules. To overcome theslowness in associative rule pruning various strategies were discussed in literature to improve the speed of ruleformation. The approaches presented in the literatures adapted multifold iteration of the transaction data setswhich affected the time and search space for pruning the more sensitive item and its relative association. Someof the sampling methods are available, but these processes will again affect the performance of the items whichleads to missed out transaction. In this framework, many algorithms have been proposed for proficient creation of normal item sets inthe literature because the problem was first introduced. To reduce the size of the candidate item sets, the DirectHashing and Pruning (DHP) algorithm uses a hash table that results in efficient pruning of item sets. ThePartition algorithm decreases Input / Output (I/O) by examining the database only twice. II. Literature Review Ken Sun et al (2008) introduced new focus on Association Rule Mining (ARM) algorithms. Thisproposal uses w-support, which does not require pre assigned weights, but this method is constraint-based in thesense that all rules must fulfill a predefined set of conditions, such as support and confidence. However, themain goal of this algorithm is to reduce the number of generated rules. Jens Teubner et al (2011) explore how to accelerate the computation of frequent itemset using field-programmable gateway. The pipeline solution was introduced to improve the performance. It uses the minimumcount as a threshold, so it is a constraint based algorithm. Zhaonian Zou et al (2010) investigates the problem of mining uncertain graph data and especiallyfocuses on mining frequent sub graph patterns on an uncertain graph database. based on minimum support valuethe frequent sub graph pattern is minined. Claudia Marinica et al (2010) proposed a new interactive approach toprune and filter discovered rules to use ontologies in order to improve the integration of user knowledge in thepost processing task Alok Sharma et al (2008), proposed a new method to reduce the search space. In that, prior todimensionality-reduction transformation an additional rotational transform that rotates the feature vectors in theoriginal feature space around their respective class centroids in such a way that the overlap between the classesin the reduced feature space is further minimized Elena Baralis et al (2009) proposed a method called IMine index, a general and compact structurewhich provides tight integration of item set extraction in a Relational Data Base Management System. E.Hüllermeier et al (2007) proposed an algorithm which was an adaptation of the Apriori algorithm for number ofitems in the attributes. It is not easy to extend the algorithm to higher dimensional cases. www.iosrjournals.org 26 | Page
    • Efficient Parallel Pruning of Associative Rules with Optimized Search III. Problem Definition Normally, the principle of association rule mining is to mine a set of shared highly correlatedattributes/features amongst a huge number of records in a given database for knowledge discovery. The fuzzyARM algorithms are used under large datasets for a fast and efficient performance. A fuzzy ARM algorithm for generating fuzzy association rules is not a simple one. The first process isconversion of crisp dataset which consists of crisp binary and numerical attributes, into a fuzzy dataset,containing crisp binary and fuzzy binary attributes. The second process is to calculate the frequency of an itemset using the presence or absence in a transaction of the dataset, but fuzzy ARM algorithms must taken intoaccount in a particular transaction of the dataset, in addition to its presence or absence. This becomes tediousprocess. The problem of scalability and higher memory requirements are addressed in this research work bydeploying parallel pruning technique at different levels of items sets (one item set, two item set, etc.,). From therecent literature we came to know that, only Apriori and its adaptations are used for generating association rules.Thus, the Fuzzy based Optimal Search Space Pruning (FOSSP) is compared with existing fuzzy Apriori and theexecution time is recorded as in Fig.1. IV. Objective The objective is to minimize the number of candidate sets and enhancing the association rule miningalgorithm while creating an association rules by evaluating maximal information associated with each item thatoccurs in given set of transaction. Initial work starts with the evaluation of weighted association rule mining interms of item-value relational metrics. Then the number of item metrics is taken into account of the associationrule mining with reduced candidate item set. This may decrease not only the number of item sets generated butalso the overall execution time of the algorithm. Any valued attribute will be treated as item-value relationalmetrics and will be used to derive the minimal number of association rules which increased the rulesinformation content. The research work evaluates the scalability of the FOSSP (say for car purchase dataset and banktransaction data set) by considering transaction time, number of item sets used in the transaction and memoryutilization. In addition, further the work moves in the direction of fuzzy based item value of the rule miningprinciple on associative rules of the complete item sets. To evaluate the item-value relativity metric of thescalable association mining, optimal search on parallel pruning is planned for deployment as it can hold morenumber of associative information. V. Scalable Association Rule Mining Using Parallel Pruning FOSSP presented in this work, first analyze the scalability issues of association rule mining in largedata sets. Parallel pruning technique is deployed in FOSSP to mine the large transactional items simultaneouslyat different levels of items sets to improve the execution speed for generating frequent items and associationrules. The enhancement of Apriori algorithm is done by increasing the efficiency of candidate pruning phase byreducing the number of candidates that are generated for further verification. The FOSSP pruning technique useinformation associated to the number of items to estimate overlap items in the transactions. The basic elementsconsidered in the development of the FOSSP are number of transactions, average size of transaction, averagesize of the maximal large item sets, number of items, and distribution of occurrences of large item sets.The parallel pruning in FOSSP provides improvement over Apriori by generating frequent items and rules fortransaction data. It generates all candidates based on n-level frequent item sets on sorted database, and allfrequent item sets that can no longer be supported by transactions that still have to be processed. Thus theFOSSP has no longer to maintain the covers of all past item sets sequentially. The algorithm for parallel pruningtechnique to generate informative rules and strong frequent items is presented as below:5.1 Framework of FOSSP Algorithm Input: Number of Transactions and items, larger data setsOutput: Candidate items, number of informative rules, frequent items, execution timeSteps of Proceduresa. Initialize number of items and transactions from large data setsb. Generate candidate item sets with information requirementc. Reduce the candidate item with relative item valuesd. With probability ratio, generate frequent item sets (i.e., satisfy minimum support)e. Parallel prune the frequent items at different levels of the item setf. With conditional probability on parallel pruned item levels, generate strong association rules.g. Calculate execution time of frequent item set and informative association rulesh. Sort the item sets based on the frequency and information association www.iosrjournals.org 27 | Page
    • Efficient Parallel Pruning of Associative Rules with Optimized Searchi. Merge the more associated rules of item pairsj. Discard the infrequent item value pairsk. Perform Fuzzy Parallel Pruning (PP)l. Iterate the steps c to f till the required scalability mining results are achievedFuzzy PP algorithm:For each t Є TSearch the whole Transaction and return all the items Membership Function (mF) = {a Є A | 0 ≤ a ≤ 1}mF = 1 ; 0 ≤ a ≤ 1; mF = 0; otherwise Perform mapping functionEndWhere T - total transaction, t - transaction instance, A - complete item set, a - items of transaction instance.For B = (y1, y2,…yn)fuzzy set (B, n) = {n(y1)/y1,…n(yn)/ yn}Scan the transformed database Evaluate the support with the predefined Min Support value.EndWhere B – candidate item set, y1,y1..yn – frequent item set of transaction instances, n – number of instances. In FOSSP the candidate item reduction object is updated in the iteration to determine the processingitems. In the Apriori association mining algorithm, the data item read, needs to be matched against allcandidates to determine the set of candidates whose counts will be incremented. It is not possible to staticallypartition the reduction object so that different process update disjoint portions of the collection which madeparallel pruning in FOSSP more efficient. However as the pruning transaction item is more concerned inparallel, the search space for frequent item generation and item-value pair based maximal information sensitiveassociation rules becomes complex. To overcome these facts, in the next chapter, the optimization of searchspace using fuzzy rule set, is described. VI. Optimization Of Search Space Using Fuzzy Rule Set The traditional fuzzy ARM exploits a data-driven pre-processing approach which makes routine to theformation of fuzzy partitions for numerical attributes. Therefore, it converts the given data set to fuzzy data setthat desires a lesser amount of human communication for even very large datasets. Numerical attributes in thereal data sets are converted to fuzzy sets which comprises of split data sets with boundary limits. The itemvalues in the split boundaries can have the uncertainty factor which affects the quality and accuracy of fuzzyassociation rule mining. In addition the search space using fuzzy modeled association rule mining needs largermemory to accommodate larger transactions data sets. The FOSSP presented in this work, which improvesparallel pruning technique is described in chapter 5. FOSSP utilize fuzzy rule controlled feedback scheme tooptimize the search space for more effective association rule generation. The following section describes aboutvarious techniques to evaluate the scalability of association rule mining and the resultant optimal search spacefor efficient item pruning6.1 Partitioning Fuzzy Domain Set In presenting the optimal search space approach for fuzzy association rule mining process, fuzzypartition domains are made based on the user defined item-value attribute on the original dataset. To evaluatethe fuzzy data set for informative association rule mining, support and confidence metrics are redefined basedon the fuzzy binary attributes. The generation of fuzzy association rules is directly impacted by the fuzzymeasures adapted in the parallel pruning approach. The dataset is logically divided into „p‟ disjoint horizontalpartitions P1, P2… Pp. Each partition is as large as can fit in available optimal memory space. The partitions areequal-sized, though each partition could be of any arbitrary size as well.6.2 Optimal Search Fuzzy Feedback scheme for Informative Rule Generation The optimal search space with fuzzy for association rule mining deployed iterative feedback on the ruleset generation. The parallel pruning of multi-level item set is split with fuzzy data set to obtain the rules fromrespective partitioned domain, whereas the feedback scheme gets into each partitioned domain. Within thepartitioned domain, the initial rules generated for item value attributes that are governed by the optimal searchbased feedback scheme to identify the sensitivity of fuzzy binary value in one domain to other. The optimalfuzzy feedback scheme minimizes the number of rules being generated in each and every partitioned domain ofmultiple outliers which are divided into groups. www.iosrjournals.org 28 | Page
    • Efficient Parallel Pruning of Associative Rules with Optimized Search VII. Experimental Results And Discussions on FOSSP The experimental evaluation of FOSSP on identifying the results of performance metrics such asscalability, search space optimality, informative associative rules sets, and candidate set reduction. Thescalability evaluation is made on the size of the data set used and its pruning time for generating frequent itemsand association rule sets with deployment of parallel pruning of multi-level item sets simultaneously. Theoptimality of search space for parallel pruning is measured by varying large items using fuzzy ruleappropriation. For experimental purpose on the scalability issue, the samples for banking data set obtained from thelocal governmental banking streams with size of transaction data with Giga Bytes (GBs) is used . The totalnumber of distinct items was 1000 and the average number of items in a transaction was 15. No of Iteration Vs Execution time 140 120 Execution Time Execution time - 100 FOSSP (Proposed) 80 60 Execution time - 40 Fuzzy Apriori (Existing) 20 0 8 -16 16 - 32 32 - 48 48 - 60 Number of Iterations Fig. 1. Comparision of execution time with FOSSP and Fuzzy Apriori The confidence value of 90% and support value of 50% is given as an input. Normally, when thenumber of iterations for item pruning increases then execution time gradually increases. The execution time forparallel pruning is illustrated to evaluate the performance of the proposed technique, compared with the existingApriori rule generation as shown in Fig.1. In General, when the data size for item pruning increases then execution time gradually increases. Thescalability performance of FOSSP shows 2 times faster execution time compared to that of fuzzy Apriorimodels. Though the performance of scalability is considerably higher for parallel pruning, the execution timerequirement increases with the growth in the size of unique items as shown in Fig.2. Data size Vs Time 40000 35000 30000 Time (sec) Time for fuzzy based 25000 apriori (Existing) 20000 Time for FOSSP 15000 (Proposed) 10000 5000 0 1 2 3 4 5 6 7 8 9 Data size (MB) Fig. 2. Scalability evaluation with FOSSP and Fuzzy Apriori Usually, when the item set for pruning increases, the search space also gradually increase. Furtherdatasets from machine learning repository (Car Purchase Data set, Bank transaction data set) are extracted andenhanced with data size to GBs with more number of unique items. The performance of FOSSP in terms ofscalability as well as the search space requirements at each of these data sets is depicted as in Fig.3. The optimalvalue of memory for search space and the maximum size of the data set, minimal number of rule generationcovering most possible information of the data set, and candidate set reduction are evaluated. The car dataset with 20 distinct items, where the average number of items per transaction is 6 to 8 areused for the experimental evaluation of FOSSP. The total size of the dataset is 2 GBs and a confidence level(C)of 90% is used. The support counts testified with the transaction for frequent item pruning are 70%, 85%, 93%,and 62%. The execution time is improved for FOSSP with reduction of 2 to 4 times as compared to fuzzyApriori and the memory utilization reduced nearly 2 to 3 times for the data size of 2 GB Car purchase data set.With experimental result on the car purchase data set, the performance of FOSSP is improved when comparedto Fuzzy Apriori. www.iosrjournals.org 29 | Page
    • Efficient Parallel Pruning of Associative Rules with Optimized Search Number of Itemsets Vs Search Space 12000 Search Space(Bytes) 10000 8000 Search Space (FOSSP) 6000 Search Space 4000 (Fuzzy Apriori) 2000 0 24 32 36 42 Number of Itemsets Fig. 3. Comparision of search space with FOSSP and Fuzzy Apriori The performance results of FOSSP approach are evaluated with various values of support(S) rangingfrom 25% to 40%. It is concluded from the observation of the results that the proposed FOSSP approach deriveseffective item-value pair based strong association rule with optimal search space performs 25% faster thanfuzzy adapted variants of Apriori(Fig.3), based on the user defined support value. With other dataset samples,the support value is approximated for 34%, in which optimal number of item sets is generated. From these experiments, it is observed that the FOSSP approach performs most efficiently (moreaccurate rules) and speedily at the optimal support value, which occurs in the range of 15% - 20% for cardataset. Another purpose was to reduce the number of parallel pruning to the data transaction partitions inFOSSP with just one partition for support values of 20% – 40% on car data sets and 10% – 40% on bank dataset, keeping in mind that the main memory is utilized in the best manner possible, without any thrashing.Furthermore, with the fuzzy based optimal search feedback scheme, it was observed that more informative rulesfor all the attributes with more sensitive frequent item have been occurred. VIII. Conclusion The fuzzy based optimal search pruning technique presented in this research work evaluated frequentitems with more sensitive item-value pairs. The rule obtained with FOSSP generated appropriate candidate itemset that contributes to the improvement of extracting maximal informative association rules from the largetransactional data sets. Parallel pruning of item sets at multiple levels of the complete items (one item set, twoitem sets, … n item sets) decreased the execution time of the FOSSP rule mining, as the frequent items for allthe levels obtained simultaneously. Fuzzy rule is modeled to function parallel pruning with optimal search spaceand reduced the trade off between scalability of data sets and the search space for larger items. In fuzzy Apriori, search space for pruning gets increased as for larger data set which affected theperformance of association rule mining; however, FOSSP provided optimal search size for larger data sets. Theexperimental results shows that FOSSP works better in terms of time reduction when contrast to fuzzy Apriorimodel. References[1] Ken Sun and Fengshan Bai “Mining Weighted Association Rules without Pre assigned Weights “. IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 4, pp. 489-495, April 2008.[2] Jens Teubner . Rene Mueller, and Gustavo Alonso “Frequent Item Computation on a Chip”, IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 8, pp 1169-1181, August 2011.[3] Zhaonian Zou, Jianzhong Li, Hong Gao, and Shuo Zhang, “Mining Frequent Subgraph Patterns from Uncertain Graph Data”, IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 9,pp 1203-1218, September 2010.[4] Claudia Marinica and Fabrice Guillet “Knowledge-Based Interactive Postmining of Association Rules Using Ontologies”, IEEE Transactions on Knowledge and Data Engineering, Vol. 22, No. 6, pp. 784-797, June 2010.[5] Alok Sharma, and K. Kuldip Paliwal, “Rotational Linear Discriminant Analysis Technique for Dimensionality Reduction”, IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 10, pp 1336-1347, October 2008.[6] Elena Baralis, Tania Cerquitelli, and Silvia Chiusano, “IMine: Index Support for Item Set Mining”, IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 4, pp. 493-506, April 2009.[7] Hüllermeier E, Y. Yi, “Defense of Fuzzy Association Analysis”, IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, Vol. 37, No.4, pp.1039- 1043, July 2007.[8] Verlinde H, M. De Cock, R. Boute, “Fuzzy Versus Quantitative Association Rules: A Fair Data-Driven Comparison”, IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, Vol. 36, No. 3, pp. 679-683, June 2006. www.iosrjournals.org 30 | Page