Published on

1 Comment
  • In the step 6, what algorithm did you used to generate itemsets
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Vikram Rajpoot, Prof. Shailendra ku. Shrivastava, Prof. Abhishek Mathur/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.2210-2215 An Efficient Constraint Based Soft Set Approach for Association Rule MiningVikram Rajpoot, Prof. Shailendra ku. Shrivastava, Prof. Abhishek Mathur M.Tech,Department of IT ,SATI Vidisha Head, Department of IT ,SATI Asst. Prof. ,Department of IT,SATIABSTRACT In this paper, we present an efficient (1)approach for mining association rule which isbased on soft set using an initial support asconstraints. In this paper first of all initialsupport constraint is used which can filter out N = Total number of transactionthe false frequent item and rarely occurs items.Due to deletion of these items the structure ofdataset is improved and result produced is (2)faster, more accurate and take less memorythan previous approach proposed in paper a softset approach for association rules mining . After In this X is antecedent and Y is consequent.the deletion of these items the improved dataset The rule X→Y has support s% in the transactionis transformed in to Boolean-valued information set D if s% of transactions in D contain X∪ Y. Thesystem. Since the ‘‘standard” soft set deals with rule has confidence c% if c% of transactions in Dsuch information system, thus a transactional that contain X also contain Y. The goal ofdataset can be represented as a soft set. Using association rule mining is to find all the rules withthe concept of parameters co-occurrence in a support and confidence exceeding user specifiedtransaction, we define the notion of regular thresholds. Many algorithms of association rulesassociation rules between two sets of mining have been proposed. The association rulesparameters, also their support, confidence and method was developed particularly for the analysisproperly using soft set theory. The results show of transactional databases.that our approach can produce strongassociation rules faster with same accuracy and A huge number of association rules can beless memory space. found from a transactional dataset.The rules thatKeywords:-Association rules mining, Boolean- satisfy the minimum support threshold andvalued information systems, Soft set theory, minimum confidence threshold is called the strongItems co-occurrence, red_sup constraint. association rules and rest of the rules is discrded.I. INTRODUCTION 1.2 Soft set1.1 Association rule Soft set theory [7], proposed byAssociation rule is one of the most popular data Molodtsov in 1999, is a new general method formining techniques and has received considerable dealing with uncertain data. Soft sets are calledattention, particularly since the publication of the (binary, basic, elementary) neighborhood systems.AIS and Apriori algorithms [2,3]. They are As for standard soft set, it may be redefined as theparticularly useful for discovering relationships classification of objects in two distinct classes, thusamong data in huge databases and applicable to confirming that soft set can deal with a Boolean-many different domains including market basket valued information system. Molodtsov [7]and risk analysis in commercial environments, pointed out that one of the main advantages of softepidemiology, clinical medicine, fluid dynamics, set theory is that it is free from the inadequacy ofastrophysics, and crime prevention. the parameterization tools, unlike in the theories of fuzzy set [8]. Since the „„standard” soft set (F,E)The association rules are considered interesting if it over the universe U can be represented by asatisfies certain constraints, i.e. predefined Boolean-valued information system, thus a soft setminimum support (min_sup) and minimum can be used for representing a transactional dataset.confidence (min_conf) thresholds.For Rule X→Y Therefore, one of the applications of soft set theorytheir support and confidence is calculated as: for data mining is for mining association rules. However, not many researches have been done on this application. 2210 | P a g e
  2. 2. Vikram Rajpoot, Prof. Shailendra ku. Shrivastava, Prof. Abhishek Mathur/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.2210-2215Definition: - A pair (F,E) is called a soft set over Each approximation has two parts, a predicate pU, where F is a mapping given by: and an approximate value set v. For example, for the approximation “expensive F : E→ P(U) (3) houses = { h2 ,h4 } ”, we have the predicate name of expensive houses and the approximate value set In other words, a soft set over U is a or value set { h 2 ,h4 } .Thus, a soft set (F, E) canparameterized family of subsets of the universe U. be viewed as a collection of approximations below:For e belongs E, F(e) may be considered as the setof e-elements of the soft set (F,E) or as the set of e- (F,E)= { p1 = v1, p2 = v2 ,p3 = v3,….., pn = vn }approximate elements of the soft set. Clearly, asoft set is not a (crisp) set. Tabular representation of soft set To illustrate this idea, let we consider the U e1 e2 e3 e4 e5following example.Example . Let we consider a soft set (F, E) which h1 0 1 0 1 1describes the “attractiveness of houses” that Mr. X h2 1 0 0 0 0is considering to purchase.Suppose that there aresix houses in the universe U under consideration, h3 0 1 1 1 0U = { h1 , h2 , h3 , h4 , h5 , h6 } h4 1 0 1 0 0and h5 0 1 1 0 0E = { e1 , e2 , e3 , e4 , e5 } h6 0 0 0 0 0is a set of decision parameters, where e1 stands for Fig. 2 Soft set in Boolean systemthe parameters “expensive”, e2 stands for theparameters “beautiful”, e3 stands for the Now here we summarize our paper sectionparameters “wooden”, e4 stands for the parameters 2 describe the previous related works. Section 3“cheap”, e5 stands for the parameters “in the green describe our proposed approach and section 4surrounding”. describe our implementation and result of proposed CSS approach and section 5 conclude our paper.Consider the mapping from equation (3) II. RELATED WORK F : E → P(U), In the previous paper A soft set approachgiven by “houses (.) ”, where (.) is to be filled in by for association rule mining [1] there are directone of parameters e belongs to E.Suppose that applicability of soft set on the Boolean valued information system that contains large number ofF(e1)={h2, h4} F(e2)={h1, h3} F(e3)={h3, h4, h5} false frequent item and also contains rare itemsF(e4)={h1, h3, h5} F(e5)={h1} whose support is less than initial support. Due to the presence of such items in database the previous Therefore F(e1) means “houses (expensive)”, approach is slow in result generation. These falsewhose functional value is the set { h2 ,h4} . Thus, frequent item and rare item is neither be frequentwe can view the soft set ( F, E ) as a collection of and no interesting rule is generated with the help ofapproximations as below these items. These items is removed when we generated the frequent pattern latter in the process with the help of min_sup. If these item not deleted from input transaction then time complexity and space complexity of the approach is increased. Therefore previous approach has high time and space complexity. In the previous papers methods proposed to found out association rule from the transaction dataset. These method is based on Rough set [16,18] to find association rule. In these method rough set is used to find the association rule on the basis of decision table .In these methods first of all find the conditional attribute and on the basis of Fig. 1 Soft set example which we construct the decision table. This decision table is used to find the association rules 2211 | P a g e
  3. 3. Vikram Rajpoot, Prof. Shailendra ku. Shrivastava, Prof. Abhishek Mathur/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.2210-2215in the IF-THEN context. With the help of Rough 40%.The transaction dataset is used as an input forset for association rule we find rule with less the proposed example is shown in Fig. 3 . Weresponse time than traditional techniques [14,15] of perform different steps of our CSS algorithm on itassociation rule mining. But in the rough set based and also show the results of the step in the figureapproach the decision table is maintain and then which is shown after the step is apply on theassociation rule is derived from that decision table dataset.The figure give the clear view of theis also time consuming in rule generation. operation performed by the various step.III. PROPOSED WORK TID Items In our proposed approach we reduce thedataset with the help of initial red_sup.Due to this 1 Canada , Iran , USA, crude, shipthe false frequent items and rare items is eliminatedor deleted from the input transaction dataset and 2 Canada , Iran , USA, crude, Coffee,shipthe response time of rule generation is increased 3 USA, earn.The algorithm of our proposed work is describedbelow. 4 USA, jobs , cpi3.1 Proposed CSS algorithm 5 USA, jobs , cpiInput :- transaction dataset D (N is the total numberof transactions, n is the total number of items 6 USA, earn ,corn, cpipresent), initial red_sup (initial reduced support),min_sup (minimum support threshold), min_conf 7 Canada , sugar , tea(minimum confidence threshold) . 8 Canada , USA ,Africa, trade, acqOutput :- Strong Association rule. 9 Canada , USA , trade, acqAlgorithm 10 Canada , USA , earnStep 1 :- Scan the dataset D for alltransactions 1 to N. Fig. 3 Transaction datasetStep 2 :- Calculate the support of all items present Now the first step of our proposed algorithmin the transaction dataset. is apply means scan the transaction dataset and generate the support of various items present in theStep 3 :- for all items in dataset dataset. If initial red_sup is greater than item The result of second step generate the support ofsupport than delete that item from transaction various is shown below.dataset. sup{canada}= 6 sup{USA}= 9 sup{Iran}= 2Step 4:- Convert the reduced dataset obtained instep 3 into Boolean valued information system sup{trade}= 2 sup{acq}= 2 sup{ sugar}=1S=(U,A,V{0,1},F). sup{tea}=1 sup{earn}= 3 sup{crude}=2Step 5:- Apply the soft set (F,E) on the Booleanvalued information system S . sup{corn}= 1 sup{Africa} =1 sup{coffee}=1Step 6:- Apply the principle of parameter co- sup{cpi}=3 sup{ship}=2occurrence and calculate the count of various Fig. 4 support of various itemsitemsets.Step 7:- Generate the association rule from the Result of the second step is shown abovefrequent patterns and check with min_conf i.e. the support of various items that present inthreshold to find out the rule is strong or not. transaction dataset.Now we apply step 3 of our approach delete those items from transactionStep 8 :- End. dataset whose support is less than red_sup threshold. Since the minimum red_sup threshold is3.2 Proposed method Example 2 then result of step 3 the reduced dataset is shown below in fig. 5. Fig. 3 shows the input transaction datasetthat contain 10 transactions. Suppose initial TID Itemsred_sup is 2 ,min_sup is also 2 and confidence is 2212 | P a g e
  4. 4. Vikram Rajpoot, Prof. Shailendra ku. Shrivastava, Prof. Abhishek Mathur/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.2210-22151 Canada , Iran , USA, crude, ship coo(u8)= Canada , USA , trade, acq2 Canada , Iran , USA, crude,ship coo(u9)= Canada , USA , trade, acq3 USA, earn coo(u10)= Canada , USA , earn.4 USA, jobs , cpi Fig. 7 Parameter co-occurance5 USA, jobs , cpi Now with the help of parameter co- occurance we calculate the support of various6 USA, earn , cpi itemsets .The support of various itemsets is shown below.7 Canada Sup{canada}={ u1,u2,u7,u8,u9,u10}=68 Canada , USA , trade, acq Sup{USA}={u1,u2,u3,u4,u5,u6,u8,u9,u10}=99 Canada , USA , trade, acq Sup{Iran} = {u1,u2} = 210 Canada , USA , earn Sup{canada,USA} = {u1,u2,u5,u9,u10} = 5Fig. 5 Reduced transaction dataset Sup{canada,Iran} = {u1,u2} = 2 The support of item Sugar ,Tea ,Africa,Corn, Coffee is 1 which is smaller than the Sup{canada,Iran,USA} = {u1,u2} = 2predefined red_sup threshold therefore these itemsis deleted from the original transaction dataset and Sup{crude} = {u1,u2} = 2after deletion of these items we get the moreaccurate dataset that contains no false frequent Sup{ship} = {u1,u2} = 2items and no rare items. Sup{earn} = {u3,u6,u10} = 3 Now we apply the step 4 of our algorithmconvert the reduced dataset of step 3 into Boolean Sup{jobs} = {u4,u5} = 2valued information system.In Step 5 soft set isapply to the Boolean valued information system Sup{cpi} = {u4,u5,u6} = 3obtained from the step 4.Result of step 5 is shown Sup{trade} = {u8,u9} = 2below. Sup{acq} = {u8,u9} = 2(F,E)={canada={1,2,7,8,9,10}USA={1,2,3,4,5,6,8,9,10}Iran={1,2} Fig. 8 Support of itemsetstrade={1,2} acq={8,9} earn={3,10}crude={1,2} cpi={3,10} ship={1,2} jobs={4,5} } In the last step we generate association rule from the frequent patterns generate in the step 6Fig. 6 Soft set representation and check the rules satisfy the min_conf threshold.Rules that satisfies the min_conf After the sot set is apply in step 5 we apply threshold is strong association rules is acceptedthe parameter co-occrance to generate the support and rules that not satisfied the min_conf thresholdof various combination of itemsets and deletet is not strong association rules and rejcted.those items set whose support is less thanmin_sup.The result of step 6 shown below. Usa,Canada → shipcoo(u1) = Canada , Iran , USA, crude, ship Conf(Usa,Canada → ship ) =2 / 5coo(u2) = Canada , Iran , USA, crude,ship Conf(Usa,Canada → ship )= 40%coo(u3)= USA, earn Therefore confidence of rule Usa,Canada→ ship is 40% which is equal to min_confcoo(u4)= USA, jobs , cpi threshold.Thresfore this rule is strong associationcoo(u5) = USA, jobs , cpi rule and accepted. In the same manner all other rules is generated and their confidence is calculatedcoo(u6) = USA, earn , cpi then on basis of min_conf thresholds we decide rule is strong or not.coo(u7)= Canada 2213 | P a g e
  5. 5. Vikram Rajpoot, Prof. Shailendra ku. Shrivastava, Prof. Abhishek Mathur/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.2210-2215IV. EXPERIMENT RESULT In this section, we compare the proposed Now we show the Memory bar graph whichCSS method for association rules mining with the repersent the memory used in the process (1) is softalgorithm of [1]. The proposed approach CSS and set and (2) is CSS approach.Previous soft set[1] is executed on dataset derivedfrom [20]. The algorithm of the proposed approachis implemented in MATLAB version A Dataset derived from the widely usedReuters-21578 [20].It contains 30 transactions withTIDs 1 to 30 and contains 10 items labelled P1 toP10.Now we show the execution time graphbetween CSS approach and Soft set approach.Inexecution graph the X-axis indicate the 6 functionof approaches and Y-axis indicate time insecond.After which we show the bar graph ofmemory used between CSS approach and Soft setapproach and finaly we give the table that compareexcution time differences as the Min_sup andMin_conf threshold is change.Fig. 9 Execution time comparision min_sup=2 Fig. 11 Memory comparision Here we give the tabular comparision of execution time between Soft set and CSS approach as the min_sup threshold is change. Min sup Min conf Soft set CSS approach approach (execution (execution time in time in sec) sec) 2 .6 .07794 .06225 Fig.9 Execution time comparision 3 .6 .06438 .03994 min_sup=3 4 .6 .06323 .02759 5 .6 .05507 .0189 Table 1 Result Analysis It is clear from the resullt shown above that our proposed CSS approach is faster and efficient than the previous Soft Set approach Table 1 Result analysis V. CONCLUSION Soft set approach for association rule mining [1] is a new method for finding association rule .With the help of soft set we can handle the uncertainty present in the dataset. This approach has more time and space complexity and also has chances of some inaccurate result due to the presence of some false frequent items and rareFig.10 Execution time comparision min_sup=3 items that never be frequent. In our proposed 2214 | P a g e
  6. 6. Vikram Rajpoot, Prof. Shailendra ku. Shrivastava, Prof. Abhishek Mathur/ International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.2210-2215approach firstly we reduce these items from input problems, Journal of Computational andtransaction dataset with the help of initial red_sup Applied Mathematics 203 (2007) 412–and then convert that reduced dataset in to Boolean 418.valued information system.In the next step we [12] Y. Zou, Z. Xiao, Data analysis approachesapply soft set to handle uncertanity of information of soft sets under incomplete information,system.Now we apply the parameter co-occurance Knowledge Based Systems 21 (2008)on the soft set to generate the count of various 941–945.itemset and then generate the resulting strong [13] Z. Kong, L. Gao, L. Wang, S. Li, Theassociation rule.In our approach due to deletion of normal parameter reduction of soft setsfalse frequent items and rare items the space and and its algorithm, Computers andtime complexity is reduced and the generate the Mathematics with Applications 56 (2008)result with less time and take less memory space 3029–3037.and same accurate than previous approach[1]. [14] R. Feldman, Y. Aumann, A. Amir, A. Zilberstein, W. Klosgen, MaximalREFERENCES association rules: a new tool for mining [1] T. Herawan, M.M. Deris A soft set for keywords cooccurrences in document approach for association rule mining collections, in: The Proceedings of the Knowledge-Based Systems 24 (2011) KDD 1997, 1997, pp. 167–170. 186–195. [15] A. Amir, Y. Aumann, R. Feldman, M. [2] R. Agrawal, T. Imielinski, A. Swami, Fresco, Maximal association rules: a tool Mining association rules between sets of for mining associations in text, Journal of items in large databases, in: Proceedings Intelligent Information Systems 25 (3) of the ACM SIGMOD International (2005) 333–345. Conference on the Management of Data, [16] J.W. Guan, D.A. Bell, D.Y. Liu, The 1993, pp. 207–216. rough set approach to association rule [3] R. Agrawal, R. Srikant, Fast algorithms mining, in: The Proceedings of the Third for mining association rules, in: IEEE International Conference on Data Proceedings of the 20th International Mining (ICDM‟03), 2003, pp. 529–532. Conference on Very Large Data Bases [17] P.K. Maji, R. Biswas, A.R. Roy, Soft set (VLDB), 1994, pp. 487–499. theory, Computers and Mathematics with [4] M. Mat Deris, N.F. Nabila, D.J. Evans, Applications 45 (2003) 555–562. M.Y. Saman, A. Mamat, Association rules [18] Y. Bi, T. Anderson, S. McClean, A rough on significant rare data using second set model with ontologies for discovering support, International Journal of Computer maximal association rules in document Mathematics 83 (1) (2006) 69–80. collections, Knowledge-Based Systems 16 [5] A.H.L. Lim, C.S. Lee, Processing online (2003) 243–251. analytics with classification and association rule mining, Knowledge- Based Systems 23 (3) (2010) 248–255. [6] Y.L. Chen, C.H. Weng, Mining fuzzy association rules from questionnaire data, Knowledge-Based Systems 22 (1) (2009) 46–56. [7] D. Molodtsov, Soft set theory-first results, Computers and Mathematics with Applications 37 (1999) 19–31. [8] L.A. Zadeh, Fuzzy set, Information and Control 8 (1965) 338–353. [9] P.K. Maji, A.R. Roy, R. Biswas, An application of soft sets in a decision making problem, Computers and Mathematics with Applications 44 (2002) 1077– 1083. [10] D. Chen, E.C.C. Tsang, D.S. Yeung, X. Wang, The parameterization reduction of soft sets and its applications, Computers and Mathematics with Applications 49 (2005) 757–763. [11] A.R. Roy, P.K. Maji, A fuzzy soft set theoretic approach to decision making 2215 | P a g e