Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. J.Malar Vizhi, Dr. T.Bhuvaneswari / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1197-1203 Data Quality Measurement With Threshold Using Genetic Algorithm J.Malar Vizhi1 and Dr. T.Bhuvaneswari2 1 Research scholar,Bharathiar University,Coimbatore,Tamil Nadu,India. 2 Assistant Professor in the Department of Computer Science, Government College for women, Burgoor,Tamil Nadu,India.ABSTRACT harnessed could be used to improve the decision Our basic idea is to employ association making process of an organization. However therule for the purpose of data quality volume of the archival data often exceeds severalmeasurement. Strong rule generation is an gigabytes and sometimes even terabytes. Such animportant area of data mining. We purpose a enormous volume of data is beyond the manualGenetic Algorithm to generate high quality analysis capability of human beings. Thus, there is aAssociation Rules with four metrics they are clear need for developing automatic methods forconfidence, completeness, interestingness and extracting knowledge from data that not only has acomprehensibility. These metrics are combined high accuracy but also comprehensibly andas an objective fitness function. Fitness function interestingness by user.evaluates the quality of each rule. The advantage We describe a first approach to employof using genetic algorithm is to discover high Association Rules for the purpose of data qualitylevel prediction rules is that they perform a mining. Data mining algorithms like Associationglobal search and cope better with attribute Rule Mining perform an exhaustive search to findinteraction than the greedy rule induction all rules satisfying some constraints [4]. Geneticalgorithm often used in data mining. Association Algorithm is a new approach for Association ruleRule Mining is one of the most applicable mining (ARM). Finding the frequent rules is thetechniques in data mining. Association rules that most resource consuming phase in association rulesatisfy threshold specified by the user are mining and always does some extra comparisonreferred to as strong association rules and a against the whole database [5].Although Geneticconsidered interesting. Algorithm is good at searching for undetermined solution, it is still rare to see that Genetic Algorithm,Keywords:-Association Rule, Threshold, is used to mine Association Rules. By introducingMichigan approach, Genetic Algorithm, Data Genetic Algorithm on the rules generated byquality Mining. Association Rule, it is based on four Objectives Confidence, Completeness, Comprehensibility and1. INTRODUCTION Interestingness. Data Mining is the most instrumental tool Common approach in rule generationin discovering knowledge from transactions [1, process is steps.2].The most important application of data mining is 1. Applying the minimum support to find alldiscovering association rules. This is one of the frequent itemsets in a data basemost important methods for pattern recognition in 2. Applying the minimum confidence constraints onunsupervised systems. These methods find all the frequent itemset to form the rules.possible rules in the database. Some measures likesupport and confidence are used to indicate high The process of discovering an interestingquality rules. Data mining techniques can be and an unexpected rule from large datasets is knownemployed in order to improve Data quality. Poor as ARM. The major aim of ARM is to find the set ofdata quality is always a problem in practical all subsets of items or attributes that frequentlyapplication. Data Quality Mining can be defined as occur in many database records or transactions andthe deliberate applications of Data Mining additionally, to extract rules on how a subset oftechniques for the purpose of Data quality, items influences the presence of another subset. Anmeasurement and improvement [3]. AR is a way of describing a relationship that can be Data Quality Mining is one of the most observed between database attributes, of the form. Ifimportant tasks in data mining community and is an the set of attributes values A is found together in aactive research area because the data being database record, then it is likely that the set B willgenerated and stored in databases of organizations be present also. A rule of this form ,A→B is ofare already enormous and continues to grow very interest only if it meets atleast two thresholdfast. This large amount of stored data normally requirements defines the number of database recordscontains valuable hidden knowledge, which if within which the association can be observed. The 1197 | P a g e
  2. 2. J.Malar Vizhi, Dr. T.Bhuvaneswari / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1197-1203confidence in the rule is the ratio of its support to database that have a support greater than thethat of its antecedent, ARM aims to uncover all such minimum support (the rules are frequent) and have arelationship that are present in a database for confidence greater than minimum confidence (rulesspecified thresholds of support and confidence. are strong).A rule is excavated out if and only if both of its Traditionally, ARM was predominantlyconfidence and support are greater than two used in market-basket analysis but it is now widelythresholds, the minimum confidence and the used in other application domains includingminimum support, respectively. Therefore, users customer segmentation, catalogue design, storeneed to appropriately specify these two thresholds layout, and telecommunication alarm prediction.[14]before they start this mining job. In general, each measure is associated with 3 Association Rulea threshold that can be controlled by the user. Rules The process of discovering an interestingthat do not meet the threshold are considered and an unexpected rule from large data sets isuninteresting, and hence are not presents to the user known as Association rule mining. Association ruleas knowledge. Association rule that specify both a mining is one of the most challenging areas of datauser specified minimum confidence and user mining which was introduced in Agrawal tospecified minimum support are referred to as strong discover the associations or co-occurrence amongassociation rules, and are considered interesting. the different attributes of the dataset. Several algorithms like Apriori(Agrawal), SETM(Houtsma2. RELATED WORKS and Swami), AprioriTID(Agrawal and Srikant), Existing algorithms for mining association DIC(Brin), Partition algorithm(Savasere), Pincerrules are mainly based on the approach suggested by search(Lin and Kedem), FP-tree(Han) etc have beenAgrawal et al. [6, 7]. Apriori [7], SETM [8], AIS developed to meet the requirements of this problem.[7], Pincer search [9], DIC [10] etc. are some of the These algorithms work basically in two phases: thepopular algorithms based on this approach. These frequent itemset generation and the rule generation.algorithms work on a binary database, termed as Since the first phase is the most time consuming, allmarket basket database. On preparing the market the above mentioned algorithms maintain focus onbasket database, every record of the original the first phase. A set of attributes is termed asdatabase is represented as a binary record where the frequent set if the occurrence of the set within thefields are defined by a unique value of each attribute dataset is more than a user specified threshold calledin the original database. The fields of this binary minimum support. After discovering the frequentdatabase are often termed as an item. For a database itemsets, in the second phase rules are generatedhaving a huge number of attributes and each with the help of another user parameter calledattribute containing a lot of distinct values, the total minimum confidence.number of items will be huge. Existing algorithms, The typical approach is to make strongtry to measure the quality of generated rule by simplifying assumption about the form of the rulesconsidering only one evaluation criterion, i.e., and limit the measure of rule quality to simpleconfidence factor or predictive accuracy. This properties such as support and confidence [7].criterion evaluates the rule depending on the number Support and confidence limit the level ofof occurrence of the rule in the entire database. interestingness of the generated rules. CompletenessMore the number of occurrences better is the rule. and accuracy are metrics that can be used together toThe generated rule may have a large number of find interesting association rules .The major aim ofattributes involved in the rule thereby making it Association rule mining is to find the sets of alldifficult to understand [11]. If the generated rules subsets of items or attributes that frequently occur inare not understandable to the user, the user will many databases records or transaction. Associationnever use them. Again, since more importance is Rule Mining algorithm discover high levelgiven to those rules, satisfying number of records, prediction rules in the formthese algorithms may extract some rules from thedata that can be easily predicted by the user. It IF THE CONDITION of the values of thewould have been better for the user, if the predicating attributes are TRUEalgorithms can generate some of those rules that are THENactually hidden inside the data. These algorithms do Predict values for some goal attributes.not give any importance towards the rare events, An Association Rule is a conditionali.e., interesting rules [12, 13].of mining association implication among itemsets X→Y where X→Y arerules over market basket data was first introduced itemsets and X∩Y=Ф. An itemset is said to beby Agrawal. The task in the present work we used frequent or large if it support is more than a userthe comprehensibility and the interestingness specified minimum support value. The confidencemeasure of the rules in addition to predictive of an Association Rule is given asaccuracy. According to Zaki [1999] the mining task Support(XUY)/Support(X) is the conditionalinvolves generating all association rules in the 1198 | P a g e
  3. 3. J.Malar Vizhi, Dr. T.Bhuvaneswari / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1197-1203probability that a transaction contains Y given that classification rules, having an interestingis also contains X. chromosome encoding and introducing a specific mutation operator. But the method is impracticalExample1: (adapted from Brin[1997]).Suppose when the number of attribute is large. Weiss andwe have a market basket database from a grocery Hirsh (1998) also followed the Michigan method tostore, consisting of n baskets. Let us focus on the predict rare events. Pei, Goodman, and Punchpurchase of tea (denoted by t) and coffee (denoted (1997), on the other hand, used the Pittsburghby c). When supp(t)=0.25 and supp(tUc)=0.2,we can approach for discovery of classes and featureapply the support-confidence framework for a patterns.potential association rule t→c. The support for this Although it is known that genetic algorithmrule is 0.2,which is fairly high. The confidence is the is good at searching for nondeterministic solution, itconditional probability that a customer who buys tea is still rare to see that genetic algorithm is used tobuys coffee, i.e., conf(t→c)=supp(tUc)/supp(t) mine association rules. We are going to further=0.2/.25=0.8, which is very high. In this case we investigate the possibility of applying geneticwould conclude that the rule t→c is a valid one. algorithm to the association rules mining in the The rules thus generated will influenced by following sections In the context of this research thethe choice of ARM parameters employed by the use of the term Michigan approach will denote anyalgorithm (typically support and confidence approach where each GA individual encodes athreshold values).The effect of this choice will single prediction rule. The choice between these twoaffect predictive accuracy. The accuracy can almost approaches strongly depends on which kind of rulealways improve by a suitable choice of parameter. is to be discovered. This is related to which kind ofRules whose support and confidence values are data mining task being addressed. Suppose the taskbelow user-specified thresholds are considered is classification. Then evaluate the quality of theuninteresting. In general, each measure is associated rule set as a whole, rather than the quality of a singlewith a threshold that can be controlled by the user. rule. In other words, the interaction among the rulesRules that do not meet the threshold are considered is important. In this case, the Pittsburgh approachuninteresting, and hence are not presented to the seems more natural [Frietas 2002]. However, thisuser as knowledge. approach leads to syntactically-longer individuals, Association rules that satisfy both a user- which tends to make fitness computation morespecified minimum confidence threshold and user- computationally expensive. In addition, it mayspecified minimum support threshold are referred to require some modifications to standard geneticas strong association rules, and are considered operators to cope with relatively complexinteresting. Rules that satisfy both a minimum individuals. [18]support threshold (min sup) and a minimum By contrast, in the Michigan approach theconfidence threshold (min conf) are called strong. individuals are simpler and syntactically shorter. One of the main challenges in ARM is to This tends to reduce the time taken to compute theidentify frequent itemsets in very large transactions fitness function and to simplify the design of geneticdatabases that comprise million of transactions and operators. However, this advantage comes with aitems, another main challenge is that the cost. First of all, since the fitness function evaluatesperformance of these frequent itemset mining the quality of each rule separately, now it is not easyalgorithms is heavily dependent on the user- to compute the quality of the rule set as a whole -specified threshold of minimum support . Very often i.e. taking rule interactions into account. Anothera minimum support value is too big and nothing is problem is that, since we want to discover a set offound in a database, whereas a slightly small rules, rather than a single rule, we cannot allow theminimum support leads to low-performance. Users GA population to converge to a single individualhave to give a suitable minimum support for a which is what usually happens in standard GAs.mining task. Michigan approach uses the niching method as mentioned earlier to avoid the running of GA4. GENETIC ALGORITHM FOR RULE several times, by discovering a different rule eachMINING time. This introduces the need for some kind of Usually, genetic algorithms for rules niching method.[16]mining are partitioned into two categories accordingto their encoding of rules in the population ofchromosomes (Freitas, 2003). One encoding method 5. GENETIC BASED LEARNINGis called Michigan approach, where each rule is Association rule can be represented as anencoded into an individual. Another is referred to as IF A THEN C statement. The only restriction here isPittsburgh approach, with which a set of rules are that the two parts should not have a commonencoded into a chromosome. For example, Fidelis, attribute. To solve this kind of mining problem byLopes, and Freitas (2000) gave a Michigan type of multiobjective genetic algorithm, the first task is togenetic algorithm to discover comprehensible represent the possible rules as individuals known as 1199 | P a g e
  4. 4. J.Malar Vizhi, Dr. T.Bhuvaneswari / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1197-1203individual representation. Second task is to define This part of the genetic algorithms, requirethe fitness function and then genetic materials. We great care, here there are two probabilities, onepropose to solve the ARM problem with a pareto usually called as pm, this probability will be used tobased MOGA. There have been many applications judge whether mutation has to be done or not, whenof genetic algorithms in the field of DM and the candidate fulfills this criterion it will be fed toknowledge discovery.[17] another probability and that is, locus probability that Genetic Algorithm (GA) was developed by is on which point of the candidate the mutation hasHolland in 1970. This incorporates Darwinian to be done. In the case of database provided, binaryevolutionary theory with sexual reproduction. encoding is used thus simple toggling operator isGenetic Algorithm is stochastic search algorithm required for mutation,modeled on the process of natural selection, which 5.3 Crossoverunderlines biological evolution. Genetic Algorithm Same as the case with Mutation here twohas been successfully applied in many search, probabilities are there, one for, whether crossoveroptimization, and machine learning problems.[19] has to be performed or not, i.e. pc, and other forGenetic Algorithm process works in an iteration finding the location, the point where, crossover mustmanner by generating new populations of strings be done.from old ones. Every string is the encoded binary, Due to the fact that there may be a largereal etc., version of a candidate solution. Standard number of attributes in the database, we propose toGenetic Algorithm apply genetic operators such use multipoint crossover operator.selection, crossover and mutation on an initially Multi point crossover method is widely used torandom population in order to compute a whole shuffle the different characteristics of an individualgeneration of new strings. and create children out of parents inheriting the• Selection deals with the probabilistic survival of characteristics directly. A chromosome of the Parentthe fittest, in that fit chromosome are chosen to A and the same chromosomes of the Parent B aresurvive, where fitness is a comparable measure of aligned. Each border of the genes can be used as ahow well a chromosome solves the problem at hand. crossing point (1).• Crossover takes individual chromosomes fromParents, combines them to form new ones.• Mutation alters the new solutions so as to addstochastic in the search for better solutions.In general the main motivation for using Genetic Depending on the number of multi crossingAlgorithms in the discovery of high-level prediction points the crossing method chooses the crossingrules is that they perform a global search and cope points randomly (2).better with attribute interaction than the greedy ruleinduction algorithms often used in data mining. Thissection discusses the aspectsof Genetic Algorithms for rule discovery.5.1 Selection: We used tournament selection somewhat Once chosen this points the children can betailored for our data mining-oriented fitness created. Lets start with child A. Child A gets thefunction, as follows. In a conventional version of first genes of parent A until the first crossing pointtournament selection, first k individuals are is reached. Then all genes of parent B are copied torandomly picked, with replacement, from the the chromosome of child A until the next crossingpopulation. Then the individual with the best fitness point is reached. This procedure is continued untilvalue, out of the k individuals, is selected as the the whole chromosome of child A is generated (3).winner of the tournament. This process is repeated Ptimes, where P is the population size. we have modified our tournament selectionprocedure so that the winner of the tournament is theindividual with the largest fitness among the set ofindividuals whose Information Gain is greater thanor equal to the average Information Gain of all The chromosome of child B is created inthe k individuals playing the tournament. (We have the similar way but starting copying the genes ofused a tournament size (k) of 5.) Once the parent B first (4).tournament selection process has selected Pindividuals, these individuals undergo crossover andmutation5.2 Mutation 1200 | P a g e
  5. 5. J.Malar Vizhi, Dr. T.Bhuvaneswari / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1197-1203 Comprehensibility=log(1+|C|)/log(1+|A&C|) (4) Where |A&C| is the number of examples that both satisfy the antecedent A and have the class predicted by the consequent C and |C| is the number of examples satisfying all the conditions in the consequent6. EVALUATING THE QUALITY OF A The fitness function is calculated as the arithmeticRULE: weighted average confidence, completeness, The fitness function used in this paper interestingness and comprehensibility. The fitnessconsists of four metrics including. We combine function (f(x)) is given by:these metrics into an objective fitness function. Thecomplementary set of measures include confidencedefined in equation (1), completeness in equation(2) , Interestingness defined in equation (3) andcomprehensibility defined in equation (4) . (5 )Let a rule be of the form: IF A THEN C, where A is Where w1,w2,w3.w4 are used defined weightsthe antecedent (a conjunction of conditions) and C isthe consequent (predicted class), as discussedearlier. A very simple way to measure the predictive 7. THE PROPOSED ALGORITHM Our approach works as follows:accuracy of a rule is to compute the so-called 1. Startconfidence factor (CF) of the rule, defined as: 2. Load a sample of records from the database that fits into the memory.CF = |A & C| / |A| (1) 4. Apply Apriori algorithm to find the frequent itemsets with the minimum support threshold. Where |A| is the number of examples Suppose S is set of the frequent item set generatedsatisfying all the conditions in the antecedent A and by Apriori algorithm and th is the threshold.|A & C| is the number of examples that both satisfy 5. Apply Genetic Algorithm for generation of allthe antecedent A and have the class predicted by the rules.consequent C. 6. Set Q=Ø where Q is the output set, which We can now measure the predictive contains the entire association rule.accuracy of a rule by taking into account not only its 7. Set the Input termination condition of geneticCF but also a measure of how “complete” the rule, i.e. the proportion of examples is, having the 8. Represent each frequent itemset of S as binarypredicted class C that is actually covered by the rule encoding, string using the combination ofantecedent. The rule completeness measure is representation specified in method above.computed by the formula: 9. Select the two members (string) from the frequent item set using tournament method.Completeness = |A & C| / |C| (2) 10. Apply Genetic Algorithm operators using multipoint crossover on the selected members Where |C| is the number of examples (string) to generate the association rules.satisfying all the conditions in the consequent C and 11.Find confidence factor, completeness,|A & C| is the number of examples that both satisfy comprehensibility and interestingness for x=>y eachthe antecedent A and have the class predicted by the rule.consequent C. 12. If generated rule is better than previous rule thenThe rule interestingness measure is computed by the 13. Set Q = Q U {x =>y}formula: 14. If the desired number of generations is not completed, then go to Step 3Interestingness= |A&C| - (|A|*|C|)/N (3) 15. Decode the chromosomes in the final stored generations and get the generated rules. Where |A&C| is the number of examples 16. Select rules based on comprehensibility andthat both satisfy the antecedent A and have the class interestingnesspredicted by the consequent C minus the product of|A| is the number of examples satisfying all the 17. Stop.conditions in the antecedent A and |C| is the numberof examples satisfying all the conditions in theconsequent. The Following expression can be used 8. RESULTS AND DISCUSSIONto quantify the comprehensibility of an associationrule 1201 | P a g e
  6. 6. J.Malar Vizhi, Dr. T.Bhuvaneswari / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1197-1203 Experiments were conducted using real IF bone inworld Primary-tumor dataset. The Primary-tumor [no] --database contains 18 attributes and 339 instances. axillar inDefault Values of the parameters are Population [no] --size: 339, mutation rate=0.5,crossover supraclavirate=0.8,threshold=5. cular inRule induction using Tanagra data mining software [no] -- skin in [no] -- pleura in age in 0. 2 8 3 11 1.4 [no] -- [>=60] 8 peritoneu m in [no] - - brain in [no] -- liver in [no] -- histologic- type in [adeno]Fig 1 Tanagra produces association rules IF liver in [yes] -- Following table is the results of the abdominalexperiment conducted. In these tests, different in [no] --predictions were made by combining different axillar inattributes to determine a result [no] -- age in 0. 3 14 3 14 1.4Table 1. Rules generated from Primary-tumor mediastin [>=60] 8dataset. um in [no] -- bone in R [no] -- u Discovered rule pleura in C [no] l (antecedent- o e >consequent) Acc/ Com Int m F CF p er pr n IF e o consequ histologic- . antecedent ent type in [epidermo id] -- age in 12 IF liver in 4 0 1 1 1.9 degree-of- [>=60] 7 [yes] -- diffe in peritoneu [fairly] -- m in [no]- skin in - axillar in age in 0. [no] 1 11 3 13 1.4 [no] -- [>=60] 8 abdominal in [yes]-- (note:acc/cf-Predictive Accuracy/Confidence factor, pleura in comp-completeness,inter-interestingness,compre- [no] comprehensibility, and F-fitness function) 9. CONCLUSION We have used multi objective evolutionary framework for ARM, offers tremendous flexibility. Each rule is influenced by threshold, controlled by the user. Association rules that satisfy both a user- specified minimum confidence threshold and user- specified minimum support threshold are referred to as strong association rules, and are considered 1202 | P a g e
  7. 7. J.Malar Vizhi, Dr. T.Bhuvaneswari / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 2, Issue4, July-August 2012, pp.1197-1203interesting. We have used tournament selection and [11] M.V. Fedelis, Discovering comprehensiblemultipoint crossover methods; it is flexible for large classification rules with a geneticnumber of attributes in the database. We have algorithm, in: Proceedings of Congress onadopted Michigan approach to represent the rules as Evolutionary Computation, 2000chromosomes, where each chromosome represents a [12] A.A. Freitas, Data Mining and Knowledgeseparate rule. We can use other technique to Discovery with Evolutionary Algorithms,minimize the complexity of the genetic algorithm. Springer-Verlag, New York, 2002.Moreover, we tested the approach only with the [13] E. Noda, Discovering interesting predictionnumerical and categorical valued attributes. We also rules with a genetic algorithm, Proceedingsintend to extend the algorithm proposed in this of the Congress on Evolutionarypaper to cope with continuous data. Computation 1999, pp. 1322–1329. [14] Peter P. Wakabi-Waiswa, VenansiusREFERENCES Baryamureeba, and Karunkaran sarukesi [1] K.J. Cios, W.Pedryc, R.W Swiniarki, “Data ,Optimized Association Rule Mining With Mining Methods for Knowledge Genetic Algorithm, Proceedings of Discovery”, kluwer Academic Publishers, Seventh International conference in natural Boston, MA (2000). computation IEEE 2011,1116-1120. [2] L.Jain, A.Abraham, R.Goldberg, [15] x.yan,CH,zhang,and sh hang,”Genetic “Evolutionary Multiobjective algorithm-based strategy for identifying Optimization” , Second Edition, Springer association rules without specifying actual (2005). minimum support”,Eleviser Expert [3] Jochen Hipp, Ulrich G¨untzer and Udo Systems with applications,vol,36,no,2,208 Grimmer, “Data Quality Mining - Making pp3066-3076 a Virtue of Necessity”, In Proceedings of [16] Z. Michalewicz, “Genetic Algorithms + the 6th ACM SIGMOD Workshop on Data Structure = Evolution Programs”, Research Issues in Data Mining and Springer-Verlag, Berlin, 1994. Knowledge Discovery (DMKD) 2001. [17] GHOSH, A., NATH, B. 2004. Multi- [4] R. Agrawal, R. Srikant, “Fast algorithms objective rule mining using genetic for mining association rules”, in algorithms.InformationSciences 163 pp Proceeding of the 20th Int’l Conference on 123-133. Very Large Databases, Chile, 1994 [18] A.A. Freitas, A survey of evolutionary [5] Ali Hadian,Mahdi Nasiri,Behrouz Minaei- algorithms for data mining and knowledge Bidgoli “Clustering Based Multi-objective discovery, in: A. Ghosh, S. Tsutsui (Eds.), Rule Mining using Genetic Algorithm” Advances in Evolutionary Computing, International Journal of Digital Content Springer-Verlag, NewYork, 2003, pp. 819– Technology and its Applications, Volume 845. 4,Number 1(2010) [19] DEHURI, S., JAGADEV, A. K., GHOSH [6] R. Agrawal, R. Srikant, Fast algorithms for A. AND MALL R. 2006. Multi-objective mining association rules, in: Proceedings of Genetic Algorithm for Association Rule the 20th International Conference on Very Mining Using a Homogeneous Dedicated Large Databases, Santiago, Chile, 1994. Cluster of Workstations. American Journal [7] R. Agrawal, T. Imielinski, A. Swami, of Applied Sciences 3 (11):2006, 2086- Mining association rules between sets of 2095, ISSN 1546-9239 items in large databases. in: Proceedings of ACM SIGMOD Conference on Management of Data, 1993, pp. 207–216. [8] Houtsma and M. Swami, Set-oriented mining of association rules, Research Report, 1993. [9] D.I. Lin and Z.M. Kedem, Pincer-search: an efficient algorithm for discovering the maximal frequent set, Proceedings of 6th European Conference on Extending Database Technology,1998. [10] S. Brin, et al., Dynamic itemset counting and implication rules for market basket data, in: Proceedings of ACM SIGMOD International Conference on Management of Data, Tucson,AZ, 1997. 1203 | P a g e