Prioritization of association rules in data mining: Multiple ...


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Prioritization of association rules in data mining: Multiple ...

  1. 1. Expert Systems with Applications 29 (2005) 867–878 Prioritization of association rules in data mining: Multiple criteria decision approach Duke Hyun Choia,*, Byeong Seok Ahnb, Soung Hie Kima a Graduate School of Management, Korea Advanced Institute of Science and Technology (KAIST), 207-43 Cheongryangri-Dong, Dongdaemun, Seoul 130-012, South Korea b Department of Business Administration, Hansung University, 389 Samsun-Dong 2, Sungbuk, Seoul 136-792, South Korea Abstract Data mining techniques, extracting patterns from large databases are the processes that focus on the automatic exploration and analysis of large quantities of raw data in order to discover meaningful patterns and rules. In the process of applying the methods, most of the managers who are engaging the business encounter a multitude of rules resulted from the data mining technique. In view of multi-faceted characteristics of such rules, in general, the rules are featured by multiple conflicting criteria that are directly related with the business values, such as, e.g. expected monetary value or incremental monetary value. In the paper, we present a method for rule prioritization, taking into account the business values which are comprised of objective metric or managers’ subjective judgments. The proposed methodology is an attempt to make synergy with decision analysis techniques for solving problems in the domain of data mining. We believe that this approach would be particularly useful for the business managers who are suffering from rule quality or quantity problems, conflicts between extracted rules, and difficulties of building a consensus in case several managers are involved for the rule selection. q 2005 Elsevier Ltd. All rights reserved. Keywords: Rule prioritization; Rule conflict; Association rule mining; ELECTRE 1. Introduction desirability of alternative outcomes, are associated with criteria in a problem solving or decision making situation Under intense competition forcing companies to develop (White, 1990). Whilst separately developed, both data and maintain competitive marketing activity, techniques mining and decision analysis have conceptual basic from the disciplines of both data mining and decision denominators, providing systematic methods for problem analysis have been extensively used in the development of solving and decision making in view of objectives (decision computerized decision aids. Data mining techniques, aiding) and delivery vehicle (the computer). extracting patterns from large databases, are the processes From the perspective of many types of practical decision that focus on the automatic exploration and analysis of large aiding applications, however, both data mining and decision quantities of raw data in order to discover meaningful analysis techniques have some limitations. Particularly, in patterns and rules (Agrawal, Imielinski, & Swami, 1993; decision support system development, there is little effort Agrawal & Srikant, 1994; Song, Kim, & Kim, 2001). On the for generating synergies with complementing each other’s other hand, decision analysis is concerned with applying limitations. More specifically, user preferences, which play decision theory to real-world problems to help companies a key role in decision aids with decision analysis, are not through the decision making process in which decision explicitly considered in the current generation of data makers’ preferences that are important for judging the mining systems. Even if they are (indirectly) addressed, they are constrained to the preferences of the data mining * Corresponding author. Tel.: C82 2 958 3684; fax: C82 2 958 3604. engineers by the use of threshold values rather than the E-mail addresses: (D.H. Choi), bsahn@ decision makers’ preferences that should be continuously (B.S. Ahn), (S.H. Kim). adjusted to the current dynamic business environment. 0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. Normative decision analysis, on the other hand, is doi:10.1016/j.eswa.2005.06.006 usually built around a prescriptive and rigid problem
  2. 2. 868 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 structure called a decision analysis model. Decision analysis of little interest for the decision making. The quality and is not compatible with extracting knowledge from large quantity problems of these rules in traditional data mining corporate databases of nowadays, since it does not focus on applications stem from the fact that rules are extracted and the automotive generation of meaningful knowledge from selected only with statistical criteria, not concerning with raw data. It can be, thus, meaningful to complement the the business values that are closely related with business techniques from data mining with those from decision strategies. Since one of the most important considerations analysis: data mining for alternatives (i.e. rules) generation when firms extract rules from large database is the business from large database and decision analysis for prioritizing values that the firms could expect to obtain by decision those alternatives by reflecting decision makers’ making relied on those rules, some ways for reflecting preferences. business requirements in the process of rule selection, thus, In existing data mining techniques, there exist some are necessary for effective decision aids. situations that make necessary the prioritization of rules for The importance of each of business values, on the other selecting and concentrating on more valuable rules due to hand, varies according to the environments with which firms the number of qualified rules (Tan & Kumar, 2000) and face and their business strategies. For example, after an limited business resources. Even though the purpose of data Internet shopping mall engages in business, the frequency of mining is rule (pattern) extraction that is valuable for rules might be considered relatively more important than decision making, patterns are deemed ‘interesting’ just on others since more frequent transactions with the customers the basis of passing certain statistical tests such as support/ may imply wide acknowledgement in the market. Later, the confidence in data mining. To the enterprise, however, it monetary value of rules might become increasingly remains unclear how such patterns can be used to maximize important with the purpose of maximizing profit from the business objectives. The major obstacle lies in the gap loyal customers. For the firms dealing with the fashionable between statistic-based summaries (the statistic-based items such as clothes, music CD, or computer games, it pattern extraction) extracted by traditional rule mining and might be considered to be important to capture trends in a profit-driven action (the value-based decision making) customers purchasing patterns since such items fall into the required by business decision making (Wang, Zhou, & Han, category having relatively short life cycle. 2002) which is characterized by explicit consideration of Recognizing those various requirements, researchers in conflicts of business objectives (Bar & Feigenbaum, 1981; data mining domain began to give attention to the business Hayes-Roth, Waterman, & Lenat, 1983) and by multiple values in pattern extraction. Some researchers focused on decision makers’ involvement for corporate decision discovering time trends and differences between datasets making. (Dong & Li, 1999; Song et al., 2001) or capturing trends of In summary, the research objective is to prioritize patterns by highlighting recent dataset (Zhang et al., 2003). association rules resulted from the data mining, taking Another group of studies deal with pruning or ranking into account their business values by explicitly incorporat- association rules according to their frequency, statistical ing the conflicting criteria of business values and by the interdependence, and interestingness measures (Hilderman managers’ preference statements toward their trade-off & Hamilton, 2001; Tan & Kumar, 2000). Recommending conditions. For this purpose, a decision analysis method, products to customers by considering expected profit (Kitts e.g. the Analytic Hierarchy Process (AHP) is applied to et al., 2000), selecting products based on association rules aggregate the opinions of the group decision makers on and product-specific profitability (Brijs, Goethals, Swinnen, what are the relevant criteria for evaluating business values Vanhoof, & Wets, 2000), or recommending target items and of rules and relative importance of those criteria. Associ- promotion strategy with the goal of maximizing the net ation rule mining, one of the data mining techniques, is then profit encompasses the third kind of researches (Wang et al., performed to capture a set of competing rules with their 2002). The business values, reflecting each group of studies, various business values and those are, in turn, used as input can be largely categorized by three types including time for the rule prioritization. Thus, the final rule selected from trends (recency), statistical significance (frequency), and the appropriate decision method, e.g. ELECTRE reveals profit (monetary value), respectively. The related research meaningful result obtained from the use of machine learning efforts with the types of business values considered are and human intelligence. shown in Table 1. 2.2. Association rule mining 2. Research background Association rule mining has been widely used from 2.1. Business values of rules traditional business applications such as cross-marketing, attached mailing, catalog design, loss-leader analysis, store In supporting decision making for the real business layout, and customer segmentation (Agrawal et al., 1993; activities, decision makers are embarrassed when they Srikant & Agrawal, 1995) to e-business applications such as encounter too many rules extracted and (or) those rules are the renewal of web pages (Cooley, Mobasher, & Srivastava,
  3. 3. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 869 Table 1 Related studies considering business values of rule Type of business Definition in this study Related studies Description value Recency Time trend of a rule Dong and Li (1999); Discovering time trends and differences between datasets Song et al. (2001) Zhang et al. (2003) Capturing trends of patterns by highlighting recent dataset Frequency Statistical significance of a rule in Hilderman and Pruning or ranking association rules according to their frequency, a time period Hamilton (2001); Tan statistical interdependence, and interestingness measures and Kumar (2000) Brin et al. (1997) Assessing the dependence between products Monetary value Profitability of a rule Kitts et al. (2000) Recommending products to customers considering expected profit Brijs et al. (2000) Selecting products based on association rules and product-specific profitability Wang et al. (2002) Recommending target items and promotion strategy with the goal of maximizing the net profit 1999) and web personalization (Mobasher, Cooley, & In other words, too many rules are generated regardless of ¨ Srivastava, 2000; Mulvenna, Anand, & Buchner, 2000). their interestingness (Tan & Kumar, 2000). To dealing with Given a set of transactions where each of transactions is a this problem, techniques that allow the user to specify set of items (itemset), an association rule implies the form multiple minimum supports to reflect the frequencies of X0Y, where X and Y are itemsets; X and Y are called the each of the items in the databases (Liu, Hsu, & Ma, 1999) or body and the head, respectively. A rule can be evaluated by that exploits support constraints specifying what minimum two measures, called confidence and support. A measure, support is required for what items for necessary itemsets to support for the association rule X0Y is the percentage of be generated (Wang, He, & Han, 2000). Those approaches, transactions that contain both itemset X and Y among all however, do not considering heterogeneous business values transactions. The confidence for the rule X0Y is the of association rules. It is, thus, required to suggest an percentage of transactions that contain an itemset Y among approache to rule quantity and quality problems in which the transactions that contain an itemset X. The support compensations between criteria of business values are not represents the usefulness of the discovered rules and the clear. The second one, not entirely independent with the confidence represents the certainty of the rules. first, is rule conflict problem that already was mentioned in Many algorithms can be used to discover association the domain of rule based expert systems. If a set of rules that rules from data to extract useful patterns. Apriori algorithm matches a context is identified, some approaches can be is one of the most widely used and famous techniques for applied to resolve those conflicts (Bar & Feigenbaum, 1981; finding association rules (Agrawal et al., 1993; Agrawal & Hayes-Roth et al., 1983). One way to address such problem Srikant, 1994). Apriori operates in two phases. In the first is to depend upon the intervention from human intelligence. phase, all itemsets with minimum support (frequent item- Thus, one can obtain a satisfactory result by the use of sets) are generated. This phase utilizes the downward human preference judgments in resolving conflicting rules closure property of support. In other words, if an itemset of that are characterized by multi-faceted conflicting criteria. size k is a frequent itemset, then all the itemsets below In summary, both problems can be handled by prioritizing (kK1) size must also be frequent itemsets. Using this the rules resulted from data mining in accordance with the property, candidate itemsets of size k are generated from the explicit consideration of business values. In what follows, set of frequent itemsets of size (kK1) by imposing the we shall introduce a decision method for resolving the constraint that all subsets of size (kK1) of any candidate conflicting rules that is a novel approach for aggregating itemset must be present in the set of frequent itemsets of size multiple evaluations whereas information requirements (kK1). The second phase of the algorithm generates rules from the managers are minimized in order to alleviate the from the set of all frequent itemsets. burden of information articulations. Association rule mining is a powerful solution for alternative rule extraction, because it aims to discover all 2.3. ELECTRE rules in data and thus is able to provide a complete picture of associations in a large dataset. There are, however, two Roy and Bouyssou (1993) argued some situations where major problems with regard to the association rule the outranking approach can be justified: (1) when at least generation. The first problem stems from the rule quantity one criterion is not quantitative, so that preference intervals and quality problems. If minimum support is set too high, ratios have no sense; (2) when the units of the different the rules involving rare items that could be of interest to criteria are so heterogeneous that coding them into one decision makers will not be found. Setting minimum common scale seems to be very difficult or artificial, and support low, however, may cause combinatorial explosion. (3) when the compensations between gains on some criteria
  4. 4. 870 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 and losses on other criteria are not clear. These properties where make the outranking approach, e.g. ELECTRE as a suitable gi ðAÞ Z value of criterion j of alternative A; approach for prioritizing a set of association rules since some of the multiple criteria reflecting business values can be heterogeneous or sometimes not quantitative, and thus it di Z maxi jgi ðCÞ K gi ðDÞj; is difficult to assure any specific compensations ratio c i; any pair of alternativesðC; DÞ: between losses and gains on those criteria. ELECTRE uses the concept of an outranking relation- The information contained in the concordance matrix ship. The outranking relationship of A/B says that even differs significantly from that contained in the discordance though two alternatives A and B do not dominate each other matrix. Differences among weights are represented by mathematically, the decision maker accepts the risk of means of the concordance matrix, whereas differences regarding A as almost surely better that B. This method among attribute values are represented by means of the consists of a pairwise comparison of alternatives based on discordance matrix. the degree to which evaluations of the alternatives and the The ELECTRE II method allows the choice by giving the preference weights confirm or contradict the pairwise rank of each alternatives. There are multiple levels of dominance relationships between alternatives. It examines concordance (0!pK!p0!pC!1) and discordance (0! both the degree to which the preference weights are in q0!qC!1) that are specified to construct two outranking agreement with pairwise dominance relationships and the relationships (strong and weak outranking relations). The degree to which weighted evaluations differ from each strong outranking relation labeled (AFB) is defined when other. These stages are based on a concordance and one of the two conditions (A F B)1 and (A F B)2 is true: discordance set, hence this method is also called con- cordance analysis. ðA F BÞ1 iff CðA; BÞR pC; DðA; BÞ% qC; The idea in the method is to rank alternatives which are preferred for most of the criteria and yet not cause an and W CðA; BÞR W KðA; BÞ unacceptable level of discontent for any one criterion. The construction of the above subset is accomplished by using ðA F BÞ2 iff CðA; BÞR p0 ; DðA; BÞ% q0 ; the concordance and discordance indexes and the out- ranking relations. The concordance index measures the and W CðA; BÞR W KðA; BÞ intensity of the agreement between the averaged opinions of the group of decision makers. The discordance index The weak outranking relation labeled (A f B) is defined measures the intensity of the non-agreement between the when the conditions are true: averaged opinions of the group of decision makers. The ðA f BÞ iff CðA; BÞR pK; DðA; BÞ% qC; concordance index, C (A, B), is based on the weights of the judgment criteria and is labeled: and W CðA; BÞR W KðA; BÞ 1 X These two relations can lead to the construction of two CðA; BÞ Z wj ; c j : gj ðAÞR gj ðBÞ; outranking graphs (strong graph and weak graph). The W strong graph is always a subgraph of the weak graph but the X n distinction between strong performance and weak perform- where W Z wj ; n : number of criteria jZ1 ance must be made to assure a complete ranking of the alternatives. These graphs are then used in an iterative procedure to obtain the rankings. The ELECTRE II This index varies from 0 to 1 and can be considered as a approach uses two separate rankings called forward ranking measure of the arguments in favor of the assertion A and reverse ranking to arrive at the final rankings of the outranks B. The concordance index reflects the relative alternatives. importance of A with respect to B. A higher value of C(A, B) There are five steps in the forward ranking procedure. indicates that A is preferred to B as far as the concordance Step 1: Identify all nodes having no precedent (i.e. those criteria are concerned. nodes that have no arcs directed towards them) in the strong The discordance index, D(A, B), takes into account the graph and denote this set as a set A. Step 2: Select all nodes total range of scale for a criterion i where there is a in a set A having no precedent in the weak graph and denote discordance and is labeled this set as set B. The nodes in set B are assigned rank one. 8 Step 3: Reduce strong and weak graphs by eliminating all > < 0 if gi ðAÞR gi ðBÞ; c j nodes in set B and all the arcs emanating from these nodes. Step 4: With the reduced graphs perform steps 1–3 and the DðA; BÞ Z maxi gi ðBÞ K gi ðAÞ : if gi ðAÞ! gi ðBÞ; c j reduced set of new nodes are given rank two. Step 5: This di iterative procedure is continued till all nodes in both strong
  5. 5. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 871 and weak graphs are eliminated and all alternatives are measuring the consistency of the decision maker’s ranked. judgments. Users need to know when they have made In reverse ranking, the first step is to reverse the inconsistent judgments, especially if they are working as a direction of the arcs in the strong and weak graphs. If an group. In fact, it is the only approach for solving alternative i is preferred to j in forward ranking, the multicriteria decision problems that has such a capability alternative j is preferred to i in reverse ranking and a high (Carlsson Walden, 1995). Furthermore, multiple decision concord relationship becomes a low concordance and a low makers’ different opinions about the importance of each discord relationship becomes a high discordance. The attribute can be easily collected and aggregated with remaining steps are identical to the steps outlined in geometric means. These strengths of the AHP can be useful forward ranking with one difference: the alternative which for us to determine the relative importance of each attribute is ranked last is ranked one and the remaining alternatives in ELECTRE method reflecting group opinions. are ranked in reverse order. This re-establishes the correct direction of the ranking process. The final ranking (r) is obtained, as suggested by Roy and Bertier (1971), by 3. Methodology taking the average of the forward (r 0 ) and reverse (r 00 ) rankings (i.e. (r 0 Cr 00 )/2). The alternative which gets least In this section, we suggest a methodology for prioritizing average value is ranked first and the alternative having next association rules, taking into account their business value is ranked second and so on till all elements of the values. The methodology consists of four phases as shown alternatives are ranked. in Fig. 1. The method has the ability to consider both objective and The criteria for prioritizing association rules should be subjective criteria and the least amount of information defined to clarify marketing objectives in the first phase. In required from the decision maker with full utilization of phase II, relative business importance of those criteria are information contained in the decision matrix (Roy calculated. Group decision making reflecting various Vincke, 1981). It has, however, also weak point in the stakeholders’ opinions is performed with the AHP tech- arbitrariness of assigning weights to the criteria and no nique. In phase III, target association ruleset is generated mention of decision criteria generation process. In this from transaction dataset. We use the Apriori algorithm for study, we suggest an approach using Saaty’s (1980, 1990) this purpose. Calculation of the criterion data for each of AHP as a complementary method for these weak points. It target rules is performed to prepare a decision table in phase permits us to collect all the relevant elements of the IV. In the final phase, a rule prioritization list for the given problem, whether objective or subjective, into one model target association rules set is produced by performing and then to interactively work out their interdependencies ELECTRE II with the decision table prepared in previous and their perceived consequences. One of the main strengths phases. A more detailed description for each of phases is of the AHP is the inconsistency measure as a means for provided in the following subsections in detail. Decision Corporate Mining association rules group DB AHP for evaluating relative weight Extracting the rule selection criteria value Association among the rule selection criteria for association rule set rule set Criteria value Decision Relative weight table Calculating a ranking of rules Rule2 Rule1 Rule4 Rule3 Ranking Rule Calculating dominance matrix 1 Rule2, Rule4 with ELECTRE 2 Rule3 3 Rule1 Fig. 1. The overall flow of association rule prioritization.
  6. 6. 872 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 For the explanation of each of phases in our 3.1. Phase I: decision hierarchy with criteria for prioritizing methodology, some notations are briefly defined as association rules follow. Since many factors would affect the business values of Dt: datasets at time t association rules, decision groups should discuss which jDtj: number of transactions in dataset Dt factors are most appropriate to explain business values. In Rt: discovered association rulesets at time t this study, recency, frequency, and monetary (RFM) method jRtj: number of rules in ruleset Rt can be helpful for decision makers’ to avoid focusing on less ri: each rule from ruleset Rt, where iZ1, 2,.jRtj promising rules, thus, allowing resources to be devoted to fA,B: number of transactions containing both A and B more valuable rules. Our approach for assessing the business in dataset Dt values of rules parallels that of RFM analysis for assessing P(A, B): percentage of transactions containing both A customer lifetime value in that the attributes related with and B in dataset Dt recency, frequency, and monetary value of each rule might be Supt(ri): support of ri in Dt worthy of special attention. Measuring RFM is an important Conft (ri): confidence of ri in Dt method for assessing customer lifetime value which is Price(Y): price per-unit of item Y typically used to identify profitable customers and to develop strategies to target those customers. Bult and Wansbeek We used real-world data for the evaluation of a proposed (1995) defined the term recency as period since the last methodology. The data used in the experiment were purchase, frequency as number of purchases made within a transaction records for those goods sold by the H certain period, and monetary as the money spent during a department store, the third largest one in Korea. The dataset certain period, respectively. It is conceived that these contains customers’ purchasing histories that are identified concepts of RFM are useful for determining the business and recorded by the use of each customer’s smart card values of each rule and ranking a set of competing association issued by the department store and it also contains product rules. A hierarchical structure of criteria for evaluating the information such as price and brand. We constructed a data association rules is shown in Fig. 2, in which the term RFM is warehouse that aggregates historical data by individual redefined with a little modification (see Table 1). customer. We prepared two datasets since we should calculate the measure, DoC for each of target association 3.1.1. Recency rules. The first dataset contains transaction records of The term recency is defined as time trend of a rule certain customers who had bought more than one electronic between time intervals in this study; a higher value implies a product from May to December, 2000. In addition to the higher worth of attention to a rule. This factor can be purchasing data in the first dataset, the second dataset measured with the attribute of the degree of change in contains information related with purchase of additional support. electronic products from January to April, 2001. With the Degree of change (DoC). Even though most of data second dataset, an Apriori technique was applied to discover mining techniques usually give attention to the rules which the target association for alternatives of rule prioritization in have a large frequency of occurrence and ignore time trend, our study. the rules with a large growth rate or decreasing rate in Business values of a rule The 1st level criteria Recency Frequency Monetary value The 2nd level criteria Expected Incremental Degree of change Support Confidence Interest factor monetary value monetary value (DoC) (Sup) (Conf) (IF) (EMV) (IMV) Fig. 2. An example for structuring rule selection problem.
  7. 7. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 873 occurrence may give significant implications to managers in This metric is defined to be the ratio between the joint changing business environment in spite of their relatively probability of two variables with respect to their expected small occurrence. There are some studies on discovering probabilities under the independence assumption. The trends and differences in patterns (see Dong Li, 1999; interest factor is a non-negative real number with a value Song et al., 2001). As we are interested in the attributes of of 1 corresponding to statistical independence. For we are each of rules from ruleset Rt, we bring the measure of the interested in rules rather than itemset, we bring the measure degree of change for emerging pattern case from the study of interest factor from the study of Tan and Kumar (2000) of Song et al. (2001) with modification for our study. The for association rules. emerging patterns are defined as itemsets whose supports The interest factor for the rule A1 A2 /Aj 0 AjC1 AjC2 / increase significantly from one dataset to another. More Ak can be defined as follows: specifically, emerging patterns are itemsets whose growth IðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ rates are larger than a given threshold value. When applied to time stamped databases, emerging patterns can capture PðA1 ; A2 ; .; Ak Þ emerging trends in business or demographic data (Song Z PðA1 ; A2 ; .; Aj ÞPðAjC1 ; AjC2 ; .; Ak Þ et al., 2001). The modified measure for the degree of change in For rule 1, goodcd_45530019310000goodcd_ support, si, is defined in this study as follows: 4503620935000, that was generated from Dt in our study, 8 we can compute the interest factor measure as follows. Supt ðri Þ K SuptKk ðri Þ si Z ; if SuptKk ðri Þ s0 : SuptKk ðri Þ P(goodcd_4553001931000, goodcd_4503620935000)Z maxðs1 ; s2 ; .; sjRj Þ; otherwise 0.0033 P(goodcd_4553001931000)Z0.0337 P(goodcd_ For rule 1, goodcd_45530019310000 4503620935000)Z0.0469 goodcd_4503620935000, that was generated from Dt in our study (see Table 3), we can compute the degree of 0:0033 Iðr1 Þ Z Z 2:090 change measure as follows 0:0337 !0:0469 SuptK1 ðr1 Þ Z 0:0012 Supt ðr1 Þ Z 0:0032 3.1.3. Monetary value 0:0032 K 0:0012 The term monetary value is defined as profitability of a s1 Z Z 1:734 rule in this study; a higher value indicates that the company 0:0012 should focus more on that rule. This factor can be measured with two attributes, expected monetary value and increased 3.1.2. Frequency monetary value. The term frequency is defined as statistical significance Expected monetary value (EMV). If we assume mutual of a rule in a time interval in this study; higher frequency independence between products, then the expected profit indicates greater statistical significance of a rule. This factor after buying a product X is equal to the probability of buying consists of three attributes, support, confidence, and interest Y given X, multiplied by the profit of Y (Kitts et al., 2000). factor. With modification to suite our study, we provide the Support(Sup). Support criterion is necessary in the model following measure for expected monetary value of a rule: because it represents the statistical significance of a pattern. From the marketing perspective, support of an itemset in EMVðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ retail sales data justifies the feasibility of promoting the items together. However, support alone may not serve as a X k Z Conf t ðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ ! PriceðAi Þ reliable interestingness measure. For example, rules with iZjC1 high support quite often correspond to obvious knowledge about the domain that may not be informative (Tan For rule 1, goodcd_45530019310000goodcd_ Kumar, 2000). 4503620935000, that was generated from Dt in our study, Confidence (Conf). Confidence was initially proposed to we can compute the interest factor measure as follows. measure the strength of an association rule by measuring the conditional probability of events associated with a particular Conf (r1)Z0.098 Price(goodcd_4503620935000)Z rule. For example, if a rule X0Y has confidence c, this 100,000 means that c percent of all transactions that contain X will EMVðr1 Þ Z 0:098 !100; 000 Z 9800 also contain Y. However, it may produce counter-intuitive results especially when strong negative correlations are EMV can be seen as a recommendation profit of a rule present (Brin, Motwani, Silverstein, 1997). that is on a per-recommendation basis and factors in both Interest factor (IF). Interest factor is another widely used the hit rate (i.e. confidence) and the profit of the measure for association patterns (Brin et al., 1997). recommended item(s).
  8. 8. 874 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 Incremental monetary value (IMV). The idea behind decision makers in our example may judge the relative incremental profit of Kitts et al. (2000) is the expected profit importance for each of levels: administrative managers minus the profit you would expect to receive due to the (AM) and business managers (BM) in sales. The answers natural course of a customer’s purchasing. Incremental with their own different opinions are aggregated into profit maximizes the profit of the item, minus the baseline geometric means, which are expressed in the form profit associated with the item (Kitts et al., 2000). of pairwise comparison matrices for each of levels (Fig. 3). Incremental monetary value is defined as follows in this study: 3.2.2. Step 2: assess the consistency of pairwise judgments IMVðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ Decision makers may make inconsistent judgments when making pairwise comparisons. Before the weights are fA ;A ;.;Ak computed, the degree of inconsistency is measured by an Z Conf t ðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ K jC1 jC2 jDt j inconsistency index. Perfect consistency implies a zero inconsistency index. However, perfect consistency is X k ! PriceðAi Þ seldom achieved since humans are often biased and iZjC1 inconsistent when making subjective judgments. Therefore, an inconsistency index of less than 0.1 is acceptable For rule 1, goodcd_45530019310000goodcd_ (Harker, 1989). If the inconsistency index exceeds this 4503620935000, that was generated from Dt in our study, value, then the pairwise judgment may be revised before the we can compute the interest factor measure as follows. weights of attributes are computed. Conf (r1)Z0.098 f(goodcd_4503620935000)Z154 3.2.3. Step 3: computing the relative weights. This jDtjZ3285 Price(goodcd_4503620935000)Z100,000 determines the weight of each decision element. This work employs eigenvalue computations to derive the 154 EMVðri Þ Z 0:098 K !100; 000 Z 5112:024 weight of the attributes (Table 2). The eigenvector of 3285 the maximum eigenvalue of the factors in the first level is 0.330 for recency, 0.274 for frequency, and 0.396 for monetary value, respectively. In this example, we can 3.2. Phase II: relative weighting find that decision makers consider monetary value as the most important factor for prioritizing association rules. The AHP was used to determine the relative importance Continuing to compute the weight of the next level by of the rule attributes. The main steps of the AHP are as computing their eigenvector, we can get 0.602 for Sup, follows. 0.237 for Conf, and 0.161 for IF as the sub-criteria of frequency and 0.414 for EMV and 0.586 for IMV as the 3.2.1. Step 1: perform pairwise comparisons sub-criteria of monetary value, respectively. According to This asks decision makers to make pairwise comparisons the assessments, the relative weights of DoC, Sup, Conf, of the relative importance of each of attributes. Two IF, EMV, IMV are 0.330, 0.165, 0.065, 0.044, 0.164, and Pairwise comparisons matrix of the first level criteria Business values of a rule Recency Frequency Monetary value a Recency 1 1 (1/2; 2) 1 (1/3; 3) Frequency 1 0.577 (1/3; 1) Monetary value 1 Pairwise comparisons matrix of the second level criteria Frequency Sup Conf IF Sup 1 2.449 (3; 2) 3.873 (3; 5) Conf 1 1.414 (2; 1) IF 1 Monetary value EMV EMV EMV 1 0.707 (2; 1/4) IMV 1 a Aggregated opinion (administrative manager’s opinion; business manager’s opinion) Fig. 3. Pairwise comparisons matrix of the first level criteria.
  9. 9. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 875 Table 2 with the price of head item. In this case, rule 1 and rule 4 Relative weights of criteria have the same body of goodcd_4553001931000, and rules Criteria Level 1 Level 2 Relative 2–4 have the same head item of goodcd_4504940939002. It weights is, thus, necessary for decision makers to rank those rules Recency_DoC 0.330 1 0.330 and get the priorities among them in decision making Frequency_Sup 0.274 0.602 0.165 processes. Frequency_Conf 0.274 0.237 0.065 Frequency-IF 0.274 0.161 0.044 Monetary 0.396 0.414 0.164 3.4. Phase IV: preparing decision table value_EMV Then the decision matrix of the rule prioritization Monetary 0.396 0.586 0.232 problem is prepared as shown in Table 4. For the criterion value_IMV of DoC, rule 2, rule 3, and rule 5 get the negative values. DoC, degree of change; Sup, support; Conf, confidence; IF, interestingness This means that the support, in other word, usefulness of factor; EMV, expected monetary value; IMV, increased monetary value. those association rules decreased in 2001. Rule 5 has the largest value for EMV due to the highest price of head item 0.232, respectively. In this example, thus, DoC is found but lowest value for IMV. The reason can be found in the to be the most important criterion, IMV is the next and fact that the value of IF is relatively low compared to the the like. value of Conf for rule 5. In other word, the accuracy of the rule 5 stem from the high frequency of head item, not 3.3. Phase III: target association rule set from relationship with the body item. In the condition of 0.25% minimum support, 8% minimum confidence and maximum itemset size 2, the 3.5. Phase V: calculating ranking system found seven association rules with the second For each relation Rule i and Rule j with i or j integers and dataset. The target association rules are provided in Table 3 2 [1, 7], the values of the concordance index and Table 3 Target association ruleset Alternatives Body Head Conf Supp Price of head item Rule 1 goodcd_4553001931000 0 goodcd_4503620935000 0.098 0.003332 100,000 Rule 2 goodcd_4550410939000 0 goodcd_4504940939002 0.081 0.002430 30,000 Rule 3 goodcd_4550410939001 0 goodcd_4504940939002 0.089 0.002759 30,000 Rule 4 goodcd_4553001931000 0 goodcd_4504940939002 0.080 0.002720 30,000 Rule 5 goodcd_4502160939001 0 goodcd_4550390935070 0.086 0.002752 179,000 Rule 6 goodcd_4550900939001 0 goodcd_4550900939000 0.141 0.004230 90,000 Rule 7 goodcd_4550900939000 0 goodcd_4550900939001 0.086 0.004300 90,000 Table 4 Decision table Alternatives DoC Sup Conf IF EMV IMV Rule 1 1.734 0.003332 0.098 2.090 9800 5112.024 Rule 2 K0.016 0.002430 0.081 20.468 2430 2311.279 Rule 3 K0.032 0.002759 0.089 22.490 2670 2551.279 Rule 4 0.741 0.002720 0.080 20.215 2400 2281.279 Rule 5 K0.258 0.002752 0.086 1.002 15,394 27.790 Rule 6 0.741 0.004230 0.141 2.842 12,690 8224.247 Rule 7 0.741 0.004300 0.086 2.854 7740 5027.671 Table 5 Matrix 7!7 of concordance Alternatives Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7 Rule 1 – 0.956 0.956 0.956 0.836 0.330 0.791 Rule 2 0.044 – 0.330 0.505 0.606 0.044 0.044 Rule 3 0.044 0.670 – 0.670 0.836 0.044 0.109 Rule 4 0.044 0.495 0.330 – 0.606 0.374 0.374 Rule 5 0.164 0.394 0.164 0.394 – 0.164 0.229 Rule 6 0.670 0.956 0.956 0.626 0.836 – 0.461 Rule 7 0.209 0.956 0.891 0.956 0.836 0.539 –
  10. 10. 876 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 Table 6 Matrix 7!7 of discordance Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7 Rule 1 – 0.855 0.949 0.843 0.431 0.705 0.518 Rule 2 0.879 – 0.176 0.380 0.998 0.984 1.000 Rule 3 0.886 0.008 – 0.388 0.388 0.852 0.824 Rule 4 0.569 0.016 0.148 – 1.000 1.000 0.845 Rule 5 1.000 0.906 1.000 0.894 – 1.000 0.828 Rule 6 0.499 0.820 0.914 0.809 0.208 – 0.037 Rule 7 0.499 0.820 0.914 0.808 0.589 0.902 – Fig. 4. Strong and weak preference graphs and forward ranking steps. discordance indexes are given in the Tables 5 and 6, 4. Conclusion respectively. For concordance and discordance thresholds of ELEC- The business values that the firms could obtain is the TRE II, we set pCZ0.609 (average of concordance indexes most important consideration to firms in the process of plus 0.1), p0Z0.509 (average of concordance indexes), and extracting rules from large database on which decision pKZ0.409 (average of concordance indexes minus 0.1); makers relied. Each of rules has various attributes related q0Z0.693 (average of discordance indexes), qCZ0.893 with the business values and those attributes have different (average of discordance indexes plus 0.2). Fig. 4 shows the results of ELECTRE II, in the form of strong and weak Table 7 relation graphs with illustration of forward ranking steps. ELECTRE 2 method, direct, reverse and medium classification with the The final ranking (r) of the target association rules are outranking relations obtained by taking the average of the forward (r 0 ) and Alternatives Forward Reverse Average, Final reverse (r 00 ) rankings (see Table 7). ranking, r 0 ranking, r 00 (r 0 Cr 00 )/2 ranking, r The sensitivity of selection of alternatives with changes Rule 1 2 2 2 2 in threshold values (pK, p0, pC, q0, and qC) was also studied Rule 2 4 4 4 4 and the results of ELECTRE II for these changes are shown Rule 3 1 3 2 2 in Table 8. It is found from Table 8 that the ranks have Rule 4 5 5 5 6 slightly changed for different threshold values. However, in Rule 5 4 5 4.5 5 Rule 6 1 1 1 1 all the cases, rule 6, rule 3, and rule 5 get the first, the Rule 7 3 3 3 3 second, and the last rank, respectively.
  11. 11. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 877 Table 8 Sensitivity analysis Case1a Case1b Case1c Case2a Case2b Case2c pK; p0; pC 0.5; 0.6; 0.7 0.6; 0.7; 0.8 q0; qC 0.5; 0.7 0.6; 0.8 0.7; 0.9 0.5; 0.7 0.6; 0.8 0.7; 0.9 Rule 1 3 3 2 2 2 2 Rule 2 4 4 4 3 4 4 Rule 3 2 2 2 2 2 2 Rule 4 6 6 6 3 3 4 Rule 5 7 7 5 4 5 4 Rule 6 1 1 1 1 1 1 Rule 7 5 5 3 3 4 3 importance according to the business strategy of the firms. Dong, G., Li, J. (1999). Efficient mining of emerging patterns: Our work presents a novel rule prioritization methodology Discovering trends and differences. Proceedings of the international conference on knowledge discovery and data mining (KDD’99) (pp. that incorporate decision makers’ preferences in evaluating 43–52). the promising association rules obtained from data mining. Harker, P. T. (1989). The art and science of decision making: The analytic The proposed methodology is an attempt to make synergy hierarchy process. The analytic hierarchy process application and with decision analysis techniques for solving problems in studies. Berlin: Springer. the domain of data mining. We believe that this approach Hayes-Roth, F., Waterman, D. A., Lenat, D. B. (1983). Building expert would be particularly useful for the business managers who systems. Reading, MA: Addison-Wesley. Hilderman, R. J., Hamilton, H. J. (2001). Evaluation of interestingness are suffering from rule quality or quantity problems, measures for ranking discovered knowledge. Proceedings of the conflicts between extracted rules, and difficulties in building Pacific-Asia conference on knowledge discovery and data mining a group-consensus for the rule selection. (PAKDD’01) (pp. 247–259). As a further research area, we plan to extend our Kitts, B., Freed, D., Vrieze, M. (2000). Cross-sell: A fast promotion- methodology to the adaptation process. With a feedback- tunable customer-item recommendation method based on conditionally loop that allows decision makers’ agreement and disagree- independent probabilities. Proceedings of the ACM-SIGMOD conference on knowledge discovery and data mining (KDD’00) ment for some parts of ranks, it is expected to obtain more (pp. 437–446). refined rule prioritization. Liu, B., Hsu, W., Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the ACM-SIGMOD international conference on knowledge discovery and data mining (KDD’99) (pp. 15–18). References Mobasher, B., Cooley, R., Srivastava, J. (2000). Automatic personaliza- tion based on web usage mining. Communications of the ACM, 43(8), 142–151. Agrawal, R., Imielinski, T., Swami, A. (1993). Mining association ¨ Mulvenna, M. D., Anand, S. S., Buchner, A. G. (2000). Personalization between sets of items in massive database. International proceedings of on the net using Web mining. Communications of the ACM, 43(8), the ACM-SIGMOD international conference on management of data 123–125. (pp. 207–216). Roy, B., Bertier, B. (1971). La methode ELECTRE II: Une Methode de Agrawal, R., Srikant, R. (1994). Fast algorithms for mining association Classement en Presence de Criteres Multiples. Note de Travail No 142, rules. Proceedings of the international conference on very large data Direction Scientifique, Groupe Metra. bases (pp. 407–419). ` ` ´ ´ Roy, B., Bouyssou, D. (1993). Aide Multicritere a la Decision: Methodes Barr, A., Feigenbaum, E. A. (Eds.). (1981). The handbook of artificial et cas. Economica. intelligence. Los Altos, CA: Morgan Kaufmann. Brijs, T., Goethals, B., Swinnen, G., Vanhoof, K., Wets, G. (2000). A Roy, B., Vincke, P. (1981). Multicriteria analysis: Survey and new data mining framework for optimal product selection in retail directions. European Journal of Operational Research, 8, 207–218. supermarket data: The generalized PROFSET model. Proceedings of Saaty, T. L. (1980). The analytic hierarchy process: Planning, priority the ACM-SIGKDD international conference on knowledge discovery setting, resource allocation. New York: McGraw-Hill. and data mining (pp. 20–23). Saaty, T. L. (1990). How to make a decision: The analytic hierarchy Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: process. European Journal of Operational Research, 48, 9–26. Generalizing association rules to correlations. International proceed- Song, H. S., Kim, J. K., Kim, S. H. (2001). Mining the change of ings of the ACM-SIGMOD international conference on management of customer behavior in an internet shopping mall. Expert Systems with data (pp. 265–276). Applications, 21, 158–168. Bult, J. R., Wansbeek, T. J. (1995). Optimal selection for direct mail. Srikant, R., Agrawal, R. (1995). Mining generalized association rules. Marketing Science, 14(4), 378–394. Proceedings of the international conference on very large data bases Carlsson, C., Walden, P. (1995). AHP in political group decisions: A (VLDB’95). study in the art of possibilities. Interfaces, 25, 14–29. Tan, P. N., Kumar, V. (2000). Interestingness measures for Cooley, R., Mobasher, B., Srivastava, J. (1999). Data preparation for association patterns: A perspective. KDD 2000 Workshop on mining world wide web browsing patterns. Journal of Knowledge and Postprocessing in Machine Learning and Data Mining, Boston, Information Systems, 1(1), 5–32. MA, August.
  12. 12. 878 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 Wang, K., He, Y., Han, J. (2000). Mining frequent itemsets using support White, C. C. (1990). A survey on the integration of decision analysis and constraints. Proceedings of the international conference on very large expert systems for decision support. IEEE Transactions on System, data bases (VLDB’00). Man, and Cybernetics, 20(2), 358–364. Wang, K., Zhou, S., Han, J. (2002). Profit mining: From patterns to Zhang, S., Zhang, C., Yan, X. (2003). Post-mining: Maintenance actions. Proceedings of international conference on extending data of association rules by weighting. Information Systems, 28, 691– base technology (EDBT’02) (pp. 70–87). 707.