1. Expert Systems with Applications 29 (2005) 867–878
www.elsevier.com/locate/eswa
Prioritization of association rules in data mining: Multiple
criteria decision approach
Duke Hyun Choia,*, Byeong Seok Ahnb, Soung Hie Kima
a
Graduate School of Management, Korea Advanced Institute of Science and Technology (KAIST), 207-43 Cheongryangri-Dong,
Dongdaemun, Seoul 130-012, South Korea
b
Department of Business Administration, Hansung University, 389 Samsun-Dong 2, Sungbuk, Seoul 136-792, South Korea
Abstract
Data mining techniques, extracting patterns from large databases are the processes that focus on the automatic exploration and analysis of
large quantities of raw data in order to discover meaningful patterns and rules. In the process of applying the methods, most of the managers
who are engaging the business encounter a multitude of rules resulted from the data mining technique. In view of multi-faceted
characteristics of such rules, in general, the rules are featured by multiple conﬂicting criteria that are directly related with the business values,
such as, e.g. expected monetary value or incremental monetary value.
In the paper, we present a method for rule prioritization, taking into account the business values which are comprised of objective metric or
managers’ subjective judgments. The proposed methodology is an attempt to make synergy with decision analysis techniques for solving
problems in the domain of data mining. We believe that this approach would be particularly useful for the business managers who are
suffering from rule quality or quantity problems, conﬂicts between extracted rules, and difﬁculties of building a consensus in case several
managers are involved for the rule selection.
q 2005 Elsevier Ltd. All rights reserved.
Keywords: Rule prioritization; Rule conﬂict; Association rule mining; ELECTRE
1. Introduction desirability of alternative outcomes, are associated with
criteria in a problem solving or decision making situation
Under intense competition forcing companies to develop (White, 1990). Whilst separately developed, both data
and maintain competitive marketing activity, techniques mining and decision analysis have conceptual basic
from the disciplines of both data mining and decision denominators, providing systematic methods for problem
analysis have been extensively used in the development of solving and decision making in view of objectives (decision
computerized decision aids. Data mining techniques, aiding) and delivery vehicle (the computer).
extracting patterns from large databases, are the processes From the perspective of many types of practical decision
that focus on the automatic exploration and analysis of large aiding applications, however, both data mining and decision
quantities of raw data in order to discover meaningful analysis techniques have some limitations. Particularly, in
patterns and rules (Agrawal, Imielinski, & Swami, 1993; decision support system development, there is little effort
Agrawal & Srikant, 1994; Song, Kim, & Kim, 2001). On the for generating synergies with complementing each other’s
other hand, decision analysis is concerned with applying limitations. More speciﬁcally, user preferences, which play
decision theory to real-world problems to help companies a key role in decision aids with decision analysis, are not
through the decision making process in which decision explicitly considered in the current generation of data
makers’ preferences that are important for judging the mining systems. Even if they are (indirectly) addressed, they
are constrained to the preferences of the data mining
* Corresponding author. Tel.: C82 2 958 3684; fax: C82 2 958 3604. engineers by the use of threshold values rather than the
E-mail addresses: dhchoi@kgsm.kaist.ac.kr (D.H. Choi), bsahn@ decision makers’ preferences that should be continuously
hansung.ac.kr (B.S. Ahn), seekim@kgsm.kaist.ac.kr (S.H. Kim). adjusted to the current dynamic business environment.
0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. Normative decision analysis, on the other hand, is
doi:10.1016/j.eswa.2005.06.006 usually built around a prescriptive and rigid problem
2. 868 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878
structure called a decision analysis model. Decision analysis of little interest for the decision making. The quality and
is not compatible with extracting knowledge from large quantity problems of these rules in traditional data mining
corporate databases of nowadays, since it does not focus on applications stem from the fact that rules are extracted and
the automotive generation of meaningful knowledge from selected only with statistical criteria, not concerning with
raw data. It can be, thus, meaningful to complement the the business values that are closely related with business
techniques from data mining with those from decision strategies. Since one of the most important considerations
analysis: data mining for alternatives (i.e. rules) generation when ﬁrms extract rules from large database is the business
from large database and decision analysis for prioritizing values that the ﬁrms could expect to obtain by decision
those alternatives by reﬂecting decision makers’ making relied on those rules, some ways for reﬂecting
preferences. business requirements in the process of rule selection, thus,
In existing data mining techniques, there exist some are necessary for effective decision aids.
situations that make necessary the prioritization of rules for The importance of each of business values, on the other
selecting and concentrating on more valuable rules due to hand, varies according to the environments with which ﬁrms
the number of qualiﬁed rules (Tan & Kumar, 2000) and face and their business strategies. For example, after an
limited business resources. Even though the purpose of data Internet shopping mall engages in business, the frequency of
mining is rule (pattern) extraction that is valuable for rules might be considered relatively more important than
decision making, patterns are deemed ‘interesting’ just on others since more frequent transactions with the customers
the basis of passing certain statistical tests such as support/ may imply wide acknowledgement in the market. Later, the
conﬁdence in data mining. To the enterprise, however, it monetary value of rules might become increasingly
remains unclear how such patterns can be used to maximize important with the purpose of maximizing proﬁt from the
business objectives. The major obstacle lies in the gap loyal customers. For the ﬁrms dealing with the fashionable
between statistic-based summaries (the statistic-based items such as clothes, music CD, or computer games, it
pattern extraction) extracted by traditional rule mining and might be considered to be important to capture trends in
a proﬁt-driven action (the value-based decision making) customers purchasing patterns since such items fall into the
required by business decision making (Wang, Zhou, & Han, category having relatively short life cycle.
2002) which is characterized by explicit consideration of Recognizing those various requirements, researchers in
conﬂicts of business objectives (Bar & Feigenbaum, 1981; data mining domain began to give attention to the business
Hayes-Roth, Waterman, & Lenat, 1983) and by multiple values in pattern extraction. Some researchers focused on
decision makers’ involvement for corporate decision discovering time trends and differences between datasets
making. (Dong & Li, 1999; Song et al., 2001) or capturing trends of
In summary, the research objective is to prioritize patterns by highlighting recent dataset (Zhang et al., 2003).
association rules resulted from the data mining, taking Another group of studies deal with pruning or ranking
into account their business values by explicitly incorporat- association rules according to their frequency, statistical
ing the conﬂicting criteria of business values and by the interdependence, and interestingness measures (Hilderman
managers’ preference statements toward their trade-off & Hamilton, 2001; Tan & Kumar, 2000). Recommending
conditions. For this purpose, a decision analysis method, products to customers by considering expected proﬁt (Kitts
e.g. the Analytic Hierarchy Process (AHP) is applied to et al., 2000), selecting products based on association rules
aggregate the opinions of the group decision makers on and product-speciﬁc proﬁtability (Brijs, Goethals, Swinnen,
what are the relevant criteria for evaluating business values Vanhoof, & Wets, 2000), or recommending target items and
of rules and relative importance of those criteria. Associ- promotion strategy with the goal of maximizing the net
ation rule mining, one of the data mining techniques, is then proﬁt encompasses the third kind of researches (Wang et al.,
performed to capture a set of competing rules with their 2002). The business values, reﬂecting each group of studies,
various business values and those are, in turn, used as input can be largely categorized by three types including time
for the rule prioritization. Thus, the ﬁnal rule selected from trends (recency), statistical signiﬁcance (frequency), and
the appropriate decision method, e.g. ELECTRE reveals proﬁt (monetary value), respectively. The related research
meaningful result obtained from the use of machine learning efforts with the types of business values considered are
and human intelligence. shown in Table 1.
2.2. Association rule mining
2. Research background
Association rule mining has been widely used from
2.1. Business values of rules traditional business applications such as cross-marketing,
attached mailing, catalog design, loss-leader analysis, store
In supporting decision making for the real business layout, and customer segmentation (Agrawal et al., 1993;
activities, decision makers are embarrassed when they Srikant & Agrawal, 1995) to e-business applications such as
encounter too many rules extracted and (or) those rules are the renewal of web pages (Cooley, Mobasher, & Srivastava,
3. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 869
Table 1
Related studies considering business values of rule
Type of business Deﬁnition in this study Related studies Description
value
Recency Time trend of a rule Dong and Li (1999); Discovering time trends and differences between datasets
Song et al. (2001)
Zhang et al. (2003) Capturing trends of patterns by highlighting recent dataset
Frequency Statistical signiﬁcance of a rule in Hilderman and Pruning or ranking association rules according to their frequency,
a time period Hamilton (2001); Tan statistical interdependence, and interestingness measures
and Kumar (2000)
Brin et al. (1997) Assessing the dependence between products
Monetary value Proﬁtability of a rule Kitts et al. (2000) Recommending products to customers considering expected proﬁt
Brijs et al. (2000) Selecting products based on association rules and product-speciﬁc
proﬁtability
Wang et al. (2002) Recommending target items and promotion strategy with the goal
of maximizing the net proﬁt
1999) and web personalization (Mobasher, Cooley, & In other words, too many rules are generated regardless of
¨
Srivastava, 2000; Mulvenna, Anand, & Buchner, 2000). their interestingness (Tan & Kumar, 2000). To dealing with
Given a set of transactions where each of transactions is a this problem, techniques that allow the user to specify
set of items (itemset), an association rule implies the form multiple minimum supports to reﬂect the frequencies of
X0Y, where X and Y are itemsets; X and Y are called the each of the items in the databases (Liu, Hsu, & Ma, 1999) or
body and the head, respectively. A rule can be evaluated by that exploits support constraints specifying what minimum
two measures, called conﬁdence and support. A measure, support is required for what items for necessary itemsets to
support for the association rule X0Y is the percentage of be generated (Wang, He, & Han, 2000). Those approaches,
transactions that contain both itemset X and Y among all however, do not considering heterogeneous business values
transactions. The conﬁdence for the rule X0Y is the of association rules. It is, thus, required to suggest an
percentage of transactions that contain an itemset Y among approache to rule quantity and quality problems in which
the transactions that contain an itemset X. The support compensations between criteria of business values are not
represents the usefulness of the discovered rules and the clear. The second one, not entirely independent with the
conﬁdence represents the certainty of the rules. ﬁrst, is rule conﬂict problem that already was mentioned in
Many algorithms can be used to discover association the domain of rule based expert systems. If a set of rules that
rules from data to extract useful patterns. Apriori algorithm matches a context is identiﬁed, some approaches can be
is one of the most widely used and famous techniques for applied to resolve those conﬂicts (Bar & Feigenbaum, 1981;
ﬁnding association rules (Agrawal et al., 1993; Agrawal & Hayes-Roth et al., 1983). One way to address such problem
Srikant, 1994). Apriori operates in two phases. In the ﬁrst is to depend upon the intervention from human intelligence.
phase, all itemsets with minimum support (frequent item- Thus, one can obtain a satisfactory result by the use of
sets) are generated. This phase utilizes the downward human preference judgments in resolving conﬂicting rules
closure property of support. In other words, if an itemset of that are characterized by multi-faceted conﬂicting criteria.
size k is a frequent itemset, then all the itemsets below In summary, both problems can be handled by prioritizing
(kK1) size must also be frequent itemsets. Using this the rules resulted from data mining in accordance with the
property, candidate itemsets of size k are generated from the explicit consideration of business values. In what follows,
set of frequent itemsets of size (kK1) by imposing the we shall introduce a decision method for resolving the
constraint that all subsets of size (kK1) of any candidate conﬂicting rules that is a novel approach for aggregating
itemset must be present in the set of frequent itemsets of size multiple evaluations whereas information requirements
(kK1). The second phase of the algorithm generates rules from the managers are minimized in order to alleviate the
from the set of all frequent itemsets. burden of information articulations.
Association rule mining is a powerful solution for
alternative rule extraction, because it aims to discover all 2.3. ELECTRE
rules in data and thus is able to provide a complete picture of
associations in a large dataset. There are, however, two Roy and Bouyssou (1993) argued some situations where
major problems with regard to the association rule the outranking approach can be justiﬁed: (1) when at least
generation. The ﬁrst problem stems from the rule quantity one criterion is not quantitative, so that preference intervals
and quality problems. If minimum support is set too high, ratios have no sense; (2) when the units of the different
the rules involving rare items that could be of interest to criteria are so heterogeneous that coding them into one
decision makers will not be found. Setting minimum common scale seems to be very difﬁcult or artiﬁcial, and
support low, however, may cause combinatorial explosion. (3) when the compensations between gains on some criteria
4. 870 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878
and losses on other criteria are not clear. These properties where
make the outranking approach, e.g. ELECTRE as a suitable
gi ðAÞ Z value of criterion j of alternative A;
approach for prioritizing a set of association rules since
some of the multiple criteria reﬂecting business values can
be heterogeneous or sometimes not quantitative, and thus it di Z maxi jgi ðCÞ K gi ðDÞj;
is difﬁcult to assure any speciﬁc compensations ratio c i; any pair of alternativesðC; DÞ:
between losses and gains on those criteria.
ELECTRE uses the concept of an outranking relation- The information contained in the concordance matrix
ship. The outranking relationship of A/B says that even differs signiﬁcantly from that contained in the discordance
though two alternatives A and B do not dominate each other matrix. Differences among weights are represented by
mathematically, the decision maker accepts the risk of means of the concordance matrix, whereas differences
regarding A as almost surely better that B. This method among attribute values are represented by means of the
consists of a pairwise comparison of alternatives based on discordance matrix.
the degree to which evaluations of the alternatives and the The ELECTRE II method allows the choice by giving the
preference weights conﬁrm or contradict the pairwise rank of each alternatives. There are multiple levels of
dominance relationships between alternatives. It examines concordance (0!pK!p0!pC!1) and discordance (0!
both the degree to which the preference weights are in q0!qC!1) that are speciﬁed to construct two outranking
agreement with pairwise dominance relationships and the relationships (strong and weak outranking relations). The
degree to which weighted evaluations differ from each strong outranking relation labeled (AFB) is deﬁned when
other. These stages are based on a concordance and one of the two conditions (A F B)1 and (A F B)2 is true:
discordance set, hence this method is also called con-
cordance analysis. ðA F BÞ1 iff CðA; BÞR pC; DðA; BÞ% qC;
The idea in the method is to rank alternatives which are
preferred for most of the criteria and yet not cause an and W CðA; BÞR W KðA; BÞ
unacceptable level of discontent for any one criterion. The
construction of the above subset is accomplished by using ðA F BÞ2 iff CðA; BÞR p0 ; DðA; BÞ% q0 ;
the concordance and discordance indexes and the out-
ranking relations. The concordance index measures the and W CðA; BÞR W KðA; BÞ
intensity of the agreement between the averaged opinions of
the group of decision makers. The discordance index The weak outranking relation labeled (A f B) is deﬁned
measures the intensity of the non-agreement between the when the conditions are true:
averaged opinions of the group of decision makers. The ðA f BÞ iff CðA; BÞR pK; DðA; BÞ% qC;
concordance index, C (A, B), is based on the weights of
the judgment criteria and is labeled: and W CðA; BÞR W KðA; BÞ
1 X These two relations can lead to the construction of two
CðA; BÞ Z wj ; c j : gj ðAÞR gj ðBÞ; outranking graphs (strong graph and weak graph). The
W
strong graph is always a subgraph of the weak graph but the
X
n
distinction between strong performance and weak perform-
where W Z wj ; n : number of criteria
jZ1
ance must be made to assure a complete ranking of the
alternatives. These graphs are then used in an iterative
procedure to obtain the rankings. The ELECTRE II
This index varies from 0 to 1 and can be considered as a
approach uses two separate rankings called forward ranking
measure of the arguments in favor of the assertion A
and reverse ranking to arrive at the ﬁnal rankings of the
outranks B. The concordance index reﬂects the relative
alternatives.
importance of A with respect to B. A higher value of C(A, B)
There are ﬁve steps in the forward ranking procedure.
indicates that A is preferred to B as far as the concordance
Step 1: Identify all nodes having no precedent (i.e. those
criteria are concerned.
nodes that have no arcs directed towards them) in the strong
The discordance index, D(A, B), takes into account the
graph and denote this set as a set A. Step 2: Select all nodes
total range of scale for a criterion i where there is a
in a set A having no precedent in the weak graph and denote
discordance and is labeled
this set as set B. The nodes in set B are assigned rank one.
8 Step 3: Reduce strong and weak graphs by eliminating all
>
< 0 if gi ðAÞR gi ðBÞ; c j nodes in set B and all the arcs emanating from these nodes.
Step 4: With the reduced graphs perform steps 1–3 and the
DðA; BÞ Z
maxi gi ðBÞ K gi ðAÞ
: if gi ðAÞ! gi ðBÞ; c j reduced set of new nodes are given rank two. Step 5: This
di
iterative procedure is continued till all nodes in both strong
5. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 871
and weak graphs are eliminated and all alternatives are measuring the consistency of the decision maker’s
ranked. judgments. Users need to know when they have made
In reverse ranking, the ﬁrst step is to reverse the inconsistent judgments, especially if they are working as a
direction of the arcs in the strong and weak graphs. If an group. In fact, it is the only approach for solving
alternative i is preferred to j in forward ranking, the multicriteria decision problems that has such a capability
alternative j is preferred to i in reverse ranking and a high (Carlsson Walden, 1995). Furthermore, multiple decision
concord relationship becomes a low concordance and a low makers’ different opinions about the importance of each
discord relationship becomes a high discordance. The attribute can be easily collected and aggregated with
remaining steps are identical to the steps outlined in geometric means. These strengths of the AHP can be useful
forward ranking with one difference: the alternative which for us to determine the relative importance of each attribute
is ranked last is ranked one and the remaining alternatives in ELECTRE method reﬂecting group opinions.
are ranked in reverse order. This re-establishes the correct
direction of the ranking process. The ﬁnal ranking (r) is
obtained, as suggested by Roy and Bertier (1971), by 3. Methodology
taking the average of the forward (r 0 ) and reverse (r 00 )
rankings (i.e. (r 0 Cr 00 )/2). The alternative which gets least In this section, we suggest a methodology for prioritizing
average value is ranked ﬁrst and the alternative having next association rules, taking into account their business
value is ranked second and so on till all elements of the values. The methodology consists of four phases as shown
alternatives are ranked. in Fig. 1.
The method has the ability to consider both objective and The criteria for prioritizing association rules should be
subjective criteria and the least amount of information deﬁned to clarify marketing objectives in the ﬁrst phase. In
required from the decision maker with full utilization of phase II, relative business importance of those criteria are
information contained in the decision matrix (Roy calculated. Group decision making reﬂecting various
Vincke, 1981). It has, however, also weak point in the stakeholders’ opinions is performed with the AHP tech-
arbitrariness of assigning weights to the criteria and no nique. In phase III, target association ruleset is generated
mention of decision criteria generation process. In this from transaction dataset. We use the Apriori algorithm for
study, we suggest an approach using Saaty’s (1980, 1990) this purpose. Calculation of the criterion data for each of
AHP as a complementary method for these weak points. It target rules is performed to prepare a decision table in phase
permits us to collect all the relevant elements of the IV. In the ﬁnal phase, a rule prioritization list for the given
problem, whether objective or subjective, into one model target association rules set is produced by performing
and then to interactively work out their interdependencies ELECTRE II with the decision table prepared in previous
and their perceived consequences. One of the main strengths phases. A more detailed description for each of phases is
of the AHP is the inconsistency measure as a means for provided in the following subsections in detail.
Decision Corporate Mining association rules
group DB
AHP for evaluating relative weight Extracting the rule selection criteria value Association
among the rule selection criteria for association rule set rule set
Criteria value
Decision
Relative weight table Calculating a ranking of rules
Rule2 Rule1
Rule4 Rule3
Ranking Rule
Calculating dominance matrix 1 Rule2, Rule4
with ELECTRE 2 Rule3
3 Rule1
Fig. 1. The overall ﬂow of association rule prioritization.
6. 872 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878
For the explanation of each of phases in our 3.1. Phase I: decision hierarchy with criteria for prioritizing
methodology, some notations are brieﬂy deﬁned as association rules
follow.
Since many factors would affect the business values of
Dt: datasets at time t association rules, decision groups should discuss which
jDtj: number of transactions in dataset Dt factors are most appropriate to explain business values. In
Rt: discovered association rulesets at time t this study, recency, frequency, and monetary (RFM) method
jRtj: number of rules in ruleset Rt can be helpful for decision makers’ to avoid focusing on less
ri: each rule from ruleset Rt, where iZ1, 2,.jRtj promising rules, thus, allowing resources to be devoted to
fA,B: number of transactions containing both A and B more valuable rules. Our approach for assessing the business
in dataset Dt values of rules parallels that of RFM analysis for assessing
P(A, B): percentage of transactions containing both A customer lifetime value in that the attributes related with
and B in dataset Dt recency, frequency, and monetary value of each rule might be
Supt(ri): support of ri in Dt worthy of special attention. Measuring RFM is an important
Conft (ri): conﬁdence of ri in Dt method for assessing customer lifetime value which is
Price(Y): price per-unit of item Y typically used to identify proﬁtable customers and to develop
strategies to target those customers. Bult and Wansbeek
We used real-world data for the evaluation of a proposed (1995) deﬁned the term recency as period since the last
methodology. The data used in the experiment were purchase, frequency as number of purchases made within a
transaction records for those goods sold by the H certain period, and monetary as the money spent during a
department store, the third largest one in Korea. The dataset certain period, respectively. It is conceived that these
contains customers’ purchasing histories that are identiﬁed concepts of RFM are useful for determining the business
and recorded by the use of each customer’s smart card values of each rule and ranking a set of competing association
issued by the department store and it also contains product rules. A hierarchical structure of criteria for evaluating the
information such as price and brand. We constructed a data association rules is shown in Fig. 2, in which the term RFM is
warehouse that aggregates historical data by individual redeﬁned with a little modiﬁcation (see Table 1).
customer. We prepared two datasets since we should
calculate the measure, DoC for each of target association 3.1.1. Recency
rules. The ﬁrst dataset contains transaction records of The term recency is deﬁned as time trend of a rule
certain customers who had bought more than one electronic between time intervals in this study; a higher value implies a
product from May to December, 2000. In addition to the higher worth of attention to a rule. This factor can be
purchasing data in the ﬁrst dataset, the second dataset measured with the attribute of the degree of change in
contains information related with purchase of additional support.
electronic products from January to April, 2001. With the Degree of change (DoC). Even though most of data
second dataset, an Apriori technique was applied to discover mining techniques usually give attention to the rules which
the target association for alternatives of rule prioritization in have a large frequency of occurrence and ignore time trend,
our study. the rules with a large growth rate or decreasing rate in
Business values of a rule
The 1st level criteria
Recency Frequency Monetary value
The 2nd level criteria
Expected Incremental
Degree of change Support Confidence Interest factor
monetary value monetary value
(DoC) (Sup) (Conf) (IF)
(EMV) (IMV)
Fig. 2. An example for structuring rule selection problem.
7. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 873
occurrence may give signiﬁcant implications to managers in This metric is deﬁned to be the ratio between the joint
changing business environment in spite of their relatively probability of two variables with respect to their expected
small occurrence. There are some studies on discovering probabilities under the independence assumption. The
trends and differences in patterns (see Dong Li, 1999; interest factor is a non-negative real number with a value
Song et al., 2001). As we are interested in the attributes of of 1 corresponding to statistical independence. For we are
each of rules from ruleset Rt, we bring the measure of the interested in rules rather than itemset, we bring the measure
degree of change for emerging pattern case from the study of interest factor from the study of Tan and Kumar (2000)
of Song et al. (2001) with modiﬁcation for our study. The for association rules.
emerging patterns are deﬁned as itemsets whose supports The interest factor for the rule A1 A2 /Aj 0 AjC1 AjC2 /
increase signiﬁcantly from one dataset to another. More Ak can be deﬁned as follows:
speciﬁcally, emerging patterns are itemsets whose growth
IðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ
rates are larger than a given threshold value. When applied
to time stamped databases, emerging patterns can capture PðA1 ; A2 ; .; Ak Þ
emerging trends in business or demographic data (Song Z
PðA1 ; A2 ; .; Aj ÞPðAjC1 ; AjC2 ; .; Ak Þ
et al., 2001).
The modiﬁed measure for the degree of change in For rule 1, goodcd_45530019310000goodcd_
support, si, is deﬁned in this study as follows: 4503620935000, that was generated from Dt in our study,
8 we can compute the interest factor measure as follows.
Supt ðri Þ K SuptKk ðri Þ
si Z ; if SuptKk ðri Þ s0
: SuptKk ðri Þ P(goodcd_4553001931000, goodcd_4503620935000)Z
maxðs1 ; s2 ; .; sjRj Þ; otherwise 0.0033
P(goodcd_4553001931000)Z0.0337 P(goodcd_
For rule 1, goodcd_45530019310000
4503620935000)Z0.0469
goodcd_4503620935000, that was generated from Dt in
our study (see Table 3), we can compute the degree of 0:0033
Iðr1 Þ Z Z 2:090
change measure as follows 0:0337 !0:0469
SuptK1 ðr1 Þ Z 0:0012 Supt ðr1 Þ Z 0:0032
3.1.3. Monetary value
0:0032 K 0:0012 The term monetary value is deﬁned as proﬁtability of a
s1 Z Z 1:734 rule in this study; a higher value indicates that the company
0:0012
should focus more on that rule. This factor can be measured
with two attributes, expected monetary value and increased
3.1.2. Frequency monetary value.
The term frequency is deﬁned as statistical signiﬁcance Expected monetary value (EMV). If we assume mutual
of a rule in a time interval in this study; higher frequency independence between products, then the expected proﬁt
indicates greater statistical signiﬁcance of a rule. This factor after buying a product X is equal to the probability of buying
consists of three attributes, support, conﬁdence, and interest Y given X, multiplied by the proﬁt of Y (Kitts et al., 2000).
factor. With modiﬁcation to suite our study, we provide the
Support(Sup). Support criterion is necessary in the model following measure for expected monetary value of a rule:
because it represents the statistical signiﬁcance of a pattern.
From the marketing perspective, support of an itemset in EMVðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ
retail sales data justiﬁes the feasibility of promoting the
items together. However, support alone may not serve as a X
k
Z Conf t ðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ ! PriceðAi Þ
reliable interestingness measure. For example, rules with iZjC1
high support quite often correspond to obvious knowledge
about the domain that may not be informative (Tan For rule 1, goodcd_45530019310000goodcd_
Kumar, 2000). 4503620935000, that was generated from Dt in our study,
Conﬁdence (Conf). Conﬁdence was initially proposed to we can compute the interest factor measure as follows.
measure the strength of an association rule by measuring the
conditional probability of events associated with a particular Conf (r1)Z0.098 Price(goodcd_4503620935000)Z
rule. For example, if a rule X0Y has conﬁdence c, this 100,000
means that c percent of all transactions that contain X will
EMVðr1 Þ Z 0:098 !100; 000 Z 9800
also contain Y. However, it may produce counter-intuitive
results especially when strong negative correlations are EMV can be seen as a recommendation proﬁt of a rule
present (Brin, Motwani, Silverstein, 1997). that is on a per-recommendation basis and factors in both
Interest factor (IF). Interest factor is another widely used the hit rate (i.e. conﬁdence) and the proﬁt of the
measure for association patterns (Brin et al., 1997). recommended item(s).
8. 874 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878
Incremental monetary value (IMV). The idea behind decision makers in our example may judge the relative
incremental proﬁt of Kitts et al. (2000) is the expected proﬁt importance for each of levels: administrative managers
minus the proﬁt you would expect to receive due to the (AM) and business managers (BM) in sales. The answers
natural course of a customer’s purchasing. Incremental with their own different opinions are aggregated into
proﬁt maximizes the proﬁt of the item, minus the baseline geometric means, which are expressed in the form
proﬁt associated with the item (Kitts et al., 2000). of pairwise comparison matrices for each of levels (Fig. 3).
Incremental monetary value is deﬁned as follows in this
study:
3.2.2. Step 2: assess the consistency of pairwise judgments
IMVðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ Decision makers may make inconsistent judgments when
making pairwise comparisons. Before the weights are
fA ;A ;.;Ak computed, the degree of inconsistency is measured by an
Z Conf t ðA1 A2 /Aj 0 AjC1 AjC2 /Ak Þ K jC1 jC2
jDt j inconsistency index. Perfect consistency implies a zero
inconsistency index. However, perfect consistency is
X
k
! PriceðAi Þ seldom achieved since humans are often biased and
iZjC1 inconsistent when making subjective judgments. Therefore,
an inconsistency index of less than 0.1 is acceptable
For rule 1, goodcd_45530019310000goodcd_ (Harker, 1989). If the inconsistency index exceeds this
4503620935000, that was generated from Dt in our study, value, then the pairwise judgment may be revised before the
we can compute the interest factor measure as follows. weights of attributes are computed.
Conf (r1)Z0.098 f(goodcd_4503620935000)Z154 3.2.3. Step 3: computing the relative weights. This
jDtjZ3285 Price(goodcd_4503620935000)Z100,000 determines the weight of each decision element. This
work employs eigenvalue computations to derive the
154
EMVðri Þ Z 0:098 K !100; 000 Z 5112:024 weight of the attributes (Table 2). The eigenvector of
3285 the maximum eigenvalue of the factors in the ﬁrst level
is 0.330 for recency, 0.274 for frequency, and 0.396 for
monetary value, respectively. In this example, we can
3.2. Phase II: relative weighting ﬁnd that decision makers consider monetary value as the
most important factor for prioritizing association rules.
The AHP was used to determine the relative importance Continuing to compute the weight of the next level by
of the rule attributes. The main steps of the AHP are as computing their eigenvector, we can get 0.602 for Sup,
follows. 0.237 for Conf, and 0.161 for IF as the sub-criteria of
frequency and 0.414 for EMV and 0.586 for IMV as the
3.2.1. Step 1: perform pairwise comparisons sub-criteria of monetary value, respectively. According to
This asks decision makers to make pairwise comparisons the assessments, the relative weights of DoC, Sup, Conf,
of the relative importance of each of attributes. Two IF, EMV, IMV are 0.330, 0.165, 0.065, 0.044, 0.164, and
Pairwise comparisons matrix of the first level criteria
Business values of a rule Recency Frequency Monetary value
a
Recency 1 1 (1/2; 2) 1 (1/3; 3)
Frequency 1 0.577 (1/3; 1)
Monetary value 1
Pairwise comparisons matrix of the second level criteria
Frequency Sup Conf IF
Sup 1 2.449 (3; 2) 3.873 (3; 5)
Conf 1 1.414 (2; 1)
IF 1
Monetary value EMV EMV
EMV 1 0.707 (2; 1/4)
IMV 1
a
Aggregated opinion (administrative manager’s opinion; business manager’s opinion)
Fig. 3. Pairwise comparisons matrix of the ﬁrst level criteria.
9. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 875
Table 2 with the price of head item. In this case, rule 1 and rule 4
Relative weights of criteria have the same body of goodcd_4553001931000, and rules
Criteria Level 1 Level 2 Relative 2–4 have the same head item of goodcd_4504940939002. It
weights is, thus, necessary for decision makers to rank those rules
Recency_DoC 0.330 1 0.330 and get the priorities among them in decision making
Frequency_Sup 0.274 0.602 0.165 processes.
Frequency_Conf 0.274 0.237 0.065
Frequency-IF 0.274 0.161 0.044
Monetary 0.396 0.414 0.164 3.4. Phase IV: preparing decision table
value_EMV Then the decision matrix of the rule prioritization
Monetary 0.396 0.586 0.232 problem is prepared as shown in Table 4. For the criterion
value_IMV of DoC, rule 2, rule 3, and rule 5 get the negative values.
DoC, degree of change; Sup, support; Conf, conﬁdence; IF, interestingness This means that the support, in other word, usefulness of
factor; EMV, expected monetary value; IMV, increased monetary value. those association rules decreased in 2001. Rule 5 has the
largest value for EMV due to the highest price of head item
0.232, respectively. In this example, thus, DoC is found but lowest value for IMV. The reason can be found in the
to be the most important criterion, IMV is the next and fact that the value of IF is relatively low compared to the
the like. value of Conf for rule 5. In other word, the accuracy of
the rule 5 stem from the high frequency of head item, not
3.3. Phase III: target association rule set from relationship with the body item.
In the condition of 0.25% minimum support, 8%
minimum conﬁdence and maximum itemset size 2, the 3.5. Phase V: calculating ranking
system found seven association rules with the second For each relation Rule i and Rule j with i or j integers and
dataset. The target association rules are provided in Table 3 2 [1, 7], the values of the concordance index and
Table 3
Target association ruleset
Alternatives Body Head Conf Supp Price of head item
Rule 1 goodcd_4553001931000 0 goodcd_4503620935000 0.098 0.003332 100,000
Rule 2 goodcd_4550410939000 0 goodcd_4504940939002 0.081 0.002430 30,000
Rule 3 goodcd_4550410939001 0 goodcd_4504940939002 0.089 0.002759 30,000
Rule 4 goodcd_4553001931000 0 goodcd_4504940939002 0.080 0.002720 30,000
Rule 5 goodcd_4502160939001 0 goodcd_4550390935070 0.086 0.002752 179,000
Rule 6 goodcd_4550900939001 0 goodcd_4550900939000 0.141 0.004230 90,000
Rule 7 goodcd_4550900939000 0 goodcd_4550900939001 0.086 0.004300 90,000
Table 4
Decision table
Alternatives DoC Sup Conf IF EMV IMV
Rule 1 1.734 0.003332 0.098 2.090 9800 5112.024
Rule 2 K0.016 0.002430 0.081 20.468 2430 2311.279
Rule 3 K0.032 0.002759 0.089 22.490 2670 2551.279
Rule 4 0.741 0.002720 0.080 20.215 2400 2281.279
Rule 5 K0.258 0.002752 0.086 1.002 15,394 27.790
Rule 6 0.741 0.004230 0.141 2.842 12,690 8224.247
Rule 7 0.741 0.004300 0.086 2.854 7740 5027.671
Table 5
Matrix 7!7 of concordance
Alternatives Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7
Rule 1 – 0.956 0.956 0.956 0.836 0.330 0.791
Rule 2 0.044 – 0.330 0.505 0.606 0.044 0.044
Rule 3 0.044 0.670 – 0.670 0.836 0.044 0.109
Rule 4 0.044 0.495 0.330 – 0.606 0.374 0.374
Rule 5 0.164 0.394 0.164 0.394 – 0.164 0.229
Rule 6 0.670 0.956 0.956 0.626 0.836 – 0.461
Rule 7 0.209 0.956 0.891 0.956 0.836 0.539 –
10. 876 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878
Table 6
Matrix 7!7 of discordance
Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6 Rule 7
Rule 1 – 0.855 0.949 0.843 0.431 0.705 0.518
Rule 2 0.879 – 0.176 0.380 0.998 0.984 1.000
Rule 3 0.886 0.008 – 0.388 0.388 0.852 0.824
Rule 4 0.569 0.016 0.148 – 1.000 1.000 0.845
Rule 5 1.000 0.906 1.000 0.894 – 1.000 0.828
Rule 6 0.499 0.820 0.914 0.809 0.208 – 0.037
Rule 7 0.499 0.820 0.914 0.808 0.589 0.902 –
Fig. 4. Strong and weak preference graphs and forward ranking steps.
discordance indexes are given in the Tables 5 and 6, 4. Conclusion
respectively.
For concordance and discordance thresholds of ELEC- The business values that the ﬁrms could obtain is the
TRE II, we set pCZ0.609 (average of concordance indexes most important consideration to ﬁrms in the process of
plus 0.1), p0Z0.509 (average of concordance indexes), and extracting rules from large database on which decision
pKZ0.409 (average of concordance indexes minus 0.1); makers relied. Each of rules has various attributes related
q0Z0.693 (average of discordance indexes), qCZ0.893 with the business values and those attributes have different
(average of discordance indexes plus 0.2). Fig. 4 shows the
results of ELECTRE II, in the form of strong and weak Table 7
relation graphs with illustration of forward ranking steps. ELECTRE 2 method, direct, reverse and medium classiﬁcation with the
The ﬁnal ranking (r) of the target association rules are outranking relations
obtained by taking the average of the forward (r 0 ) and Alternatives Forward Reverse Average, Final
reverse (r 00 ) rankings (see Table 7). ranking, r 0 ranking, r 00 (r 0 Cr 00 )/2 ranking, r
The sensitivity of selection of alternatives with changes
Rule 1 2 2 2 2
in threshold values (pK, p0, pC, q0, and qC) was also studied Rule 2 4 4 4 4
and the results of ELECTRE II for these changes are shown Rule 3 1 3 2 2
in Table 8. It is found from Table 8 that the ranks have Rule 4 5 5 5 6
slightly changed for different threshold values. However, in Rule 5 4 5 4.5 5
Rule 6 1 1 1 1
all the cases, rule 6, rule 3, and rule 5 get the ﬁrst, the
Rule 7 3 3 3 3
second, and the last rank, respectively.
11. D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878 877
Table 8
Sensitivity analysis
Case1a Case1b Case1c Case2a Case2b Case2c
pK; p0; pC 0.5; 0.6; 0.7 0.6; 0.7; 0.8
q0; qC 0.5; 0.7 0.6; 0.8 0.7; 0.9 0.5; 0.7 0.6; 0.8 0.7; 0.9
Rule 1 3 3 2 2 2 2
Rule 2 4 4 4 3 4 4
Rule 3 2 2 2 2 2 2
Rule 4 6 6 6 3 3 4
Rule 5 7 7 5 4 5 4
Rule 6 1 1 1 1 1 1
Rule 7 5 5 3 3 4 3
importance according to the business strategy of the ﬁrms. Dong, G., Li, J. (1999). Efﬁcient mining of emerging patterns:
Our work presents a novel rule prioritization methodology Discovering trends and differences. Proceedings of the international
conference on knowledge discovery and data mining (KDD’99) (pp.
that incorporate decision makers’ preferences in evaluating
43–52).
the promising association rules obtained from data mining. Harker, P. T. (1989). The art and science of decision making: The analytic
The proposed methodology is an attempt to make synergy hierarchy process. The analytic hierarchy process application and
with decision analysis techniques for solving problems in studies. Berlin: Springer.
the domain of data mining. We believe that this approach Hayes-Roth, F., Waterman, D. A., Lenat, D. B. (1983). Building expert
would be particularly useful for the business managers who systems. Reading, MA: Addison-Wesley.
Hilderman, R. J., Hamilton, H. J. (2001). Evaluation of interestingness
are suffering from rule quality or quantity problems,
measures for ranking discovered knowledge. Proceedings of the
conﬂicts between extracted rules, and difﬁculties in building Paciﬁc-Asia conference on knowledge discovery and data mining
a group-consensus for the rule selection. (PAKDD’01) (pp. 247–259).
As a further research area, we plan to extend our Kitts, B., Freed, D., Vrieze, M. (2000). Cross-sell: A fast promotion-
methodology to the adaptation process. With a feedback- tunable customer-item recommendation method based on conditionally
loop that allows decision makers’ agreement and disagree- independent probabilities. Proceedings of the ACM-SIGMOD
conference on knowledge discovery and data mining (KDD’00)
ment for some parts of ranks, it is expected to obtain more (pp. 437–446).
reﬁned rule prioritization. Liu, B., Hsu, W., Ma, Y. (1999). Mining association rules with multiple
minimum supports. Proceedings of the ACM-SIGMOD international
conference on knowledge discovery and data mining (KDD’99) (pp.
15–18).
References Mobasher, B., Cooley, R., Srivastava, J. (2000). Automatic personaliza-
tion based on web usage mining. Communications of the ACM, 43(8),
142–151.
Agrawal, R., Imielinski, T., Swami, A. (1993). Mining association
¨
Mulvenna, M. D., Anand, S. S., Buchner, A. G. (2000). Personalization
between sets of items in massive database. International proceedings of
on the net using Web mining. Communications of the ACM, 43(8),
the ACM-SIGMOD international conference on management of data
123–125.
(pp. 207–216).
Roy, B., Bertier, B. (1971). La methode ELECTRE II: Une Methode de
Agrawal, R., Srikant, R. (1994). Fast algorithms for mining association
Classement en Presence de Criteres Multiples. Note de Travail No 142,
rules. Proceedings of the international conference on very large data
Direction Scientiﬁque, Groupe Metra.
bases (pp. 407–419).
` ` ´ ´
Roy, B., Bouyssou, D. (1993). Aide Multicritere a la Decision: Methodes
Barr, A., Feigenbaum, E. A. (Eds.). (1981). The handbook of artiﬁcial
et cas. Economica.
intelligence. Los Altos, CA: Morgan Kaufmann.
Brijs, T., Goethals, B., Swinnen, G., Vanhoof, K., Wets, G. (2000). A Roy, B., Vincke, P. (1981). Multicriteria analysis: Survey and new
data mining framework for optimal product selection in retail directions. European Journal of Operational Research, 8, 207–218.
supermarket data: The generalized PROFSET model. Proceedings of Saaty, T. L. (1980). The analytic hierarchy process: Planning, priority
the ACM-SIGKDD international conference on knowledge discovery setting, resource allocation. New York: McGraw-Hill.
and data mining (pp. 20–23). Saaty, T. L. (1990). How to make a decision: The analytic hierarchy
Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: process. European Journal of Operational Research, 48, 9–26.
Generalizing association rules to correlations. International proceed- Song, H. S., Kim, J. K., Kim, S. H. (2001). Mining the change of
ings of the ACM-SIGMOD international conference on management of customer behavior in an internet shopping mall. Expert Systems with
data (pp. 265–276). Applications, 21, 158–168.
Bult, J. R., Wansbeek, T. J. (1995). Optimal selection for direct mail. Srikant, R., Agrawal, R. (1995). Mining generalized association rules.
Marketing Science, 14(4), 378–394. Proceedings of the international conference on very large data bases
Carlsson, C., Walden, P. (1995). AHP in political group decisions: A (VLDB’95).
study in the art of possibilities. Interfaces, 25, 14–29. Tan, P. N., Kumar, V. (2000). Interestingness measures for
Cooley, R., Mobasher, B., Srivastava, J. (1999). Data preparation for association patterns: A perspective. KDD 2000 Workshop on
mining world wide web browsing patterns. Journal of Knowledge and Postprocessing in Machine Learning and Data Mining, Boston,
Information Systems, 1(1), 5–32. MA, August.
12. 878 D.H. Choi et al. / Expert Systems with Applications 29 (2005) 867–878
Wang, K., He, Y., Han, J. (2000). Mining frequent itemsets using support White, C. C. (1990). A survey on the integration of decision analysis and
constraints. Proceedings of the international conference on very large expert systems for decision support. IEEE Transactions on System,
data bases (VLDB’00). Man, and Cybernetics, 20(2), 358–364.
Wang, K., Zhou, S., Han, J. (2002). Proﬁt mining: From patterns to Zhang, S., Zhang, C., Yan, X. (2003). Post-mining: Maintenance
actions. Proceedings of international conference on extending data of association rules by weighting. Information Systems, 28, 691–
base technology (EDBT’02) (pp. 70–87). 707.
Be the first to comment