2. 2. Association Rules: Introduction
Introduction : - Association rules are created by analyzing data for frequent if/then
patterns and using the criteria support and confidence to identify the most important
relationships.
- The purchase of one product when another product is purchased represents an
association rules.
- Association rules are used in retail stores in marketing advertising floor placements and
inventory control.
2
3. Support is an indication of how frequently the items appear in the database. Confidence
indicates the number of times the if/then statements have been found to be true.
In data mining, association rules are useful for analyzing and predicting customer
behavior.
They play an important part in shopping basket data analysis, product clustering, catalog
design and store layout.
Example: "If a customer buys a dozen eggs, he is 80% likely to also purchase
milk."
3
4. Frequent Item Set
The most common approach to finding association rule is to break up
problem into two parts:
1)Find Large Itemsets.
2)Generate rules from frequent itemsets.
A set is called frequent if its support is no less than a given absolute
minimal support threshold.
An itemset is any subset of the set of all items.
A frequent itemset is an itemset whose number of occurrence is above
a threshold. We use the notation L to indicate the complete set of large
item sets and l to indicate a specific large itemsets.
The original motivation for searching frequent sets came from the need
to analyse so called supermarket transaction data, that is, to examine
customer behaviour in terms of the purchased products. 4
10. Algorithm to Generate Association Rules:
In this algorithm we use a function support , which
returns the support for the input itemset.
10
11. Example
Table1.1 Sample data to IllustrateAssociation Rule
A database in which an association rule is to be found in viewed as a set of tuples ,
where each tuple contains a set of items. For example, a tuple could be {Bread,Jelly,
Peanut,Butter} which consists of this three item.
Table 1.1 is used throughout this topic to illustrate different algorithms. Here,
there are five transaction and five items.
11
13. Support of All Sets of Items
Support: This says how popular an itemset is, as measured by the proportion of transactions
in which an itemset appears. In Table 1 below, the support of {apple} is 4 out of 8, or 50%.
Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2
out of 8, or 25%.
13
14. Confidence
Every association rule has a support and a confidence.
An association rule is of the form: X => Y
X => Y: if someone buys X, he also buys Y
The confidence is the conditional probability that, given X
present in a transition , Y will also be present.
Confidence measure, by definition:
Confidence(X=>Y) equals support(X,Y) / support(X)
14
15. Support and Confidence for Some association Rule
15
This says how likely item Y is purchased when item X is purchased,
expressed as {X -> Y}.
This is measured by the proportion of transactions with item X, in
which item Y also appears.
17. 17
This says how likely item Y is purchased when item X is purchased,
while controlling for how popular item Y is. In Table 1, the lift of
{apple -> beer} is 1, which implies no. association between items.
A lift value greater than 1 means that item Y is likely to be bought
if item X is bought, while a value less than 1 means that item Y is
unlikely to be bought if item X is bought.
21. Confidence or Support :
The confidence or Support (a) for a association rule X=> Y is the ratio
of the number of transaction that contain X U Y to the transaction that
contain X.
The selection of association rule is based on these two values as
describe in the definition of the association rule problem in definition.
Confidence measure the Support of the rule where as supports
measures how often it should occur in the database.
21