4. “
Association rules are if-then statements that help
to show the probability of relationships
between data items within large data sets in
various types of databases. Association rule
mining has a number of applications and is
widely used to help discover sales correlations
in transactional data or in medical data sets.
4
6. Basic Concepts
Support count(𝜎): frequency of occurrence of a
itemset. Here σ({Milk, Bread, Diaper})=2
Frequent itemset: An itemset whose support is
greater than or equal to minusp threshold.
Association rule: X →Y, where X and Y are
nonoverlapping itemset.
Example: {Milk, Diaper}->{Beer}
6
7. Basic Concepts
Support(s): The number of transactions that
include items in the {X} and {Y} parts of the
rule as a percentage of the total number of
transaction. It is a measure of how
frequently the collection of items occur
together as a percentage of all
transactions.
It is interpreted as fraction of transactions that contain
both X and Y.
From the above table, {Milk, Diaper}
→{Beer}
S= 𝛔(X+Y) / total
= 𝝈 ({Milk, Diaper, Beer}) / |T| = 2/5 = 0.4
7
TID ITEMS
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
8. Confidence(c): It is the ratio of the no
of transactions that includes all items
in {B} as well as the no of transactions
that includes all items in {A} to the no
of transactions that includes all items
in {A}.
It measures how often each item in Y appears in
transactions that contains items in X also.
From the above table, {Milk, Diaper}
→{Beer}
confidence(X →Y) = support(X ∪ Y) / support(X)
c= σ (Milk, Diaper, Beer)/σ (Milk, Diaper)
= 2/3 = 0.67
Basic concept
8
TID ITEMS
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
9. Lift(l): The lift of the rule X=>Y is the
confidence of the rule divided by the
expected confidence, assuming that the
itemsets X and Y are independent of each
other. The expected confidence is the
confidence divided by the frequency of {Y}
Lift value near 1 indicates X and Y almost often appear
together as expected, greater than 1 means they appear
together more than expected and less than 1 means they
appear less than expected. Greater lift values indicate
stronger association.
lift(X →Y) = confidence (X →Y) / support(Y)
l= σ({Milk, Diaper, Beer})/ σ({Milk,
Diaper})* σ({Beer})
= 0.4/(0.6*0.6) = 1.11
Basic concept
9
TID ITEMS
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
12. Apriori Algorithm
All non-empty subset of frequent itemset must be
frequent. The key concept of Apriori algorithm is its
anti-monotonicity of support measure. Apriori
assumes that
‘’All subsets of a frequent itemset must be
frequent(Apriori property).
If an itemset is infrequent, all its supersets will be
infrequent.’’
12
TID ITEMS
1 I1,I2,I5
2 I2,I4
3 I2,I3
4 I1,I2,I4
5 I1,I3
6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
-minimum support
count is 2
-minimum
confidence is 50%
13. STEP 1
13
1) Create a table
containing support
count of each item
present in dataset –
Called C1(candidate set)
ITEMS SUPPORT
COUNT
I1 6
I2 7
I3 6
I4 2
I5 2
TID ITEMS
1 I1,I2,I5
2 I2,I4
3 I2,I3
4 I1,I2,I4
5 I1,I3
6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
ITEMS SUPPORT
COUNT
I1 6
I2 7
I3 6
I4 2
I5 2
C1L1
2) compare candidate set item’s
support count with minimum
support count(here
min_support=2 if support count
of candidate set items is less
than min_support then remove
those items). This gives us
itemset L1.