Association Rule Mining || Data Mining

Hello!
I AM IFFAT FIROZY
I am here because I love to teach.
You can find me at ifirozy@gmail.com
2

“
Association rules are if-then statements that help
to show the probability of relationships
between data items within large data sets in
various types of databases. Association rule
mining has a number of applications and is
widely used to help discover sales correlations
in transactional data or in medical data sets.
4

BASIC CONCEPT
5
TID ITEMS
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Basic Concepts
Support count(𝜎): frequency of occurrence of a
itemset. Here σ({Milk, Bread, Diaper})=2
Frequent itemset: An itemset whose support is
greater than or equal to minusp threshold.
Association rule: X →Y, where X and Y are
nonoverlapping itemset.
Example: {Milk, Diaper}->{Beer}
6

Basic Concepts
Support(s): The number of transactions that
include items in the {X} and {Y} parts of the
rule as a percentage of the total number of
transaction. It is a measure of how
frequently the collection of items occur
together as a percentage of all
transactions.
It is interpreted as fraction of transactions that contain
both X and Y.
From the above table, {Milk, Diaper}
→{Beer}
S= 𝛔(X+Y) / total
= 𝝈 ({Milk, Diaper, Beer}) / |T| = 2/5 = 0.4
7
TID ITEMS
1 Bread, Milk

Confidence(c): It is the ratio of the no
of transactions that includes all items
in {B} as well as the no of transactions
that includes all items in {A} to the no
of transactions that includes all items
in {A}.
It measures how often each item in Y appears in
transactions that contains items in X also.
From the above table, {Milk, Diaper}
→{Beer}
confidence(X →Y) = support(X ∪ Y) / support(X)
c= σ (Milk, Diaper, Beer)/σ (Milk, Diaper)
= 2/3 = 0.67
Basic concept
8
TID ITEMS
1 Bread, Milk

Lift(l): The lift of the rule X=>Y is the
confidence of the rule divided by the
expected confidence, assuming that the
itemsets X and Y are independent of each
other. The expected confidence is the
confidence divided by the frequency of {Y}
Lift value near 1 indicates X and Y almost often appear
together as expected, greater than 1 means they appear
together more than expected and less than 1 means they
appear less than expected. Greater lift values indicate
stronger association.
lift(X →Y) = confidence (X →Y) / support(Y)
l= σ({Milk, Diaper, Beer})/ σ({Milk,
Diaper})* σ({Beer})
= 0.4/(0.6*0.6) = 1.11
Basic concept
9
TID ITEMS
1 Bread, Milk

PROBLEM
How many itemsets are
there?
-Given n items, there are 2n
possible itemsets
10

Apriori Algorithm
All non-empty subset of frequent itemset must be
frequent. The key concept of Apriori algorithm is its
anti-monotonicity of support measure. Apriori
assumes that
‘’All subsets of a frequent itemset must be
frequent(Apriori property).
If an itemset is infrequent, all its supersets will be
infrequent.’’
12
TID ITEMS
1 I1,I2,I5
2 I2,I4
3 I2,I3
4 I1,I2,I4
5 I1,I3
6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
-minimum support
count is 2
-minimum
confidence is 50%

STEP 1
13
1) Create a table
containing support
count of each item
present in dataset –
Called C1(candidate set)
ITEMS SUPPORT
COUNT
I1 6
I2 7
I3 6
I4 2
I5 2
TID ITEMS
1 I1,I2,I5
2 I2,I4
3 I2,I3
4 I1,I2,I4
5 I1,I3
6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
ITEMS SUPPORT
COUNT
I1 6
I2 7
I3 6
I4 2
I5 2
C1L1
2) compare candidate set item’s
support count with minimum
support count(here
min_support=2 if support count
of candidate set items is less
than min_support then remove
those items). This gives us
itemset L1.

STEP 2
14
ITEMS SUPPORT
COUNT
I1,12 6
I1,I3 7
I1,14 6
I1,15 2
I2,I3 2
I2,I4 2
I2,I5 2
I3,I4 0
I3,I5 1
I4,I5 0
TID ITEMS
1 I1,I2,I5
2 I2,I4
3 I2,I3
4 I1,I2,I4
5 I1,I3
6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
C2
L2
ITEMS SUPPORT
COUNT
I1,12 6
I1,I3 7
I1,14 6
I1,15 2
I2,I3 2
I2,I4 2
I2,I5 2

STEP 3
15
ITEMS SUPPORT
COUNT
I1,12,I3 2
I1,I2.I4 1
I1,I2,I5 2
I1,I3,I4 0
I1,I4,I5 0
I2,I3,I5 1
I2,I4,I5 0
I3,I4,I5 0
TID ITEMS
1 I1,I2,I5
2 I2,I4
3 I2,I3
4 I1,I2,I4
5 I1,I3
6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
C3
L3
ITEMS SUPPORT
COUNT
I1,12,I3 2
I1,I2,I5 2

Confidence
A confidence of 60% means that 60% of the customers, who purchased milk and
bread also bought butter.
Confidence(A->B)=Support count(A∪B)/Support count(A)
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1 ∪ I2] →[I3] //confidence = sup(I1 ∪ I2 ∪ I3)/sup(I1 ∪ I2) = 2/4*100=50%
[I1] →[I2 ∪ I3] //confidence = sup(I1 ∪ I2 ∪ I3)/sup(I1) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as strong
association rules.
16
ITEMS SUPPORT
COUNT
I1,12,I3 2
I1,I2,I5 2

Thanks!
ANY QUESTIONS?
You can find me at
@ifirozy
ifirozy@gmail.com
17

Association Rule Mining || Data Mining

Recommended

Recommended

More Related Content

Similar to Association Rule Mining || Data Mining

Similar to Association Rule Mining || Data Mining (20)

More from Iffat Firozy

More from Iffat Firozy (9)

Recently uploaded

Recently uploaded (20)

Association Rule Mining || Data Mining