Adane B. Datamining and warehousing 8/17/22
Association Rule Mining
• Association rules analysis is a technique to uncover(mine) how items are
associated to each other.
• Such uncovered association between items are called association rules
• When to mine association rules?
– Scenario:
• You are a sales manager
• Customer bought a pc and a digital camera recently
• What should you recommend to her next?
• Association rules are helpful In making your recommendation.
3.
Adane B. Datamining and warehousing 8/17/22
• Frequent patterns(itemsets):
– Frequent patterns are patterns that appear frequently in a data set.
• In transaction data set milk and bread is a frequent pattern,
• In a shopping history database first buy pc, then a digital camera, and
then a memory card is another example of frequent pattern.
– Frequent pattern mining searches for recurring relationships in a given data
set.
– Frequent pattern mining for the discovery of interesting associations
between item sets
Association Rule Mining
4.
Adane B. Datamining and warehousing 8/17/22
• A typical example of frequent pattern(item set) mining for association
rules.
Association Rule Mining
Example of Association Rule:
milk → bread
Applications: To make marketing
strategies
5.
Adane B. Datamining and warehousing 8/17/22
Association Rule Mining
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper} {Beer},
{Milk, Bread} {Eggs,Coke},
{Beer, Bread} {Milk},
6.
Adane B. Datamining and warehousing 8/17/22
Definition: Frequent Itemset
Itemset
– An itemset is a set of items that occurs in a
shopping basket
– A collection of one or more items
Example: {Milk, Bread, Diaper}
– k-itemset
An itemset that contains k items
Support count ()
- Frequency of occurrence of an
itemset
- E.g. ({Milk, Bread, Diaper}) =
2
7.
Adane B. Datamining and warehousing 8/17/22
Support
– Support of an itemset is the
frequency of the itemset with respect
to the number of transactions
– Fraction of transactions that contain
an itemset
Definition: Frequent Itemset
8.
Adane B. Datamining and warehousing 8/17/22
Definition: Association Rule
Example:
Beer
}
Diaper
,
Milk
{
4
.
0
5
2
|
T
|
)
Beer
Diaper,
,
Milk
(
s
67
.
0
3
2
)
Diaper
,
Milk
(
)
Beer
Diaper,
Milk,
(
c
Association Rule
– An implication expression of the form X Y,
where X and Y are item sets
– Example:
{Milk, Diaper} {Beer}
Rule Evaluation Metrics
Support (s)
Fraction of transactions that contain both X and Y
Confidence (c)
Measures how often items in Y appear in transactions
that contain X
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
9.
Adane B. Datamining and warehousing 8/17/22
Ex1. What is the support and
confidence of the rule: {B,D} →
{A}
in the following transaction?
Ex2. What is the support and
confidence of the following transaction
with the rule
– A C
– C A
Transaction ID Items Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F
Definition: Association Rule
10.
Frequent pattern
Frequentpattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
itemset: A set of one or more items
k-itemset: X = {x1, …, xk}
Mining algorithms
Apriori
FP-growth
Given a set of transactions T, the goal of association rule mining is to find all rules having
support ≥ minsup threshold
confidence ≥ minconf threshold
11.
Adane B. Datamining and warehousing 8/17/22
Apriori Algorithm
1. Initially, scan DB once to get candidate item set C1 and find 1-items
from C1and put them to Lk (k=1)
2. Use Lk to generate a collection of candidate itemsets Ck+1 with size
(k+1)
3. Scan the database to find which itemsets in Ck+1 are frequent and put
them into Lk+1
4. If Lk+1 is not empty (i.e., terminate when no frequent or candidate
set can be generated) k=k+1 GOT 2
12.
Adane B. Datamining and warehousing 8/17/22
Apriori Algorithm
13.
Adane B. Datamining and warehousing 8/17/22
Apriori Algorithm
14.
Adane B. Datamining and warehousing 8/17/22
The Apriori Algorithm —Example3
• Consider the following transactions for association rules analysis:
• Use minimum support(min_sup) = 2 (2/9 = 22%) and
• Minimum confidence = 70%
Apriori Algorithm
15.
Adane B. Datamining and warehousing 8/17/22
Apriori Algorithm
16.
Adane B. Datamining and warehousing 8/17/22
• Ex 1. A database has five transactions. Let min sup =
60% and min conf = 80%.
• Find all frequent itemsets using Apriori algorithm.
• List all the strong association rules.
Apriori Algorithm
17.
Adane B. Datamining and warehousing 8/17/22
FP-Tree/FP-Growth Algorithm
Use a compressed representation of the database using an FP-
tree
Once an FP-tree has been constructed, it uses a recursive
divide-and-conquer approach to mine the frequent itemsets.
Building the FP-Tree
1. Scan data to determine the support count of each item.
Infrequent items are discarded, while the frequent items are sorted in
decreasing support counts.
2. Make a second pass over the data to construct the FPtree.
As the transactions are read, before being processed, their items are
sorted according to the above order.
18.
First scan –determine frequent 1-
itemsets, then build header
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
B 8
A 7
C 7
D 5
E 3
19.
FP-tree construction
TID Items
1{A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
null
B:1
A:1
After reading TID=1:
After reading TID=2:
null
B:2
A:1
C:1
D:1
20.
Adane B. Datamining and warehousing 8/17/22
FP-Growth (I)
FP
growth generates frequent itemsets from an FP
tree
by exploring the tree in a bottom
up fashion.
Given the example tree, the algorithm looks for
frequent itemsets ending in E first, followed by D, C,
A, and finally, B.
Since every transaction is mapped onto a path in the
FP
tree, we can derive the frequent itemsets ending
with a particular item, say, E, by examining only the
paths containing node E.
Adane B. Datamining and warehousing 8/17/22
Conditional FP-Tree for E
We now need to build a conditional FP-Tree for E,
which is the tree of itemsets ending in E.
It is not the tree obtained in previous slide as result of
deleting nodes from the original tree.
Why? Because the order of the items change.
– In this example, C has a higher count than B.
23.
Conditional FP-Tree forE
Adding up the counts for D we get 2, so
{E,D} is frequent itemset.
We continue recursively.
Base of recursion: When the tree has a
single path only.
B:3
null
C:3
A:2
C:1
D:1
E:1
D:1
E:1
E:1
The set of paths containing E.
Insert each path (after truncating E) into
a new tree.
Item Pointer
C 4
B 3
A 2
D 2
Header table
The new header
C:3
null
B:3
C:1
A:1
D:1
A:1
D:1
The conditional
FP-Tree for E
24.
Adane B. Datamining and warehousing 8/17/22
FP-Tree Another Example
T1= A B C E F O
T2= A C G
T3= E I
T4= A C D E G
T5= A C E G L
T6= E J
T7= A B C E F P
T8= A C D
T9= A C E G M
T10= A C E G N
A:8
C:8
E:8
G:5
B:2
D:2
F:2
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
Freq. 1-Itemsets.
Supp. Count 2
Transactions Transactions with items sorted based on
frequencies, and ignoring the infrequent
items.
25.
FP-Tree after reading1st
transaction
A:8
C:8
E:8
G:5
B:2
D:2
F:2
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
null
A:1
C:1
E:1
B:1
F:1
Header
26.
FP-Tree after reading2nd
transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:2
C:2
E:1
B:1
F:1
Header
27.
FP-Tree after reading3rd
transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:2
C:2
E:1
B:1
F:1
Header
E:1
28.
A C EB F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 4th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:3
C:3
E:2
B:1
F:1
Header
E:1
G:1
D:1
29.
A C EB F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 5th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:4
C:4
E:3
B:1
F:1
Header
E:1
G:2
D:1
30.
A C EB F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 6th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:4
C:4
E:3
B:1
F:1
Header
E:2
G:2
D:1
31.
A C EB F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 7th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:5
C:5
E:4
B:2
F:2
Header
E:2
G:2
D:1
32.
A C EB F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 8th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:6
C:6
E:4
B:2
F:2
Header
E:2
G:2
D:1
D:1
33.
A C EB F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 9th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:7
C:7
E:5
B:2
F:2
Header
E:2
G:3
D:1
D:1
34.
A C EB F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 10th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:8
C:8
E:6
B:2
F:2
Header
E:2
G:4
D:1
D:1