Chapter 4 Association Data Mining. pptx

Chapter 4
Association Analysis

Adane B. Data mining and warehousing 8/17/22
Association Rule Mining
• Association rules analysis is a technique to uncover(mine) how items are
associated to each other.
• Such uncovered association between items are called association rules
• When to mine association rules?
– Scenario:
• You are a sales manager
• Customer bought a pc and a digital camera recently
• What should you recommend to her next?
• Association rules are helpful In making your recommendation.

• Frequent patterns(itemsets):
– Frequent patterns are patterns that appear frequently in a data set.
• In transaction data set milk and bread is a frequent pattern,
• In a shopping history database first buy pc, then a digital camera, and
then a memory card is another example of frequent pattern.
– Frequent pattern mining searches for recurring relationships in a given data
set.
– Frequent pattern mining for the discovery of interesting associations
between item sets

• A typical example of frequent pattern(item set) mining for association
rules.
Example of Association Rule:
milk → bread
Applications: To make marketing
strategies

 Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper}  {Beer},
{Milk, Bread}  {Eggs,Coke},
{Beer, Bread}  {Milk},

Definition: Frequent Itemset
 Itemset
– An itemset is a set of items that occurs in a
shopping basket
– A collection of one or more items
 Example: {Milk, Bread, Diaper}
– k-itemset
 An itemset that contains k items
Support count ()
- Frequency of occurrence of an
itemset
- E.g. ({Milk, Bread, Diaper}) =
2

 Support
– Support of an itemset is the
frequency of the itemset with respect
to the number of transactions
– Fraction of transactions that contain
an itemset
Definition: Frequent Itemset

Definition: Association Rule
Example:
Beer
}
Diaper
,
Milk
{ 
4
.
0
5
2
|
T
|
)
Beer
Diaper,
,
Milk
(




s
67
.
0
3
2
)
Diaper
,
Milk
(
)
Beer
Diaper,
Milk,
(





c
 Association Rule
– An implication expression of the form X  Y,
where X and Y are item sets
– Example:
{Milk, Diaper}  {Beer}
 Rule Evaluation Metrics
 Support (s)
Fraction of transactions that contain both X and Y
Confidence (c)
Measures how often items in Y appear in transactions
that contain X
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Ex1. What is the support and
confidence of the rule: {B,D} →
{A}
in the following transaction?
Ex2. What is the support and
confidence of the following transaction
with the rule
– A  C
– C  A
Transaction ID Items Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F
Definition: Association Rule

Frequent pattern
 Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
 itemset: A set of one or more items
 k-itemset: X = {x1, …, xk}
 Mining algorithms
 Apriori
 FP-growth
Given a set of transactions T, the goal of association rule mining is to find all rules having
support ≥ minsup threshold
confidence ≥ minconf threshold

Apriori Algorithm
1. Initially, scan DB once to get candidate item set C1 and find 1-items
from C1and put them to Lk (k=1)
2. Use Lk to generate a collection of candidate itemsets Ck+1 with size
(k+1)
3. Scan the database to find which itemsets in Ck+1 are frequent and put
them into Lk+1
4. If Lk+1 is not empty (i.e., terminate when no frequent or candidate
set can be generated) k=k+1 GOT 2

Apriori Algorithm

The Apriori Algorithm —Example3
• Consider the following transactions for association rules analysis:
• Use minimum support(min_sup) = 2 (2/9 = 22%) and
• Minimum confidence = 70%
Apriori Algorithm

• Ex 1. A database has five transactions. Let min sup =
60% and min conf = 80%.
• Find all frequent itemsets using Apriori algorithm.
• List all the strong association rules.
Apriori Algorithm

FP-Tree/FP-Growth Algorithm
 Use a compressed representation of the database using an FP-
tree
 Once an FP-tree has been constructed, it uses a recursive
divide-and-conquer approach to mine the frequent itemsets.
Building the FP-Tree
1. Scan data to determine the support count of each item.
Infrequent items are discarded, while the frequent items are sorted in
decreasing support counts.
2. Make a second pass over the data to construct the FPtree.
As the transactions are read, before being processed, their items are
sorted according to the above order.

First scan – determine frequent 1-
itemsets, then build header
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
B 8
A 7
C 7
D 5
E 3

FP-tree construction
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
null
B:1
A:1
After reading TID=1:
After reading TID=2:
null
B:2
A:1
C:1
D:1

FP-Growth (I)
 FP
growth generates frequent itemsets from an FP
tree
by exploring the tree in a bottom
up fashion.
 Given the example tree, the algorithm looks for
frequent itemsets ending in E first, followed by D, C,
A, and finally, B.
 Since every transaction is mapped onto a path in the
FP
tree, we can derive the frequent itemsets ending
with a particular item, say, E, by examining only the
paths containing node E.

Paths containing node E
B:3
null
C:3
A:2
C:1
D:1
E:1
D:1
E:1
E:1
B:8
A:5
null
C:3
D:1
A:2
C:1
D:1
E:1
D:1
E:1
C:3
D:1
D:1 E:1

Conditional FP-Tree for E
 We now need to build a conditional FP-Tree for E,
which is the tree of itemsets ending in E.
 It is not the tree obtained in previous slide as result of
deleting nodes from the original tree.
 Why? Because the order of the items change.
– In this example, C has a higher count than B.

Conditional FP-Tree for E
Adding up the counts for D we get 2, so
{E,D} is frequent itemset.
We continue recursively.
Base of recursion: When the tree has a
single path only.
B:3
null
C:3
A:2
C:1
D:1
E:1
D:1
E:1
E:1
The set of paths containing E.
Insert each path (after truncating E) into
a new tree.
Item Pointer
C 4
B 3
A 2
D 2
Header table
The new header
C:3
null
B:3
C:1
A:1
D:1
A:1
D:1
The conditional
FP-Tree for E

FP-Tree Another Example
T1= A B C E F O
T2= A C G
T3= E I
T4= A C D E G
T5= A C E G L
T6= E J
T7= A B C E F P
T8= A C D
T9= A C E G M
T10= A C E G N
A:8
C:8
E:8
G:5
B:2
D:2
F:2
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
Freq. 1-Itemsets.
Supp. Count 2
Transactions Transactions with items sorted based on
frequencies, and ignoring the infrequent
items.

FP-Tree after reading 1st
transaction
A:8
C:8
E:8
G:5
B:2
D:2
F:2
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
null
A:1
C:1
E:1
B:1
F:1
Header

FP-Tree after reading 2nd
transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:2
C:2
E:1
B:1
F:1
Header

FP-Tree after reading 3rd
transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:2
C:2
E:1
B:1
F:1
Header
E:1

A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 4th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:3
C:3
E:2
B:1
F:1
Header
E:1
G:1
D:1

A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:4
C:4
E:3
B:1
F:1
Header
E:1
G:2
D:1

A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:4
C:4
E:3
B:1
F:1
Header
E:2
G:2
D:1

A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:5
C:5
E:4
B:2
F:2
Header
E:2
G:2
D:1

A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:6
C:6
E:4
B:2
F:2
Header
E:2
G:2
D:1
D:1

A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:7
C:7
E:5
B:2
F:2
Header
E:2
G:3
D:1
D:1

A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:8
C:8
E:6
B:2
F:2
Header
E:2
G:4
D:1
D:1

Chapter 4 Association Data Mining. pptx

More Related Content

Similar to Chapter 4 Association Data Mining. pptx

Recently uploaded

Chapter 4 Association Data Mining. pptx