Chapter 4
Association Analysis
Adane B. Data mining and warehousing 8/17/22
Association Rule Mining
• Association rules analysis is a technique to uncover(mine) how items are
associated to each other.
• Such uncovered association between items are called association rules
• When to mine association rules?
– Scenario:
• You are a sales manager
• Customer bought a pc and a digital camera recently
• What should you recommend to her next?
• Association rules are helpful In making your recommendation.
Adane B. Data mining and warehousing 8/17/22
• Frequent patterns(itemsets):
– Frequent patterns are patterns that appear frequently in a data set.
• In transaction data set milk and bread is a frequent pattern,
• In a shopping history database first buy pc, then a digital camera, and
then a memory card is another example of frequent pattern.
– Frequent pattern mining searches for recurring relationships in a given data
set.
– Frequent pattern mining for the discovery of interesting associations
between item sets
Association Rule Mining
Adane B. Data mining and warehousing 8/17/22
• A typical example of frequent pattern(item set) mining for association
rules.
Association Rule Mining
Example of Association Rule:
milk → bread
Applications: To make marketing
strategies
Adane B. Data mining and warehousing 8/17/22
Association Rule Mining
 Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper}  {Beer},
{Milk, Bread}  {Eggs,Coke},
{Beer, Bread}  {Milk},
Adane B. Data mining and warehousing 8/17/22
Definition: Frequent Itemset
 Itemset
– An itemset is a set of items that occurs in a
shopping basket
– A collection of one or more items
 Example: {Milk, Bread, Diaper}
– k-itemset
 An itemset that contains k items
Support count ()
- Frequency of occurrence of an
itemset
- E.g. ({Milk, Bread, Diaper}) =
2
Adane B. Data mining and warehousing 8/17/22
 Support
– Support of an itemset is the
frequency of the itemset with respect
to the number of transactions
– Fraction of transactions that contain
an itemset
Definition: Frequent Itemset
Adane B. Data mining and warehousing 8/17/22
Definition: Association Rule
Example:
Beer
}
Diaper
,
Milk
{ 
4
.
0
5
2
|
T
|
)
Beer
Diaper,
,
Milk
(




s
67
.
0
3
2
)
Diaper
,
Milk
(
)
Beer
Diaper,
Milk,
(





c
 Association Rule
– An implication expression of the form X  Y,
where X and Y are item sets
– Example:
{Milk, Diaper}  {Beer}
 Rule Evaluation Metrics
 Support (s)
Fraction of transactions that contain both X and Y
Confidence (c)
Measures how often items in Y appear in transactions
that contain X
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Adane B. Data mining and warehousing 8/17/22
Ex1. What is the support and
confidence of the rule: {B,D} →
{A}
in the following transaction?
Ex2. What is the support and
confidence of the following transaction
with the rule
– A  C
– C  A
Transaction ID Items Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F
Definition: Association Rule
Frequent pattern
 Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
 itemset: A set of one or more items
 k-itemset: X = {x1, …, xk}
 Mining algorithms
 Apriori
 FP-growth
Given a set of transactions T, the goal of association rule mining is to find all rules having
support ≥ minsup threshold
confidence ≥ minconf threshold
Adane B. Data mining and warehousing 8/17/22
Apriori Algorithm
1. Initially, scan DB once to get candidate item set C1 and find 1-items
from C1and put them to Lk (k=1)
2. Use Lk to generate a collection of candidate itemsets Ck+1 with size
(k+1)
3. Scan the database to find which itemsets in Ck+1 are frequent and put
them into Lk+1
4. If Lk+1 is not empty (i.e., terminate when no frequent or candidate
set can be generated) k=k+1 GOT 2
Adane B. Data mining and warehousing 8/17/22
Apriori Algorithm
Adane B. Data mining and warehousing 8/17/22
Apriori Algorithm
Adane B. Data mining and warehousing 8/17/22
The Apriori Algorithm —Example3
• Consider the following transactions for association rules analysis:
• Use minimum support(min_sup) = 2 (2/9 = 22%) and
• Minimum confidence = 70%
Apriori Algorithm
Adane B. Data mining and warehousing 8/17/22
Apriori Algorithm
Adane B. Data mining and warehousing 8/17/22
• Ex 1. A database has five transactions. Let min sup =
60% and min conf = 80%.
• Find all frequent itemsets using Apriori algorithm.
• List all the strong association rules.
Apriori Algorithm
Adane B. Data mining and warehousing 8/17/22
FP-Tree/FP-Growth Algorithm
 Use a compressed representation of the database using an FP-
tree
 Once an FP-tree has been constructed, it uses a recursive
divide-and-conquer approach to mine the frequent itemsets.
Building the FP-Tree
1. Scan data to determine the support count of each item.
Infrequent items are discarded, while the frequent items are sorted in
decreasing support counts.
2. Make a second pass over the data to construct the FP­tree.
As the transactions are read, before being processed, their items are
sorted according to the above order.
First scan – determine frequent 1-
itemsets, then build header
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
B 8
A 7
C 7
D 5
E 3
FP-tree construction
TID Items
1 {A,B}
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
null
B:1
A:1
After reading TID=1:
After reading TID=2:
null
B:2
A:1
C:1
D:1
Adane B. Data mining and warehousing 8/17/22
FP-Growth (I)
 FP­
growth generates frequent itemsets from an FP­
tree
by exploring the tree in a bottom­
up fashion.
 Given the example tree, the algorithm looks for
frequent itemsets ending in E first, followed by D, C,
A, and finally, B.
 Since every transaction is mapped onto a path in the
FP­
tree, we can derive the frequent itemsets ending
with a particular item, say, E, by examining only the
paths containing node E.
Paths containing node E
B:3
null
C:3
A:2
C:1
D:1
E:1
D:1
E:1
E:1
B:8
A:5
null
C:3
D:1
A:2
C:1
D:1
E:1
D:1
E:1
C:3
D:1
D:1 E:1
Adane B. Data mining and warehousing 8/17/22
Conditional FP-Tree for E
 We now need to build a conditional FP-Tree for E,
which is the tree of itemsets ending in E.
 It is not the tree obtained in previous slide as result of
deleting nodes from the original tree.
 Why? Because the order of the items change.
– In this example, C has a higher count than B.
Conditional FP-Tree for E
Adding up the counts for D we get 2, so
{E,D} is frequent itemset.
We continue recursively.
Base of recursion: When the tree has a
single path only.
B:3
null
C:3
A:2
C:1
D:1
E:1
D:1
E:1
E:1
The set of paths containing E.
Insert each path (after truncating E) into
a new tree.
Item Pointer
C 4
B 3
A 2
D 2
Header table
The new header
C:3
null
B:3
C:1
A:1
D:1
A:1
D:1
The conditional
FP-Tree for E
Adane B. Data mining and warehousing 8/17/22
FP-Tree Another Example
T1= A B C E F O
T2= A C G
T3= E I
T4= A C D E G
T5= A C E G L
T6= E J
T7= A B C E F P
T8= A C D
T9= A C E G M
T10= A C E G N
A:8
C:8
E:8
G:5
B:2
D:2
F:2
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
Freq. 1-Itemsets.
Supp. Count 2
Transactions Transactions with items sorted based on
frequencies, and ignoring the infrequent
items.
FP-Tree after reading 1st
transaction
A:8
C:8
E:8
G:5
B:2
D:2
F:2
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
null
A:1
C:1
E:1
B:1
F:1
Header
FP-Tree after reading 2nd
transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:2
C:2
E:1
B:1
F:1
Header
FP-Tree after reading 3rd
transaction
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:2
C:2
E:1
B:1
F:1
Header
E:1
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 4th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:3
C:3
E:2
B:1
F:1
Header
E:1
G:1
D:1
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 5th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:4
C:4
E:3
B:1
F:1
Header
E:1
G:2
D:1
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 6th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:4
C:4
E:3
B:1
F:1
Header
E:2
G:2
D:1
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 7th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:5
C:5
E:4
B:2
F:2
Header
E:2
G:2
D:1
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 8th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:6
C:6
E:4
B:2
F:2
Header
E:2
G:2
D:1
D:1
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 9th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:7
C:7
E:5
B:2
F:2
Header
E:2
G:3
D:1
D:1
A C E B F
A C G
E
A C E G D
A C E G
E
A C E B F
A C D
A C E G
A C E G
FP-Tree after reading 10th
transaction
G:1
A:8
C:8
E:8
G:5
B:2
D:2
F:2
null
A:8
C:8
E:6
B:2
F:2
Header
E:2
G:4
D:1
D:1

Chapter 4 Association Data Mining. pptx

  • 1.
  • 2.
    Adane B. Datamining and warehousing 8/17/22 Association Rule Mining • Association rules analysis is a technique to uncover(mine) how items are associated to each other. • Such uncovered association between items are called association rules • When to mine association rules? – Scenario: • You are a sales manager • Customer bought a pc and a digital camera recently • What should you recommend to her next? • Association rules are helpful In making your recommendation.
  • 3.
    Adane B. Datamining and warehousing 8/17/22 • Frequent patterns(itemsets): – Frequent patterns are patterns that appear frequently in a data set. • In transaction data set milk and bread is a frequent pattern, • In a shopping history database first buy pc, then a digital camera, and then a memory card is another example of frequent pattern. – Frequent pattern mining searches for recurring relationships in a given data set. – Frequent pattern mining for the discovery of interesting associations between item sets Association Rule Mining
  • 4.
    Adane B. Datamining and warehousing 8/17/22 • A typical example of frequent pattern(item set) mining for association rules. Association Rule Mining Example of Association Rule: milk → bread Applications: To make marketing strategies
  • 5.
    Adane B. Datamining and warehousing 8/17/22 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Association Rules {Diaper}  {Beer}, {Milk, Bread}  {Eggs,Coke}, {Beer, Bread}  {Milk},
  • 6.
    Adane B. Datamining and warehousing 8/17/22 Definition: Frequent Itemset  Itemset – An itemset is a set of items that occurs in a shopping basket – A collection of one or more items  Example: {Milk, Bread, Diaper} – k-itemset  An itemset that contains k items Support count () - Frequency of occurrence of an itemset - E.g. ({Milk, Bread, Diaper}) = 2
  • 7.
    Adane B. Datamining and warehousing 8/17/22  Support – Support of an itemset is the frequency of the itemset with respect to the number of transactions – Fraction of transactions that contain an itemset Definition: Frequent Itemset
  • 8.
    Adane B. Datamining and warehousing 8/17/22 Definition: Association Rule Example: Beer } Diaper , Milk {  4 . 0 5 2 | T | ) Beer Diaper, , Milk (     s 67 . 0 3 2 ) Diaper , Milk ( ) Beer Diaper, Milk, (      c  Association Rule – An implication expression of the form X  Y, where X and Y are item sets – Example: {Milk, Diaper}  {Beer}  Rule Evaluation Metrics  Support (s) Fraction of transactions that contain both X and Y Confidence (c) Measures how often items in Y appear in transactions that contain X TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  • 9.
    Adane B. Datamining and warehousing 8/17/22 Ex1. What is the support and confidence of the rule: {B,D} → {A} in the following transaction? Ex2. What is the support and confidence of the following transaction with the rule – A  C – C  A Transaction ID Items Bought 2000 A,B,C 1000 A,C 4000 A,D 5000 B,E,F Definition: Association Rule
  • 10.
    Frequent pattern  Frequentpattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set  itemset: A set of one or more items  k-itemset: X = {x1, …, xk}  Mining algorithms  Apriori  FP-growth Given a set of transactions T, the goal of association rule mining is to find all rules having support ≥ minsup threshold confidence ≥ minconf threshold
  • 11.
    Adane B. Datamining and warehousing 8/17/22 Apriori Algorithm 1. Initially, scan DB once to get candidate item set C1 and find 1-items from C1and put them to Lk (k=1) 2. Use Lk to generate a collection of candidate itemsets Ck+1 with size (k+1) 3. Scan the database to find which itemsets in Ck+1 are frequent and put them into Lk+1 4. If Lk+1 is not empty (i.e., terminate when no frequent or candidate set can be generated) k=k+1 GOT 2
  • 12.
    Adane B. Datamining and warehousing 8/17/22 Apriori Algorithm
  • 13.
    Adane B. Datamining and warehousing 8/17/22 Apriori Algorithm
  • 14.
    Adane B. Datamining and warehousing 8/17/22 The Apriori Algorithm —Example3 • Consider the following transactions for association rules analysis: • Use minimum support(min_sup) = 2 (2/9 = 22%) and • Minimum confidence = 70% Apriori Algorithm
  • 15.
    Adane B. Datamining and warehousing 8/17/22 Apriori Algorithm
  • 16.
    Adane B. Datamining and warehousing 8/17/22 • Ex 1. A database has five transactions. Let min sup = 60% and min conf = 80%. • Find all frequent itemsets using Apriori algorithm. • List all the strong association rules. Apriori Algorithm
  • 17.
    Adane B. Datamining and warehousing 8/17/22 FP-Tree/FP-Growth Algorithm  Use a compressed representation of the database using an FP- tree  Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets. Building the FP-Tree 1. Scan data to determine the support count of each item. Infrequent items are discarded, while the frequent items are sorted in decreasing support counts. 2. Make a second pass over the data to construct the FP­tree. As the transactions are read, before being processed, their items are sorted according to the above order.
  • 18.
    First scan –determine frequent 1- itemsets, then build header TID Items 1 {A,B} 2 {B,C,D} 3 {A,C,D,E} 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} 7 {B,C} 8 {A,B,C} 9 {A,B,D} 10 {B,C,E} B 8 A 7 C 7 D 5 E 3
  • 19.
    FP-tree construction TID Items 1{A,B} 2 {B,C,D} 3 {A,C,D,E} 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} 7 {B,C} 8 {A,B,C} 9 {A,B,D} 10 {B,C,E} null B:1 A:1 After reading TID=1: After reading TID=2: null B:2 A:1 C:1 D:1
  • 20.
    Adane B. Datamining and warehousing 8/17/22 FP-Growth (I)  FP­ growth generates frequent itemsets from an FP­ tree by exploring the tree in a bottom­ up fashion.  Given the example tree, the algorithm looks for frequent itemsets ending in E first, followed by D, C, A, and finally, B.  Since every transaction is mapped onto a path in the FP­ tree, we can derive the frequent itemsets ending with a particular item, say, E, by examining only the paths containing node E.
  • 21.
    Paths containing nodeE B:3 null C:3 A:2 C:1 D:1 E:1 D:1 E:1 E:1 B:8 A:5 null C:3 D:1 A:2 C:1 D:1 E:1 D:1 E:1 C:3 D:1 D:1 E:1
  • 22.
    Adane B. Datamining and warehousing 8/17/22 Conditional FP-Tree for E  We now need to build a conditional FP-Tree for E, which is the tree of itemsets ending in E.  It is not the tree obtained in previous slide as result of deleting nodes from the original tree.  Why? Because the order of the items change. – In this example, C has a higher count than B.
  • 23.
    Conditional FP-Tree forE Adding up the counts for D we get 2, so {E,D} is frequent itemset. We continue recursively. Base of recursion: When the tree has a single path only. B:3 null C:3 A:2 C:1 D:1 E:1 D:1 E:1 E:1 The set of paths containing E. Insert each path (after truncating E) into a new tree. Item Pointer C 4 B 3 A 2 D 2 Header table The new header C:3 null B:3 C:1 A:1 D:1 A:1 D:1 The conditional FP-Tree for E
  • 24.
    Adane B. Datamining and warehousing 8/17/22 FP-Tree Another Example T1= A B C E F O T2= A C G T3= E I T4= A C D E G T5= A C E G L T6= E J T7= A B C E F P T8= A C D T9= A C E G M T10= A C E G N A:8 C:8 E:8 G:5 B:2 D:2 F:2 A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G Freq. 1-Itemsets. Supp. Count 2 Transactions Transactions with items sorted based on frequencies, and ignoring the infrequent items.
  • 25.
    FP-Tree after reading1st transaction A:8 C:8 E:8 G:5 B:2 D:2 F:2 A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G null A:1 C:1 E:1 B:1 F:1 Header
  • 26.
    FP-Tree after reading2nd transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:2 C:2 E:1 B:1 F:1 Header
  • 27.
    FP-Tree after reading3rd transaction A C E B F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:2 C:2 E:1 B:1 F:1 Header E:1
  • 28.
    A C EB F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G FP-Tree after reading 4th transaction G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:3 C:3 E:2 B:1 F:1 Header E:1 G:1 D:1
  • 29.
    A C EB F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G FP-Tree after reading 5th transaction G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:4 C:4 E:3 B:1 F:1 Header E:1 G:2 D:1
  • 30.
    A C EB F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G FP-Tree after reading 6th transaction G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:4 C:4 E:3 B:1 F:1 Header E:2 G:2 D:1
  • 31.
    A C EB F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G FP-Tree after reading 7th transaction G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:5 C:5 E:4 B:2 F:2 Header E:2 G:2 D:1
  • 32.
    A C EB F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G FP-Tree after reading 8th transaction G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:6 C:6 E:4 B:2 F:2 Header E:2 G:2 D:1 D:1
  • 33.
    A C EB F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G FP-Tree after reading 9th transaction G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:7 C:7 E:5 B:2 F:2 Header E:2 G:3 D:1 D:1
  • 34.
    A C EB F A C G E A C E G D A C E G E A C E B F A C D A C E G A C E G FP-Tree after reading 10th transaction G:1 A:8 C:8 E:8 G:5 B:2 D:2 F:2 null A:8 C:8 E:6 B:2 F:2 Header E:2 G:4 D:1 D:1