SlideShare a Scribd company logo
1 of 84
Download to read offline
Ms. Rashmi Bhat
Mining Frequent
Patterns And
Association
Rules
Does This Look
Familiar???
What is a Frequent Pattern?
• A frequent pattern is a pattern that appears in a data set frequently.
• What are these frequent patterns?
Frequent Itemsets
What is a Frequent Pattern?
• A frequent pattern is a pattern that appears in a data set frequently.
• What are these frequent patterns?
• Frequent Itemsets
• Frequent Sequential Pattern
• Frequent Structured Patterns etc
• Searching for recurring relationships in a given data set.
• Discovering interesting associations and correlations
between itemsets in transactional databases.
What is Frequent Pattern Mining?
Market Basket Analysis
Market Basket Analysis
• Analyzes customer buying habits by finding associations between the different
items that customers place in their “shopping baskets”.
• How does this help retailers?
• Helps to develop marketing strategies by gaining insight into which items are frequently
purchased together by customers.
Market Basket Analysis
• Buying patterns which reflect items frequently purchased or associated together
can be represented in rules form, known as association rules.
• e.g.
{𝑴𝒐𝒃𝒊𝒍𝒆} ⇒ 𝑺𝒄𝒓𝒆𝒆𝒏𝑮𝒖𝒂𝒓𝒅, 𝑩𝒂𝒄𝒌𝒄𝒐𝒗𝒆𝒓 [𝑠𝑢𝑝𝑝𝑜𝑟𝑡 = 5%, 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 65%]
• Interestingness measures: 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 and 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆
• Reflect the usefulness and certainty of discovered rules.
• Association rules are considered interesting if they satisfy both a minimum support
threshold and a minimum confidence threshold.
• Thresholds can be set by users or domain experts.
Association Rule
Let
𝑰 = 𝑷𝒆𝒏, 𝑷𝒆𝒏𝒄𝒊𝒍, 𝑬𝒓𝒂𝒔𝒆𝒓, 𝑵𝒐𝒕𝒆𝒃𝒐𝒐𝒌, 𝑹𝒖𝒍𝒆𝒓, 𝑴𝒂𝒓𝒌𝒆𝒓, 𝑺𝒄𝒊𝒔𝒔𝒐𝒓𝒔, 𝑮𝒍𝒖𝒆 … …Set of items in shop
𝑫 is task relevant dataset
𝑻 = 𝑷𝒆𝒏𝒄𝒊𝒍, 𝑷𝒆𝒏, 𝑵𝒐𝒕𝒆𝒃𝒐𝒐𝒌, 𝑬𝒓𝒂𝒔𝒆𝒓 , 𝑻 ⊂ 𝑰 …Transaction
𝑨 = 𝑷𝒆𝒏, 𝑷𝒆𝒏𝒄𝒊𝒍, 𝑵𝒐𝒕𝒆𝒃𝒐𝒐𝒌, 𝑺𝒄𝒊𝒔𝒔𝒐𝒓𝒔 , 𝑩 = 𝑬𝒓𝒂𝒔𝒆𝒓, 𝑮𝒍𝒖𝒆 … Set of items
𝑨 ⇒ 𝑩
An Association Rule … 𝑤ℎ𝑒𝑟𝑒 𝐴 ⊂ 𝐼, 𝐵 ⊂ 𝐼 𝑎𝑛𝑑 𝐴 ∩ 𝐵 = ∅
An Association Rule 𝑨 ⇒ 𝑩 holds in transaction set with support 𝒔 and confidence 𝒄
Association Rule
Support s, where s is the percentage of transactions in D that contain 𝑨 ∪ 𝑩
Confidence c, where c is the the percentage of transactions in D
containing 𝑨 that also contain 𝐵.
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑨 ⇒ 𝑩 = 𝑷(𝑨 ∪ 𝑩)
𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝑨 ⇒ 𝑩 = 𝑷(𝑩|𝑨)
Rules that satisfy both a minimum support threshold (min_sup) and a minimum
confidence threshold (min_conf) are called strong rules.
Some Important Terminologies
• Itemset is a set of items.
• An itemset that contains k items is a k-itemset.
• The occurrence frequency of an itemset is the number of transactions that contain the
itemset. This is also known as the frequency, support count, or count of the itemset.
• The occurrence frequency is called the absolute support.
• If an itemset 𝐼 satisfies a prespecified minimum support threshold, then 𝐼 is a frequent
itemset
• The set of frequent 𝑘-itemsets is commonly denoted by 𝐿𝑘
𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝑨 ⇒ 𝑩 =
𝒔𝒖𝒑𝒑𝒐𝒓𝒕(𝑨 ∪ 𝑩)
𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝑨)
=
𝒔𝒖𝒑𝒑𝒐𝒓𝒕_𝒄𝒐𝒖𝒏𝒕(𝑨 ∪ 𝑩)
𝒔𝒖𝒑𝒑𝒐𝒓𝒕_𝒄𝒐𝒖𝒏𝒕(𝑨)
Frequent Itemset
ID Transactions
1 A, B, C, D
2 B, D, E
3 A, D, E
4 A, B, E
5 C, D, E
Item Frequency
A 3
B 3
C 2
D 4
E 4
Item Frequency Support
A 3 3/5→0.6
B 3 3/5→0.6
C 2 2/5→0.4
D 4 4/5→0.8
E 4 4/5→0.8
Association Rule Mining
• Association Rule Mining
• The overall performance of mining association rules is determined by the first step.
1. Find all frequent Itemsets
2. Generate strong association
rules from the frequent itemsets
Itemsets
• Closed Itemset
• An itemset 𝑋 is closed in a data set 𝑆 if there exists no proper super-itemset 𝑌 such that 𝑌
has the same support count as 𝑋 in 𝑆.
• If 𝑋 is both closed and frequent in 𝑆, then 𝑋 is a closed frequent itemset in 𝑆.
• Maximal Itemset
• An itemset 𝑋 is a maximal frequent itemset (or max-itemset) in set 𝑆, if 𝑋 is frequent,
and there exists no super-itemset 𝑌 such that 𝑋 ⊂ 𝑌 and 𝑌 is frequent in 𝑆.
Closed and Maximal Itemsets
• Frequent itemset 𝑋 ∈ 𝐹 is maximal if it does not have any frequent
supersets
• That is, for all 𝑋 ⊂ 𝑌, 𝑌 ∉ 𝐹
• Frequent itemset 𝑋 ∈ 𝐹 is closed if it has no immediate superset with
the same frequency
• That is, for all 𝑋 ⊂ 𝑌, 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑌, 𝐷 < 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑋, 𝐷)
TID Itemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup = 3
i.e. min_sup = 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3 3 3 4 4
2 2 3 3
2
1- Item set
2-Item set
3-Item set
4-Item set
A B C D
A,B A,C A,D B,C
TID Itemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup = 3
i.e. min_sup = 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3 3 3 4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
1- Item set
2-Item set
3-Item set
4-frequent Item set
TID Itemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup = 3
i.e. min_sup = 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3 3 3 4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
D 1- Item set
2-Item set
3- Item set
4-Item set
Closed Frequent Itemset
B,D C,D
(but not maximal)
TID Itemset
1 {A, C, D}
2 {B, C, D}
3 {A, B, C, D}
4 {B, D}
5 {A, B, C, D}
min_sup = 3
i.e. min_sup = 60%
null
A B C D
A,B A,C A,D B,C B,D C,D
A,B,C A,B,D A,C,D B,C,D
A,B,C,D
3 4 4 5
2
3 3 3 4 4
2 2 3 3
2
Frequent Itemset Infrequent Itemset
D 1- Item set
2-Item set
3- Item set
4- Item set
Closed Frequent Itemset
B,D C,D
Maximal Frequent Itemset
A,C,D B,C,D
(but not maximal)
Frequent Pattern Mining
• Frequent pattern mining can be classified in various ways as follows:
Based on the completeness of patterns to be mined
Based on the levels of abstraction involved in the rule set
Based on the number of data dimensions involved in the rule
Based on the types of values handled in the rule
Based on the kinds of rules to be mined
Based on the kinds of patterns to be mined
Frequent Pattern Mining
• Based on the completeness of the patterns to be mined
• The complete set of frequent itemsets,
• The closed frequent itemsets, and the maximal frequent itemsets
• The constrained frequent itemsets
• The approximate frequent itemsets
• The near-match frequent itemsets
• The top-k frequent itemsets
Frequent Pattern Mining
• Based on the levels of abstraction involved in the rule set
• We can find rules at differing levels of abstraction
• multilevel association rules
• Based on the number of data dimensions involved in the rule
• Single-dimensional association rule
• Multidimensional association rule
• Based on the types of values handled in the rule
• Boolean association rule
• Quantitative association rule
Frequent Pattern Mining
• Based on the kinds of rules to be mined
• Association rules
• Correlation rules
• Based on the kinds of patterns to be mined
• Frequent itemset mining
• Sequential pattern mining
• Structured pattern mining
Efficient and Scalable Frequent Itemset
Mining Methods
• Methods for mining the simplest form of frequent patterns
• Single-dimensional,
• Single-level,
• Boolean frequent itemsets
• Apriori Algorithm
• basic algorithm for finding frequent itemsets
• How to generate strong association rules from frequent itemsets?
• Variations to the Apriori algorithm
Apriori Algorithm
• Finds Frequent Itemsets Using Candidate Generation
• The algorithm uses prior knowledge of frequent itemset properties
• Employs an iterative approach known as a level-wise search
• k-itemsets are used to explore (k+1)-itemsets
• Apriori property, is used to reduce the search space.
• If a set cannot pass a test, all of its supersets will fail the same test as well.
Apriori property: All nonempty subsets of a frequent itemset must also be frequent.
Apriori Algorithm
• Apriori algorithm follows a two step process
• Join Step:
• To find 𝐿𝑘, a set of candidate 𝑘-itemsets is generated by joning 𝐿𝑘−1 with itself.
• This set of candidates is denoted 𝐶𝑘
• Prune Step:
• 𝐶𝑘 is a superset of 𝐿𝑘, that is, its members may or may not be frequent, but all of the
frequent 𝑘-itemsets are included in 𝐶𝑘.
• A scan of the database to determine the count of each candidate in 𝐶𝑘 would result in the
determination of 𝐿𝑘.
• To reduce the size of 𝐶𝑘, the Apriori property is used
• Any (𝑘 − 1)-itemset that is not frequent cannot be a subset of a frequent 𝑘-itemset.
Prune
Step
Join
Step
Apriori Algorithm
Transactional data for a retail shop
• Find the frequent itemsets using Apriori algorithm
• There are eight transactions in this database, that is,
𝐷 = 8.
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Apriori Algorithm
Iteration 1:
• Generate candidate itemset 𝐶1 (1-itemset)
• Suppose that the minimum support count required is 3, i.e ,
𝒎𝒊𝒏 _𝒔𝒖𝒑=𝟑 (relative support is 3/8=37.5%)
Item
{ Milk }
{ Eggs }
{ Bread }
{ Butter }
{ Cheese }
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Support
Count
6
4
7
4
3
𝑪𝟏
Apriori Algorithm
Iteration 1:
• Generate candidate itemset 𝐶1
• Suppose that the minimum support
count required is 3, i.e , 𝒎𝒊𝒏 _𝒔𝒖𝒑=𝟑
(relative support is 3/8=37.5%)
• The set of frequent 1-itemsets, 𝑳𝟏,
can then be determined.
• Prune Step: Remove all the itemsets
not satisfying minimum support.
• 𝐿1 consists Candidate 1-itemsets
satisfying minimum support.
Frequent 1-Itemset 𝑳𝟏
Item Support
{ Milk } 6
{ Eggs } 4
{ Bread } 7
{ Butter } 4
{ Cheese } 3
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Apriori Algorithm
Iteration 2:
• Join Step: Join 𝐿1 × 𝐿1 to generate candidate itemset 𝐶2
𝑪𝟐
Item
{ Milk, Eggs }
{ Milk, Bread }
{ Milk, Butter }
{ Milk, Cheese }
{ Eggs, Bread}
{ Eggs, Butter }
{Eggs, Cheese }
{Bread, Butter}
{Bread, Cheese}
{Butter, Cheese}
Support Count
2
5
3
2
3
2
1
4
3
1
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Apriori Algorithm
Iteration 2:
• Join Step: Join 𝐿1 × 𝐿1 to generate
candidate itemset 𝐶2
• 𝒎𝒊𝒏_𝒔𝒖𝒑=3
• The set of frequent 2-itemsets, 𝐿2 ,
can then be determined.
• Prune Step: Remove all the itemsets
not satisfying minimum support.
• 𝐿2 consists Candidate 2-itemsets
satisfying minimum support.
𝑪𝟐
Item Support
Count
{ Milk, Eggs } 2
{ Milk, Bread } 5
{ Milk, Butter } 3
{ Milk, Cheese } 2
{ Eggs, Bread} 3
{ Eggs, Butter } 2
{Eggs, Cheese } 1
{Bread, Butter} 4
{Bread, Cheese} 3
{Butter, Cheese} 1
Item Support
Count
{ Milk, Eggs } 2
{ Milk, Bread } 5
{ Milk, Butter } 3
{ Milk, Cheese } 2
{ Eggs, Bread} 3
{ Eggs, Butter } 2
{Eggs, Cheese } 1
{Bread, Butter} 4
{Bread, Cheese} 3
{Butter, Cheese} 1
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Apriori Algorithm
Iteration 2:
• Join Step: Join 𝐿1 × 𝐿1 to generate
candidate itemset 𝐶2
• 𝒎𝒊𝒏_𝒔𝒖𝒑=3
• The set of frequent 2-itemsets, 𝐿2 ,
can then be determined.
• Prune Step: Remove all the itemsets
not satisfying minimum support.
• 𝐿2 consists Candidate 2-itemsets
satisfying minimum support.
Item Support
Count
{ Milk, Bread } 5
{ Milk, Butter } 3
{ Eggs, Bread} 3
{Bread, Butter} 4
{Bread, Cheese} 3
Frequent 2-Itemset 𝑳𝟐
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Apriori Algorithm
Iteration 3:
• Join Step: Join 𝐿2 × 𝐿2 to
generate candidate itemset
𝐶3.
𝑪𝟑
Item
{ Milk, Bread, Butter }
{ Milk, Eggs, Bread }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Butter }
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Apriori Algorithm
Iteration 3:
• Join Step: Join 𝐿2 × 𝐿2 to
generate candidate itemset 𝑪𝟑
• 𝒎𝒊𝒏_𝒔𝒖𝒑=3
• The set of frequent 3-itemsets,
𝐿3 , can then be determined.
• Prune Step: Remove all the
itemsets not satisfying
minimum support.
• 𝐿3 consists Candidate 3-
itemsets satisfying minimum
support.
𝑪𝟑
Item
{ Milk, Bread, Butter }
{ Milk, Eggs, Bread }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Butter }
Item Support
Count
{ Milk, Bread, Butter } 3
{ Milk, Eggs, Bread } 1
{ Milk, Bread, Cheese } 2
{ Eggs, Bread, Butter } 2
Item Support
Count
{ Milk, Bread, Butter } 3
{ Milk, Eggs, Bread } 1
{ Milk, Bread, Cheese } 2
{ Eggs, Bread, Butter } 2
frequent 3-itemsets, 𝐿3
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Apriori Algorithm
Iteration 3:
• Join Step: Join 𝐿2 × 𝐿2 to
generate candidate itemset 𝑪𝟑
• 𝒎𝒊𝒏_𝒔𝒖𝒑=3
• The set of frequent 3-itemsets,
𝐿3 , can then be determined.
• Prune Step: Remove all the
itemsets not satisfying
minimum support.
• 𝐿3 consists Candidate 3-
itemsets satisfying minimum
support.
Item Support
Count
{ Milk, Bread, Butter } 3
frequent 3-itemsets, 𝐿3
T_id List of Item
1 { Milk, Eggs, Bread, Butter }
2 {Milk, Bread }
3 { Eggs, Bread, Butter }
4 {Milk, Bread, Butter }
5 { Milk, Bread, Cheese }
6 { Eggs, Bread, Cheese }
7 { Milk, Bread, Butter, Cheese}
8 { Milk, Eggs, }
Algorithm:
Algorithm:
Generating Association Rules from Frequent
Itemsets
• To generate strong association rules from frequent itemsets, calculate confidence of
a rule using
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐴 ⟹ 𝐵 =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡_𝑐𝑜𝑢𝑛𝑡(𝐴 ∪ 𝐵)
𝑆𝑢𝑝𝑝𝑜𝑟𝑡_𝑐𝑜𝑢𝑛𝑡(𝐴)
• Based on this, association rules can be generated as follows:
• For each frequent itemset 𝑙, generate all nonempty subsets of 𝑙.
• For every nonempty subset 𝑠 of 𝑙, output the rule 𝑠 ⟹ (𝑙 − 𝑠) if
𝑠𝑢𝑝𝑝𝑜𝑟𝑡−𝑐𝑜𝑢𝑛𝑡(𝑙)
𝑠𝑢𝑝𝑝𝑜𝑟𝑡_𝑐𝑜𝑢𝑛𝑡(𝑠)
≥ 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓,
where 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 is the minimum confidence threshold
Generating Association Rules from Frequent
Itemsets
• E.g. Example contains frequent itemset 𝑙 = {𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟}. What are the
association rules can be generated from 𝑙.
• Subsets of l = {milk}, {Bread}, {butter}, {Milk,Bread}, {Milk,Butter}, {Bread, Butter}
• Resulting association rules are
𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝐵𝑢𝑡𝑡𝑒𝑟}
𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝐵𝑟𝑒𝑎𝑑}
𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘}
𝑀𝑖𝑙𝑘 ⟹ {𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟}
𝐵𝑟𝑒𝑎𝑑 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟}
𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑}
𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ
𝟑 𝟓 = 𝟔𝟎%
𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ
𝟑 𝟑 = 𝟏𝟎𝟎%
𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ
𝟑 𝟒 = 𝟕𝟓%
𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ
𝟑 𝟔 = 𝟓𝟎%
𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ
𝟑 𝟕 = 𝟒𝟐. 𝟖𝟓%
𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ
𝟑 𝟒 = 𝟕𝟓%
List of Item
{ Milk, Eggs, Bread, Butter }
{Milk, Bread }
{ Eggs, Bread, Butter }
{Milk, Bread, Butter }
{ Milk, Bread, Cheese }
{ Eggs, Bread, Cheese }
{ Milk, Bread, Butter, Cheese}
{ Milk, Eggs, }
Generating Association Rules from Frequent
Itemsets
Sr. No. Rule Confidence Is Strong Rule?
1 𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝐵𝑢𝑡𝑡𝑒𝑟} 60.00
2 𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝐵𝑟𝑒𝑎𝑑} 100.00
3 𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘} 75.00
4 𝑀𝑖𝑙𝑘 ⟹ {𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟} 50.00
5 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟} 42.85
6 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑} 75.00
Minimum Confidence is
60%
Rule Confidence
𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝐵𝑢𝑡𝑡𝑒𝑟} 60.00
𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝐵𝑟𝑒𝑎𝑑} 100.00
𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘} 75.00
𝑀𝑖𝑙𝑘 ⟹ {𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟} 50.00
𝐵𝑟𝑒𝑎𝑑 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟} 42.85
𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑} 75.00
Minimum Support = 30% and Minimum Confidence = 65%
Methods to Improve Efficiency of Apriori
Hash Based Technique
Transaction reduction
Partitioning
Dynamic Itemset Counting
Sampling
• Hash Based Technique
• Can be used to reduce the size of the candidate 𝑘-itemsets, 𝐶𝑘, for 𝑘 > 1.
• Such a hash-based technique may substantially reduce the number of the candidate
𝑘 −itemsets examined (especially when 𝑘 = 2).
• In 2nd iteration, i.e. generation of 2-itemset, every combination of two items, map them on
different buckets of a hash table structure and increment the bucket count.
• If count of bucket is less than min_sup count, then remove them from candidate sets.
Methods to Improve Efficiency of Apriori
• Hash Based Technique
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
𝐦𝐢𝐧 _𝐬𝐮𝐩 𝐂𝐨𝐮𝐧𝐭 = 𝟑
Itemset Support Count
A 6
B 7
C 6
D 2
E 2
𝑪𝟏
Order of items
A = 1, B = 2, C = 3, D = 4 and E = 5
Itemset Count Hash Function
A,B 4 1 × 10 + 2 𝑚𝑜𝑑 7 = 𝟓
A,C 4 1 × 10 + 3 𝑚𝑜𝑑 7 = 𝟔
A,D 1 1 × 10 + 4 𝑚𝑜𝑑 7 = 𝟎
A,E 2 1 × 10 + 5 𝑚𝑜𝑑 7 = 𝟏
B,C 4 2 × 10 + 3 𝑚𝑜𝑑 7 = 𝟐
B,D 2 2 × 10 + 4 𝑚𝑜𝑑 7 = 𝟑
B,E 2 2 × 10 + 5 𝑚𝑜𝑑 7 = 𝟒
C,D 0 −
C,E 1 3 × 10 + 5 𝑚𝑜𝑑 7 = 𝟎
D,E 0 −
𝑯 𝒙, 𝒚 = (𝒐𝒓𝒅𝒆𝒓 𝒐𝒇 𝒙 × 𝟏𝟎 + (𝒐𝒓𝒅𝒆𝒓 𝒐𝒇 𝒚)) 𝒎𝒐𝒅 𝟕
Hash Table
Methods to Improve Efficiency of Apriori
• Hash Based Technique
Bucket
Address
Bucket
Count
Bucket
Content
𝑳𝟐
0 2 {A,D} {C,E} No
1 2 {A,E} {A,E} No
2 4
{B,C} {B,C} {B,C}
{B,C}
Yes
3 2 {B, D} {B,D} No
4 2 {B,E} {B,E} No
5 4
{A,B} {A,B} {A,B}
{A,B}
Yes
6 4
{A,C} {A,C} {A,C}
{A,C}
yes
Hash Table Structure to Generate 𝑳𝟐
𝐦𝐢𝐧 _𝐬𝐮𝐩 𝐂𝐨𝐮𝐧𝐭 = 𝟑
TID List of Items
T1 {A, B, E}
T2 {B, D}
T3 {B, C}
T4 {A, B, D}
T5 {A, C}
T6 {B, C}
T7 {A, C}
T8 {A, B, C, E}
T9 {A, B, C}
Methods to Improve Efficiency of Apriori
Itemset Hash
Value
{A,B} 𝟓
{A,C} 𝟔
{A,D} 𝟎
{A,E} 𝟏
{B,C} 𝟐
{B,D} 𝟑
{B,E} 𝟒
{C,E} 𝟎
• Transaction Reduction
• A transaction that does not contain any frequent 𝑘-itemsets cannot contain any frequent
(𝑘 + 1)-itemsets.
• Such transaction can be marked or removed from further consideration
TID List of
Items
T1 A, B, E
T2 B, C, D
T3 C, D
T4 A, B, C, D
TID A B C D E
T1 1 1 0 0 1 3
T2 0 1 1 1 0 3
T3 0 0 1 1 0 2
T4 1 1 1 1 0 4
2 3 3 3 1
𝐦𝐢𝐧 _𝐬𝐮𝐩 𝐂𝐨𝐮𝐧𝐭 = 𝟐
TID A B C D E
T1 1 1 0 0 1
T2 0 1 1 1 0
T3 0 0 1 1 0
T4 1 1 1 1 0
TID A B C D
T1 1 1 0 0
T2 0 1 1 1
T3 0 0 1 1
T4 1 1 1 1
Methods to Improve Efficiency of Apriori
• Transaction Reduction
TID List of Items
T1 A, B, E
T2 B, C, D
T3 C, D
T4 A, B, C, D
TID A,B A,C A,D B,C B,D C,D
T1 1 0 0 0 0 0 1
T2 0 0 0 1 1 1 3
T3 0 0 0 0 0 1 1
T4 1 1 1 1 1 1 6
2 1 1 2 2 3
𝐦𝐢𝐧 _𝐬𝐮𝐩 𝐂𝐨𝐮𝐧𝐭 = 𝟐
TID A,B A,C A,D B,C B,D C,D
T1 1 0 0 0 0 0 1
T2 0 0 0 1 1 1 3
T3 0 0 0 0 0 1 1
T4 1 1 1 1 1 1 6
2 1 1 2 2 3
TID A,B B,C B,D C,D
T2 0 1 1 1
T4 1 1 1 1
TID A,B B,C B,D C,D
T2 0 1 1 1 3
T4 1 1 1 1 4
1 2 2 2
TID B,C,D
T2 1
T4 1
TID B,C B,D C,D
T2 1 1 1
T4 1 1 1
Methods to Improve Efficiency of Apriori
• Partitioning
• Requires just two database scans to mine frequent itemsets
Transactions
in D
Frequent
itemsets in
D
Find global
frequent
itemsets
among
candidates
(1 Scan)
Combine all
frequent
itemsets to
form
candidate
itemset
Find frequent
Itemsets local
to each
partitions
(1 Scan)
Divide D into
n partitions
Transactions
in D
Phase I
Phase II
Methods to Improve Efficiency of Apriori
• Partitioning
Transactions
in D
TID A B C D E
T1 1 0 0 0 1
T2 0 1 0 1 0
T3 0 0 0 1 1
T4 0 1 1 0 0
T5 0 0 0 0 1
T6 0 1 1 1 0
Database is divided into three partitions
Each having two transactions with
support of 20%
First Scan
Support = 20%
Min_Sup = 1
A=1, B=1, D=1, E=1
{A,E} = 1, {B,D} = 1
B=1, C=1, D=1, E=1
{D,E} = 1, {B,C} = 1
B=1, C=1, D=1, E=1
{B,C}=1, {B,D}=1, {C,D} = 1
{B,C,D} = 1
Shortlisted
Frequent
Itemsets
B=3,
C=2,
D=3,
E=3
{B,D} = 2
{B,C} = 2
Second Scan
Support = 20%
Min_Sup = 2
A=1, B=3, C=2, D=3, E=3
{A,E} = 1
{B,D} = 2
{D,E} = 1
{B,C} = 2
{C,D} = 1
{B,C,D} = 1
Methods to Improve Efficiency of Apriori
• Dynamic Itemset Counting
• Database is partitioned into blocks marked by start points.
• New candidate can be added at any start point.
• This technique uses the count-so-far as the lower bound of the actual count
• If the count-so-far passes the min_sup, the itemset is added into frequent itemset collection
and can be used to generate longer candidates
• Leads to fewer database scans
Transactions
in D
Methods to Improve Efficiency of Apriori
C
B
• Dynamic Itemset Counting
Transactions
in D
TID List of Items
T1 A, B,
T2 B, C
T3 A
T4 -
TID A B C
T1 1 1 0
T2 0 1 1
Minimum Support = 25%
Number of blocks (M) = 2
{ }
A
A,B A,C B,C
A,B,C
Confirmed Frequent Itemset
Suspected Frequent Itemset
Confirmed Infrequent Itemset
Suspected Infrequent Itemset
{ }
A B C
A,B A,C B,C
A,B,C
A,C
{ }
A B C
A,B B,C
A,B,C
A=0, B=0, C=0 A=1, B=2, C=1
AB=1, BC = 1
A=2, B=2, C=1
AB=1, BC = 1
Itemset Lattice
Before scanning
Itemset Lattice after
scanning 1st block
Itemset Lattice after
scanning 1st and 2nd block
Methods to Improve Efficiency of Apriori
TID List of Items
T1 A, B,
T2 B, C
T3 A
T4 -
T3 1 0 0
T4 0 0 0
• Sampling
• Pick up a random sample S of a given dataset D,
• Search for frequent itemsets in S instead of D
• We trade off some degree of accuracy against efficiency.
• We may lose a global frequent itemset, so we use a lower support threshold than minimum
support to find frequent itemsets local to S denoted as 𝐿𝑆
.
• The rest of the database is used to compute the actual frequencies of each itemset in 𝐿𝑆.
• If 𝐿𝑆
contains all frequent itemsets in D, then only one scan of D is required.
Transactions
in D
Methods to Improve Efficiency of Apriori
• Reducing the size of candidate sets may lead to good performance, it can suffer
from two nontrivial costs:
• It may still need to generate a huge number of candidate sets.
• It may need to repeatedly scan the whole databases and check a large set of candidate by
pattern matching.
• A method is required that will mine the complete set of frequent itemsets
without a costly candidate generation process
• This method is called as Frequent Pattern Growth or FP-Growth
FP-Growth
• Adopts a divide-and-conquer strategy as:
• First it compresses the database representing frequent items into a frequent pattern tree or
FP-tree which retains itemset association information
• Then it divides the compressed database into a set of conditional databases, each associated
with one frequent item or pattern fragment
• And then mines each database separately.
FP-Growth
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
Scan the database
Derive set of frequent 1-
itemset and their support
count (min_sup=2)
Itemset Support
A 6
B 7
C 6
D 2
E 2
Sort in order of
descending support
count
𝐿 = { 𝐵: 7 , 𝐴: 6 , 𝐶: 6 , 𝐷: 2 , {𝐸: 2}}
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
𝐿 = { 𝐵: 7 , 𝐴: 6 , 𝐶: 6 , 𝐷: 2 , {𝐸: 2}}
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 1
A: 1
E: 1
𝐿 = { 𝐵: 7 , 𝐴: 6 , 𝐶: 6 , 𝐷: 2 , {𝐸: 2}}
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 2
A: 1
E: 1
D: 1
B: 2
D: 1
𝐿 = { 𝐵: 7 , 𝐴: 6 , 𝐶: 6 , 𝐷: 2 , {𝐸: 2}}
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 3
A: 1
E: 1
D: 1
C: 1
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 4
A: 2
E: 1
D: 1
C: 1
D: 1
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 4
A: 2
E: 1
D: 1
C: 1
D: 1
A: 1
C: 1
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 5
A: 2
E: 1
D: 1
C: 2
D: 1
A: 1
C: 1
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 5
A: 2
E: 1
D: 1
C: 2
D: 1
A: 2
C: 2
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 6
A: 3
E: 1
D: 1
C: 2
D: 1
C: 1
E: 1
A: 2
C: 2
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 7
A: 4
E: 1
D: 1
C: 2
D: 1
C: 2
E: 1
A:2
C: 2
FP-Growth
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Itemset Support
B 7
A 6
C 6
D 2
E 2
1. Create the root of the
tree, labeled with “null”
2. Scan the database D again.
The items in each transaction
are processed in L order
null { }
B: 7
A: 4
E: 1
D: 1
C: 2
D: 1
C: 2
E: 1
A:2
C: 2
FP-Growth
Itemset Support Node
Link
B 7
A 6
C 6
D 2
E 2
To facilitate tree traversal, an item header table is built so that each item points to occurrences in the tree via a
chain of node-links.
null { }
B: 7
A: 4
E: 1
D: 1
C: 2
D: 1
C: 2
E: 1
A:2
C: 2
Home Work
• Draw FP-tree for given database TID List of Items
T1 {A,B}
T2 {B,C}
T3 {B,C,D}
T4 {A,C,D,E}
T5 {A,D,E}
T6 {A,B,C}
T7 {A,B,C,D}
T8 {A,C}
T9 {A,B,C}
T10 {A,D,E}
T11 {A,E}
Home Work
FP-Growth
• The FP-tree is mined as follows
• Start from each frequent length-1 pattern (as an initial suffix pattern), construct its
conditional pattern base (a “subdatabase,”which consists of the set of prefix paths in the FP-
tree co-occurring with the suffix pattern),
• Then construct its (conditional) FP-tree,
• perform mining recursively on such a tree.
• The pattern growth is achieved by the concatenation of the suffix pattern with
the frequent patterns generated from a conditional FP-tree.
FP-Growth
Item Conditional Pattern Base Conditional FP-Tree Frequent Patterns Generated
E 𝐵, 𝐴: 1 , 𝐵, 𝐴, 𝐶: 1 𝐵: 2, 𝐴: 2 𝐵, 𝐸: 2 , 𝐴, 𝐸: 2 , {𝐵, 𝐴, 𝐸: 2}
D 𝐵: 1 , 𝐵, 𝐴: 1 𝐵: 2 𝐵, 𝐷: 2
C 𝐵, 𝐴: 2 , 𝐵: 2 , {𝐴: 2} 𝐵: 4, 𝐴: 2 , 𝐴: 2 𝐵, 𝐶: 4 , 𝐴, 𝐶: 4 , 𝐵, 𝐴, 𝐶: 2
A 𝐵: 4 𝐵: 4 𝐵, 𝐴: 4
B - - -
1. Start with Item having least support count
2. Generate conditional pattern base by
identifying the path to the item
3. Form conditional FP-Tree
4. Generate frequent patterns
Mining Frequent Itemsets Using Vertical
Data Format
TID List of Items
T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B, C
T7 A, C
T8 A, B, C, E
T9 A, B, C
Item TID_set
A {T1, T4, T5, T7, T8, T9}
B {T1, T2, T3, T4, T6, T8, T9}
C {T3, T5, T6, T7, T8, T9}
D {T2, T4}
E {T1, T8}
Horizontal Data Format
{𝑇𝐼𝐷: 𝐼𝑡𝑒𝑚𝑠𝑒𝑡}
Vertical Data Format
{𝐼𝑡𝑒𝑚: 𝑇𝐼𝐷_𝑠𝑒𝑡}
Mining can be performed on this data set
by intersecting the TID sets of every pair of
frequent single items.
Item TID_set
A∩B {T1, T4, T8, T9}
A∩C {T5, T7, T8, T9}
A∩D {T4}
A∩E {T1, T8}
B∩C {T3, T6, T8, T9}
B∩D {T2, T4}
B∩E {T1, T8}
C∩D { }
C∩E {T8}
D∩E {}
Mining Frequent Itemsets Using Vertical
Data Format
Item TID_set
{A, B} {T1, T4, T8, T9}
{A, C} {T5, T7, T8, T9}
{A, D} {T4}
{A, E} {T1, T8}
{B, C} {T3, T6, T8, T9}
{B, D} {T2, T4}
{B, E} {T1, T8}
{C, E} {T8}
2-Itemset in vertical format
Item TID_set
{A, B, C, E} {T8}
3-Itemset in vertical format
4-Itemset in vertical format
There are only two frequent 3-itemsets:
𝑨, 𝑩, 𝑪: 𝟐 and 𝑨, 𝑩, 𝑬: 𝟐
Item TID_set
{A, B} {T1, T4, T8, T9}
{A, C} {T5, T7, T8, T9}
{A, D} {T4}
{A, E} {T1, T8}
{B, C} {T3, T6, T8, T9}
{B, D} {T2, T4}
{B, E} {T1, T8}
{C, E} {T8}
Item TID_set
{A, B, C} {T8, T9}
{A, B, E} {T1, T8}
Item TID_set
{A, B, C, E} {T8}
The support count of an itemset is simply the length of
the TID_set of the itemset.
TID Itemset
1 D, B
2 C, A, B
3 D, A, B, C
4 A, C
5 D, C
6 C, A, E
7 D, C, A
8 D
9 A, B, D
10 B, C, E
11 B, A
Find the frequent itemsets using FP-Growth algorithm with minimum support= 50%
Mining Multilevel Association Rules
• Strong associations discovered at high levels of abstraction may represent
commonsense knowledge.
• So, data mining systems should provide capabilities for mining association rules at
multiple levels of abstraction, with sufficient flexibility for easy traversal among
different abstraction spaces.
Mining Multilevel Association Rules
Mining Multilevel Association Rules
Concept Hierarchy for Computer Items at Shop
Level 0
Level 1
Level 2
Level 3
Level 4
Mining Multilevel Association Rules
• It is difficult to find interesting purchase patterns at such raw or primitive-level
data.
• It is easier to find strong associations between generalized abstractions of these
items at primitive levels.
• Association rules generated from mining data at multiple levels of abstraction are
called multiple-level or multilevel association rules.
• Multilevel association rules can be mined efficiently using concept hierarchies
under a support-confidence framework.
• A top-down strategy is employed.
Mining Multilevel Association Rules
• Using uniform minimum support for all levels:
• The same minimum support threshold is used when mining at each level of abstraction.
• The search procedure is simplified.
• If min_sup is set too high, it could miss some meaningful associations at low abstraction levels
• If min_sup is set too low, it may generate many uninteresting associations at high abstraction
levels
Mining Multilevel Association Rules
• Using reduced minimum support at lower levels:
• Each level of abstraction has its own minimum support threshold.
• The deeper the level of abstraction, the smaller the corresponding threshold is
Mining Multilevel Association Rules
• Using item or group-based minimum support:
• It is sometimes more desirable to set up user-specific, item, or group based minimal support
thresholds when mining multilevel rules
• e.g. a user could set up the minimum support thresholds based on product price, or on items of
interest, such as by setting particularly low support thresholds for laptop computers and flash
drives in order to pay particular attention to the association patterns containing items in these
categories.
• A serious side effect of mining multilevel association rules is its generation of many
redundant rules across multiple levels of abstraction due to the “ancestor”
relationships among items.
Mining Multilevel Association Rules
• A serious side effect of mining multilevel association rules is its generation of many
redundant rules across multiple levels of abstraction due to the “ancestor”
relationships among items.
𝑏𝑢𝑦𝑠(𝑋, "Laptop computer") ⇒ 𝑏𝑢𝑦𝑠(𝑋, "HP Printer")
[𝑠𝑢𝑝𝑝𝑜𝑟𝑡 = 8%, 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 70%]
𝑏𝑢𝑦𝑠(𝑋, "IBM Laptop computer") ⇒ 𝑏𝑢𝑦𝑠(𝑋, "HP Printer")
[𝑠𝑢𝑝𝑝𝑜𝑟𝑡 = 2%, 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 72%]
• Does the later rule really provide any novel information??
• A rule 𝑅1 is an ancestor of a rule 𝑅2, if 𝑅1 can be obtained by replacing the items in
𝑅2 by their ancestors in a concept hierarchy.
Mining Multidimensional Association Rules
• Single Dimensional Rule
𝑏𝑢𝑦𝑠(𝑋, "Milk") ⇒ buys(X,"𝐵𝑢𝑡𝑡𝑒𝑟")
• Instead of considering transaction data only, sales and related information are also
linked with relational data in data warehouse.
• Such data stores are multidimensional in nature
• Additional information of customers who purchased the items may also be stored.
• We can mine association rules containing multiple predicates/dimesions
𝒂𝒈𝒆(𝑋, "20−29" ) ∧ 𝒐𝒄𝒄𝒖𝒑𝒂𝒕𝒊𝒐𝒏(𝑋, "Housewife") ⇒ 𝒃𝒖𝒚𝒔(𝑋, "Milk" )
• Association rules which involve two or more dimensions or predicates are referred
as multidimensional association rules.
Mining Multidimensional Association Rules
𝒂𝒈𝒆(𝑋, "20−29" ) ∧ 𝒐𝒄𝒄𝒖𝒑𝒂𝒕𝒊𝒐𝒏(𝑋, "Housewife") ⇒ 𝒃𝒖𝒚𝒔(𝑋, "Milk" )
• No repeated predicates.
• Association rules which involve with no repeated predicates are referred as
interdimensional association rules.
• Association rules with repeated predicates are called as Hybrid-dimensional
association rules
𝒂𝒈𝒆(𝑋, "20−29" ) ∧ 𝒃𝒖𝒚𝒔(𝑋, "Milk") ⇒ 𝒃𝒖𝒚𝒔(𝑋, "Bread" )
Mining Multidimensional Association Rules
• Data attributes can be nominal or quantitative.
• Mining multidimensional association rules (with quantitative attributes) can be
categorized in to two approaches
1. Mining multidimensional association rules using Static discretization of
quantitative attributes
• Quantitative attributes are discretization using predefined concept hierarchies
• Discretization is done before mining.
2. Mining multidimensional association rules using Dynamic discretization of
quantitative attributes
• Quantitative attributes are discretization or clustered into bins based on the data distribution
• Bins may be combined during the mining process, that’s why dynamic process.

More Related Content

What's hot

Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clusteringMegha Sharma
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithmhina firdaus
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.pptSowmyaJyothi3
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptSubrata Kumer Paul
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 

What's hot (20)

Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
data mining
data miningdata mining
data mining
 
Clustering
ClusteringClustering
Clustering
 
Fp growth
Fp growthFp growth
Fp growth
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Association Rule.ppt
Association Rule.pptAssociation Rule.ppt
Association Rule.ppt
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Data clustring
Data clustring Data clustring
Data clustring
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Decision tree
Decision treeDecision tree
Decision tree
 
Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.ppt
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
08 clustering
08 clustering08 clustering
08 clustering
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 

Similar to Mining Frequent Patterns And Association Rules

Similar to Mining Frequent Patterns And Association Rules (20)

Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
APRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptxAPRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptx
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
6 module 4
6 module 46 module 4
6 module 4
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5CS 402 DATAMINING AND WAREHOUSING -MODULE 5
CS 402 DATAMINING AND WAREHOUSING -MODULE 5
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
Lec6_Association.ppt
Lec6_Association.pptLec6_Association.ppt
Lec6_Association.ppt
 

More from Rashmi Bhat

Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Process Scheduling in OS
Process Scheduling in OSProcess Scheduling in OS
Process Scheduling in OSRashmi Bhat
 
Introduction to Operating System
Introduction to Operating SystemIntroduction to Operating System
Introduction to Operating SystemRashmi Bhat
 
The Geometry of Virtual Worlds.pdf
The Geometry of Virtual Worlds.pdfThe Geometry of Virtual Worlds.pdf
The Geometry of Virtual Worlds.pdfRashmi Bhat
 
Spatial Data Mining
Spatial Data MiningSpatial Data Mining
Spatial Data MiningRashmi Bhat
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data MiningRashmi Bhat
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse FundamentalsRashmi Bhat
 
Virtual Reality
Virtual Reality Virtual Reality
Virtual Reality Rashmi Bhat
 
Introduction To Virtual Reality
Introduction To Virtual RealityIntroduction To Virtual Reality
Introduction To Virtual RealityRashmi Bhat
 

More from Rashmi Bhat (17)

Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Process Scheduling in OS
Process Scheduling in OSProcess Scheduling in OS
Process Scheduling in OS
 
Introduction to Operating System
Introduction to Operating SystemIntroduction to Operating System
Introduction to Operating System
 
The Geometry of Virtual Worlds.pdf
The Geometry of Virtual Worlds.pdfThe Geometry of Virtual Worlds.pdf
The Geometry of Virtual Worlds.pdf
 
Module 1 VR.pdf
Module 1 VR.pdfModule 1 VR.pdf
Module 1 VR.pdf
 
OLAP
OLAPOLAP
OLAP
 
Spatial Data Mining
Spatial Data MiningSpatial Data Mining
Spatial Data Mining
 
Web mining
Web miningWeb mining
Web mining
 
Clustering
ClusteringClustering
Clustering
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data Mining
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse Fundamentals
 
Virtual Reality
Virtual Reality Virtual Reality
Virtual Reality
 
Introduction To Virtual Reality
Introduction To Virtual RealityIntroduction To Virtual Reality
Introduction To Virtual Reality
 
Graph Theory
Graph TheoryGraph Theory
Graph Theory
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Mining Frequent Patterns And Association Rules

  • 1. Ms. Rashmi Bhat Mining Frequent Patterns And Association Rules
  • 3. What is a Frequent Pattern? • A frequent pattern is a pattern that appears in a data set frequently. • What are these frequent patterns? Frequent Itemsets
  • 4. What is a Frequent Pattern? • A frequent pattern is a pattern that appears in a data set frequently. • What are these frequent patterns? • Frequent Itemsets • Frequent Sequential Pattern • Frequent Structured Patterns etc • Searching for recurring relationships in a given data set. • Discovering interesting associations and correlations between itemsets in transactional databases. What is Frequent Pattern Mining?
  • 6. Market Basket Analysis • Analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. • How does this help retailers? • Helps to develop marketing strategies by gaining insight into which items are frequently purchased together by customers.
  • 7. Market Basket Analysis • Buying patterns which reflect items frequently purchased or associated together can be represented in rules form, known as association rules. • e.g. {𝑴𝒐𝒃𝒊𝒍𝒆} ⇒ 𝑺𝒄𝒓𝒆𝒆𝒏𝑮𝒖𝒂𝒓𝒅, 𝑩𝒂𝒄𝒌𝒄𝒐𝒗𝒆𝒓 [𝑠𝑢𝑝𝑝𝑜𝑟𝑡 = 5%, 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 65%] • Interestingness measures: 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 and 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 • Reflect the usefulness and certainty of discovered rules. • Association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold. • Thresholds can be set by users or domain experts.
  • 8. Association Rule Let 𝑰 = 𝑷𝒆𝒏, 𝑷𝒆𝒏𝒄𝒊𝒍, 𝑬𝒓𝒂𝒔𝒆𝒓, 𝑵𝒐𝒕𝒆𝒃𝒐𝒐𝒌, 𝑹𝒖𝒍𝒆𝒓, 𝑴𝒂𝒓𝒌𝒆𝒓, 𝑺𝒄𝒊𝒔𝒔𝒐𝒓𝒔, 𝑮𝒍𝒖𝒆 … …Set of items in shop 𝑫 is task relevant dataset 𝑻 = 𝑷𝒆𝒏𝒄𝒊𝒍, 𝑷𝒆𝒏, 𝑵𝒐𝒕𝒆𝒃𝒐𝒐𝒌, 𝑬𝒓𝒂𝒔𝒆𝒓 , 𝑻 ⊂ 𝑰 …Transaction 𝑨 = 𝑷𝒆𝒏, 𝑷𝒆𝒏𝒄𝒊𝒍, 𝑵𝒐𝒕𝒆𝒃𝒐𝒐𝒌, 𝑺𝒄𝒊𝒔𝒔𝒐𝒓𝒔 , 𝑩 = 𝑬𝒓𝒂𝒔𝒆𝒓, 𝑮𝒍𝒖𝒆 … Set of items 𝑨 ⇒ 𝑩 An Association Rule … 𝑤ℎ𝑒𝑟𝑒 𝐴 ⊂ 𝐼, 𝐵 ⊂ 𝐼 𝑎𝑛𝑑 𝐴 ∩ 𝐵 = ∅ An Association Rule 𝑨 ⇒ 𝑩 holds in transaction set with support 𝒔 and confidence 𝒄
  • 9. Association Rule Support s, where s is the percentage of transactions in D that contain 𝑨 ∪ 𝑩 Confidence c, where c is the the percentage of transactions in D containing 𝑨 that also contain 𝐵. 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑨 ⇒ 𝑩 = 𝑷(𝑨 ∪ 𝑩) 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝑨 ⇒ 𝑩 = 𝑷(𝑩|𝑨) Rules that satisfy both a minimum support threshold (min_sup) and a minimum confidence threshold (min_conf) are called strong rules.
  • 10. Some Important Terminologies • Itemset is a set of items. • An itemset that contains k items is a k-itemset. • The occurrence frequency of an itemset is the number of transactions that contain the itemset. This is also known as the frequency, support count, or count of the itemset. • The occurrence frequency is called the absolute support. • If an itemset 𝐼 satisfies a prespecified minimum support threshold, then 𝐼 is a frequent itemset • The set of frequent 𝑘-itemsets is commonly denoted by 𝐿𝑘 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝑨 ⇒ 𝑩 = 𝒔𝒖𝒑𝒑𝒐𝒓𝒕(𝑨 ∪ 𝑩) 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝑨) = 𝒔𝒖𝒑𝒑𝒐𝒓𝒕_𝒄𝒐𝒖𝒏𝒕(𝑨 ∪ 𝑩) 𝒔𝒖𝒑𝒑𝒐𝒓𝒕_𝒄𝒐𝒖𝒏𝒕(𝑨)
  • 11. Frequent Itemset ID Transactions 1 A, B, C, D 2 B, D, E 3 A, D, E 4 A, B, E 5 C, D, E Item Frequency A 3 B 3 C 2 D 4 E 4 Item Frequency Support A 3 3/5→0.6 B 3 3/5→0.6 C 2 2/5→0.4 D 4 4/5→0.8 E 4 4/5→0.8
  • 12. Association Rule Mining • Association Rule Mining • The overall performance of mining association rules is determined by the first step. 1. Find all frequent Itemsets 2. Generate strong association rules from the frequent itemsets
  • 13. Itemsets • Closed Itemset • An itemset 𝑋 is closed in a data set 𝑆 if there exists no proper super-itemset 𝑌 such that 𝑌 has the same support count as 𝑋 in 𝑆. • If 𝑋 is both closed and frequent in 𝑆, then 𝑋 is a closed frequent itemset in 𝑆. • Maximal Itemset • An itemset 𝑋 is a maximal frequent itemset (or max-itemset) in set 𝑆, if 𝑋 is frequent, and there exists no super-itemset 𝑌 such that 𝑋 ⊂ 𝑌 and 𝑌 is frequent in 𝑆.
  • 14. Closed and Maximal Itemsets • Frequent itemset 𝑋 ∈ 𝐹 is maximal if it does not have any frequent supersets • That is, for all 𝑋 ⊂ 𝑌, 𝑌 ∉ 𝐹 • Frequent itemset 𝑋 ∈ 𝐹 is closed if it has no immediate superset with the same frequency • That is, for all 𝑋 ⊂ 𝑌, 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑌, 𝐷 < 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑋, 𝐷)
  • 15. TID Itemset 1 {A, C, D} 2 {B, C, D} 3 {A, B, C, D} 4 {B, D} 5 {A, B, C, D} min_sup = 3 i.e. min_sup = 60% null A B C D A,B A,C A,D B,C B,D C,D A,B,C A,B,D A,C,D B,C,D A,B,C,D 3 4 4 5 2 3 3 3 4 4 2 2 3 3 2 1- Item set 2-Item set 3-Item set 4-Item set A B C D A,B A,C A,D B,C
  • 16. TID Itemset 1 {A, C, D} 2 {B, C, D} 3 {A, B, C, D} 4 {B, D} 5 {A, B, C, D} min_sup = 3 i.e. min_sup = 60% null A B C D A,B A,C A,D B,C B,D C,D A,B,C A,B,D A,C,D B,C,D A,B,C,D 3 4 4 5 2 3 3 3 4 4 2 2 3 3 2 Frequent Itemset Infrequent Itemset 1- Item set 2-Item set 3-Item set 4-frequent Item set
  • 17. TID Itemset 1 {A, C, D} 2 {B, C, D} 3 {A, B, C, D} 4 {B, D} 5 {A, B, C, D} min_sup = 3 i.e. min_sup = 60% null A B C D A,B A,C A,D B,C B,D C,D A,B,C A,B,D A,C,D B,C,D A,B,C,D 3 4 4 5 2 3 3 3 4 4 2 2 3 3 2 Frequent Itemset Infrequent Itemset D 1- Item set 2-Item set 3- Item set 4-Item set Closed Frequent Itemset B,D C,D (but not maximal)
  • 18. TID Itemset 1 {A, C, D} 2 {B, C, D} 3 {A, B, C, D} 4 {B, D} 5 {A, B, C, D} min_sup = 3 i.e. min_sup = 60% null A B C D A,B A,C A,D B,C B,D C,D A,B,C A,B,D A,C,D B,C,D A,B,C,D 3 4 4 5 2 3 3 3 4 4 2 2 3 3 2 Frequent Itemset Infrequent Itemset D 1- Item set 2-Item set 3- Item set 4- Item set Closed Frequent Itemset B,D C,D Maximal Frequent Itemset A,C,D B,C,D (but not maximal)
  • 19. Frequent Pattern Mining • Frequent pattern mining can be classified in various ways as follows: Based on the completeness of patterns to be mined Based on the levels of abstraction involved in the rule set Based on the number of data dimensions involved in the rule Based on the types of values handled in the rule Based on the kinds of rules to be mined Based on the kinds of patterns to be mined
  • 20. Frequent Pattern Mining • Based on the completeness of the patterns to be mined • The complete set of frequent itemsets, • The closed frequent itemsets, and the maximal frequent itemsets • The constrained frequent itemsets • The approximate frequent itemsets • The near-match frequent itemsets • The top-k frequent itemsets
  • 21. Frequent Pattern Mining • Based on the levels of abstraction involved in the rule set • We can find rules at differing levels of abstraction • multilevel association rules • Based on the number of data dimensions involved in the rule • Single-dimensional association rule • Multidimensional association rule • Based on the types of values handled in the rule • Boolean association rule • Quantitative association rule
  • 22. Frequent Pattern Mining • Based on the kinds of rules to be mined • Association rules • Correlation rules • Based on the kinds of patterns to be mined • Frequent itemset mining • Sequential pattern mining • Structured pattern mining
  • 23. Efficient and Scalable Frequent Itemset Mining Methods • Methods for mining the simplest form of frequent patterns • Single-dimensional, • Single-level, • Boolean frequent itemsets • Apriori Algorithm • basic algorithm for finding frequent itemsets • How to generate strong association rules from frequent itemsets? • Variations to the Apriori algorithm
  • 24. Apriori Algorithm • Finds Frequent Itemsets Using Candidate Generation • The algorithm uses prior knowledge of frequent itemset properties • Employs an iterative approach known as a level-wise search • k-itemsets are used to explore (k+1)-itemsets • Apriori property, is used to reduce the search space. • If a set cannot pass a test, all of its supersets will fail the same test as well. Apriori property: All nonempty subsets of a frequent itemset must also be frequent.
  • 25. Apriori Algorithm • Apriori algorithm follows a two step process • Join Step: • To find 𝐿𝑘, a set of candidate 𝑘-itemsets is generated by joning 𝐿𝑘−1 with itself. • This set of candidates is denoted 𝐶𝑘 • Prune Step: • 𝐶𝑘 is a superset of 𝐿𝑘, that is, its members may or may not be frequent, but all of the frequent 𝑘-itemsets are included in 𝐶𝑘. • A scan of the database to determine the count of each candidate in 𝐶𝑘 would result in the determination of 𝐿𝑘. • To reduce the size of 𝐶𝑘, the Apriori property is used • Any (𝑘 − 1)-itemset that is not frequent cannot be a subset of a frequent 𝑘-itemset. Prune Step Join Step
  • 26. Apriori Algorithm Transactional data for a retail shop • Find the frequent itemsets using Apriori algorithm • There are eight transactions in this database, that is, 𝐷 = 8. T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, }
  • 27. Apriori Algorithm Iteration 1: • Generate candidate itemset 𝐶1 (1-itemset) • Suppose that the minimum support count required is 3, i.e , 𝒎𝒊𝒏 _𝒔𝒖𝒑=𝟑 (relative support is 3/8=37.5%) Item { Milk } { Eggs } { Bread } { Butter } { Cheese } T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, } Support Count 6 4 7 4 3 𝑪𝟏
  • 28. Apriori Algorithm Iteration 1: • Generate candidate itemset 𝐶1 • Suppose that the minimum support count required is 3, i.e , 𝒎𝒊𝒏 _𝒔𝒖𝒑=𝟑 (relative support is 3/8=37.5%) • The set of frequent 1-itemsets, 𝑳𝟏, can then be determined. • Prune Step: Remove all the itemsets not satisfying minimum support. • 𝐿1 consists Candidate 1-itemsets satisfying minimum support. Frequent 1-Itemset 𝑳𝟏 Item Support { Milk } 6 { Eggs } 4 { Bread } 7 { Butter } 4 { Cheese } 3 T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, }
  • 29. Apriori Algorithm Iteration 2: • Join Step: Join 𝐿1 × 𝐿1 to generate candidate itemset 𝐶2 𝑪𝟐 Item { Milk, Eggs } { Milk, Bread } { Milk, Butter } { Milk, Cheese } { Eggs, Bread} { Eggs, Butter } {Eggs, Cheese } {Bread, Butter} {Bread, Cheese} {Butter, Cheese} Support Count 2 5 3 2 3 2 1 4 3 1 T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, }
  • 30. Apriori Algorithm Iteration 2: • Join Step: Join 𝐿1 × 𝐿1 to generate candidate itemset 𝐶2 • 𝒎𝒊𝒏_𝒔𝒖𝒑=3 • The set of frequent 2-itemsets, 𝐿2 , can then be determined. • Prune Step: Remove all the itemsets not satisfying minimum support. • 𝐿2 consists Candidate 2-itemsets satisfying minimum support. 𝑪𝟐 Item Support Count { Milk, Eggs } 2 { Milk, Bread } 5 { Milk, Butter } 3 { Milk, Cheese } 2 { Eggs, Bread} 3 { Eggs, Butter } 2 {Eggs, Cheese } 1 {Bread, Butter} 4 {Bread, Cheese} 3 {Butter, Cheese} 1 Item Support Count { Milk, Eggs } 2 { Milk, Bread } 5 { Milk, Butter } 3 { Milk, Cheese } 2 { Eggs, Bread} 3 { Eggs, Butter } 2 {Eggs, Cheese } 1 {Bread, Butter} 4 {Bread, Cheese} 3 {Butter, Cheese} 1 T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, }
  • 31. Apriori Algorithm Iteration 2: • Join Step: Join 𝐿1 × 𝐿1 to generate candidate itemset 𝐶2 • 𝒎𝒊𝒏_𝒔𝒖𝒑=3 • The set of frequent 2-itemsets, 𝐿2 , can then be determined. • Prune Step: Remove all the itemsets not satisfying minimum support. • 𝐿2 consists Candidate 2-itemsets satisfying minimum support. Item Support Count { Milk, Bread } 5 { Milk, Butter } 3 { Eggs, Bread} 3 {Bread, Butter} 4 {Bread, Cheese} 3 Frequent 2-Itemset 𝑳𝟐 T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, }
  • 32. Apriori Algorithm Iteration 3: • Join Step: Join 𝐿2 × 𝐿2 to generate candidate itemset 𝐶3. 𝑪𝟑 Item { Milk, Bread, Butter } { Milk, Eggs, Bread } { Milk, Bread, Cheese } { Eggs, Bread, Butter } T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, }
  • 33. Apriori Algorithm Iteration 3: • Join Step: Join 𝐿2 × 𝐿2 to generate candidate itemset 𝑪𝟑 • 𝒎𝒊𝒏_𝒔𝒖𝒑=3 • The set of frequent 3-itemsets, 𝐿3 , can then be determined. • Prune Step: Remove all the itemsets not satisfying minimum support. • 𝐿3 consists Candidate 3- itemsets satisfying minimum support. 𝑪𝟑 Item { Milk, Bread, Butter } { Milk, Eggs, Bread } { Milk, Bread, Cheese } { Eggs, Bread, Butter } Item Support Count { Milk, Bread, Butter } 3 { Milk, Eggs, Bread } 1 { Milk, Bread, Cheese } 2 { Eggs, Bread, Butter } 2 Item Support Count { Milk, Bread, Butter } 3 { Milk, Eggs, Bread } 1 { Milk, Bread, Cheese } 2 { Eggs, Bread, Butter } 2 frequent 3-itemsets, 𝐿3 T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, }
  • 34. Apriori Algorithm Iteration 3: • Join Step: Join 𝐿2 × 𝐿2 to generate candidate itemset 𝑪𝟑 • 𝒎𝒊𝒏_𝒔𝒖𝒑=3 • The set of frequent 3-itemsets, 𝐿3 , can then be determined. • Prune Step: Remove all the itemsets not satisfying minimum support. • 𝐿3 consists Candidate 3- itemsets satisfying minimum support. Item Support Count { Milk, Bread, Butter } 3 frequent 3-itemsets, 𝐿3 T_id List of Item 1 { Milk, Eggs, Bread, Butter } 2 {Milk, Bread } 3 { Eggs, Bread, Butter } 4 {Milk, Bread, Butter } 5 { Milk, Bread, Cheese } 6 { Eggs, Bread, Cheese } 7 { Milk, Bread, Butter, Cheese} 8 { Milk, Eggs, }
  • 37. Generating Association Rules from Frequent Itemsets • To generate strong association rules from frequent itemsets, calculate confidence of a rule using 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐴 ⟹ 𝐵 = 𝑆𝑢𝑝𝑝𝑜𝑟𝑡_𝑐𝑜𝑢𝑛𝑡(𝐴 ∪ 𝐵) 𝑆𝑢𝑝𝑝𝑜𝑟𝑡_𝑐𝑜𝑢𝑛𝑡(𝐴) • Based on this, association rules can be generated as follows: • For each frequent itemset 𝑙, generate all nonempty subsets of 𝑙. • For every nonempty subset 𝑠 of 𝑙, output the rule 𝑠 ⟹ (𝑙 − 𝑠) if 𝑠𝑢𝑝𝑝𝑜𝑟𝑡−𝑐𝑜𝑢𝑛𝑡(𝑙) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡_𝑐𝑜𝑢𝑛𝑡(𝑠) ≥ 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓, where 𝑚𝑖𝑛 _𝑐𝑜𝑛𝑓 is the minimum confidence threshold
  • 38. Generating Association Rules from Frequent Itemsets • E.g. Example contains frequent itemset 𝑙 = {𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟}. What are the association rules can be generated from 𝑙. • Subsets of l = {milk}, {Bread}, {butter}, {Milk,Bread}, {Milk,Butter}, {Bread, Butter} • Resulting association rules are 𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝐵𝑢𝑡𝑡𝑒𝑟} 𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝐵𝑟𝑒𝑎𝑑} 𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘} 𝑀𝑖𝑙𝑘 ⟹ {𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟} 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟} 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑} 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ 𝟑 𝟓 = 𝟔𝟎% 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ 𝟑 𝟑 = 𝟏𝟎𝟎% 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ 𝟑 𝟒 = 𝟕𝟓% 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ 𝟑 𝟔 = 𝟓𝟎% 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ 𝟑 𝟕 = 𝟒𝟐. 𝟖𝟓% 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = Τ 𝟑 𝟒 = 𝟕𝟓% List of Item { Milk, Eggs, Bread, Butter } {Milk, Bread } { Eggs, Bread, Butter } {Milk, Bread, Butter } { Milk, Bread, Cheese } { Eggs, Bread, Cheese } { Milk, Bread, Butter, Cheese} { Milk, Eggs, }
  • 39. Generating Association Rules from Frequent Itemsets Sr. No. Rule Confidence Is Strong Rule? 1 𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝐵𝑢𝑡𝑡𝑒𝑟} 60.00 2 𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝐵𝑟𝑒𝑎𝑑} 100.00 3 𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘} 75.00 4 𝑀𝑖𝑙𝑘 ⟹ {𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟} 50.00 5 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟} 42.85 6 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑} 75.00 Minimum Confidence is 60% Rule Confidence 𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝐵𝑢𝑡𝑡𝑒𝑟} 60.00 𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝐵𝑟𝑒𝑎𝑑} 100.00 𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘} 75.00 𝑀𝑖𝑙𝑘 ⟹ {𝐵𝑟𝑒𝑎𝑑, 𝐵𝑢𝑡𝑡𝑒𝑟} 50.00 𝐵𝑟𝑒𝑎𝑑 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑢𝑡𝑡𝑒𝑟} 42.85 𝐵𝑢𝑡𝑡𝑒𝑟 ⟹ {𝑀𝑖𝑙𝑘, 𝐵𝑟𝑒𝑎𝑑} 75.00
  • 40. Minimum Support = 30% and Minimum Confidence = 65%
  • 41. Methods to Improve Efficiency of Apriori Hash Based Technique Transaction reduction Partitioning Dynamic Itemset Counting Sampling
  • 42. • Hash Based Technique • Can be used to reduce the size of the candidate 𝑘-itemsets, 𝐶𝑘, for 𝑘 > 1. • Such a hash-based technique may substantially reduce the number of the candidate 𝑘 −itemsets examined (especially when 𝑘 = 2). • In 2nd iteration, i.e. generation of 2-itemset, every combination of two items, map them on different buckets of a hash table structure and increment the bucket count. • If count of bucket is less than min_sup count, then remove them from candidate sets. Methods to Improve Efficiency of Apriori
  • 43. • Hash Based Technique TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C 𝐦𝐢𝐧 _𝐬𝐮𝐩 𝐂𝐨𝐮𝐧𝐭 = 𝟑 Itemset Support Count A 6 B 7 C 6 D 2 E 2 𝑪𝟏 Order of items A = 1, B = 2, C = 3, D = 4 and E = 5 Itemset Count Hash Function A,B 4 1 × 10 + 2 𝑚𝑜𝑑 7 = 𝟓 A,C 4 1 × 10 + 3 𝑚𝑜𝑑 7 = 𝟔 A,D 1 1 × 10 + 4 𝑚𝑜𝑑 7 = 𝟎 A,E 2 1 × 10 + 5 𝑚𝑜𝑑 7 = 𝟏 B,C 4 2 × 10 + 3 𝑚𝑜𝑑 7 = 𝟐 B,D 2 2 × 10 + 4 𝑚𝑜𝑑 7 = 𝟑 B,E 2 2 × 10 + 5 𝑚𝑜𝑑 7 = 𝟒 C,D 0 − C,E 1 3 × 10 + 5 𝑚𝑜𝑑 7 = 𝟎 D,E 0 − 𝑯 𝒙, 𝒚 = (𝒐𝒓𝒅𝒆𝒓 𝒐𝒇 𝒙 × 𝟏𝟎 + (𝒐𝒓𝒅𝒆𝒓 𝒐𝒇 𝒚)) 𝒎𝒐𝒅 𝟕 Hash Table Methods to Improve Efficiency of Apriori
  • 44. • Hash Based Technique Bucket Address Bucket Count Bucket Content 𝑳𝟐 0 2 {A,D} {C,E} No 1 2 {A,E} {A,E} No 2 4 {B,C} {B,C} {B,C} {B,C} Yes 3 2 {B, D} {B,D} No 4 2 {B,E} {B,E} No 5 4 {A,B} {A,B} {A,B} {A,B} Yes 6 4 {A,C} {A,C} {A,C} {A,C} yes Hash Table Structure to Generate 𝑳𝟐 𝐦𝐢𝐧 _𝐬𝐮𝐩 𝐂𝐨𝐮𝐧𝐭 = 𝟑 TID List of Items T1 {A, B, E} T2 {B, D} T3 {B, C} T4 {A, B, D} T5 {A, C} T6 {B, C} T7 {A, C} T8 {A, B, C, E} T9 {A, B, C} Methods to Improve Efficiency of Apriori Itemset Hash Value {A,B} 𝟓 {A,C} 𝟔 {A,D} 𝟎 {A,E} 𝟏 {B,C} 𝟐 {B,D} 𝟑 {B,E} 𝟒 {C,E} 𝟎
  • 45. • Transaction Reduction • A transaction that does not contain any frequent 𝑘-itemsets cannot contain any frequent (𝑘 + 1)-itemsets. • Such transaction can be marked or removed from further consideration TID List of Items T1 A, B, E T2 B, C, D T3 C, D T4 A, B, C, D TID A B C D E T1 1 1 0 0 1 3 T2 0 1 1 1 0 3 T3 0 0 1 1 0 2 T4 1 1 1 1 0 4 2 3 3 3 1 𝐦𝐢𝐧 _𝐬𝐮𝐩 𝐂𝐨𝐮𝐧𝐭 = 𝟐 TID A B C D E T1 1 1 0 0 1 T2 0 1 1 1 0 T3 0 0 1 1 0 T4 1 1 1 1 0 TID A B C D T1 1 1 0 0 T2 0 1 1 1 T3 0 0 1 1 T4 1 1 1 1 Methods to Improve Efficiency of Apriori
  • 46. • Transaction Reduction TID List of Items T1 A, B, E T2 B, C, D T3 C, D T4 A, B, C, D TID A,B A,C A,D B,C B,D C,D T1 1 0 0 0 0 0 1 T2 0 0 0 1 1 1 3 T3 0 0 0 0 0 1 1 T4 1 1 1 1 1 1 6 2 1 1 2 2 3 𝐦𝐢𝐧 _𝐬𝐮𝐩 𝐂𝐨𝐮𝐧𝐭 = 𝟐 TID A,B A,C A,D B,C B,D C,D T1 1 0 0 0 0 0 1 T2 0 0 0 1 1 1 3 T3 0 0 0 0 0 1 1 T4 1 1 1 1 1 1 6 2 1 1 2 2 3 TID A,B B,C B,D C,D T2 0 1 1 1 T4 1 1 1 1 TID A,B B,C B,D C,D T2 0 1 1 1 3 T4 1 1 1 1 4 1 2 2 2 TID B,C,D T2 1 T4 1 TID B,C B,D C,D T2 1 1 1 T4 1 1 1 Methods to Improve Efficiency of Apriori
  • 47. • Partitioning • Requires just two database scans to mine frequent itemsets Transactions in D Frequent itemsets in D Find global frequent itemsets among candidates (1 Scan) Combine all frequent itemsets to form candidate itemset Find frequent Itemsets local to each partitions (1 Scan) Divide D into n partitions Transactions in D Phase I Phase II Methods to Improve Efficiency of Apriori
  • 48. • Partitioning Transactions in D TID A B C D E T1 1 0 0 0 1 T2 0 1 0 1 0 T3 0 0 0 1 1 T4 0 1 1 0 0 T5 0 0 0 0 1 T6 0 1 1 1 0 Database is divided into three partitions Each having two transactions with support of 20% First Scan Support = 20% Min_Sup = 1 A=1, B=1, D=1, E=1 {A,E} = 1, {B,D} = 1 B=1, C=1, D=1, E=1 {D,E} = 1, {B,C} = 1 B=1, C=1, D=1, E=1 {B,C}=1, {B,D}=1, {C,D} = 1 {B,C,D} = 1 Shortlisted Frequent Itemsets B=3, C=2, D=3, E=3 {B,D} = 2 {B,C} = 2 Second Scan Support = 20% Min_Sup = 2 A=1, B=3, C=2, D=3, E=3 {A,E} = 1 {B,D} = 2 {D,E} = 1 {B,C} = 2 {C,D} = 1 {B,C,D} = 1 Methods to Improve Efficiency of Apriori
  • 49. • Dynamic Itemset Counting • Database is partitioned into blocks marked by start points. • New candidate can be added at any start point. • This technique uses the count-so-far as the lower bound of the actual count • If the count-so-far passes the min_sup, the itemset is added into frequent itemset collection and can be used to generate longer candidates • Leads to fewer database scans Transactions in D Methods to Improve Efficiency of Apriori
  • 50. C B • Dynamic Itemset Counting Transactions in D TID List of Items T1 A, B, T2 B, C T3 A T4 - TID A B C T1 1 1 0 T2 0 1 1 Minimum Support = 25% Number of blocks (M) = 2 { } A A,B A,C B,C A,B,C Confirmed Frequent Itemset Suspected Frequent Itemset Confirmed Infrequent Itemset Suspected Infrequent Itemset { } A B C A,B A,C B,C A,B,C A,C { } A B C A,B B,C A,B,C A=0, B=0, C=0 A=1, B=2, C=1 AB=1, BC = 1 A=2, B=2, C=1 AB=1, BC = 1 Itemset Lattice Before scanning Itemset Lattice after scanning 1st block Itemset Lattice after scanning 1st and 2nd block Methods to Improve Efficiency of Apriori TID List of Items T1 A, B, T2 B, C T3 A T4 - T3 1 0 0 T4 0 0 0
  • 51. • Sampling • Pick up a random sample S of a given dataset D, • Search for frequent itemsets in S instead of D • We trade off some degree of accuracy against efficiency. • We may lose a global frequent itemset, so we use a lower support threshold than minimum support to find frequent itemsets local to S denoted as 𝐿𝑆 . • The rest of the database is used to compute the actual frequencies of each itemset in 𝐿𝑆. • If 𝐿𝑆 contains all frequent itemsets in D, then only one scan of D is required. Transactions in D Methods to Improve Efficiency of Apriori
  • 52. • Reducing the size of candidate sets may lead to good performance, it can suffer from two nontrivial costs: • It may still need to generate a huge number of candidate sets. • It may need to repeatedly scan the whole databases and check a large set of candidate by pattern matching. • A method is required that will mine the complete set of frequent itemsets without a costly candidate generation process • This method is called as Frequent Pattern Growth or FP-Growth FP-Growth
  • 53. • Adopts a divide-and-conquer strategy as: • First it compresses the database representing frequent items into a frequent pattern tree or FP-tree which retains itemset association information • Then it divides the compressed database into a set of conditional databases, each associated with one frequent item or pattern fragment • And then mines each database separately. FP-Growth
  • 54. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 Scan the database Derive set of frequent 1- itemset and their support count (min_sup=2) Itemset Support A 6 B 7 C 6 D 2 E 2 Sort in order of descending support count 𝐿 = { 𝐵: 7 , 𝐴: 6 , 𝐶: 6 , 𝐷: 2 , {𝐸: 2}}
  • 55. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } 𝐿 = { 𝐵: 7 , 𝐴: 6 , 𝐶: 6 , 𝐷: 2 , {𝐸: 2}}
  • 56. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 1 A: 1 E: 1 𝐿 = { 𝐵: 7 , 𝐴: 6 , 𝐶: 6 , 𝐷: 2 , {𝐸: 2}}
  • 57. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 2 A: 1 E: 1 D: 1 B: 2 D: 1 𝐿 = { 𝐵: 7 , 𝐴: 6 , 𝐶: 6 , 𝐷: 2 , {𝐸: 2}}
  • 58. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 3 A: 1 E: 1 D: 1 C: 1
  • 59. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 4 A: 2 E: 1 D: 1 C: 1 D: 1
  • 60. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 4 A: 2 E: 1 D: 1 C: 1 D: 1 A: 1 C: 1
  • 61. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 5 A: 2 E: 1 D: 1 C: 2 D: 1 A: 1 C: 1
  • 62. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 5 A: 2 E: 1 D: 1 C: 2 D: 1 A: 2 C: 2
  • 63. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 6 A: 3 E: 1 D: 1 C: 2 D: 1 C: 1 E: 1 A: 2 C: 2
  • 64. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 7 A: 4 E: 1 D: 1 C: 2 D: 1 C: 2 E: 1 A:2 C: 2
  • 65. FP-Growth TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Itemset Support B 7 A 6 C 6 D 2 E 2 1. Create the root of the tree, labeled with “null” 2. Scan the database D again. The items in each transaction are processed in L order null { } B: 7 A: 4 E: 1 D: 1 C: 2 D: 1 C: 2 E: 1 A:2 C: 2
  • 66. FP-Growth Itemset Support Node Link B 7 A 6 C 6 D 2 E 2 To facilitate tree traversal, an item header table is built so that each item points to occurrences in the tree via a chain of node-links. null { } B: 7 A: 4 E: 1 D: 1 C: 2 D: 1 C: 2 E: 1 A:2 C: 2
  • 67. Home Work • Draw FP-tree for given database TID List of Items T1 {A,B} T2 {B,C} T3 {B,C,D} T4 {A,C,D,E} T5 {A,D,E} T6 {A,B,C} T7 {A,B,C,D} T8 {A,C} T9 {A,B,C} T10 {A,D,E} T11 {A,E}
  • 69. FP-Growth • The FP-tree is mined as follows • Start from each frequent length-1 pattern (as an initial suffix pattern), construct its conditional pattern base (a “subdatabase,”which consists of the set of prefix paths in the FP- tree co-occurring with the suffix pattern), • Then construct its (conditional) FP-tree, • perform mining recursively on such a tree. • The pattern growth is achieved by the concatenation of the suffix pattern with the frequent patterns generated from a conditional FP-tree.
  • 70. FP-Growth Item Conditional Pattern Base Conditional FP-Tree Frequent Patterns Generated E 𝐵, 𝐴: 1 , 𝐵, 𝐴, 𝐶: 1 𝐵: 2, 𝐴: 2 𝐵, 𝐸: 2 , 𝐴, 𝐸: 2 , {𝐵, 𝐴, 𝐸: 2} D 𝐵: 1 , 𝐵, 𝐴: 1 𝐵: 2 𝐵, 𝐷: 2 C 𝐵, 𝐴: 2 , 𝐵: 2 , {𝐴: 2} 𝐵: 4, 𝐴: 2 , 𝐴: 2 𝐵, 𝐶: 4 , 𝐴, 𝐶: 4 , 𝐵, 𝐴, 𝐶: 2 A 𝐵: 4 𝐵: 4 𝐵, 𝐴: 4 B - - - 1. Start with Item having least support count 2. Generate conditional pattern base by identifying the path to the item 3. Form conditional FP-Tree 4. Generate frequent patterns
  • 71. Mining Frequent Itemsets Using Vertical Data Format TID List of Items T1 A, B, E T2 B, D T3 B, C T4 A, B, D T5 A, C T6 B, C T7 A, C T8 A, B, C, E T9 A, B, C Item TID_set A {T1, T4, T5, T7, T8, T9} B {T1, T2, T3, T4, T6, T8, T9} C {T3, T5, T6, T7, T8, T9} D {T2, T4} E {T1, T8} Horizontal Data Format {𝑇𝐼𝐷: 𝐼𝑡𝑒𝑚𝑠𝑒𝑡} Vertical Data Format {𝐼𝑡𝑒𝑚: 𝑇𝐼𝐷_𝑠𝑒𝑡} Mining can be performed on this data set by intersecting the TID sets of every pair of frequent single items. Item TID_set A∩B {T1, T4, T8, T9} A∩C {T5, T7, T8, T9} A∩D {T4} A∩E {T1, T8} B∩C {T3, T6, T8, T9} B∩D {T2, T4} B∩E {T1, T8} C∩D { } C∩E {T8} D∩E {}
  • 72. Mining Frequent Itemsets Using Vertical Data Format Item TID_set {A, B} {T1, T4, T8, T9} {A, C} {T5, T7, T8, T9} {A, D} {T4} {A, E} {T1, T8} {B, C} {T3, T6, T8, T9} {B, D} {T2, T4} {B, E} {T1, T8} {C, E} {T8} 2-Itemset in vertical format Item TID_set {A, B, C, E} {T8} 3-Itemset in vertical format 4-Itemset in vertical format There are only two frequent 3-itemsets: 𝑨, 𝑩, 𝑪: 𝟐 and 𝑨, 𝑩, 𝑬: 𝟐 Item TID_set {A, B} {T1, T4, T8, T9} {A, C} {T5, T7, T8, T9} {A, D} {T4} {A, E} {T1, T8} {B, C} {T3, T6, T8, T9} {B, D} {T2, T4} {B, E} {T1, T8} {C, E} {T8} Item TID_set {A, B, C} {T8, T9} {A, B, E} {T1, T8} Item TID_set {A, B, C, E} {T8} The support count of an itemset is simply the length of the TID_set of the itemset.
  • 73. TID Itemset 1 D, B 2 C, A, B 3 D, A, B, C 4 A, C 5 D, C 6 C, A, E 7 D, C, A 8 D 9 A, B, D 10 B, C, E 11 B, A Find the frequent itemsets using FP-Growth algorithm with minimum support= 50%
  • 74. Mining Multilevel Association Rules • Strong associations discovered at high levels of abstraction may represent commonsense knowledge. • So, data mining systems should provide capabilities for mining association rules at multiple levels of abstraction, with sufficient flexibility for easy traversal among different abstraction spaces.
  • 76. Mining Multilevel Association Rules Concept Hierarchy for Computer Items at Shop Level 0 Level 1 Level 2 Level 3 Level 4
  • 77. Mining Multilevel Association Rules • It is difficult to find interesting purchase patterns at such raw or primitive-level data. • It is easier to find strong associations between generalized abstractions of these items at primitive levels. • Association rules generated from mining data at multiple levels of abstraction are called multiple-level or multilevel association rules. • Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework. • A top-down strategy is employed.
  • 78. Mining Multilevel Association Rules • Using uniform minimum support for all levels: • The same minimum support threshold is used when mining at each level of abstraction. • The search procedure is simplified. • If min_sup is set too high, it could miss some meaningful associations at low abstraction levels • If min_sup is set too low, it may generate many uninteresting associations at high abstraction levels
  • 79. Mining Multilevel Association Rules • Using reduced minimum support at lower levels: • Each level of abstraction has its own minimum support threshold. • The deeper the level of abstraction, the smaller the corresponding threshold is
  • 80. Mining Multilevel Association Rules • Using item or group-based minimum support: • It is sometimes more desirable to set up user-specific, item, or group based minimal support thresholds when mining multilevel rules • e.g. a user could set up the minimum support thresholds based on product price, or on items of interest, such as by setting particularly low support thresholds for laptop computers and flash drives in order to pay particular attention to the association patterns containing items in these categories. • A serious side effect of mining multilevel association rules is its generation of many redundant rules across multiple levels of abstraction due to the “ancestor” relationships among items.
  • 81. Mining Multilevel Association Rules • A serious side effect of mining multilevel association rules is its generation of many redundant rules across multiple levels of abstraction due to the “ancestor” relationships among items. 𝑏𝑢𝑦𝑠(𝑋, "Laptop computer") ⇒ 𝑏𝑢𝑦𝑠(𝑋, "HP Printer") [𝑠𝑢𝑝𝑝𝑜𝑟𝑡 = 8%, 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 70%] 𝑏𝑢𝑦𝑠(𝑋, "IBM Laptop computer") ⇒ 𝑏𝑢𝑦𝑠(𝑋, "HP Printer") [𝑠𝑢𝑝𝑝𝑜𝑟𝑡 = 2%, 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 72%] • Does the later rule really provide any novel information?? • A rule 𝑅1 is an ancestor of a rule 𝑅2, if 𝑅1 can be obtained by replacing the items in 𝑅2 by their ancestors in a concept hierarchy.
  • 82. Mining Multidimensional Association Rules • Single Dimensional Rule 𝑏𝑢𝑦𝑠(𝑋, "Milk") ⇒ buys(X,"𝐵𝑢𝑡𝑡𝑒𝑟") • Instead of considering transaction data only, sales and related information are also linked with relational data in data warehouse. • Such data stores are multidimensional in nature • Additional information of customers who purchased the items may also be stored. • We can mine association rules containing multiple predicates/dimesions 𝒂𝒈𝒆(𝑋, "20−29" ) ∧ 𝒐𝒄𝒄𝒖𝒑𝒂𝒕𝒊𝒐𝒏(𝑋, "Housewife") ⇒ 𝒃𝒖𝒚𝒔(𝑋, "Milk" ) • Association rules which involve two or more dimensions or predicates are referred as multidimensional association rules.
  • 83. Mining Multidimensional Association Rules 𝒂𝒈𝒆(𝑋, "20−29" ) ∧ 𝒐𝒄𝒄𝒖𝒑𝒂𝒕𝒊𝒐𝒏(𝑋, "Housewife") ⇒ 𝒃𝒖𝒚𝒔(𝑋, "Milk" ) • No repeated predicates. • Association rules which involve with no repeated predicates are referred as interdimensional association rules. • Association rules with repeated predicates are called as Hybrid-dimensional association rules 𝒂𝒈𝒆(𝑋, "20−29" ) ∧ 𝒃𝒖𝒚𝒔(𝑋, "Milk") ⇒ 𝒃𝒖𝒚𝒔(𝑋, "Bread" )
  • 84. Mining Multidimensional Association Rules • Data attributes can be nominal or quantitative. • Mining multidimensional association rules (with quantitative attributes) can be categorized in to two approaches 1. Mining multidimensional association rules using Static discretization of quantitative attributes • Quantitative attributes are discretization using predefined concept hierarchies • Discretization is done before mining. 2. Mining multidimensional association rules using Dynamic discretization of quantitative attributes • Quantitative attributes are discretization or clustered into bins based on the data distribution • Bins may be combined during the mining process, that’s why dynamic process.