SlideShare a Scribd company logo
Association Rule Mining
• Given a set of transactions, find rules that will predict the occurrence
of an item based on the occurrences of other items in the transaction
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper}  {Beer},
{Milk, Bread}  {Eggs,Coke},
{Beer, Bread}  {Milk},
Definition: Frequent Itemset
• Itemset
– A collection of one or more items
• Example: {Milk, Bread, Diaper}
– k-itemset
• An itemset that contains k items
• Support count ()
– Frequency of occurrence of an itemset
– E.g. ({Milk, Bread,Diaper}) = 2
• Support
– Fraction of transactions that contain an itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
– An itemset whose support is greater than or
equal to a minsup threshold
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Definition: Association Rule
Example:
Beer}Diaper,Milk{ 
4.0
5
2
|T|
)BeerDiaper,,Milk(


s
67.0
3
2
)Diaper,Milk(
)BeerDiaper,Milk,(



c
 Association Rule
– An implication expression of the form X 
Y, where X and Y are itemsets
– Example:
{Milk, Diaper}  {Beer}
 Rule Evaluation Metrics
– Support (s)
 Fraction of transactions that contain both
X and Y
– Confidence (c)
 Measures how often items in Y
appear in transactions that
contain X
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Association Rule Mining Task
• Given a set of transactions T, the goal of association
rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold
• Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf thresholds
 Computationally prohibitive!
Mining Association Rules
Example of Rules:
{Milk,Diaper}  {Beer} (s=0.4, c=0.67)
{Milk,Beer}  {Diaper} (s=0.4, c=1.0)
{Diaper,Beer}  {Milk} (s=0.4, c=0.67)
{Beer}  {Milk,Diaper} (s=0.4, c=0.67)
{Diaper}  {Milk,Beer} (s=0.4, c=0.5)
{Milk}  {Diaper,Beer} (s=0.4, c=0.5)
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
Mining Association Rules
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup
2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset
• Frequent itemset generation is still computationally
expensive
Frequent Itemset Generation
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given d items, there are
2d possible candidate
itemsets
Frequent Itemset Generation
• Brute-force approach:
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the database
– Match each transaction against every candidate
– Complexity ~ O(NMw) => Expensive since M = 2d !!!
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions List of
Candidates
M
w
Computational Complexity
• Given d unique items:
– Total number of itemsets = 2d
– Total number of possible association rules:
123 1
1
1 1












 












 
dd
d
k
kd
j
j
kd
k
d
R
If d=6, R = 602 rules
Frequent Itemset Generation Strategies
• Reduce the number of candidates (M)
– Complete search: M=2d
– Use pruning techniques to reduce M
• Reduce the number of transactions (N)
– Reduce size of N as the size of itemset increases
– Used by DHP and vertical-based mining algorithms
• Reduce the number of comparisons (NM)
– Use efficient data structures to store the candidates or
transactions
– No need to match every candidate against every
transaction
Monotonicity Property
• A measure s (support) is monotone (or upward closed) if
• which means that if X is a subset of Y, then s(X) must not exceed s(Y).
On the other hand, s is anti-monotone (or downward closed) if
• which means that if X is a subset of Y, then s(Y) must not exceed s(X).
• Any measure that posses an anti-monotone property can be
incorporated directly into the mining algorithm to effectively prune the
exponential search space of candidate itemsets.
11
)()()(:, YsXsYXYX 
)()()(:, YsXsYXYX 
Reducing Number of Candidates
• Apriori principle:
– If an itemset is frequent, then all of its subsets must also be
frequent
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of support
Illustrating Apriori Principle
13
Illustrating Apriori Principle
14
Illustrating Apriori Principle
Item Count
Bread 4
Coke 2
Milk 4
Beer 3
Diaper 4
Eggs 1
Itemset Count
{Bread,Milk} 3
{Bread,Beer} 2
{Bread,Diaper} 3
{Milk,Beer} 2
{Milk,Diaper} 3
{Beer,Diaper} 3
Itemset Count
{Bread,Milk,Diaper} 3
Items (1-itemsets)
Pairs (2-itemsets)
(No need to generate
candidates involving Coke
or Eggs)
Triplets (3-itemsets)
Minimum Support = 3
If every subset is considered,
6C1 + 6C2 + 6C3 = 41
With support-based pruning,
6 + 6 + 1 = 13
Apriori Algorithm
• Method:
– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
• Generate length (k+1) candidate itemsets from length k
frequent itemsets
• Prune candidate itemsets containing subsets of length k that
are infrequent
• Count the support of each candidate by scanning the DB
• Eliminate candidates that are infrequent, leaving only those
that are frequent
Apriori Algorithm – Example 2
• Consider the following transaction database:
• min_sup=2
Apriori Algorithm – Example 2 – Iteration 1
• In the first iteration of the algorithm, each item is a
member of the set of candidate 1-itemsets, C1. The
algorithm simply scans all of the transactions to count
the number of occurrences of each item.
Apriori Algorithm – Example 2 – Iteration 2
• To discover the set of frequent 2-itemsets, L2, the
algorithm uses the join L1 L1 to generate a candidate
set of 2-itemsets, C2.
• C2 consists of
|𝐿1|
2
2-itemsets.
Apriori Algorithm – Example 2 – Iteration 3
• The generation of the set of the candidate 3-itemsets,
C3.
Apriori Algorithm – Example 2 – Iteration 4
• The algorithm uses L3 L3 to generate a candidate set
of 4-itemsets, C4.
• Although the join results in {{I1, I2, I3, I5}}, itemset {I1,
I2, I3, I5} is pruned because its subset {I2, I3, I5} is not
frequent.
• Thus, C4 = φ, and the algorithm terminates, having
found all of the frequent itemsets.
Reducing Number of Comparisons
• Candidate counting:
– Scan the database of transactions to determine the support
of each candidate itemset
– To reduce the number of comparisons, store the candidates
in a hash structure
• Instead of matching each transaction against every candidate,
match it against candidates contained in the hashed buckets
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions Hash Structure
k
Buckets
Generate Hash Tree
2 3 4
5 6 7
1 4 5
1 3 6
1 2 4
4 5 7 1 2 5
4 5 8
1 5 9
3 4 5 3 5 6
3 5 7
6 8 9
3 6 7
3 6 8
1,4,7
2,5,8
3,6,9
Hash function
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}
You need:
• Hash function
• Max leaf size: max number of itemsets stored in a leaf node (if number of candidate
itemsets exceeds max leaf size, split the node)
Association Rule Discovery: Hash tree
1 5 9
1 4 5 1 3 6
3 4 5 3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
1,4,7
2,5,8
3,6,9
Hash Function Candidate Hash Tree
Hash on
1, 4 or 7
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4
{1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3
{6 8 9}, {3 6 7}, {3 6 8}
Association Rule Discovery: Hash tree
1 5 9
1 4 5 1 3 6
3 4 5 3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
1,4,7
2,5,8
3,6,9
Hash Function Candidate Hash Tree
Hash on
2, 5 or 8
Association Rule Discovery: Hash tree
1 5 9
1 4 5 1 3 6
3 4 5 3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
1,4,7
2,5,8
3,6,9
Hash Function Candidate Hash Tree
Hash on
3, 6 or 9
Subset Operation
1 2 3 5 6
Transaction, t
2 3 5 61 3 5 62
5 61 33 5 61 2 61 5 5 62 3 62 5
5 63
1 2 3
1 2 5
1 2 6
1 3 5
1 3 6
1 5 6
2 3 5
2 3 6
2 5 6 3 5 6
Subsets of 3 items
Level 1
Level 2
Level 3
63 5
Given a transaction t, what are
the possible subsets of size 3?
Subset Operation Using Hash Tree
1 5 9
1 4 5 1 3 6
3 4 5 3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
1 2 3 5 6
1 + 2 3 5 6
3 5 62 +
5 63 +
1,4,7
2,5,8
3,6,9
Hash Functiontransaction
Subset Operation Using Hash Tree
1 5 9
1 4 5 1 3 6
3 4 5 3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
1,4,7
2,5,8
3,6,9
Hash Function
1 2 3 5 6
3 5 61 2 +
5 61 3 +
61 5 +
3 5 62 +
5 63 +
1 + 2 3 5 6
transaction
Subset Operation Using Hash Tree
1 5 9
1 4 5 1 3 6
3 4 5 3 6 7
3 6 8
3 5 6
3 5 7
6 8 9
2 3 4
5 6 7
1 2 4
4 5 7
1 2 5
4 5 8
1,4,7
2,5,8
3,6,9
Hash Function
1 2 3 5 6
3 5 61 2 +
5 61 3 +
61 5 +
3 5 62 +
5 63 +
1 + 2 3 5 6
transaction
Match transaction against 11 out of 15 candidates

More Related Content

What's hot

Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
tsering choezom
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Junghoon Kim
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
Mohit Rajput
 
Market baasket analysis
Market baasket analysisMarket baasket analysis
Market baasket analysis
SiddharthaPanapakam
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Text MIning
Text MIningText MIning
Text MIning
Prakhyath Rai
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Sandeep Prasad
 
Association rules
Association rulesAssociation rules
Association rules
Dr. C.V. Suresh Babu
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1
Krish_ver2
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
neelamoberoi1030
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
Deepa Jeya
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
maha797959
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
Knoldus Inc.
 
Apriori
AprioriApriori
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
Datamining Tools
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysis
TanmayeeMandala
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 

What's hot (20)

Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Market baasket analysis
Market baasket analysisMarket baasket analysis
Market baasket analysis
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Text MIning
Text MIningText MIning
Text MIning
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Association rules
Association rulesAssociation rules
Association rules
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
Apriori
AprioriApriori
Apriori
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data mining in market basket analysis
Data mining in market basket analysisData mining in market basket analysis
Data mining in market basket analysis
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 

Similar to Rules of data mining

Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
Sulman Ahmed
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
raju980973
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
Wan Aezwani Wab
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
deepti92pawar
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
Rashmi Bhat
 
chap6_basic_association_analysis.ppt
chap6_basic_association_analysis.pptchap6_basic_association_analysis.ppt
chap6_basic_association_analysis.ppt
PinkuBorah4
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
WailaBaba
 
Data Mining Lecture_3.pptx
Data Mining Lecture_3.pptxData Mining Lecture_3.pptx
Data Mining Lecture_3.pptx
Subrata Kumer Paul
 
Data Mining Lecture_4.pptx
Data Mining Lecture_4.pptxData Mining Lecture_4.pptx
Data Mining Lecture_4.pptx
Subrata Kumer Paul
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Subrata Kumer Paul
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
KomalBanik
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
KomalBanik
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
Rashi Agarwal
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
Alireza418370
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
PALLAB DAS
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
JoonyoungJayGwak
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
ssuser957b41
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
lahiruherath654
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 

Similar to Rules of data mining (20)

Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
 
chap6_basic_association_analysis.ppt
chap6_basic_association_analysis.pptchap6_basic_association_analysis.ppt
chap6_basic_association_analysis.ppt
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
Data Mining Lecture_3.pptx
Data Mining Lecture_3.pptxData Mining Lecture_3.pptx
Data Mining Lecture_3.pptx
 
Data Mining Lecture_4.pptx
Data Mining Lecture_4.pptxData Mining Lecture_4.pptx
Data Mining Lecture_4.pptx
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 

More from Sulman Ahmed

Entrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Strategy Generating and Exploiting new entriesEntrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Strategy Generating and Exploiting new entries
Sulman Ahmed
 
Entrepreneurial Intentions and corporate entrepreneurship
Entrepreneurial Intentions and corporate entrepreneurshipEntrepreneurial Intentions and corporate entrepreneurship
Entrepreneurial Intentions and corporate entrepreneurship
Sulman Ahmed
 
Entrepreneurship main concepts and description
Entrepreneurship main concepts and descriptionEntrepreneurship main concepts and description
Entrepreneurship main concepts and description
Sulman Ahmed
 
Run time Verification using formal methods
Run time Verification using formal methodsRun time Verification using formal methods
Run time Verification using formal methods
Sulman Ahmed
 
Use of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web ServicesUse of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web Services
Sulman Ahmed
 
student learning App
student learning Appstudent learning App
student learning App
Sulman Ahmed
 
Software Engineering Economics Life Cycle.
Software Engineering Economics  Life Cycle.Software Engineering Economics  Life Cycle.
Software Engineering Economics Life Cycle.
Sulman Ahmed
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining Techniques
Sulman Ahmed
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
Sulman Ahmed
 
Data mining Basics and complete description
Data mining Basics and complete description Data mining Basics and complete description
Data mining Basics and complete description
Sulman Ahmed
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
Sulman Ahmed
 
Dwh lecture-07-denormalization
Dwh lecture-07-denormalizationDwh lecture-07-denormalization
Dwh lecture-07-denormalization
Sulman Ahmed
 
Dwh lecture-06-normalization
Dwh lecture-06-normalizationDwh lecture-06-normalization
Dwh lecture-06-normalization
Sulman Ahmed
 
Dwh lecture 12-dm
Dwh lecture 12-dmDwh lecture 12-dm
Dwh lecture 12-dm
Sulman Ahmed
 
Dwh lecture 13-process dm
Dwh  lecture 13-process dmDwh  lecture 13-process dm
Dwh lecture 13-process dm
Sulman Ahmed
 
Dwh lecture 11-molap
Dwh  lecture 11-molapDwh  lecture 11-molap
Dwh lecture 11-molap
Sulman Ahmed
 
Dwh lecture 10-olap
Dwh   lecture 10-olapDwh   lecture 10-olap
Dwh lecture 10-olap
Sulman Ahmed
 
Dwh lecture 08-denormalization tech
Dwh   lecture 08-denormalization techDwh   lecture 08-denormalization tech
Dwh lecture 08-denormalization tech
Sulman Ahmed
 
Dwh lecture 07-denormalization
Dwh   lecture 07-denormalizationDwh   lecture 07-denormalization
Dwh lecture 07-denormalization
Sulman Ahmed
 
Wbs
WbsWbs

More from Sulman Ahmed (20)

Entrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Strategy Generating and Exploiting new entriesEntrepreneurial Strategy Generating and Exploiting new entries
Entrepreneurial Strategy Generating and Exploiting new entries
 
Entrepreneurial Intentions and corporate entrepreneurship
Entrepreneurial Intentions and corporate entrepreneurshipEntrepreneurial Intentions and corporate entrepreneurship
Entrepreneurial Intentions and corporate entrepreneurship
 
Entrepreneurship main concepts and description
Entrepreneurship main concepts and descriptionEntrepreneurship main concepts and description
Entrepreneurship main concepts and description
 
Run time Verification using formal methods
Run time Verification using formal methodsRun time Verification using formal methods
Run time Verification using formal methods
 
Use of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web ServicesUse of Formal Methods at Amazon Web Services
Use of Formal Methods at Amazon Web Services
 
student learning App
student learning Appstudent learning App
student learning App
 
Software Engineering Economics Life Cycle.
Software Engineering Economics  Life Cycle.Software Engineering Economics  Life Cycle.
Software Engineering Economics Life Cycle.
 
Data mining Techniques
Data mining TechniquesData mining Techniques
Data mining Techniques
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Data mining Basics and complete description
Data mining Basics and complete description Data mining Basics and complete description
Data mining Basics and complete description
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Dwh lecture-07-denormalization
Dwh lecture-07-denormalizationDwh lecture-07-denormalization
Dwh lecture-07-denormalization
 
Dwh lecture-06-normalization
Dwh lecture-06-normalizationDwh lecture-06-normalization
Dwh lecture-06-normalization
 
Dwh lecture 12-dm
Dwh lecture 12-dmDwh lecture 12-dm
Dwh lecture 12-dm
 
Dwh lecture 13-process dm
Dwh  lecture 13-process dmDwh  lecture 13-process dm
Dwh lecture 13-process dm
 
Dwh lecture 11-molap
Dwh  lecture 11-molapDwh  lecture 11-molap
Dwh lecture 11-molap
 
Dwh lecture 10-olap
Dwh   lecture 10-olapDwh   lecture 10-olap
Dwh lecture 10-olap
 
Dwh lecture 08-denormalization tech
Dwh   lecture 08-denormalization techDwh   lecture 08-denormalization tech
Dwh lecture 08-denormalization tech
 
Dwh lecture 07-denormalization
Dwh   lecture 07-denormalizationDwh   lecture 07-denormalization
Dwh lecture 07-denormalization
 
Wbs
WbsWbs
Wbs
 

Recently uploaded

Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 

Recently uploaded (20)

Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 

Rules of data mining

  • 1. Association Rule Mining • Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Association Rules {Diaper}  {Beer}, {Milk, Bread}  {Eggs,Coke}, {Beer, Bread}  {Milk},
  • 2. Definition: Frequent Itemset • Itemset – A collection of one or more items • Example: {Milk, Bread, Diaper} – k-itemset • An itemset that contains k items • Support count () – Frequency of occurrence of an itemset – E.g. ({Milk, Bread,Diaper}) = 2 • Support – Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5 • Frequent Itemset – An itemset whose support is greater than or equal to a minsup threshold TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  • 3. Definition: Association Rule Example: Beer}Diaper,Milk{  4.0 5 2 |T| )BeerDiaper,,Milk(   s 67.0 3 2 )Diaper,Milk( )BeerDiaper,Milk,(    c  Association Rule – An implication expression of the form X  Y, where X and Y are itemsets – Example: {Milk, Diaper}  {Beer}  Rule Evaluation Metrics – Support (s)  Fraction of transactions that contain both X and Y – Confidence (c)  Measures how often items in Y appear in transactions that contain X TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  • 4. Association Rule Mining Task • Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold – confidence ≥ minconf threshold • Brute-force approach: – List all possible association rules – Compute the support and confidence for each rule – Prune rules that fail the minsup and minconf thresholds  Computationally prohibitive!
  • 5. Mining Association Rules Example of Rules: {Milk,Diaper}  {Beer} (s=0.4, c=0.67) {Milk,Beer}  {Diaper} (s=0.4, c=1.0) {Diaper,Beer}  {Milk} (s=0.4, c=0.67) {Beer}  {Milk,Diaper} (s=0.4, c=0.67) {Diaper}  {Milk,Beer} (s=0.4, c=0.5) {Milk}  {Diaper,Beer} (s=0.4, c=0.5) TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Observations: • All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} • Rules originating from the same itemset have identical support but can have different confidence • Thus, we may decouple the support and confidence requirements
  • 6. Mining Association Rules • Two-step approach: 1. Frequent Itemset Generation – Generate all itemsets whose support  minsup 2. Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset • Frequent itemset generation is still computationally expensive
  • 7. Frequent Itemset Generation null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2d possible candidate itemsets
  • 8. Frequent Itemset Generation • Brute-force approach: – Each itemset in the lattice is a candidate frequent itemset – Count the support of each candidate by scanning the database – Match each transaction against every candidate – Complexity ~ O(NMw) => Expensive since M = 2d !!! TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke N Transactions List of Candidates M w
  • 9. Computational Complexity • Given d unique items: – Total number of itemsets = 2d – Total number of possible association rules: 123 1 1 1 1                             dd d k kd j j kd k d R If d=6, R = 602 rules
  • 10. Frequent Itemset Generation Strategies • Reduce the number of candidates (M) – Complete search: M=2d – Use pruning techniques to reduce M • Reduce the number of transactions (N) – Reduce size of N as the size of itemset increases – Used by DHP and vertical-based mining algorithms • Reduce the number of comparisons (NM) – Use efficient data structures to store the candidates or transactions – No need to match every candidate against every transaction
  • 11. Monotonicity Property • A measure s (support) is monotone (or upward closed) if • which means that if X is a subset of Y, then s(X) must not exceed s(Y). On the other hand, s is anti-monotone (or downward closed) if • which means that if X is a subset of Y, then s(Y) must not exceed s(X). • Any measure that posses an anti-monotone property can be incorporated directly into the mining algorithm to effectively prune the exponential search space of candidate itemsets. 11 )()()(:, YsXsYXYX  )()()(:, YsXsYXYX 
  • 12. Reducing Number of Candidates • Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent – Support of an itemset never exceeds the support of its subsets – This is known as the anti-monotone property of support
  • 15. Illustrating Apriori Principle Item Count Bread 4 Coke 2 Milk 4 Beer 3 Diaper 4 Eggs 1 Itemset Count {Bread,Milk} 3 {Bread,Beer} 2 {Bread,Diaper} 3 {Milk,Beer} 2 {Milk,Diaper} 3 {Beer,Diaper} 3 Itemset Count {Bread,Milk,Diaper} 3 Items (1-itemsets) Pairs (2-itemsets) (No need to generate candidates involving Coke or Eggs) Triplets (3-itemsets) Minimum Support = 3 If every subset is considered, 6C1 + 6C2 + 6C3 = 41 With support-based pruning, 6 + 6 + 1 = 13
  • 16. Apriori Algorithm • Method: – Let k=1 – Generate frequent itemsets of length 1 – Repeat until no new frequent itemsets are identified • Generate length (k+1) candidate itemsets from length k frequent itemsets • Prune candidate itemsets containing subsets of length k that are infrequent • Count the support of each candidate by scanning the DB • Eliminate candidates that are infrequent, leaving only those that are frequent
  • 17. Apriori Algorithm – Example 2 • Consider the following transaction database: • min_sup=2
  • 18. Apriori Algorithm – Example 2 – Iteration 1 • In the first iteration of the algorithm, each item is a member of the set of candidate 1-itemsets, C1. The algorithm simply scans all of the transactions to count the number of occurrences of each item.
  • 19. Apriori Algorithm – Example 2 – Iteration 2 • To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 L1 to generate a candidate set of 2-itemsets, C2. • C2 consists of |𝐿1| 2 2-itemsets.
  • 20. Apriori Algorithm – Example 2 – Iteration 3 • The generation of the set of the candidate 3-itemsets, C3.
  • 21. Apriori Algorithm – Example 2 – Iteration 4 • The algorithm uses L3 L3 to generate a candidate set of 4-itemsets, C4. • Although the join results in {{I1, I2, I3, I5}}, itemset {I1, I2, I3, I5} is pruned because its subset {I2, I3, I5} is not frequent. • Thus, C4 = φ, and the algorithm terminates, having found all of the frequent itemsets.
  • 22. Reducing Number of Comparisons • Candidate counting: – Scan the database of transactions to determine the support of each candidate itemset – To reduce the number of comparisons, store the candidates in a hash structure • Instead of matching each transaction against every candidate, match it against candidates contained in the hashed buckets TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke N Transactions Hash Structure k Buckets
  • 23. Generate Hash Tree 2 3 4 5 6 7 1 4 5 1 3 6 1 2 4 4 5 7 1 2 5 4 5 8 1 5 9 3 4 5 3 5 6 3 5 7 6 8 9 3 6 7 3 6 8 1,4,7 2,5,8 3,6,9 Hash function Suppose you have 15 candidate itemsets of length 3: {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8} You need: • Hash function • Max leaf size: max number of itemsets stored in a leaf node (if number of candidate itemsets exceeds max leaf size, split the node)
  • 24. Association Rule Discovery: Hash tree 1 5 9 1 4 5 1 3 6 3 4 5 3 6 7 3 6 8 3 5 6 3 5 7 6 8 9 2 3 4 5 6 7 1 2 4 4 5 7 1 2 5 4 5 8 1,4,7 2,5,8 3,6,9 Hash Function Candidate Hash Tree Hash on 1, 4 or 7 {1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 {6 8 9}, {3 6 7}, {3 6 8}
  • 25. Association Rule Discovery: Hash tree 1 5 9 1 4 5 1 3 6 3 4 5 3 6 7 3 6 8 3 5 6 3 5 7 6 8 9 2 3 4 5 6 7 1 2 4 4 5 7 1 2 5 4 5 8 1,4,7 2,5,8 3,6,9 Hash Function Candidate Hash Tree Hash on 2, 5 or 8
  • 26. Association Rule Discovery: Hash tree 1 5 9 1 4 5 1 3 6 3 4 5 3 6 7 3 6 8 3 5 6 3 5 7 6 8 9 2 3 4 5 6 7 1 2 4 4 5 7 1 2 5 4 5 8 1,4,7 2,5,8 3,6,9 Hash Function Candidate Hash Tree Hash on 3, 6 or 9
  • 27. Subset Operation 1 2 3 5 6 Transaction, t 2 3 5 61 3 5 62 5 61 33 5 61 2 61 5 5 62 3 62 5 5 63 1 2 3 1 2 5 1 2 6 1 3 5 1 3 6 1 5 6 2 3 5 2 3 6 2 5 6 3 5 6 Subsets of 3 items Level 1 Level 2 Level 3 63 5 Given a transaction t, what are the possible subsets of size 3?
  • 28. Subset Operation Using Hash Tree 1 5 9 1 4 5 1 3 6 3 4 5 3 6 7 3 6 8 3 5 6 3 5 7 6 8 9 2 3 4 5 6 7 1 2 4 4 5 7 1 2 5 4 5 8 1 2 3 5 6 1 + 2 3 5 6 3 5 62 + 5 63 + 1,4,7 2,5,8 3,6,9 Hash Functiontransaction
  • 29. Subset Operation Using Hash Tree 1 5 9 1 4 5 1 3 6 3 4 5 3 6 7 3 6 8 3 5 6 3 5 7 6 8 9 2 3 4 5 6 7 1 2 4 4 5 7 1 2 5 4 5 8 1,4,7 2,5,8 3,6,9 Hash Function 1 2 3 5 6 3 5 61 2 + 5 61 3 + 61 5 + 3 5 62 + 5 63 + 1 + 2 3 5 6 transaction
  • 30. Subset Operation Using Hash Tree 1 5 9 1 4 5 1 3 6 3 4 5 3 6 7 3 6 8 3 5 6 3 5 7 6 8 9 2 3 4 5 6 7 1 2 4 4 5 7 1 2 5 4 5 8 1,4,7 2,5,8 3,6,9 Hash Function 1 2 3 5 6 3 5 61 2 + 5 61 3 + 61 5 + 3 5 62 + 5 63 + 1 + 2 3 5 6 transaction Match transaction against 11 out of 15 candidates