SlideShare a Scribd company logo
1 of 17
SEG4630 2009-2010
Tutorial 2 – Frequent Pattern
Mining
2
Frequent Patterns
 Frequent pattern: a pattern (a set of items,
subsequences, substructures, etc.) that occurs
frequently in a data set
 itemset: A set of one or more items
 k-itemset: X = {x1, …, xk}
 Mining algorithms
 Apriori
 FP-growth
Tid Items bought
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Beer
3
Support & Confidence
 Support
 (absolute) support, or, support count of X: Frequency or
occurrence of an itemset X
 (relative) support, s, is the fraction of transactions that
contains X (i.e., the probability that a transaction contains X)
 An itemset X is frequent if X’s support is no less than a minsup
threshold
 Confidence (association rule: XY )
 sup(XY)/sup(x) (conditional prob.: Pr(Y|X) = Pr(X^Y)/Pr(X) )
 confidence, c, conditional probability that a transaction
having X also contains Y
 Find all the rules XY with minimum support and confidence
 sup(XY) ≥ minsup
 sup(XY)/sup(X) ≥ minconf
4
Apriori Principle
 If an itemset is frequent, then all of its subsets must also be
frequent
 If an itemset is infrequent, then all of its supersets must be
infrequent too
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
frequent
frequent infrequent
infrequent
(X  Y)
(¬Y  ¬X)
5
Apriori: A Candidate Generation & Test
Approach
 Initially, scan DB once to get frequent 1-
itemset
 Loop
 Generate length (k+1) candidate
itemsets from length k frequent
itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set
can be generated
6
Generate candidate itemsets
Example
Frequent 3-itemsets:
{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4},
{1, 3, 5}, {2, 3, 4}, {2, 3, 5} and {3, 4, 5}
 Candidate 4-itemset:
{1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 4, 5}, {1, 3,
4, 5}, {2, 3, 4, 5}
 Which need not to be counted?
{1, 2, 4, 5} & {1, 3, 4, 5} & {2, 3, 4, 5}
7
Maximal vs Closed Frequent Itemsets
 An itemset X is a max-pattern if X is frequent and
there exists no frequent super-pattern Y ‫כ‬ X
 An itemset X is closed if X is frequent and there
exists no super-pattern Y ‫כ‬ X, with the same
support as X
Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets
Closed Frequent Itemsets are Lossless:
the support for any frequent itemset
can be deduced from the closed
frequent itemsets
8
Maximal vs Closed Frequent Itemsets
# Closed = 9
# Maximal = 4
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
124 123 1234 245 345
12 124 24 4 123 2 3 24 34 45
12 2 24 4 4 2 3 4
2 4
Closed and
maximal
frequent
Closed but
not maximal
minsup=2
9
Algorithms to find frequent pattern
 Apriori: uses a generate-and-test approach –
generates candidate itemsets and tests if they
are frequent
 Generation of candidate itemsets is expensive (in both
space and time)
 Support counting is expensive
 Subset checking (computationally expensive)
 Multiple Database scans (I/O)
 FP-Growth: allows frequent itemset discovery
without candidate generation. Two step:
 1.Build a compact data structure called the FP-tree
 2 passes over the database
 2.extracts frequent itemsets directly from the FP-tree
 Traverse through FP-tree
10
Pattern-Growth Approach: Mining Frequent
Patterns Without Candidate Generation
 The FP-Growth Approach
 Depth-first search (Apriori: Breadth-first search)
 Avoid explicit candidate generation
Fp-tree construatioin:
• Scan DB once, find frequent
1-itemset (single item
pattern)
• Sort frequent items in
frequency descending order,
f-list
• Scan DB again, construct FP-
tree
FP-Growth approach:
• For each frequent item, construct its
conditional pattern-base, and then
its conditional FP-tree
• Repeat the process on each newly
created conditional FP-tree
• Until the resulting FP-tree is empty,
or it contains only one path—single
path will generate all the
combinations of its sub-paths, each
of which is a frequent pattern
11
FP-tree Size
 The size of an FPtree is typically smaller than the
size of the uncompressed data because many
transactions often share a few items in common
 Bestcase scenario: All transactions have the same
set of items, and the FPtree contains only a single
branch of nodes.
 Worstcase scenario: Every transaction has a unique
set of items. As none of the transactions have any
items in common, the size of the FPtree is
effectively the same as the size of the original
data.
 The size of an FPtree also depends on how the
items are ordered
12
Example
 FP-tree with item
descending ordering
 FP-tree with item ascending
ordering
13
Find Patterns Having p From P-conditional
Database
 Starting at the frequent item header table in the FP-tree
 Traverse the FP-tree by following the link of each
frequent item p
 Accumulate all of transformed prefix paths of item p to
form p’s conditional pattern base
Conditional pattern bases
item cond. pattern base
c f:3
a fc:3
b fca:1, f:1, c:1
m fca:2, fcab:1
p fcam:2, cb:1
{}
f:4 c:1
b:1
p:1
b:1
c:3
a:3
b:1
m:2
p:2 m:1
Header Table
Item frequency head
f 4
c 4
a 3
b 3
m 3
p 3
14
f, c, a, m, p
5
c, b, p
4
f, b
3
f, c, a, b, m
2
f, c, a, m, p
1
f, c, a, m, p
5
c, b, p
4
f, b
3
f, c, a, b, m
2
f, c, a, m, p
1
f, c, a
5
c, b
4
f, b
3
f, c, a, b
2
f, c, a
1
f, c, a
5
c, b
4
f, b
3
f, c, a, b
2
f, c, a
1
f, c, a, m
5
c, b
4
f, c, a, m
1
f, c, a, m
5
c, b
4
f, c, a, m
1
f, c, a
5
f, c, a, b
2
f, c, a
1
f, c, a
5
f, c, a, b
2
f, c, a
1
f, c, a, m
5
c, b
4
f, b
3
f, c, a, b, m
2
f, c, a, m
1
f, c, a, m
5
c, b
4
f, b
3
f, c, a, b, m
2
f, c, a, m
1
c
4
f
3
f, c, a
2
c
4
f
3
f, c, a
2
f, c, a
5
c
4
f
3
f, c, a
2
f, c, a
1
f, c, a
5
c
4
f
3
f, c, a
2
f, c, a
1 f, c
5
f, c
2
f, c
1
f, c
5
f, c
2
f, c
1
f, c
5
c
4
f
3
f, c
2
f, c
1
f, c
5
c
4
f
3
f, c
2
f, c
1
+ p
+ m
+ b
+ a
FP-Growth
15
f, c, a, m, p
5
c, b, p
4
f, b
3
f, c, a, b, m
2
f, c, a, m, p
1
f, c, a, m, p
5
c, b, p
4
f, b
3
f, c, a, b, m
2
f, c, a, m, p
1
f, c, a, m
5
c, b
4
f, c, a, m
1
f, c, a, m
5
c, b
4
f, c, a, m
1
+ p
f, c, a
5
f, c, a, b
2
f, c, a
1
f, c, a
5
f, c, a, b
2
f, c, a
1
+ m
c
4
f
3
f, c, a
2
c
4
f
3
f, c, a
2
+ b
f, c
5
f, c
2
f, c
1
f, c
5
f, c
2
f, c
1
+ a
f: 1,2,3,5
(1) (2)
(3) (4)
(5)
(6)
+ c
f
5
4
f
2
f
1
f
5
4
f
2
f
1
FP-Growth
16
{}
f:4 c:1
b:1
p:1
b:1
c:3
a:3
b:1
m:2
p:2 m:1
{}
f:2 c:1
b:1
p:1
c:2
a:2
m:2
{}
f:3
c:3
a:3
b:1
{}
f:2 c:1
c:1
a:1
{}
f:3
c:3
{}
f:3
+
p
+
m
+
b
+
a
+
c
f:4
(1) (2)
(3) (4) (5) (6)
17
f, c, a, m, p
5
c, b, p
4
f, b
3
f, c, a, b, m
2
f, c, a, m, p
1
f, c, a, m, p
5
c, b, p
4
f, b
3
f, c, a, b, m
2
f, c, a, m, p
1
f, c, a, m
5
c, b
4
f, c, a, m
1
f, c, a, m
5
c, b
4
f, c, a, m
1
+ p
f, c, a
5
f, c, a, b
2
f, c, a
1
f, c, a
5
f, c, a, b
2
f, c, a
1
+ m
c
4
f
3
f, c, a
2
c
4
f
3
f, c, a
2
+ b
f, c
5
f, c
2
f, c
1
f, c
5
f, c
2
f, c
1
+ a
f: 1,2,3,5
+ p
c
5
c
4
c
1
c
5
c
4
c
1
p: 3
cp: 3
f, c, a
5
f, c, a
2
f, c, a
1
f, c, a
5
f, c, a
2
f, c, a
1
+ m
m: 3
fm: 3
cm: 3
am: 3
fcm: 3
fam: 3
cam: 3
fcam: 3
b: 3
f: 4
a: 3
fa: 3
ca: 3
fca: 3
c: 4
fc: 3
+ c
f
5
4
f
2
f
1
f
5
4
f
2
f
1
min_sup = 3

More Related Content

Similar to FP growth algorithm, data mining, data analystics

UNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptUNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptRaviKiranVarma4
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationKnoldus Inc.
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatternsKamal Singh Lodhi
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptraju980973
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmdeepti92pawar
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptNBACriteria2SICET
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmShivarkarSandip
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithmPradip Kumar
 

Similar to FP growth algorithm, data mining, data analystics (20)

UNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptUNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
7 algorithm
7 algorithm7 algorithm
7 algorithm
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Lecture20
Lecture20Lecture20
Lecture20
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
 
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth AlgorithmFrequent Pattern Analysis, Apriori and FP Growth Algorithm
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
DMML4_apriori (1).ppt
DMML4_apriori (1).pptDMML4_apriori (1).ppt
DMML4_apriori (1).ppt
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 

Recently uploaded

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

FP growth algorithm, data mining, data analystics

  • 1. SEG4630 2009-2010 Tutorial 2 – Frequent Pattern Mining
  • 2. 2 Frequent Patterns  Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set  itemset: A set of one or more items  k-itemset: X = {x1, …, xk}  Mining algorithms  Apriori  FP-growth Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Beer
  • 3. 3 Support & Confidence  Support  (absolute) support, or, support count of X: Frequency or occurrence of an itemset X  (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X)  An itemset X is frequent if X’s support is no less than a minsup threshold  Confidence (association rule: XY )  sup(XY)/sup(x) (conditional prob.: Pr(Y|X) = Pr(X^Y)/Pr(X) )  confidence, c, conditional probability that a transaction having X also contains Y  Find all the rules XY with minimum support and confidence  sup(XY) ≥ minsup  sup(XY)/sup(X) ≥ minconf
  • 4. 4 Apriori Principle  If an itemset is frequent, then all of its subsets must also be frequent  If an itemset is infrequent, then all of its supersets must be infrequent too null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE frequent frequent infrequent infrequent (X  Y) (¬Y  ¬X)
  • 5. 5 Apriori: A Candidate Generation & Test Approach  Initially, scan DB once to get frequent 1- itemset  Loop  Generate length (k+1) candidate itemsets from length k frequent itemsets  Test the candidates against DB  Terminate when no frequent or candidate set can be generated
  • 6. 6 Generate candidate itemsets Example Frequent 3-itemsets: {1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}, {2, 3, 5} and {3, 4, 5}  Candidate 4-itemset: {1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 4, 5}, {1, 3, 4, 5}, {2, 3, 4, 5}  Which need not to be counted? {1, 2, 4, 5} & {1, 3, 4, 5} & {2, 3, 4, 5}
  • 7. 7 Maximal vs Closed Frequent Itemsets  An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern Y ‫כ‬ X  An itemset X is closed if X is frequent and there exists no super-pattern Y ‫כ‬ X, with the same support as X Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets Closed Frequent Itemsets are Lossless: the support for any frequent itemset can be deduced from the closed frequent itemsets
  • 8. 8 Maximal vs Closed Frequent Itemsets # Closed = 9 # Maximal = 4 null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE 124 123 1234 245 345 12 124 24 4 123 2 3 24 34 45 12 2 24 4 4 2 3 4 2 4 Closed and maximal frequent Closed but not maximal minsup=2
  • 9. 9 Algorithms to find frequent pattern  Apriori: uses a generate-and-test approach – generates candidate itemsets and tests if they are frequent  Generation of candidate itemsets is expensive (in both space and time)  Support counting is expensive  Subset checking (computationally expensive)  Multiple Database scans (I/O)  FP-Growth: allows frequent itemset discovery without candidate generation. Two step:  1.Build a compact data structure called the FP-tree  2 passes over the database  2.extracts frequent itemsets directly from the FP-tree  Traverse through FP-tree
  • 10. 10 Pattern-Growth Approach: Mining Frequent Patterns Without Candidate Generation  The FP-Growth Approach  Depth-first search (Apriori: Breadth-first search)  Avoid explicit candidate generation Fp-tree construatioin: • Scan DB once, find frequent 1-itemset (single item pattern) • Sort frequent items in frequency descending order, f-list • Scan DB again, construct FP- tree FP-Growth approach: • For each frequent item, construct its conditional pattern-base, and then its conditional FP-tree • Repeat the process on each newly created conditional FP-tree • Until the resulting FP-tree is empty, or it contains only one path—single path will generate all the combinations of its sub-paths, each of which is a frequent pattern
  • 11. 11 FP-tree Size  The size of an FPtree is typically smaller than the size of the uncompressed data because many transactions often share a few items in common  Bestcase scenario: All transactions have the same set of items, and the FPtree contains only a single branch of nodes.  Worstcase scenario: Every transaction has a unique set of items. As none of the transactions have any items in common, the size of the FPtree is effectively the same as the size of the original data.  The size of an FPtree also depends on how the items are ordered
  • 12. 12 Example  FP-tree with item descending ordering  FP-tree with item ascending ordering
  • 13. 13 Find Patterns Having p From P-conditional Database  Starting at the frequent item header table in the FP-tree  Traverse the FP-tree by following the link of each frequent item p  Accumulate all of transformed prefix paths of item p to form p’s conditional pattern base Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 {} f:4 c:1 b:1 p:1 b:1 c:3 a:3 b:1 m:2 p:2 m:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3
  • 14. 14 f, c, a, m, p 5 c, b, p 4 f, b 3 f, c, a, b, m 2 f, c, a, m, p 1 f, c, a, m, p 5 c, b, p 4 f, b 3 f, c, a, b, m 2 f, c, a, m, p 1 f, c, a 5 c, b 4 f, b 3 f, c, a, b 2 f, c, a 1 f, c, a 5 c, b 4 f, b 3 f, c, a, b 2 f, c, a 1 f, c, a, m 5 c, b 4 f, c, a, m 1 f, c, a, m 5 c, b 4 f, c, a, m 1 f, c, a 5 f, c, a, b 2 f, c, a 1 f, c, a 5 f, c, a, b 2 f, c, a 1 f, c, a, m 5 c, b 4 f, b 3 f, c, a, b, m 2 f, c, a, m 1 f, c, a, m 5 c, b 4 f, b 3 f, c, a, b, m 2 f, c, a, m 1 c 4 f 3 f, c, a 2 c 4 f 3 f, c, a 2 f, c, a 5 c 4 f 3 f, c, a 2 f, c, a 1 f, c, a 5 c 4 f 3 f, c, a 2 f, c, a 1 f, c 5 f, c 2 f, c 1 f, c 5 f, c 2 f, c 1 f, c 5 c 4 f 3 f, c 2 f, c 1 f, c 5 c 4 f 3 f, c 2 f, c 1 + p + m + b + a FP-Growth
  • 15. 15 f, c, a, m, p 5 c, b, p 4 f, b 3 f, c, a, b, m 2 f, c, a, m, p 1 f, c, a, m, p 5 c, b, p 4 f, b 3 f, c, a, b, m 2 f, c, a, m, p 1 f, c, a, m 5 c, b 4 f, c, a, m 1 f, c, a, m 5 c, b 4 f, c, a, m 1 + p f, c, a 5 f, c, a, b 2 f, c, a 1 f, c, a 5 f, c, a, b 2 f, c, a 1 + m c 4 f 3 f, c, a 2 c 4 f 3 f, c, a 2 + b f, c 5 f, c 2 f, c 1 f, c 5 f, c 2 f, c 1 + a f: 1,2,3,5 (1) (2) (3) (4) (5) (6) + c f 5 4 f 2 f 1 f 5 4 f 2 f 1 FP-Growth
  • 16. 16 {} f:4 c:1 b:1 p:1 b:1 c:3 a:3 b:1 m:2 p:2 m:1 {} f:2 c:1 b:1 p:1 c:2 a:2 m:2 {} f:3 c:3 a:3 b:1 {} f:2 c:1 c:1 a:1 {} f:3 c:3 {} f:3 + p + m + b + a + c f:4 (1) (2) (3) (4) (5) (6)
  • 17. 17 f, c, a, m, p 5 c, b, p 4 f, b 3 f, c, a, b, m 2 f, c, a, m, p 1 f, c, a, m, p 5 c, b, p 4 f, b 3 f, c, a, b, m 2 f, c, a, m, p 1 f, c, a, m 5 c, b 4 f, c, a, m 1 f, c, a, m 5 c, b 4 f, c, a, m 1 + p f, c, a 5 f, c, a, b 2 f, c, a 1 f, c, a 5 f, c, a, b 2 f, c, a 1 + m c 4 f 3 f, c, a 2 c 4 f 3 f, c, a 2 + b f, c 5 f, c 2 f, c 1 f, c 5 f, c 2 f, c 1 + a f: 1,2,3,5 + p c 5 c 4 c 1 c 5 c 4 c 1 p: 3 cp: 3 f, c, a 5 f, c, a 2 f, c, a 1 f, c, a 5 f, c, a 2 f, c, a 1 + m m: 3 fm: 3 cm: 3 am: 3 fcm: 3 fam: 3 cam: 3 fcam: 3 b: 3 f: 4 a: 3 fa: 3 ca: 3 fca: 3 c: 4 fc: 3 + c f 5 4 f 2 f 1 f 5 4 f 2 f 1 min_sup = 3