SlideShare a Scribd company logo
1 of 17
FP-Growth algorithm
Vasiljevic Vladica,
vv113314m@student.etf.rs
Introduction
 Apriori: uses a generate-and-test approach – generates
candidate itemsets and tests if they are frequent
– Generation of candidate itemsets is expensive(in both
space and time)
– Support counting is expensive
• Subset checking (computationally expensive)
• Multiple Database scans (I/O)
 FP-Growth: allows frequent itemset discovery without
candidate itemset generation. Two step approach:
– Step 1: Build a compact data structure called the FP-tree
• Built using 2 passes over the data-set.
– Step 2: Extracts frequent itemsets directly from the FP-
tree
Step 1: FP-Tree Construction
FP-Tree is constructed using 2 passes over the
data-set:
Pass 1:
– Scan data and find support for each item.
– Discard infrequent items.
– Sort frequent items in decreasing order based on
their support.
Use this order when building the FP-Tree, so
common prefixes can be shared.
Step 1: FP-Tree Construction
Pass 2:
Nodes correspond to items and have a counter
1. FP-Growth reads 1 transaction at a time and maps it to a
path
2. Fixed order is used, so paths can overlap when transactions
share items (when they have the same prfix ).
– In this case, counters are incremented
1. Pointers are maintained between nodes containing the
same item, creating singly linked lists (dotted lines)
– The more paths that overlap, the higher the compression. FP-
tree may fit in memory.
1. Frequent itemsets extracted from the FP-Tree.
Step 1: FP-Tree Construction
(Example)
FP-Tree size
 The FP-Tree usually has a smaller size than the uncompressed data -
typically many transactions share items (and hence prefixes).
– Best case scenario: all transactions contain the same set of items.
• 1 path in the FP-tree
– Worst case scenario: every transaction has a unique set of items (no
items in common)
• Size of the FP-tree is at least as large as the original data.
• Storage requirements for the FP-tree are higher - need to store the pointers
between the nodes and the counters.
 The size of the FP-tree depends on how the items are ordered
 Ordering by decreasing support is typically used but it does not
always lead to the smallest tree (it's a heuristic).
Step 2: Frequent Itemset Generation
FP-Growth extracts frequent itemsets from
the FP-tree.
Bottom-up algorithm - from the leaves
towards the root
Divide and conquer: first look for frequent
itemsets ending in e, then de, etc. . . then d,
then cd, etc. . .
First, extract prefix path sub-trees ending in
an item(set). (hint: use the linked lists)
Prefix path sub-trees (Example)
Step 2: Frequent Itemset Generation
Each prefix path sub-tree is
processed recursively to extract the
frequent itemsets. Solutions are
then merged.
– E.g. the prefix path sub-tree for e will
be used to extract frequent itemsets
ending in e, then in de, ce, be and ae,
then in cde, bde, cde, etc.
– Divide and conquer approach
Conditional FP-Tree
The FP-Tree that would be built if we only
consider transactions containing a particular
itemset (and then removing that itemset from
all transactions).
I Example: FP-Tree conditional on e.
Example
Let minSup = 2 and extract all frequent itemsets
containing e.
1. Obtain the prefix path sub-tree for e:
Example
2. Check if e is a frequent item by adding the
counts along the linked list (dotted line). If so,
extract it.
– Yes, count =3 so {e} is extracted as a frequent
itemset.
3. As e is frequent, find frequent itemsets
ending in e. i.e. de, ce, be and ae.
Example
4. Use the the conditional FP-tree for e to find
frequent itemsets ending in de, ce and ae
– Note that be is not considered as b is not in the
conditional FP-tree for e.
• I For each of them (e.g. de), find the prefix
paths from the conditional tree for e, extract
frequent itemsets, generate conditional FP-
tree, etc... (recursive)
Example
• Example: e -> de -> ade ({d,e}, {a,d,e} are
found to be frequent)
•Example: e -> ce ({c,e} is found to be frequent)
Result
Frequent itemsets found (ordered by sufix and
order in which they are found):
Discusion
Advantages of FP-Growth
– only 2 passes over data-set
– “compresses” data-set
– no candidate generation
– much faster than Apriori
Disadvantages of FP-Growth
– FP-Tree may not fit in memory!!
– FP-Tree is expensive to build
References
• [1] Pang-Ning Tan, Michael Steinbach, Vipin
Kumar:Introduction to Data Mining, Addison-
Wesley
• www.wikipedia.org

More Related Content

What's hot

Frequent Pattern growth algorithm
Frequent Pattern growth algorithmFrequent Pattern growth algorithm
Frequent Pattern growth algorithmAshis Kumar Chanda
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithmhina firdaus
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodShani729
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningUtkarsh Sharma
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
File organization 1
File organization 1File organization 1
File organization 1Rupali Rana
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...Javed Khan
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashingJeet Poria
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 

What's hot (20)

Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Frequent Pattern growth algorithm
Frequent Pattern growth algorithmFrequent Pattern growth algorithm
Frequent Pattern growth algorithm
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Apriori
AprioriApriori
Apriori
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
File organization 1
File organization 1File organization 1
File organization 1
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 

Similar to Fp growth algorithm

Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatternsKamal Singh Lodhi
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...ijsrd.com
 
A Study of Various Projected Data Based Pattern Mining Algorithms
A Study of Various Projected Data Based Pattern Mining AlgorithmsA Study of Various Projected Data Based Pattern Mining Algorithms
A Study of Various Projected Data Based Pattern Mining Algorithmsijsrd.com
 
Associations.ppt
Associations.pptAssociations.ppt
Associations.pptQuyn590023
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptxRashi Agarwal
 
Associations1
Associations1Associations1
Associations1mancnilu
 
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...Dr. Amarjeet Singh
 
Paper id 42201608
Paper id 42201608Paper id 42201608
Paper id 42201608IJRAT
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac treeAlexander Decker
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac treeAlexander Decker
 
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining iosrjce
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit IIImalathieswaran29
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...eSAT Publishing House
 

Similar to Fp growth algorithm (20)

21 FP Tree
21 FP Tree21 FP Tree
21 FP Tree
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
6 module 4
6 module 46 module 4
6 module 4
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
22 FP Tree
22 FP Tree22 FP Tree
22 FP Tree
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
 
A Study of Various Projected Data Based Pattern Mining Algorithms
A Study of Various Projected Data Based Pattern Mining AlgorithmsA Study of Various Projected Data Based Pattern Mining Algorithms
A Study of Various Projected Data Based Pattern Mining Algorithms
 
Associations.ppt
Associations.pptAssociations.ppt
Associations.ppt
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
Associations1
Associations1Associations1
Associations1
 
Ej36829834
Ej36829834Ej36829834
Ej36829834
 
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
Mining Algorithm for Weighted FP-Growth Frequent Item Sets based on Ordered F...
 
Paper id 42201608
Paper id 42201608Paper id 42201608
Paper id 42201608
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree
 
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
 
G017633943
G017633943G017633943
G017633943
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...
 

Recently uploaded

Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Recently uploaded (20)

Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Fp growth algorithm

  • 2. Introduction  Apriori: uses a generate-and-test approach – generates candidate itemsets and tests if they are frequent – Generation of candidate itemsets is expensive(in both space and time) – Support counting is expensive • Subset checking (computationally expensive) • Multiple Database scans (I/O)  FP-Growth: allows frequent itemset discovery without candidate itemset generation. Two step approach: – Step 1: Build a compact data structure called the FP-tree • Built using 2 passes over the data-set. – Step 2: Extracts frequent itemsets directly from the FP- tree
  • 3. Step 1: FP-Tree Construction FP-Tree is constructed using 2 passes over the data-set: Pass 1: – Scan data and find support for each item. – Discard infrequent items. – Sort frequent items in decreasing order based on their support. Use this order when building the FP-Tree, so common prefixes can be shared.
  • 4. Step 1: FP-Tree Construction Pass 2: Nodes correspond to items and have a counter 1. FP-Growth reads 1 transaction at a time and maps it to a path 2. Fixed order is used, so paths can overlap when transactions share items (when they have the same prfix ). – In this case, counters are incremented 1. Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines) – The more paths that overlap, the higher the compression. FP- tree may fit in memory. 1. Frequent itemsets extracted from the FP-Tree.
  • 5. Step 1: FP-Tree Construction (Example)
  • 6. FP-Tree size  The FP-Tree usually has a smaller size than the uncompressed data - typically many transactions share items (and hence prefixes). – Best case scenario: all transactions contain the same set of items. • 1 path in the FP-tree – Worst case scenario: every transaction has a unique set of items (no items in common) • Size of the FP-tree is at least as large as the original data. • Storage requirements for the FP-tree are higher - need to store the pointers between the nodes and the counters.  The size of the FP-tree depends on how the items are ordered  Ordering by decreasing support is typically used but it does not always lead to the smallest tree (it's a heuristic).
  • 7. Step 2: Frequent Itemset Generation FP-Growth extracts frequent itemsets from the FP-tree. Bottom-up algorithm - from the leaves towards the root Divide and conquer: first look for frequent itemsets ending in e, then de, etc. . . then d, then cd, etc. . . First, extract prefix path sub-trees ending in an item(set). (hint: use the linked lists)
  • 9. Step 2: Frequent Itemset Generation Each prefix path sub-tree is processed recursively to extract the frequent itemsets. Solutions are then merged. – E.g. the prefix path sub-tree for e will be used to extract frequent itemsets ending in e, then in de, ce, be and ae, then in cde, bde, cde, etc. – Divide and conquer approach
  • 10. Conditional FP-Tree The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions). I Example: FP-Tree conditional on e.
  • 11. Example Let minSup = 2 and extract all frequent itemsets containing e. 1. Obtain the prefix path sub-tree for e:
  • 12. Example 2. Check if e is a frequent item by adding the counts along the linked list (dotted line). If so, extract it. – Yes, count =3 so {e} is extracted as a frequent itemset. 3. As e is frequent, find frequent itemsets ending in e. i.e. de, ce, be and ae.
  • 13. Example 4. Use the the conditional FP-tree for e to find frequent itemsets ending in de, ce and ae – Note that be is not considered as b is not in the conditional FP-tree for e. • I For each of them (e.g. de), find the prefix paths from the conditional tree for e, extract frequent itemsets, generate conditional FP- tree, etc... (recursive)
  • 14. Example • Example: e -> de -> ade ({d,e}, {a,d,e} are found to be frequent) •Example: e -> ce ({c,e} is found to be frequent)
  • 15. Result Frequent itemsets found (ordered by sufix and order in which they are found):
  • 16. Discusion Advantages of FP-Growth – only 2 passes over data-set – “compresses” data-set – no candidate generation – much faster than Apriori Disadvantages of FP-Growth – FP-Tree may not fit in memory!! – FP-Tree is expensive to build
  • 17. References • [1] Pang-Ning Tan, Michael Steinbach, Vipin Kumar:Introduction to Data Mining, Addison- Wesley • www.wikipedia.org