SlideShare a Scribd company logo
1 of 31
Download to read offline
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
(An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune)
NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Computer Engineering
(NBA Accredited)
Prof. S. A. Shivarkar
Assistant Professor
Contact No.8275032712
Email- shivarkarsandipcomp@sanjivani.org.in
Subject- Data Mining and Warehousing (CO314)
Unit –V: Frequent Pattern Analysis
Content
 Market Basket Analysis, Frequent item set, closed item set & Association
Rules, mining multilevel association rules, constraint based association rule
mining
 Generating Association Rules from Frequent Item sets, Apriori Algorithm,
Improving the Efficiency of Apriori, FP Growth Algorithm
 Mining Various Kinds of Association Rules: Mining multilevel association
rules, constraint based association rule mining, Meta rule-Guided Mining of
Association Rules.
What Is Frequent Pattern Analysis?
 Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Beer and diapers?!
 What are the subsequent purchases after buying a PC?
 What kinds of DNA are sensitive to this new drug?
 Can we automatically classify web documents?
 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
Why is Frequent Pattern Analysis Important?
 Freq. pattern: An intrinsic and important property of
datasets
 Foundation for many essential data mining tasks
 Association, correlation, and causality analysis
 Sequential, structural (e.g., sub-graph) patterns
 Pattern analysis in spatiotemporal, multimedia, time-
series, and stream data
 Classification: discriminative, frequent pattern analysis
 Cluster analysis: frequent pattern-based clustering
 Data warehousing: iceberg cube and cube-gradient
 Semantic data compression: fascicles
 Broad applications
Basic Concepts: Frequent Patterns
Customer
buys diaper
Customer
buys both
Customer
buys beer
Tid Items bought
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Milk
 itemset: A set of one or more
items
 k-itemset X = {x1, …, xk}
 (absolute) support, or, support
count of X: Frequency or
occurrence of an itemset X
 (relative) support, s, is the
fraction of transactions that
contains X (i.e., the probability
that a transaction contains X)
 An itemset X is frequent if X’s
support is no less than a minsup
threshold
Basic Concepts: Association Rules
Customer
buys diaper
Customer
buys both
Customer
buys beer
Tid Items bought
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Milk
 Find all the rules X  Y with
minimum support and confidence
 support, s, probability that a
transaction contains X  Y
 confidence, c, conditional
probability that a transaction
having X also contains Y
Let minsup = 50%, minconf = 50%
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3,
{Beer, Diaper}:3
 Association rules: (many more!)
 Beer  Diaper (60%, 100%)
 Diaper  Beer (60%, 75%)
The Downward Closure Property and Scalable Mining Methods
 The downward closure property of frequent patterns
 Any subset of a frequent itemset must be frequent
 If {beer, diaper, nuts} is frequent, so is {beer, diaper}
i.e., every transaction having {beer, diaper, nuts} also contains
{beer, diaper}
 Scalable mining methods: Three major approaches
 Apriori (Agrawal & Srikant@VLDB’94)
 Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00)
 Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
Apriori: A Candidate Generation & Test Approach
 Apriori pruning principle: If there is any itemset which is infrequent, its
superset should not be generated/tested! (Agrawal & Srikant
@VLDB’94, Mannila, et al. @ KDD’ 94)
 Method:
 Initially, scan DB once to get frequent 1-itemset
 Generate length (k+1) candidate itemsets from length k frequent
itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set can be generated
The Apriori Algorithm—An Example 1
The Apriori Algorithm—An Example 2
The Apriori Algorithm—An Example 3
A database has five transactions. Let min sup D 60% and min conf D 80%.
(a) Find all frequent itemsets using Apriori and FP-growth, respectively. Compare the
efficiency of the two mining processes.
(b) List all the strong association rules with support s and confidence c
The Apriori Algorithm Pseudo-Code
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Implementation of Apriori
 How to generate candidates?
 Step 1: self-joining Lk
 Step 2: pruning
 Example of Candidate-generation
 L3={abc, abd, acd, ace, bcd}
 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
 C4 = {abcd}
Further Challenges and Improvement in Apriori
 Major computational challenges
 Multiple scans of transaction database
 Huge number of candidates
 Tedious workload of support counting for candidates
 Improving Apriori: general ideas
 Reduce passes of transaction database scans
 Shrink number of candidates
 Facilitate support counting of candidates
Improving the Efficiency of Apriori
 Hash-based technique
 Hashing itemsets into corresponding buckets
 A hash-based technique can be used to reduce the size of the candidate k-itemsets, Ck,
for k > 1
 Transaction reduction
 Reducing the number of transactions scanned in future iterations
 A transaction that does not contain any frequent k-itemsets cannot contain any frequent
(k C1)-itemsets.
 Therefore, such a transaction can be marked or removed from further consideration
because subsequent database scans for j-itemsets, where j > k, will not need to
consider such a transaction.
Improving the Efficiency of Apriori
 Partitioning
 Partitioning the data to find candidate itemsets
 Sampling
 Mining on a subset of the given data
 The basic idea of the sampling approach is to pick a random sample S of the given data
D, and then search for frequent itemsets in S instead of D. In this way, we trade off
some degree of accuracy against efficiency
 Dynamic item set counting
 Adding candidate itemsets at different points during a scan
Pattern-Growth Approach: Mining Frequent Patterns
Without Candidate Generation
 Bottlenecks of the Apriori approach
 Breadth-first (i.e., level-wise) search
 Candidate generation and test
 Often generates a huge number of candidates
 The FP Growth Approach (J. Han, J. Pei, and Y. Yin, SIGMOD’ 00)
 Depth-first search
 Avoid explicit candidate generation
 Major philosophy: Grow long patterns from short ones using local
frequent items only
 “abc” is a frequent pattern
 Get all transactions having “abc”, i.e., project DB on abc: DB|abc
 “d” is a local frequent item in DB|abc  abcd is a frequent pattern
Construct FP-tree from a Transaction Database Example 1
Construct FP-tree from a Transaction Database Example 1 cont…
Construct FP-tree from a Transaction Database Example 1 cont…
Construct FP-tree from a Transaction Database Example 1 cont…
Construct FP-tree from a Transaction Database Example 2
Mining Various Kinds of Association Rules: Mining
multilevel association rules
 Pattern Mining in Multilevel, Multidimensional Space
 Multilevel associations involve concepts at different abstraction levels.
 Multidimensional associations involve more than one dimension or
predicate (e.g., rules that relate what a customer buys to his or her
age).
 Quantitative association rules involve numeric attributes that have an
implicit ordering among values (e.g., age).
Mining Multilevel Associations
 For many applications, strong associations discovered at high abstraction
levels, though with high support
 May want to drill down to find novel patterns at more detailed levels.
 A concept hierarchy defines a sequence of mappings from a set of low-level concepts
to a higher-level, more general concept set
 Data can be generalized by replacing low-level concepts within the data by their
corresponding higher-level concepts, or ancestors, from a concept hierarchy.
 Association rules generated from mining data at multiple abstraction levels are called
multiple-level or multilevel association
 Multilevel association rules can be mined efficiently using concept hierarchies under a
support-confidence framework.
Mining Multilevel Associations
Mining Multidimensional Associations: Multilevel
mining with uniform support
Mining Multidimensional Associations: Multilevel
mining with reduced support.
Constraint-Based Frequent Pattern Mining
 Constraint based association rule mining aims to develop a systematic method
by which the user can find important association among items in a database of
transactions.
 The users specify intuition or expectations as constraints to confine the search
space.
 This strategy is known as constraint-based mining. The constraints can include
the following:
 Knowledge type constraints
 Data constraints
 Interestingness constraints
 Rule constraints
Constraint-Based Frequent Pattern Mining
 Knowledge type constraints: These specify the type of knowledge to be
mined, such as association, correlation, classification, or clustering.
 Data constraints: These specify the set of task-relevant data.
 Dimension/level constraints: These specify the desired dimensions (or
attributes) of the data, the abstraction levels, or the level of the concept
hierarchies to be used in mining.
 Interestingness constraints: These specify thresholds on statistical measures of
rule interestingness such as support, confidence, and correlation.
Constraint-Based Frequent Pattern Mining
 Rule constraints: These specify the form of, or conditions on, the rules to be
mined. Such constraints may be expressed as meta rules (rule templates), as
the maximum or minimum number of predicates that can occur in the rule
antecedent or consequent, or as relationships among attributes, attribute
values, and/or aggregates.
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 31
Reference
 Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and
Techniques”,Elsevier Publishers, ISBN:9780123814791, 9780123814807.
 https://onlinecourses.nptel.ac.in/noc24_cs22

More Related Content

Similar to Frequent Pattern Analysis, Apriori and FP Growth Algorithm

Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptUNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptRaviKiranVarma4
 
Associations.ppt
Associations.pptAssociations.ppt
Associations.pptQuyn590023
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns associationDeepaR42
 
Associations1
Associations1Associations1
Associations1mancnilu
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesHouw Liong The
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Futurefeiwin
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
Data Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association RuleData Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association Ruleijtsrd
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061badirh
 

Similar to Frequent Pattern Analysis, Apriori and FP Growth Algorithm (20)

Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptUNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
 
Associations.ppt
Associations.pptAssociations.ppt
Associations.ppt
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns association
 
Associations1
Associations1Associations1
Associations1
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
05
0505
05
 
My6asso
My6assoMy6asso
My6asso
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
6asso
6asso6asso
6asso
 
6 module 4
6 module 46 module 4
6 module 4
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
Data Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association RuleData Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association Rule
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
 

More from ShivarkarSandip

Cluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityCluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityShivarkarSandip
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...ShivarkarSandip
 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationShivarkarSandip
 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningShivarkarSandip
 
Introduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAPIntroduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAPShivarkarSandip
 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPShivarkarSandip
 
Issues in data mining Patterns Online Analytical Processing
Issues in data mining  Patterns Online Analytical ProcessingIssues in data mining  Patterns Online Analytical Processing
Issues in data mining Patterns Online Analytical ProcessingShivarkarSandip
 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basicsShivarkarSandip
 
Introduction to Data Communication.pdf
Introduction to Data Communication.pdfIntroduction to Data Communication.pdf
Introduction to Data Communication.pdfShivarkarSandip
 
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdfShivarkarSandip
 
Sequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfSequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfShivarkarSandip
 
Boolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfBoolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfShivarkarSandip
 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfShivarkarSandip
 
Unit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfUnit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfShivarkarSandip
 
Unit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfUnit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfShivarkarSandip
 
Unit I Operational data Informational data.pdf
Unit I Operational data  Informational data.pdfUnit I Operational data  Informational data.pdf
Unit I Operational data Informational data.pdfShivarkarSandip
 

More from ShivarkarSandip (20)

Cluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityCluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & Dissimilarity
 
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP Operation
 
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data Cleaning
 
Introduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAPIntroduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAP
 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAP
 
Issues in data mining Patterns Online Analytical Processing
Issues in data mining  Patterns Online Analytical ProcessingIssues in data mining  Patterns Online Analytical Processing
Issues in data mining Patterns Online Analytical Processing
 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basics
 
Introduction to Data Communication.pdf
Introduction to Data Communication.pdfIntroduction to Data Communication.pdf
Introduction to Data Communication.pdf
 
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdf
 
Sequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfSequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdf
 
Sequential Ckt.pdf
Sequential Ckt.pdfSequential Ckt.pdf
Sequential Ckt.pdf
 
Flip Flop.pdf
Flip Flop.pdfFlip Flop.pdf
Flip Flop.pdf
 
Combinational Ckt.pdf
Combinational Ckt.pdfCombinational Ckt.pdf
Combinational Ckt.pdf
 
Boolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfBoolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdf
 
Logic Minimization.pdf
Logic Minimization.pdfLogic Minimization.pdf
Logic Minimization.pdf
 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdf
 
Unit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfUnit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdf
 
Unit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfUnit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdf
 
Unit I Operational data Informational data.pdf
Unit I Operational data  Informational data.pdfUnit I Operational data  Informational data.pdf
Unit I Operational data Informational data.pdf
 

Recently uploaded

ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfrs7054576148
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 

Recently uploaded (20)

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 

Frequent Pattern Analysis, Apriori and FP Growth Algorithm

  • 1. Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423 603 (An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune) NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Computer Engineering (NBA Accredited) Prof. S. A. Shivarkar Assistant Professor Contact No.8275032712 Email- shivarkarsandipcomp@sanjivani.org.in Subject- Data Mining and Warehousing (CO314) Unit –V: Frequent Pattern Analysis
  • 2. Content  Market Basket Analysis, Frequent item set, closed item set & Association Rules, mining multilevel association rules, constraint based association rule mining  Generating Association Rules from Frequent Item sets, Apriori Algorithm, Improving the Efficiency of Apriori, FP Growth Algorithm  Mining Various Kinds of Association Rules: Mining multilevel association rules, constraint based association rule mining, Meta rule-Guided Mining of Association Rules.
  • 3. What Is Frequent Pattern Analysis?  Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set  First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of frequent itemsets and association rule mining  Motivation: Finding inherent regularities in data  What products were often purchased together?— Beer and diapers?!  What are the subsequent purchases after buying a PC?  What kinds of DNA are sensitive to this new drug?  Can we automatically classify web documents?  Applications  Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis.
  • 4. Why is Frequent Pattern Analysis Important?  Freq. pattern: An intrinsic and important property of datasets  Foundation for many essential data mining tasks  Association, correlation, and causality analysis  Sequential, structural (e.g., sub-graph) patterns  Pattern analysis in spatiotemporal, multimedia, time- series, and stream data  Classification: discriminative, frequent pattern analysis  Cluster analysis: frequent pattern-based clustering  Data warehousing: iceberg cube and cube-gradient  Semantic data compression: fascicles  Broad applications
  • 5. Basic Concepts: Frequent Patterns Customer buys diaper Customer buys both Customer buys beer Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk  itemset: A set of one or more items  k-itemset X = {x1, …, xk}  (absolute) support, or, support count of X: Frequency or occurrence of an itemset X  (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X)  An itemset X is frequent if X’s support is no less than a minsup threshold
  • 6. Basic Concepts: Association Rules Customer buys diaper Customer buys both Customer buys beer Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk  Find all the rules X  Y with minimum support and confidence  support, s, probability that a transaction contains X  Y  confidence, c, conditional probability that a transaction having X also contains Y Let minsup = 50%, minconf = 50% Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer, Diaper}:3  Association rules: (many more!)  Beer  Diaper (60%, 100%)  Diaper  Beer (60%, 75%)
  • 7. The Downward Closure Property and Scalable Mining Methods  The downward closure property of frequent patterns  Any subset of a frequent itemset must be frequent  If {beer, diaper, nuts} is frequent, so is {beer, diaper} i.e., every transaction having {beer, diaper, nuts} also contains {beer, diaper}  Scalable mining methods: Three major approaches  Apriori (Agrawal & Srikant@VLDB’94)  Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00)  Vertical data format approach (Charm—Zaki & Hsiao @SDM’02)
  • 8. Apriori: A Candidate Generation & Test Approach  Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)  Method:  Initially, scan DB once to get frequent 1-itemset  Generate length (k+1) candidate itemsets from length k frequent itemsets  Test the candidates against DB  Terminate when no frequent or candidate set can be generated
  • 11. The Apriori Algorithm—An Example 3 A database has five transactions. Let min sup D 60% and min conf D 80%. (a) Find all frequent itemsets using Apriori and FP-growth, respectively. Compare the efficiency of the two mining processes. (b) List all the strong association rules with support s and confidence c
  • 12. The Apriori Algorithm Pseudo-Code Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk;
  • 13. Implementation of Apriori  How to generate candidates?  Step 1: self-joining Lk  Step 2: pruning  Example of Candidate-generation  L3={abc, abd, acd, ace, bcd}  Self-joining: L3*L3  abcd from abc and abd  acde from acd and ace  Pruning:  acde is removed because ade is not in L3  C4 = {abcd}
  • 14. Further Challenges and Improvement in Apriori  Major computational challenges  Multiple scans of transaction database  Huge number of candidates  Tedious workload of support counting for candidates  Improving Apriori: general ideas  Reduce passes of transaction database scans  Shrink number of candidates  Facilitate support counting of candidates
  • 15. Improving the Efficiency of Apriori  Hash-based technique  Hashing itemsets into corresponding buckets  A hash-based technique can be used to reduce the size of the candidate k-itemsets, Ck, for k > 1  Transaction reduction  Reducing the number of transactions scanned in future iterations  A transaction that does not contain any frequent k-itemsets cannot contain any frequent (k C1)-itemsets.  Therefore, such a transaction can be marked or removed from further consideration because subsequent database scans for j-itemsets, where j > k, will not need to consider such a transaction.
  • 16. Improving the Efficiency of Apriori  Partitioning  Partitioning the data to find candidate itemsets  Sampling  Mining on a subset of the given data  The basic idea of the sampling approach is to pick a random sample S of the given data D, and then search for frequent itemsets in S instead of D. In this way, we trade off some degree of accuracy against efficiency  Dynamic item set counting  Adding candidate itemsets at different points during a scan
  • 17. Pattern-Growth Approach: Mining Frequent Patterns Without Candidate Generation  Bottlenecks of the Apriori approach  Breadth-first (i.e., level-wise) search  Candidate generation and test  Often generates a huge number of candidates  The FP Growth Approach (J. Han, J. Pei, and Y. Yin, SIGMOD’ 00)  Depth-first search  Avoid explicit candidate generation  Major philosophy: Grow long patterns from short ones using local frequent items only  “abc” is a frequent pattern  Get all transactions having “abc”, i.e., project DB on abc: DB|abc  “d” is a local frequent item in DB|abc  abcd is a frequent pattern
  • 18. Construct FP-tree from a Transaction Database Example 1
  • 19. Construct FP-tree from a Transaction Database Example 1 cont…
  • 20. Construct FP-tree from a Transaction Database Example 1 cont…
  • 21. Construct FP-tree from a Transaction Database Example 1 cont…
  • 22. Construct FP-tree from a Transaction Database Example 2
  • 23. Mining Various Kinds of Association Rules: Mining multilevel association rules  Pattern Mining in Multilevel, Multidimensional Space  Multilevel associations involve concepts at different abstraction levels.  Multidimensional associations involve more than one dimension or predicate (e.g., rules that relate what a customer buys to his or her age).  Quantitative association rules involve numeric attributes that have an implicit ordering among values (e.g., age).
  • 24. Mining Multilevel Associations  For many applications, strong associations discovered at high abstraction levels, though with high support  May want to drill down to find novel patterns at more detailed levels.  A concept hierarchy defines a sequence of mappings from a set of low-level concepts to a higher-level, more general concept set  Data can be generalized by replacing low-level concepts within the data by their corresponding higher-level concepts, or ancestors, from a concept hierarchy.  Association rules generated from mining data at multiple abstraction levels are called multiple-level or multilevel association  Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
  • 26. Mining Multidimensional Associations: Multilevel mining with uniform support
  • 27. Mining Multidimensional Associations: Multilevel mining with reduced support.
  • 28. Constraint-Based Frequent Pattern Mining  Constraint based association rule mining aims to develop a systematic method by which the user can find important association among items in a database of transactions.  The users specify intuition or expectations as constraints to confine the search space.  This strategy is known as constraint-based mining. The constraints can include the following:  Knowledge type constraints  Data constraints  Interestingness constraints  Rule constraints
  • 29. Constraint-Based Frequent Pattern Mining  Knowledge type constraints: These specify the type of knowledge to be mined, such as association, correlation, classification, or clustering.  Data constraints: These specify the set of task-relevant data.  Dimension/level constraints: These specify the desired dimensions (or attributes) of the data, the abstraction levels, or the level of the concept hierarchies to be used in mining.  Interestingness constraints: These specify thresholds on statistical measures of rule interestingness such as support, confidence, and correlation.
  • 30. Constraint-Based Frequent Pattern Mining  Rule constraints: These specify the form of, or conditions on, the rules to be mined. Such constraints may be expressed as meta rules (rule templates), as the maximum or minimum number of predicates that can occur in the rule antecedent or consequent, or as relationships among attributes, attribute values, and/or aggregates.
  • 31. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 31 Reference  Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and Techniques”,Elsevier Publishers, ISBN:9780123814791, 9780123814807.  https://onlinecourses.nptel.ac.in/noc24_cs22