SlideShare a Scribd company logo
1 of 16
Presentation On
Association Rule Mining
Under the Guidance of :
Prof: R.N.Yadwad
Presented By :
Poornima Raidurg
2SD10CS057
Data Mining
• Data Mining or knowledge discovery, is the
computer-assisted process of digging through and
analyzing enormous sets of data and then extracting
the meaning of the data.
Example :
• Market Basket Analysis - Understand what products or
services are commonly purchased together
Association Analysis
• It is the most important model invented and extensively
studied by databases and data mining community.
• Proposed by Agrawal Rakesh, Srikrishna.
• Association rules are used to discover patterns that describe
strongly associated features in the data.
• Application - Business field where discovering of purchase
patterns or association between products is very useful for
decision-making and effective marketing.
Association Rules
• Association rules are of the form X->Y where X and Y are
disjoint item sets.
• The strength of an association rule can be determined
in terms of Support and Confidence.
Notations
 Item set is a collection of zero or more
items.
 If an item set contains ‘k’ items then it is
k-item set.
Procedure
Two subtasks:
Step 1 - Frequent Itemset Generation :It finds all the
itemsets that satisfy user-defined minsup threshold.
Step 2 - Rule Generation : It extracts all the high
confidence rules from the frequent itemsets found in
Step 1. These rules are called strong rules.
Support and Confidence
• Support : It determines how frequent the rule is
applicable in the transaction set T.
 Let n be the number of transactions in T.
 Support = Support(XUY)/n
Ex : Consider the rule {Milk, Diapers} -> {Beer}
Support = 2/5 = 0.4
• Confidence : The confidence of a rule is the percentage of
transactions in T that contain X also contain Y.
Confidence = Support(XUY)/Support(X)
Confidence = 2/3 = 0.666
The Apriori Principle
• If an itemset is frequent, then all of its subsets must also
be frequent.
• Apriori is the first association rule mining algorithm that
pioneered the use of support-based pruning to
systematically control the exponential growth of candidate
itemsets.
Apriori Algorithm
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Frequent Itemset Generation
Rule Generation
The Apriori algorithm uses a level-wise approach for
generating association rules, where each level
corresponds to the number of items that belong to the
rule consequent.
Suppose {Bread,Diapers,Milk} is frequent, with sup=50%
Proper non-empty subsets are {Bread}, {Diapers}, {Milk},
{Bread, Diapers} ,{Bread, Milk}, {Diapers, Milk}
with sup=50%, 50%, 75%, 75%, 75%, 75% respectively.
The association rules generated are :
{Bread, Diapers} -> {Milk} Confidence = 66.666 %
{Diapers, Milk} -> {Bread} Confidence = 75%
{Bread, Milk} -> {Diapers} Confidence = 75%
{Bread} -> {Diapers, Milk} Confidence = 100%
{Milk} -> {Diapers, Bread} Confidence = 100%
{Diapers} -> {Bread, Milk} Confidence = 100%
Approaches for frequent itemset generation
BruteForce method :
Advantages :
• This method considers every k-itemset as a potential
candidate and then applies the candidate pruning step to
remove any unnecessary candidates.
Disadvantages :
• Candidate Pruning becomes extremely expensive because
a large number of itemsets must be examined.
Fk-1 X Fk-1 Itemset Generation
Advantages :
• This method merges a pair of frequent (k-1) itemsets
only if their first (k-2) are identical.
Disadvantages :
• This method requires an extra pruning step to ensure
that the remaining (k-2) subsets are frequent itemsets.
Fk-1 X F1 Itemset Generation
Advantages :
• This method takes frequent (k-1) itemsets and extends
them other frequent itemsets. For instance it takes 2-
frequent itemsets and combines them with frequent 1-
itemset.
Disadvantages :
• This method does not prevent the same candidate itemset
from being generated more than once.
Thank You

More Related Content

What's hot

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsPier Luca Lanzi
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningUtkarsh Sharma
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationKnoldus Inc.
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodShani729
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithmPradip Kumar
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
Dbms ii mca-ch10-concurrency-control-2013
Dbms ii mca-ch10-concurrency-control-2013Dbms ii mca-ch10-concurrency-control-2013
Dbms ii mca-ch10-concurrency-control-2013Prosanta Ghosh
 
ZIO-Direct - Functional Scala 2022
ZIO-Direct - Functional Scala 2022ZIO-Direct - Functional Scala 2022
ZIO-Direct - Functional Scala 2022Alexander Ioffe
 

What's hot (20)

Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules Basics
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Linked list
Linked listLinked list
Linked list
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Advanced DBMS presentation
Advanced DBMS presentationAdvanced DBMS presentation
Advanced DBMS presentation
 
Market baasket analysis
Market baasket analysisMarket baasket analysis
Market baasket analysis
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
B and B+ tree
B and B+ treeB and B+ tree
B and B+ tree
 
FP-growth.pptx
FP-growth.pptxFP-growth.pptx
FP-growth.pptx
 
Dbms ii mca-ch10-concurrency-control-2013
Dbms ii mca-ch10-concurrency-control-2013Dbms ii mca-ch10-concurrency-control-2013
Dbms ii mca-ch10-concurrency-control-2013
 
Advanced Database System
Advanced Database SystemAdvanced Database System
Advanced Database System
 
Arboles binarios
Arboles binariosArboles binarios
Arboles binarios
 
ZIO-Direct - Functional Scala 2022
ZIO-Direct - Functional Scala 2022ZIO-Direct - Functional Scala 2022
ZIO-Direct - Functional Scala 2022
 

Similar to Association 04.03.14

Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit IIImalathieswaran29
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptraju980973
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesRashmi Bhat
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
 
Lec6_Association.ppt
Lec6_Association.pptLec6_Association.ppt
Lec6_Association.pptprema370155
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptxAmenahAbbood
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdfWailaBaba
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creationcscpconf
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 

Similar to Association 04.03.14 (20)

Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
Lec6_Association.ppt
Lec6_Association.pptLec6_Association.ppt
Lec6_Association.ppt
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 

Recently uploaded

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Association 04.03.14

  • 1. Presentation On Association Rule Mining Under the Guidance of : Prof: R.N.Yadwad Presented By : Poornima Raidurg 2SD10CS057
  • 2. Data Mining • Data Mining or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Example : • Market Basket Analysis - Understand what products or services are commonly purchased together
  • 3. Association Analysis • It is the most important model invented and extensively studied by databases and data mining community. • Proposed by Agrawal Rakesh, Srikrishna. • Association rules are used to discover patterns that describe strongly associated features in the data. • Application - Business field where discovering of purchase patterns or association between products is very useful for decision-making and effective marketing.
  • 4. Association Rules • Association rules are of the form X->Y where X and Y are disjoint item sets. • The strength of an association rule can be determined in terms of Support and Confidence.
  • 5. Notations  Item set is a collection of zero or more items.  If an item set contains ‘k’ items then it is k-item set.
  • 6. Procedure Two subtasks: Step 1 - Frequent Itemset Generation :It finds all the itemsets that satisfy user-defined minsup threshold. Step 2 - Rule Generation : It extracts all the high confidence rules from the frequent itemsets found in Step 1. These rules are called strong rules.
  • 7. Support and Confidence • Support : It determines how frequent the rule is applicable in the transaction set T.  Let n be the number of transactions in T.  Support = Support(XUY)/n Ex : Consider the rule {Milk, Diapers} -> {Beer} Support = 2/5 = 0.4 • Confidence : The confidence of a rule is the percentage of transactions in T that contain X also contain Y. Confidence = Support(XUY)/Support(X) Confidence = 2/3 = 0.666
  • 8. The Apriori Principle • If an itemset is frequent, then all of its subsets must also be frequent. • Apriori is the first association rule mining algorithm that pioneered the use of support-based pruning to systematically control the exponential growth of candidate itemsets.
  • 9. Apriori Algorithm Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk;
  • 11. Rule Generation The Apriori algorithm uses a level-wise approach for generating association rules, where each level corresponds to the number of items that belong to the rule consequent.
  • 12. Suppose {Bread,Diapers,Milk} is frequent, with sup=50% Proper non-empty subsets are {Bread}, {Diapers}, {Milk}, {Bread, Diapers} ,{Bread, Milk}, {Diapers, Milk} with sup=50%, 50%, 75%, 75%, 75%, 75% respectively. The association rules generated are : {Bread, Diapers} -> {Milk} Confidence = 66.666 % {Diapers, Milk} -> {Bread} Confidence = 75% {Bread, Milk} -> {Diapers} Confidence = 75% {Bread} -> {Diapers, Milk} Confidence = 100% {Milk} -> {Diapers, Bread} Confidence = 100% {Diapers} -> {Bread, Milk} Confidence = 100%
  • 13. Approaches for frequent itemset generation BruteForce method : Advantages : • This method considers every k-itemset as a potential candidate and then applies the candidate pruning step to remove any unnecessary candidates. Disadvantages : • Candidate Pruning becomes extremely expensive because a large number of itemsets must be examined.
  • 14. Fk-1 X Fk-1 Itemset Generation Advantages : • This method merges a pair of frequent (k-1) itemsets only if their first (k-2) are identical. Disadvantages : • This method requires an extra pruning step to ensure that the remaining (k-2) subsets are frequent itemsets.
  • 15. Fk-1 X F1 Itemset Generation Advantages : • This method takes frequent (k-1) itemsets and extends them other frequent itemsets. For instance it takes 2- frequent itemsets and combines them with frequent 1- itemset. Disadvantages : • This method does not prevent the same candidate itemset from being generated more than once.