SlideShare a Scribd company logo
1 of 45
ASSOCIATION RULE MINING
AND APRIORI ALGORITHM
By,
AMBOOJ
ANAM IQBAL
HINA FIRDAUS
M.Tech CSE 3rd Semester
Guided By,
Dr. Harleen Kaur
 First proposed by Agrawal, Imielinski, and Swami frequent itemsets and
association rule mining
 Motivation: Finding inherent regularities in data.
 Pattern mining algorithms can be applied on various types of data such as :
 transaction databases
 sequence databases
 stream, graph etc.
 Pattern mining algorithms can be designed to discover various types of patterns:
 subgraphs,
 associations,
 indirect associations,
 trends,
 periodic patterns,
 sequential rules, lattices, sequential patterns, high-utility patterns, etc.
WHAT IS FREQUENT PATTERN ANALYSIS?
WHY IS FREQUENT PATTERN MINING IMPORTANT?
 Frequent pattern:
An intrinsic and important property of datasets.
 Foundation for many essential data mining tasks
 Association, correlation, and causality analysis
 Sequential, structural (e.g., sub-graph) patterns
 Pattern analysis in spatiotemporal, multimedia, time-series, and stream data
 Classification: discriminative, frequent pattern analysis
 Cluster analysis: frequent pattern-based clustering
 Data warehousing: iceberg cube and cube-gradient
 Semantic data compression: fascicles
 Broad applications
WHAT IS ASSOCIATION MINING?
 Proposed by Agrawal et al in 1993.
 Association rule mining
it is a procedure which is meant to find frequent patterns, correlations, associations, or causal
structures from data sets found in various kinds of databases such as relational databases,
transactional databases, and other forms of data repositories.
 It is an important data mining model studied extensively by the database and data
mining community.
APPLICATION OF ASSOCIATION
 Market Basket Analysis:
given a database of customer transactions, where each transaction is a set of items the goal is
to find groups of items which are frequently purchased together.
 Telecommunication
each customer is a transaction containing the set of phone calls
 Credit Cards/ Banking Services
each card/account is a transaction containing the set of customer’s payments
 Medical Treatments
each patient is represented as a transaction containing the ordered set of diseases
 Basketball-Game Analysis
each game is represented as a transaction containing the ordered set of ball passes
MARKET BASKET ANALYSIS
 INPUT: list of purchases by purchaser
 do not have names
 identify purchase patterns
 what items tend to be purchased together
 obvious: steak-potatoes; beer-pretzels
 what items are purchased sequentially
 obvious: house-furniture; car-tires
 what items tend to be purchased by season
CONTINUE…
 Categorize customer purchase behavior
 identify actionable information
 purchase profiles
 profitability of each purchase profile
 use for marketing
 layout or catalogs
 select products for promotion
 space allocation, product placement
 Market Basket Benefits
 selection of promotions, merchandising strategy
 sensitive to price: Italian entrees, pizza, pies, Oriental entrees, orange juice
 uncover consumer spending patterns
 correlations: orange juice & waffles
 joint promotional opportunities
POSSIBLE MARKET BASKETS
Customer 1: diapers, baby lotion, grapefruit juice, baby food, milk
Customer 2: soda, potato chips, milk
Customer 3: soup, beer, milk, ice cream
Customer 4: soda, coffee, milk, bread
Customer 5: beer, potato chips
CO-OCCURRENCE TABLE
LIMITATIONS
 takes over 18 months to implement
 market basket analysis only identifies hypotheses, which need to be tested
 neural network, regression, decision tree analyses
 measurement of impact needed
 difficult to identify product groupings
 complexity grows exponentially
BENEFITS
 simple computations
 can be undirected (don’t have to have hypotheses before
analysis)
 different data forms can be analyzed
ASSOCIATION RULES
 Wal-Mart customers who purchase Barbie dolls have a 60%
likelihood of also purchasing one of three types of candy bars
[Forbes, Sept 8, 1997]
 Customers who purchase maintenance agreements are very likely
to purchase large appliances (author experience)
 When a new hardware store opens, one of the most commonly sold
items is toilet bowl cleaners (author experience)
 So what…
WHAT IS ASSOCIATION RULE MINING?
 Association Analysis is used for discovering interesting relationships
hidden in large data sets.
 Proposed by Agrawal et al in 1993.
 It is an important data mining model studied extensively by the
database and data mining community.
 Assume all data are categorical.
 Initially used for Market Basket Analysis to find how items purchased
by customers are related.
 Finding frequent patterns, associations, correlations, or causal
structures among sets of items in transaction databases, relational
databases, and other information repositories.
 Association rules are if/then statements that help uncover
relationships between seemingly unrelated data in a relational
database or other information repository.
 Association rules are widely used in various areas such as
telecommunication networks, market and risk management,
inventory control etc.
 Programmers use association rules to build programs capable of
machine learning.
CONTINUED…
 The following rule can be extracted
from the data set shown in table 1:
 {Diapers}  {Beer}
 The rule suggests that a strong
relationship exists between the sale
of diapers and beer because many
customers who buy diapers also buy
beer.
 Retailers can use this type of rules to help them identify new
opportunities for cross-selling their products to the customers.
 Applications Basket data analysis, cross-marketing, catalog design,
loss-leader analysis, web log analysis, fraud detection
 An association rule has two parts, an antecedent (if) and a consequent (then).
 An antecedent is an item found in the data.
 A consequent is an item that is found in combination with the antecedent.
 Rule form: Antecedent → Consequence
 „Given:
 (1) database of transactions,
 (2) each transaction is a list of items purchased by a customer in a visit.
 Find: all rules that correlate the presence of one set of items ( itemset ) with that
of another set of items.
 „E.g., 98% of people who purchase tires and auto accessories also get
automotive services done
THE MODEL: RULES
 A transaction t contains X, a set of items (itemset) in I, if X  t.
 An association rule is an implication of the form:
X  Y, where X, Y  I, and X Y = 
 An itemset is a set of items.
 E.g., X = {milk, bread, cereal} is an itemset.
 A k-itemset is an itemset with k items.
 E.g., {milk, bread, cereal} is a 3-itemset
ASSOCIATION RULES
 Association rule types:
 Actionable Rules – contain high-quality, actionable information
 Trivial Rules – information already well-known by those familiar with
the business
 Inexplicable Rules – no explanation and do not suggest action
 Trivial and Inexplicable Rules occur most often
HOW GOOD IS AN ASSOCIATION RULE?
Customer Items Purchased
1 OJ, soda
2 Milk, OJ, window cleaner
3 OJ, detergent
4 OJ, detergent, soda
5 Window cleaner, soda
OJ Window
cleaner
Milk Soda Detergent
OJ 4 1 1 2 2
Window cleaner 1 2 1 1 0
Milk 1 1 1 0 0
Soda 2 1 0 3 1
Detergent 2 0 0 1 2
POS Transactions
Co-occurrence of
Products
HOW GOOD IS AN ASSOCIATION RULE?
OJ Window
cleaner
Milk Soda Detergent
OJ 4 1 1 2 2
Window cleaner 1 2 1 1 0
Milk 1 1 1 0 0
Soda 2 1 0 3 1
Detergent 2 0 0 1 2
Simple patterns:
1. OJ and soda are more likely purchased together than
any other two items
2. Detergent is never purchased with milk or window cleaner
3. Milk is never purchased with soda or detergent
HOW GOOD IS AN ASSOCIATION RULE?
 What is the confidence for this rule:
 If a customer purchases soda, then customer also purchases OJ
 2 out of 3 soda purchases also include OJ, so 67%
 What about the confidence of this rule reversed?
 2 out of 4 OJ purchases also include soda, so 50%
 Confidence = Ratio of the number of transactions with all the items to
the number of transactions with just the “if” items
Customer Items Purchased
1 OJ, soda
2 Milk, OJ, window cleaner
3 OJ, detergent
4 OJ, detergent, soda
5 Window cleaner, soda
POS Transactions
CREATING ASSOCIATION RULES
1. Choosing the right set of
items
2. Generating rules by
deciphering the counts in
the co-occurrence matrix
3. Overcoming the practical
limits imposed by
thousands or tens of
thousands of unique items
ASSOCIATION RULES
Support
 “The support is the percentage of transactions that demonstrate the rule.”
 Example: Database with transactions (customer_# : item_a1, item_a2,.. )
1: 1, 3, 5.
2: 1, 8, 14, 17, 12.
3: 4, 6, 8, 12, 9, 104.
4: 2, 1, 8.
 support {8,12} = 2 (,or 50% ~ 2 of 4 customers)
 support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )
 support {1} = 3 (,or 75% ~ 3 of 4 customers)
 An itemset is called frequent if its support is equal or greater than an
agreed upon minimal value – the support threshold
ASSOCIATION RULES
Confidence
 The confidence is the conditional probability that, given X present in a
transition , Y will also be present.
 An association rule is of the form: X => Y
 X => Y: if someone buys X, he also buys Y
 Confidence measure, by definition:
 Confidence(X=>Y) equals support(X,Y) / support(X)
EXAMPLE
Example: Database with transactions ( customer_# : item_a1, item_a2, …
)
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
Conf ( {5} => {8} ) ?
supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
EXAMPLE
Example: Database with transactions ( customer_# : item_a1, item_a2, … )
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ?
supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
EXAMPLE
Example: Database with transactions ( customer_# : item_a1,
item_a2, … )
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
Conf ( {9} => {3} ) ?
supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1,
then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
EXAMPLE
Example: Database with transactions ( customer_# : item_a1,
item_a2, … )
Conf( {9} => {3} ) = 100%. Done.
Notice: High Confidence, Low Support.
-> Rule ( {9} => {3} ) not meaningful
WHAT IS ASSOCIATION RULE MINING ALGORITHM
 There are a large number of them!!
 They use different strategies and data structures.
 Their resulting sets of rules are all the same.
 Given a transaction data set T, and a minimum support and a minimum
confident, the set of association rules existing in T is uniquely determined.
 Some of the proposed algorithms are:
 AIS Algorithm
 SETM Algorithm
 Apriori Algorithm *
 AprioriHybrid Algorithm.
 AprioriTid Algorithm
 FP growth Algorithm
CONTINUE…
APRIORI ALGORITHM
 In computer science and data mining, Apriori is a classic algorithm for
learning association rules.
 Apriori is designed to operate on databases containing transactions (for
example, collections of items bought by customers, or details of a website
frequentation).
 The algorithm attempts to find subsets which are common to at least a
minimum number C (the cutoff, or confidence threshold) of the itemsets.
 Apriori uses a "bottom up" approach, where frequent subsets are extended
one item at a time (a step known as candidate generation, and groups of
candidates are tested against the data.
 The algorithm terminates when no further successful extensions are found.
 Apriori uses breadth-first search and a hash tree structure to count
candidate item sets efficiently.
APRIORI ALGORITHM PSEUDOCODE
Ck: Candidate itemsets of size k
Lk : frequent itemsets of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++)
Ck+1 = GenerateCandidates(Lk)
for each transaction t in database do
increment count of candidates in Ck+1 that are contained in t
endfor
Lk+1 = candidates in Ck+1 with support ≥min_sup
endfor
return k Lk;
FREQUENT ITEMSET PROPERTY
GENERATE CANDIDATES
• Assume the items in Lk are listed in an order (e.g., alphabetical)
• Step 1: self-joining Lk (IN SQL)
insert into Ck+1
select p.item1, p.item2, …, p.itemk, q.itemk
from Lk p, Lk q
where p.item1=q.item1, …, p.itemk-1=q.itemk-1, p.itemk < q.itemk
• Step 2: pruning
forall itemsets c in Ck+1 do
forall k-subsets s of c do
if (s is not in Lk) then delete c from Ck+1
STEPS TO PERFORM APRIORI ALGORITHM
FORMULAS TO NOTE
 Min_sup count = Minimum support percentage * total number of transaction in
database
 Minimum confidence percentage < Confidence percentage after association rule
APRIORI ALGORITHM EXAMPLES
If the minimum support is 50%, minimum confidence is 50% in database D.
Illustrate the Apriori algorithm for finding frequent itemsets in D
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D
CONTINUE…
Scan D
itemset sup.
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
C1
itemset sup.
{1} 2
{2} 3
{3} 3
{5} 3
L1
itemset sup
{1 3} 2
{2 3} 2
{2 5} 3
{3 5} 2
L2
itemset sup
{1 2} 1
{1 3} 2
{1 5} 1
{2 3} 2
{2 5} 3
{3 5} 2
C2 itemset
{1 2}
{1 3}
{1 5}
{2 3}
{2 5}
{3 5}
C2
Scan D
C3
itemset
{2 3 5} Scan D L3 itemset sup
{2 3 5} 2
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D
CONTINUE…
Association
Rule {2,3,5}
Support Confidence Confidence (%)
2->3^5 2 2/3=0.66 66.6%
3->2^5 2 2/3=0.66 66.6%
5->2^3 2 2/3=0.66 66.6%
2^3->5 2 2/2=1.0 100%
2^5->3 2 2/3=0.66 66.6%
3^5->2 2 2/2=1.0 100%
With the confidence threshold set to 50%, the Strong Association Rules are
(sorted by confidence):
1. 2^3->5 (1.0)
2. 3^5->2 (1.0)
3. 2->3^5 (0.66)
4. 3->2^5 (0.66)
5. 5->2^3 (0.66)
6. 2^5->3 (0.66)
CONTINUE…
PRACTICE PROBLEM
Trace the results of using the Apriori algorithm on the grocery store example with support threshold
s=33.34% and confidence threshold c=60%. Show the candidate and frequent itemsets for each
database scan. Enumerate all the final frequent itemsets. Also indicate the association rules that
are generated and highlight the strong ones, sort them by confidence.
SOLUTION
 Support threshold =33.34% => threshold is at least 2 transactions.
 Applying Apriori
 Note that {HotDogs, Buns, Coke} and {HotDogs, Buns, Chips} are not candidates when k=3
because their subsets {Buns, Coke} and {Buns, Chips} are not frequent.
 Note also that normally, there is no need to go to k=4 since the longest transaction has only 3 items.
 All Frequent Itemsets: {HotDogs}, {Buns}, {Ketchup}, {Coke}, {Chips}, {HotDogs, Buns},
{HotDogs, Coke}, {HotDogs, Chips}, {Coke, Chips}, {HotDogs, Coke, Chips}.
Association rules:
SOLUTION
With the confidence threshold set to 60%, the Strong Association Rules are (sorted by confidence):
APRIORI ADVANTAGES/DISADVANTAGES
 Advantages
 Uses large itemset property
 Easily parallelized
 Easy to implement
 Disadvantages
 Assumes transaction database is memory resident.
 Requires many database scans.
APRIORI ALGORITHM ASSOCIATION RULE MINING

More Related Content

What's hot

Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule MiningMohit Rajput
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithmPradip Kumar
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysishktripathy
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmdeepti92pawar
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 

What's hot (20)

Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
 

Similar to APRIORI ALGORITHM ASSOCIATION RULE MINING

What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)Kumar P
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesDerek Kane
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopVarunSahdev2
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptxAmenahAbbood
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligenceGeo Marian
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisNarayan Vyas
 
5Association AnalysisBasic Concepts an.docx
 5Association AnalysisBasic Concepts an.docx 5Association AnalysisBasic Concepts an.docx
5Association AnalysisBasic Concepts an.docxShiraPrater50
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126IJRAT
 
Cluster2
Cluster2Cluster2
Cluster2work
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and CorrelationsIRJET Journal
 
Association and Classification Algorithm
Association and Classification AlgorithmAssociation and Classification Algorithm
Association and Classification AlgorithmMedicaps University
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
 
Products Frequently Bought Together in Stores Using classificat...
Products Frequently Bought Together in Stores               Using classificat...Products Frequently Bought Together in Stores               Using classificat...
Products Frequently Bought Together in Stores Using classificat...hibaziyad99
 
Association Mining
Association Mining Association Mining
Association Mining Edureka!
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdfJyoti Yadav
 
Mining Negative Association Rules
Mining Negative Association RulesMining Negative Association Rules
Mining Negative Association RulesIOSR Journals
 

Similar to APRIORI ALGORITHM ASSOCIATION RULE MINING (20)

What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
5Association AnalysisBasic Concepts an.docx
 5Association AnalysisBasic Concepts an.docx 5Association AnalysisBasic Concepts an.docx
5Association AnalysisBasic Concepts an.docx
 
Data Mining
Data Mining Data Mining
Data Mining
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126
 
Cluster2
Cluster2Cluster2
Cluster2
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
 
Association and Classification Algorithm
Association and Classification AlgorithmAssociation and Classification Algorithm
Association and Classification Algorithm
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
Products Frequently Bought Together in Stores Using classificat...
Products Frequently Bought Together in Stores               Using classificat...Products Frequently Bought Together in Stores               Using classificat...
Products Frequently Bought Together in Stores Using classificat...
 
Association Mining
Association Mining Association Mining
Association Mining
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
 
Mining Negative Association Rules
Mining Negative Association RulesMining Negative Association Rules
Mining Negative Association Rules
 

Recently uploaded

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Recently uploaded (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

APRIORI ALGORITHM ASSOCIATION RULE MINING

  • 1. ASSOCIATION RULE MINING AND APRIORI ALGORITHM By, AMBOOJ ANAM IQBAL HINA FIRDAUS M.Tech CSE 3rd Semester Guided By, Dr. Harleen Kaur
  • 2.  First proposed by Agrawal, Imielinski, and Swami frequent itemsets and association rule mining  Motivation: Finding inherent regularities in data.  Pattern mining algorithms can be applied on various types of data such as :  transaction databases  sequence databases  stream, graph etc.  Pattern mining algorithms can be designed to discover various types of patterns:  subgraphs,  associations,  indirect associations,  trends,  periodic patterns,  sequential rules, lattices, sequential patterns, high-utility patterns, etc. WHAT IS FREQUENT PATTERN ANALYSIS?
  • 3. WHY IS FREQUENT PATTERN MINING IMPORTANT?  Frequent pattern: An intrinsic and important property of datasets.  Foundation for many essential data mining tasks  Association, correlation, and causality analysis  Sequential, structural (e.g., sub-graph) patterns  Pattern analysis in spatiotemporal, multimedia, time-series, and stream data  Classification: discriminative, frequent pattern analysis  Cluster analysis: frequent pattern-based clustering  Data warehousing: iceberg cube and cube-gradient  Semantic data compression: fascicles  Broad applications
  • 4. WHAT IS ASSOCIATION MINING?  Proposed by Agrawal et al in 1993.  Association rule mining it is a procedure which is meant to find frequent patterns, correlations, associations, or causal structures from data sets found in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories.  It is an important data mining model studied extensively by the database and data mining community.
  • 5. APPLICATION OF ASSOCIATION  Market Basket Analysis: given a database of customer transactions, where each transaction is a set of items the goal is to find groups of items which are frequently purchased together.  Telecommunication each customer is a transaction containing the set of phone calls  Credit Cards/ Banking Services each card/account is a transaction containing the set of customer’s payments  Medical Treatments each patient is represented as a transaction containing the ordered set of diseases  Basketball-Game Analysis each game is represented as a transaction containing the ordered set of ball passes
  • 6. MARKET BASKET ANALYSIS  INPUT: list of purchases by purchaser  do not have names  identify purchase patterns  what items tend to be purchased together  obvious: steak-potatoes; beer-pretzels  what items are purchased sequentially  obvious: house-furniture; car-tires  what items tend to be purchased by season
  • 7. CONTINUE…  Categorize customer purchase behavior  identify actionable information  purchase profiles  profitability of each purchase profile  use for marketing  layout or catalogs  select products for promotion  space allocation, product placement  Market Basket Benefits  selection of promotions, merchandising strategy  sensitive to price: Italian entrees, pizza, pies, Oriental entrees, orange juice  uncover consumer spending patterns  correlations: orange juice & waffles  joint promotional opportunities
  • 8. POSSIBLE MARKET BASKETS Customer 1: diapers, baby lotion, grapefruit juice, baby food, milk Customer 2: soda, potato chips, milk Customer 3: soup, beer, milk, ice cream Customer 4: soda, coffee, milk, bread Customer 5: beer, potato chips
  • 10. LIMITATIONS  takes over 18 months to implement  market basket analysis only identifies hypotheses, which need to be tested  neural network, regression, decision tree analyses  measurement of impact needed  difficult to identify product groupings  complexity grows exponentially
  • 11. BENEFITS  simple computations  can be undirected (don’t have to have hypotheses before analysis)  different data forms can be analyzed
  • 12. ASSOCIATION RULES  Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars [Forbes, Sept 8, 1997]  Customers who purchase maintenance agreements are very likely to purchase large appliances (author experience)  When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners (author experience)  So what…
  • 13. WHAT IS ASSOCIATION RULE MINING?  Association Analysis is used for discovering interesting relationships hidden in large data sets.  Proposed by Agrawal et al in 1993.  It is an important data mining model studied extensively by the database and data mining community.  Assume all data are categorical.  Initially used for Market Basket Analysis to find how items purchased by customers are related.
  • 14.  Finding frequent patterns, associations, correlations, or causal structures among sets of items in transaction databases, relational databases, and other information repositories.  Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository.  Association rules are widely used in various areas such as telecommunication networks, market and risk management, inventory control etc.  Programmers use association rules to build programs capable of machine learning. CONTINUED…
  • 15.  The following rule can be extracted from the data set shown in table 1:  {Diapers}  {Beer}  The rule suggests that a strong relationship exists between the sale of diapers and beer because many customers who buy diapers also buy beer.  Retailers can use this type of rules to help them identify new opportunities for cross-selling their products to the customers.  Applications Basket data analysis, cross-marketing, catalog design, loss-leader analysis, web log analysis, fraud detection
  • 16.  An association rule has two parts, an antecedent (if) and a consequent (then).  An antecedent is an item found in the data.  A consequent is an item that is found in combination with the antecedent.  Rule form: Antecedent → Consequence  „Given:  (1) database of transactions,  (2) each transaction is a list of items purchased by a customer in a visit.  Find: all rules that correlate the presence of one set of items ( itemset ) with that of another set of items.  „E.g., 98% of people who purchase tires and auto accessories also get automotive services done
  • 17. THE MODEL: RULES  A transaction t contains X, a set of items (itemset) in I, if X  t.  An association rule is an implication of the form: X  Y, where X, Y  I, and X Y =   An itemset is a set of items.  E.g., X = {milk, bread, cereal} is an itemset.  A k-itemset is an itemset with k items.  E.g., {milk, bread, cereal} is a 3-itemset
  • 18. ASSOCIATION RULES  Association rule types:  Actionable Rules – contain high-quality, actionable information  Trivial Rules – information already well-known by those familiar with the business  Inexplicable Rules – no explanation and do not suggest action  Trivial and Inexplicable Rules occur most often
  • 19. HOW GOOD IS AN ASSOCIATION RULE? Customer Items Purchased 1 OJ, soda 2 Milk, OJ, window cleaner 3 OJ, detergent 4 OJ, detergent, soda 5 Window cleaner, soda OJ Window cleaner Milk Soda Detergent OJ 4 1 1 2 2 Window cleaner 1 2 1 1 0 Milk 1 1 1 0 0 Soda 2 1 0 3 1 Detergent 2 0 0 1 2 POS Transactions Co-occurrence of Products
  • 20. HOW GOOD IS AN ASSOCIATION RULE? OJ Window cleaner Milk Soda Detergent OJ 4 1 1 2 2 Window cleaner 1 2 1 1 0 Milk 1 1 1 0 0 Soda 2 1 0 3 1 Detergent 2 0 0 1 2 Simple patterns: 1. OJ and soda are more likely purchased together than any other two items 2. Detergent is never purchased with milk or window cleaner 3. Milk is never purchased with soda or detergent
  • 21. HOW GOOD IS AN ASSOCIATION RULE?  What is the confidence for this rule:  If a customer purchases soda, then customer also purchases OJ  2 out of 3 soda purchases also include OJ, so 67%  What about the confidence of this rule reversed?  2 out of 4 OJ purchases also include soda, so 50%  Confidence = Ratio of the number of transactions with all the items to the number of transactions with just the “if” items Customer Items Purchased 1 OJ, soda 2 Milk, OJ, window cleaner 3 OJ, detergent 4 OJ, detergent, soda 5 Window cleaner, soda POS Transactions
  • 22. CREATING ASSOCIATION RULES 1. Choosing the right set of items 2. Generating rules by deciphering the counts in the co-occurrence matrix 3. Overcoming the practical limits imposed by thousands or tens of thousands of unique items
  • 23. ASSOCIATION RULES Support  “The support is the percentage of transactions that demonstrate the rule.”  Example: Database with transactions (customer_# : item_a1, item_a2,.. ) 1: 1, 3, 5. 2: 1, 8, 14, 17, 12. 3: 4, 6, 8, 12, 9, 104. 4: 2, 1, 8.  support {8,12} = 2 (,or 50% ~ 2 of 4 customers)  support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )  support {1} = 3 (,or 75% ~ 3 of 4 customers)  An itemset is called frequent if its support is equal or greater than an agreed upon minimal value – the support threshold
  • 24. ASSOCIATION RULES Confidence  The confidence is the conditional probability that, given X present in a transition , Y will also be present.  An association rule is of the form: X => Y  X => Y: if someone buys X, he also buys Y  Confidence measure, by definition:  Confidence(X=>Y) equals support(X,Y) / support(X)
  • 25. EXAMPLE Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {5} => {8} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
  • 26. EXAMPLE Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
  • 27. EXAMPLE Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {9} => {3} ) ? supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1, then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
  • 28. EXAMPLE Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf( {9} => {3} ) = 100%. Done. Notice: High Confidence, Low Support. -> Rule ( {9} => {3} ) not meaningful
  • 29. WHAT IS ASSOCIATION RULE MINING ALGORITHM  There are a large number of them!!  They use different strategies and data structures.  Their resulting sets of rules are all the same.  Given a transaction data set T, and a minimum support and a minimum confident, the set of association rules existing in T is uniquely determined.  Some of the proposed algorithms are:  AIS Algorithm  SETM Algorithm  Apriori Algorithm *  AprioriHybrid Algorithm.  AprioriTid Algorithm  FP growth Algorithm
  • 31. APRIORI ALGORITHM  In computer science and data mining, Apriori is a classic algorithm for learning association rules.  Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation).  The algorithm attempts to find subsets which are common to at least a minimum number C (the cutoff, or confidence threshold) of the itemsets.  Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data.  The algorithm terminates when no further successful extensions are found.  Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently.
  • 32. APRIORI ALGORITHM PSEUDOCODE Ck: Candidate itemsets of size k Lk : frequent itemsets of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) Ck+1 = GenerateCandidates(Lk) for each transaction t in database do increment count of candidates in Ck+1 that are contained in t endfor Lk+1 = candidates in Ck+1 with support ≥min_sup endfor return k Lk;
  • 34. GENERATE CANDIDATES • Assume the items in Lk are listed in an order (e.g., alphabetical) • Step 1: self-joining Lk (IN SQL) insert into Ck+1 select p.item1, p.item2, …, p.itemk, q.itemk from Lk p, Lk q where p.item1=q.item1, …, p.itemk-1=q.itemk-1, p.itemk < q.itemk • Step 2: pruning forall itemsets c in Ck+1 do forall k-subsets s of c do if (s is not in Lk) then delete c from Ck+1
  • 35. STEPS TO PERFORM APRIORI ALGORITHM
  • 36. FORMULAS TO NOTE  Min_sup count = Minimum support percentage * total number of transaction in database  Minimum confidence percentage < Confidence percentage after association rule
  • 37. APRIORI ALGORITHM EXAMPLES If the minimum support is 50%, minimum confidence is 50% in database D. Illustrate the Apriori algorithm for finding frequent itemsets in D TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database D
  • 38. CONTINUE… Scan D itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 C1 itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 L1 itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 L2 itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 C2 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} C2 Scan D C3 itemset {2 3 5} Scan D L3 itemset sup {2 3 5} 2 TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database D
  • 39. CONTINUE… Association Rule {2,3,5} Support Confidence Confidence (%) 2->3^5 2 2/3=0.66 66.6% 3->2^5 2 2/3=0.66 66.6% 5->2^3 2 2/3=0.66 66.6% 2^3->5 2 2/2=1.0 100% 2^5->3 2 2/3=0.66 66.6% 3^5->2 2 2/2=1.0 100%
  • 40. With the confidence threshold set to 50%, the Strong Association Rules are (sorted by confidence): 1. 2^3->5 (1.0) 2. 3^5->2 (1.0) 3. 2->3^5 (0.66) 4. 3->2^5 (0.66) 5. 5->2^3 (0.66) 6. 2^5->3 (0.66) CONTINUE…
  • 41. PRACTICE PROBLEM Trace the results of using the Apriori algorithm on the grocery store example with support threshold s=33.34% and confidence threshold c=60%. Show the candidate and frequent itemsets for each database scan. Enumerate all the final frequent itemsets. Also indicate the association rules that are generated and highlight the strong ones, sort them by confidence.
  • 42. SOLUTION  Support threshold =33.34% => threshold is at least 2 transactions.  Applying Apriori  Note that {HotDogs, Buns, Coke} and {HotDogs, Buns, Chips} are not candidates when k=3 because their subsets {Buns, Coke} and {Buns, Chips} are not frequent.  Note also that normally, there is no need to go to k=4 since the longest transaction has only 3 items.  All Frequent Itemsets: {HotDogs}, {Buns}, {Ketchup}, {Coke}, {Chips}, {HotDogs, Buns}, {HotDogs, Coke}, {HotDogs, Chips}, {Coke, Chips}, {HotDogs, Coke, Chips}.
  • 43. Association rules: SOLUTION With the confidence threshold set to 60%, the Strong Association Rules are (sorted by confidence):
  • 44. APRIORI ADVANTAGES/DISADVANTAGES  Advantages  Uses large itemset property  Easily parallelized  Easy to implement  Disadvantages  Assumes transaction database is memory resident.  Requires many database scans.