SlideShare a Scribd company logo
ASSOCIATION RULE MINING
AND APRIORI ALGORITHM
By,
AMBOOJ
ANAM IQBAL
HINA FIRDAUS
M.Tech CSE 3rd Semester
Guided By,
Dr. Harleen Kaur
 First proposed by Agrawal, Imielinski, and Swami frequent itemsets and
association rule mining
 Motivation: Finding inherent regularities in data.
 Pattern mining algorithms can be applied on various types of data such as :
 transaction databases
 sequence databases
 stream, graph etc.
 Pattern mining algorithms can be designed to discover various types of patterns:
 subgraphs,
 associations,
 indirect associations,
 trends,
 periodic patterns,
 sequential rules, lattices, sequential patterns, high-utility patterns, etc.
WHAT IS FREQUENT PATTERN ANALYSIS?
WHY IS FREQUENT PATTERN MINING IMPORTANT?
 Frequent pattern:
An intrinsic and important property of datasets.
 Foundation for many essential data mining tasks
 Association, correlation, and causality analysis
 Sequential, structural (e.g., sub-graph) patterns
 Pattern analysis in spatiotemporal, multimedia, time-series, and stream data
 Classification: discriminative, frequent pattern analysis
 Cluster analysis: frequent pattern-based clustering
 Data warehousing: iceberg cube and cube-gradient
 Semantic data compression: fascicles
 Broad applications
WHAT IS ASSOCIATION MINING?
 Proposed by Agrawal et al in 1993.
 Association rule mining
it is a procedure which is meant to find frequent patterns, correlations, associations, or causal
structures from data sets found in various kinds of databases such as relational databases,
transactional databases, and other forms of data repositories.
 It is an important data mining model studied extensively by the database and data
mining community.
APPLICATION OF ASSOCIATION
 Market Basket Analysis:
given a database of customer transactions, where each transaction is a set of items the goal is
to find groups of items which are frequently purchased together.
 Telecommunication
each customer is a transaction containing the set of phone calls
 Credit Cards/ Banking Services
each card/account is a transaction containing the set of customer’s payments
 Medical Treatments
each patient is represented as a transaction containing the ordered set of diseases
 Basketball-Game Analysis
each game is represented as a transaction containing the ordered set of ball passes
MARKET BASKET ANALYSIS
 INPUT: list of purchases by purchaser
 do not have names
 identify purchase patterns
 what items tend to be purchased together
 obvious: steak-potatoes; beer-pretzels
 what items are purchased sequentially
 obvious: house-furniture; car-tires
 what items tend to be purchased by season
CONTINUE…
 Categorize customer purchase behavior
 identify actionable information
 purchase profiles
 profitability of each purchase profile
 use for marketing
 layout or catalogs
 select products for promotion
 space allocation, product placement
 Market Basket Benefits
 selection of promotions, merchandising strategy
 sensitive to price: Italian entrees, pizza, pies, Oriental entrees, orange juice
 uncover consumer spending patterns
 correlations: orange juice & waffles
 joint promotional opportunities
POSSIBLE MARKET BASKETS
Customer 1: diapers, baby lotion, grapefruit juice, baby food, milk
Customer 2: soda, potato chips, milk
Customer 3: soup, beer, milk, ice cream
Customer 4: soda, coffee, milk, bread
Customer 5: beer, potato chips
CO-OCCURRENCE TABLE
LIMITATIONS
 takes over 18 months to implement
 market basket analysis only identifies hypotheses, which need to be tested
 neural network, regression, decision tree analyses
 measurement of impact needed
 difficult to identify product groupings
 complexity grows exponentially
BENEFITS
 simple computations
 can be undirected (don’t have to have hypotheses before
analysis)
 different data forms can be analyzed
ASSOCIATION RULES
 Wal-Mart customers who purchase Barbie dolls have a 60%
likelihood of also purchasing one of three types of candy bars
[Forbes, Sept 8, 1997]
 Customers who purchase maintenance agreements are very likely
to purchase large appliances (author experience)
 When a new hardware store opens, one of the most commonly sold
items is toilet bowl cleaners (author experience)
 So what…
WHAT IS ASSOCIATION RULE MINING?
 Association Analysis is used for discovering interesting relationships
hidden in large data sets.
 Proposed by Agrawal et al in 1993.
 It is an important data mining model studied extensively by the
database and data mining community.
 Assume all data are categorical.
 Initially used for Market Basket Analysis to find how items purchased
by customers are related.
 Finding frequent patterns, associations, correlations, or causal
structures among sets of items in transaction databases, relational
databases, and other information repositories.
 Association rules are if/then statements that help uncover
relationships between seemingly unrelated data in a relational
database or other information repository.
 Association rules are widely used in various areas such as
telecommunication networks, market and risk management,
inventory control etc.
 Programmers use association rules to build programs capable of
machine learning.
CONTINUED…
 The following rule can be extracted
from the data set shown in table 1:
 {Diapers}  {Beer}
 The rule suggests that a strong
relationship exists between the sale
of diapers and beer because many
customers who buy diapers also buy
beer.
 Retailers can use this type of rules to help them identify new
opportunities for cross-selling their products to the customers.
 Applications Basket data analysis, cross-marketing, catalog design,
loss-leader analysis, web log analysis, fraud detection
 An association rule has two parts, an antecedent (if) and a consequent (then).
 An antecedent is an item found in the data.
 A consequent is an item that is found in combination with the antecedent.
 Rule form: Antecedent → Consequence
 „Given:
 (1) database of transactions,
 (2) each transaction is a list of items purchased by a customer in a visit.
 Find: all rules that correlate the presence of one set of items ( itemset ) with that
of another set of items.
 „E.g., 98% of people who purchase tires and auto accessories also get
automotive services done
THE MODEL: RULES
 A transaction t contains X, a set of items (itemset) in I, if X  t.
 An association rule is an implication of the form:
X  Y, where X, Y  I, and X Y = 
 An itemset is a set of items.
 E.g., X = {milk, bread, cereal} is an itemset.
 A k-itemset is an itemset with k items.
 E.g., {milk, bread, cereal} is a 3-itemset
ASSOCIATION RULES
 Association rule types:
 Actionable Rules – contain high-quality, actionable information
 Trivial Rules – information already well-known by those familiar with
the business
 Inexplicable Rules – no explanation and do not suggest action
 Trivial and Inexplicable Rules occur most often
HOW GOOD IS AN ASSOCIATION RULE?
Customer Items Purchased
1 OJ, soda
2 Milk, OJ, window cleaner
3 OJ, detergent
4 OJ, detergent, soda
5 Window cleaner, soda
OJ Window
cleaner
Milk Soda Detergent
OJ 4 1 1 2 2
Window cleaner 1 2 1 1 0
Milk 1 1 1 0 0
Soda 2 1 0 3 1
Detergent 2 0 0 1 2
POS Transactions
Co-occurrence of
Products
HOW GOOD IS AN ASSOCIATION RULE?
OJ Window
cleaner
Milk Soda Detergent
OJ 4 1 1 2 2
Window cleaner 1 2 1 1 0
Milk 1 1 1 0 0
Soda 2 1 0 3 1
Detergent 2 0 0 1 2
Simple patterns:
1. OJ and soda are more likely purchased together than
any other two items
2. Detergent is never purchased with milk or window cleaner
3. Milk is never purchased with soda or detergent
HOW GOOD IS AN ASSOCIATION RULE?
 What is the confidence for this rule:
 If a customer purchases soda, then customer also purchases OJ
 2 out of 3 soda purchases also include OJ, so 67%
 What about the confidence of this rule reversed?
 2 out of 4 OJ purchases also include soda, so 50%
 Confidence = Ratio of the number of transactions with all the items to
the number of transactions with just the “if” items
Customer Items Purchased
1 OJ, soda
2 Milk, OJ, window cleaner
3 OJ, detergent
4 OJ, detergent, soda
5 Window cleaner, soda
POS Transactions
CREATING ASSOCIATION RULES
1. Choosing the right set of
items
2. Generating rules by
deciphering the counts in
the co-occurrence matrix
3. Overcoming the practical
limits imposed by
thousands or tens of
thousands of unique items
ASSOCIATION RULES
Support
 “The support is the percentage of transactions that demonstrate the rule.”
 Example: Database with transactions (customer_# : item_a1, item_a2,.. )
1: 1, 3, 5.
2: 1, 8, 14, 17, 12.
3: 4, 6, 8, 12, 9, 104.
4: 2, 1, 8.
 support {8,12} = 2 (,or 50% ~ 2 of 4 customers)
 support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )
 support {1} = 3 (,or 75% ~ 3 of 4 customers)
 An itemset is called frequent if its support is equal or greater than an
agreed upon minimal value – the support threshold
ASSOCIATION RULES
Confidence
 The confidence is the conditional probability that, given X present in a
transition , Y will also be present.
 An association rule is of the form: X => Y
 X => Y: if someone buys X, he also buys Y
 Confidence measure, by definition:
 Confidence(X=>Y) equals support(X,Y) / support(X)
EXAMPLE
Example: Database with transactions ( customer_# : item_a1, item_a2, …
)
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
Conf ( {5} => {8} ) ?
supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
EXAMPLE
Example: Database with transactions ( customer_# : item_a1, item_a2, … )
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ?
supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
EXAMPLE
Example: Database with transactions ( customer_# : item_a1,
item_a2, … )
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.
Conf ( {9} => {3} ) ?
supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1,
then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
EXAMPLE
Example: Database with transactions ( customer_# : item_a1,
item_a2, … )
Conf( {9} => {3} ) = 100%. Done.
Notice: High Confidence, Low Support.
-> Rule ( {9} => {3} ) not meaningful
WHAT IS ASSOCIATION RULE MINING ALGORITHM
 There are a large number of them!!
 They use different strategies and data structures.
 Their resulting sets of rules are all the same.
 Given a transaction data set T, and a minimum support and a minimum
confident, the set of association rules existing in T is uniquely determined.
 Some of the proposed algorithms are:
 AIS Algorithm
 SETM Algorithm
 Apriori Algorithm *
 AprioriHybrid Algorithm.
 AprioriTid Algorithm
 FP growth Algorithm
CONTINUE…
APRIORI ALGORITHM
 In computer science and data mining, Apriori is a classic algorithm for
learning association rules.
 Apriori is designed to operate on databases containing transactions (for
example, collections of items bought by customers, or details of a website
frequentation).
 The algorithm attempts to find subsets which are common to at least a
minimum number C (the cutoff, or confidence threshold) of the itemsets.
 Apriori uses a "bottom up" approach, where frequent subsets are extended
one item at a time (a step known as candidate generation, and groups of
candidates are tested against the data.
 The algorithm terminates when no further successful extensions are found.
 Apriori uses breadth-first search and a hash tree structure to count
candidate item sets efficiently.
APRIORI ALGORITHM PSEUDOCODE
Ck: Candidate itemsets of size k
Lk : frequent itemsets of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++)
Ck+1 = GenerateCandidates(Lk)
for each transaction t in database do
increment count of candidates in Ck+1 that are contained in t
endfor
Lk+1 = candidates in Ck+1 with support ≥min_sup
endfor
return k Lk;
FREQUENT ITEMSET PROPERTY
GENERATE CANDIDATES
• Assume the items in Lk are listed in an order (e.g., alphabetical)
• Step 1: self-joining Lk (IN SQL)
insert into Ck+1
select p.item1, p.item2, …, p.itemk, q.itemk
from Lk p, Lk q
where p.item1=q.item1, …, p.itemk-1=q.itemk-1, p.itemk < q.itemk
• Step 2: pruning
forall itemsets c in Ck+1 do
forall k-subsets s of c do
if (s is not in Lk) then delete c from Ck+1
STEPS TO PERFORM APRIORI ALGORITHM
FORMULAS TO NOTE
 Min_sup count = Minimum support percentage * total number of transaction in
database
 Minimum confidence percentage < Confidence percentage after association rule
APRIORI ALGORITHM EXAMPLES
If the minimum support is 50%, minimum confidence is 50% in database D.
Illustrate the Apriori algorithm for finding frequent itemsets in D
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D
CONTINUE…
Scan D
itemset sup.
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
C1
itemset sup.
{1} 2
{2} 3
{3} 3
{5} 3
L1
itemset sup
{1 3} 2
{2 3} 2
{2 5} 3
{3 5} 2
L2
itemset sup
{1 2} 1
{1 3} 2
{1 5} 1
{2 3} 2
{2 5} 3
{3 5} 2
C2 itemset
{1 2}
{1 3}
{1 5}
{2 3}
{2 5}
{3 5}
C2
Scan D
C3
itemset
{2 3 5} Scan D L3 itemset sup
{2 3 5} 2
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D
CONTINUE…
Association
Rule {2,3,5}
Support Confidence Confidence (%)
2->3^5 2 2/3=0.66 66.6%
3->2^5 2 2/3=0.66 66.6%
5->2^3 2 2/3=0.66 66.6%
2^3->5 2 2/2=1.0 100%
2^5->3 2 2/3=0.66 66.6%
3^5->2 2 2/2=1.0 100%
With the confidence threshold set to 50%, the Strong Association Rules are
(sorted by confidence):
1. 2^3->5 (1.0)
2. 3^5->2 (1.0)
3. 2->3^5 (0.66)
4. 3->2^5 (0.66)
5. 5->2^3 (0.66)
6. 2^5->3 (0.66)
CONTINUE…
PRACTICE PROBLEM
Trace the results of using the Apriori algorithm on the grocery store example with support threshold
s=33.34% and confidence threshold c=60%. Show the candidate and frequent itemsets for each
database scan. Enumerate all the final frequent itemsets. Also indicate the association rules that
are generated and highlight the strong ones, sort them by confidence.
SOLUTION
 Support threshold =33.34% => threshold is at least 2 transactions.
 Applying Apriori
 Note that {HotDogs, Buns, Coke} and {HotDogs, Buns, Chips} are not candidates when k=3
because their subsets {Buns, Coke} and {Buns, Chips} are not frequent.
 Note also that normally, there is no need to go to k=4 since the longest transaction has only 3 items.
 All Frequent Itemsets: {HotDogs}, {Buns}, {Ketchup}, {Coke}, {Chips}, {HotDogs, Buns},
{HotDogs, Coke}, {HotDogs, Chips}, {Coke, Chips}, {HotDogs, Coke, Chips}.
Association rules:
SOLUTION
With the confidence threshold set to 60%, the Strong Association Rules are (sorted by confidence):
APRIORI ADVANTAGES/DISADVANTAGES
 Advantages
 Uses large itemset property
 Easily parallelized
 Easy to implement
 Disadvantages
 Assumes transaction database is memory resident.
 Requires many database scans.
Association rule mining and Apriori algorithm

More Related Content

What's hot

Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Utkarsh Sharma
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
Acad
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
Salah Amean
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
Paras Kohli
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
hktripathy
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Mainul Hassan
 
Market baasket analysis
Market baasket analysisMarket baasket analysis
Market baasket analysis
SiddharthaPanapakam
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
Eng Teong Cheah
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
nouraalkhatib
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Salah Amean
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Association rules
Association rulesAssociation rules
Association rules
Dr. C.V. Suresh Babu
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Narayan Vyas
 

What's hot (20)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Market baasket analysis
Market baasket analysisMarket baasket analysis
Market baasket analysis
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Association rules
Association rulesAssociation rules
Association rules
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 

Similar to Association rule mining and Apriori algorithm

What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
Kumar P
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
Derek Kane
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Wake Tech BAS
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
VarunSahdev2
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
SmithaRaj16
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
AmenahAbbood
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
Geo Marian
 
5Association AnalysisBasic Concepts an.docx
 5Association AnalysisBasic Concepts an.docx 5Association AnalysisBasic Concepts an.docx
5Association AnalysisBasic Concepts an.docx
ShiraPrater50
 
Data Mining
Data Mining Data Mining
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126IJRAT
 
Cluster2
Cluster2Cluster2
Cluster2work
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
DrGnaneswariG
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET Journal
 
Association and Classification Algorithm
Association and Classification AlgorithmAssociation and Classification Algorithm
Association and Classification Algorithm
Medicaps University
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
nikshaikh786
 
Products Frequently Bought Together in Stores Using classificat...
Products Frequently Bought Together in Stores               Using classificat...Products Frequently Bought Together in Stores               Using classificat...
Products Frequently Bought Together in Stores Using classificat...
hibaziyad99
 
Association Mining
Association Mining Association Mining
Association Mining Edureka!
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
Jyoti Yadav
 
Mining Negative Association Rules
Mining Negative Association RulesMining Negative Association Rules
Mining Negative Association Rules
IOSR Journals
 
Data Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association RuleData Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association Rule
ijtsrd
 

Similar to Association rule mining and Apriori algorithm (20)

What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
Market Basket Analysis of bakery Shop
Market Basket Analysis of bakery ShopMarket Basket Analysis of bakery Shop
Market Basket Analysis of bakery Shop
 
Unit 4_ML.pptx
Unit 4_ML.pptxUnit 4_ML.pptx
Unit 4_ML.pptx
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
5Association AnalysisBasic Concepts an.docx
 5Association AnalysisBasic Concepts an.docx 5Association AnalysisBasic Concepts an.docx
5Association AnalysisBasic Concepts an.docx
 
Data Mining
Data Mining Data Mining
Data Mining
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126
 
Cluster2
Cluster2Cluster2
Cluster2
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
IRJET- Minning Frequent Patterns,Associations and Correlations
IRJET-  	  Minning Frequent Patterns,Associations and CorrelationsIRJET-  	  Minning Frequent Patterns,Associations and Correlations
IRJET- Minning Frequent Patterns,Associations and Correlations
 
Association and Classification Algorithm
Association and Classification AlgorithmAssociation and Classification Algorithm
Association and Classification Algorithm
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
Products Frequently Bought Together in Stores Using classificat...
Products Frequently Bought Together in Stores               Using classificat...Products Frequently Bought Together in Stores               Using classificat...
Products Frequently Bought Together in Stores Using classificat...
 
Association Mining
Association Mining Association Mining
Association Mining
 
6. Association Rule.pdf
6. Association Rule.pdf6. Association Rule.pdf
6. Association Rule.pdf
 
Mining Negative Association Rules
Mining Negative Association RulesMining Negative Association Rules
Mining Negative Association Rules
 
Data Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association RuleData Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association Rule
 

Recently uploaded

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 

Recently uploaded (20)

Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 

Association rule mining and Apriori algorithm

  • 1. ASSOCIATION RULE MINING AND APRIORI ALGORITHM By, AMBOOJ ANAM IQBAL HINA FIRDAUS M.Tech CSE 3rd Semester Guided By, Dr. Harleen Kaur
  • 2.  First proposed by Agrawal, Imielinski, and Swami frequent itemsets and association rule mining  Motivation: Finding inherent regularities in data.  Pattern mining algorithms can be applied on various types of data such as :  transaction databases  sequence databases  stream, graph etc.  Pattern mining algorithms can be designed to discover various types of patterns:  subgraphs,  associations,  indirect associations,  trends,  periodic patterns,  sequential rules, lattices, sequential patterns, high-utility patterns, etc. WHAT IS FREQUENT PATTERN ANALYSIS?
  • 3. WHY IS FREQUENT PATTERN MINING IMPORTANT?  Frequent pattern: An intrinsic and important property of datasets.  Foundation for many essential data mining tasks  Association, correlation, and causality analysis  Sequential, structural (e.g., sub-graph) patterns  Pattern analysis in spatiotemporal, multimedia, time-series, and stream data  Classification: discriminative, frequent pattern analysis  Cluster analysis: frequent pattern-based clustering  Data warehousing: iceberg cube and cube-gradient  Semantic data compression: fascicles  Broad applications
  • 4. WHAT IS ASSOCIATION MINING?  Proposed by Agrawal et al in 1993.  Association rule mining it is a procedure which is meant to find frequent patterns, correlations, associations, or causal structures from data sets found in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories.  It is an important data mining model studied extensively by the database and data mining community.
  • 5. APPLICATION OF ASSOCIATION  Market Basket Analysis: given a database of customer transactions, where each transaction is a set of items the goal is to find groups of items which are frequently purchased together.  Telecommunication each customer is a transaction containing the set of phone calls  Credit Cards/ Banking Services each card/account is a transaction containing the set of customer’s payments  Medical Treatments each patient is represented as a transaction containing the ordered set of diseases  Basketball-Game Analysis each game is represented as a transaction containing the ordered set of ball passes
  • 6. MARKET BASKET ANALYSIS  INPUT: list of purchases by purchaser  do not have names  identify purchase patterns  what items tend to be purchased together  obvious: steak-potatoes; beer-pretzels  what items are purchased sequentially  obvious: house-furniture; car-tires  what items tend to be purchased by season
  • 7. CONTINUE…  Categorize customer purchase behavior  identify actionable information  purchase profiles  profitability of each purchase profile  use for marketing  layout or catalogs  select products for promotion  space allocation, product placement  Market Basket Benefits  selection of promotions, merchandising strategy  sensitive to price: Italian entrees, pizza, pies, Oriental entrees, orange juice  uncover consumer spending patterns  correlations: orange juice & waffles  joint promotional opportunities
  • 8. POSSIBLE MARKET BASKETS Customer 1: diapers, baby lotion, grapefruit juice, baby food, milk Customer 2: soda, potato chips, milk Customer 3: soup, beer, milk, ice cream Customer 4: soda, coffee, milk, bread Customer 5: beer, potato chips
  • 10. LIMITATIONS  takes over 18 months to implement  market basket analysis only identifies hypotheses, which need to be tested  neural network, regression, decision tree analyses  measurement of impact needed  difficult to identify product groupings  complexity grows exponentially
  • 11. BENEFITS  simple computations  can be undirected (don’t have to have hypotheses before analysis)  different data forms can be analyzed
  • 12. ASSOCIATION RULES  Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars [Forbes, Sept 8, 1997]  Customers who purchase maintenance agreements are very likely to purchase large appliances (author experience)  When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners (author experience)  So what…
  • 13. WHAT IS ASSOCIATION RULE MINING?  Association Analysis is used for discovering interesting relationships hidden in large data sets.  Proposed by Agrawal et al in 1993.  It is an important data mining model studied extensively by the database and data mining community.  Assume all data are categorical.  Initially used for Market Basket Analysis to find how items purchased by customers are related.
  • 14.  Finding frequent patterns, associations, correlations, or causal structures among sets of items in transaction databases, relational databases, and other information repositories.  Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository.  Association rules are widely used in various areas such as telecommunication networks, market and risk management, inventory control etc.  Programmers use association rules to build programs capable of machine learning. CONTINUED…
  • 15.  The following rule can be extracted from the data set shown in table 1:  {Diapers}  {Beer}  The rule suggests that a strong relationship exists between the sale of diapers and beer because many customers who buy diapers also buy beer.  Retailers can use this type of rules to help them identify new opportunities for cross-selling their products to the customers.  Applications Basket data analysis, cross-marketing, catalog design, loss-leader analysis, web log analysis, fraud detection
  • 16.  An association rule has two parts, an antecedent (if) and a consequent (then).  An antecedent is an item found in the data.  A consequent is an item that is found in combination with the antecedent.  Rule form: Antecedent → Consequence  „Given:  (1) database of transactions,  (2) each transaction is a list of items purchased by a customer in a visit.  Find: all rules that correlate the presence of one set of items ( itemset ) with that of another set of items.  „E.g., 98% of people who purchase tires and auto accessories also get automotive services done
  • 17. THE MODEL: RULES  A transaction t contains X, a set of items (itemset) in I, if X  t.  An association rule is an implication of the form: X  Y, where X, Y  I, and X Y =   An itemset is a set of items.  E.g., X = {milk, bread, cereal} is an itemset.  A k-itemset is an itemset with k items.  E.g., {milk, bread, cereal} is a 3-itemset
  • 18. ASSOCIATION RULES  Association rule types:  Actionable Rules – contain high-quality, actionable information  Trivial Rules – information already well-known by those familiar with the business  Inexplicable Rules – no explanation and do not suggest action  Trivial and Inexplicable Rules occur most often
  • 19. HOW GOOD IS AN ASSOCIATION RULE? Customer Items Purchased 1 OJ, soda 2 Milk, OJ, window cleaner 3 OJ, detergent 4 OJ, detergent, soda 5 Window cleaner, soda OJ Window cleaner Milk Soda Detergent OJ 4 1 1 2 2 Window cleaner 1 2 1 1 0 Milk 1 1 1 0 0 Soda 2 1 0 3 1 Detergent 2 0 0 1 2 POS Transactions Co-occurrence of Products
  • 20. HOW GOOD IS AN ASSOCIATION RULE? OJ Window cleaner Milk Soda Detergent OJ 4 1 1 2 2 Window cleaner 1 2 1 1 0 Milk 1 1 1 0 0 Soda 2 1 0 3 1 Detergent 2 0 0 1 2 Simple patterns: 1. OJ and soda are more likely purchased together than any other two items 2. Detergent is never purchased with milk or window cleaner 3. Milk is never purchased with soda or detergent
  • 21. HOW GOOD IS AN ASSOCIATION RULE?  What is the confidence for this rule:  If a customer purchases soda, then customer also purchases OJ  2 out of 3 soda purchases also include OJ, so 67%  What about the confidence of this rule reversed?  2 out of 4 OJ purchases also include soda, so 50%  Confidence = Ratio of the number of transactions with all the items to the number of transactions with just the “if” items Customer Items Purchased 1 OJ, soda 2 Milk, OJ, window cleaner 3 OJ, detergent 4 OJ, detergent, soda 5 Window cleaner, soda POS Transactions
  • 22. CREATING ASSOCIATION RULES 1. Choosing the right set of items 2. Generating rules by deciphering the counts in the co-occurrence matrix 3. Overcoming the practical limits imposed by thousands or tens of thousands of unique items
  • 23. ASSOCIATION RULES Support  “The support is the percentage of transactions that demonstrate the rule.”  Example: Database with transactions (customer_# : item_a1, item_a2,.. ) 1: 1, 3, 5. 2: 1, 8, 14, 17, 12. 3: 4, 6, 8, 12, 9, 104. 4: 2, 1, 8.  support {8,12} = 2 (,or 50% ~ 2 of 4 customers)  support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )  support {1} = 3 (,or 75% ~ 3 of 4 customers)  An itemset is called frequent if its support is equal or greater than an agreed upon minimal value – the support threshold
  • 24. ASSOCIATION RULES Confidence  The confidence is the conditional probability that, given X present in a transition , Y will also be present.  An association rule is of the form: X => Y  X => Y: if someone buys X, he also buys Y  Confidence measure, by definition:  Confidence(X=>Y) equals support(X,Y) / support(X)
  • 25. EXAMPLE Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {5} => {8} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
  • 26. EXAMPLE Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
  • 27. EXAMPLE Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10. Conf ( {9} => {3} ) ? supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1, then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
  • 28. EXAMPLE Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf( {9} => {3} ) = 100%. Done. Notice: High Confidence, Low Support. -> Rule ( {9} => {3} ) not meaningful
  • 29. WHAT IS ASSOCIATION RULE MINING ALGORITHM  There are a large number of them!!  They use different strategies and data structures.  Their resulting sets of rules are all the same.  Given a transaction data set T, and a minimum support and a minimum confident, the set of association rules existing in T is uniquely determined.  Some of the proposed algorithms are:  AIS Algorithm  SETM Algorithm  Apriori Algorithm *  AprioriHybrid Algorithm.  AprioriTid Algorithm  FP growth Algorithm
  • 31. APRIORI ALGORITHM  In computer science and data mining, Apriori is a classic algorithm for learning association rules.  Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation).  The algorithm attempts to find subsets which are common to at least a minimum number C (the cutoff, or confidence threshold) of the itemsets.  Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data.  The algorithm terminates when no further successful extensions are found.  Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently.
  • 32. APRIORI ALGORITHM PSEUDOCODE Ck: Candidate itemsets of size k Lk : frequent itemsets of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) Ck+1 = GenerateCandidates(Lk) for each transaction t in database do increment count of candidates in Ck+1 that are contained in t endfor Lk+1 = candidates in Ck+1 with support ≥min_sup endfor return k Lk;
  • 34. GENERATE CANDIDATES • Assume the items in Lk are listed in an order (e.g., alphabetical) • Step 1: self-joining Lk (IN SQL) insert into Ck+1 select p.item1, p.item2, …, p.itemk, q.itemk from Lk p, Lk q where p.item1=q.item1, …, p.itemk-1=q.itemk-1, p.itemk < q.itemk • Step 2: pruning forall itemsets c in Ck+1 do forall k-subsets s of c do if (s is not in Lk) then delete c from Ck+1
  • 35. STEPS TO PERFORM APRIORI ALGORITHM
  • 36. FORMULAS TO NOTE  Min_sup count = Minimum support percentage * total number of transaction in database  Minimum confidence percentage < Confidence percentage after association rule
  • 37. APRIORI ALGORITHM EXAMPLES If the minimum support is 50%, minimum confidence is 50% in database D. Illustrate the Apriori algorithm for finding frequent itemsets in D TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database D
  • 38. CONTINUE… Scan D itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 C1 itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 L1 itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 L2 itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 C2 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} C2 Scan D C3 itemset {2 3 5} Scan D L3 itemset sup {2 3 5} 2 TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Database D
  • 39. CONTINUE… Association Rule {2,3,5} Support Confidence Confidence (%) 2->3^5 2 2/3=0.66 66.6% 3->2^5 2 2/3=0.66 66.6% 5->2^3 2 2/3=0.66 66.6% 2^3->5 2 2/2=1.0 100% 2^5->3 2 2/3=0.66 66.6% 3^5->2 2 2/2=1.0 100%
  • 40. With the confidence threshold set to 50%, the Strong Association Rules are (sorted by confidence): 1. 2^3->5 (1.0) 2. 3^5->2 (1.0) 3. 2->3^5 (0.66) 4. 3->2^5 (0.66) 5. 5->2^3 (0.66) 6. 2^5->3 (0.66) CONTINUE…
  • 41. PRACTICE PROBLEM Trace the results of using the Apriori algorithm on the grocery store example with support threshold s=33.34% and confidence threshold c=60%. Show the candidate and frequent itemsets for each database scan. Enumerate all the final frequent itemsets. Also indicate the association rules that are generated and highlight the strong ones, sort them by confidence.
  • 42. SOLUTION  Support threshold =33.34% => threshold is at least 2 transactions.  Applying Apriori  Note that {HotDogs, Buns, Coke} and {HotDogs, Buns, Chips} are not candidates when k=3 because their subsets {Buns, Coke} and {Buns, Chips} are not frequent.  Note also that normally, there is no need to go to k=4 since the longest transaction has only 3 items.  All Frequent Itemsets: {HotDogs}, {Buns}, {Ketchup}, {Coke}, {Chips}, {HotDogs, Buns}, {HotDogs, Coke}, {HotDogs, Chips}, {Coke, Chips}, {HotDogs, Coke, Chips}.
  • 43. Association rules: SOLUTION With the confidence threshold set to 60%, the Strong Association Rules are (sorted by confidence):
  • 44. APRIORI ADVANTAGES/DISADVANTAGES  Advantages  Uses large itemset property  Easily parallelized  Easy to implement  Disadvantages  Assumes transaction database is memory resident.  Requires many database scans.