Association 04.03.14

Presentation On
Association Rule Mining
Under the Guidance of :
Prof: R.N.Yadwad
Presented By :
Poornima Raidurg
2SD10CS057

Data Mining
• Data Mining or knowledge discovery, is the
computer-assisted process of digging through and
analyzing enormous sets of data and then extracting
the meaning of the data.
Example :
• Market Basket Analysis - Understand what products or
services are commonly purchased together

Association Analysis
• It is the most important model invented and extensively
studied by databases and data mining community.
• Proposed by Agrawal Rakesh, Srikrishna.
• Association rules are used to discover patterns that describe
strongly associated features in the data.
• Application - Business field where discovering of purchase
patterns or association between products is very useful for
decision-making and effective marketing.

Association Rules
• Association rules are of the form X->Y where X and Y are
disjoint item sets.
• The strength of an association rule can be determined
in terms of Support and Confidence.

Notations
 Item set is a collection of zero or more
items.
 If an item set contains ‘k’ items then it is
k-item set.

Procedure
Two subtasks:
Step 1 - Frequent Itemset Generation :It finds all the
itemsets that satisfy user-defined minsup threshold.
Step 2 - Rule Generation : It extracts all the high
confidence rules from the frequent itemsets found in
Step 1. These rules are called strong rules.

Support and Confidence
• Support : It determines how frequent the rule is
applicable in the transaction set T.
 Let n be the number of transactions in T.
 Support = Support(XUY)/n
Ex : Consider the rule {Milk, Diapers} -> {Beer}
Support = 2/5 = 0.4
• Confidence : The confidence of a rule is the percentage of
transactions in T that contain X also contain Y.
Confidence = Support(XUY)/Support(X)
Confidence = 2/3 = 0.666

The Apriori Principle
• If an itemset is frequent, then all of its subsets must also
be frequent.
• Apriori is the first association rule mining algorithm that
pioneered the use of support-based pruning to
systematically control the exponential growth of candidate
itemsets.

Apriori Algorithm
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;

Rule Generation
The Apriori algorithm uses a level-wise approach for
generating association rules, where each level
corresponds to the number of items that belong to the
rule consequent.

Suppose {Bread,Diapers,Milk} is frequent, with sup=50%
Proper non-empty subsets are {Bread}, {Diapers}, {Milk},
{Bread, Diapers} ,{Bread, Milk}, {Diapers, Milk}
with sup=50%, 50%, 75%, 75%, 75%, 75% respectively.
The association rules generated are :
{Bread, Diapers} -> {Milk} Confidence = 66.666 %
{Diapers, Milk} -> {Bread} Confidence = 75%
{Bread, Milk} -> {Diapers} Confidence = 75%
{Bread} -> {Diapers, Milk} Confidence = 100%
{Milk} -> {Diapers, Bread} Confidence = 100%
{Diapers} -> {Bread, Milk} Confidence = 100%

Approaches for frequent itemset generation
BruteForce method :
Advantages :
• This method considers every k-itemset as a potential
candidate and then applies the candidate pruning step to
remove any unnecessary candidates.
Disadvantages :
• Candidate Pruning becomes extremely expensive because
a large number of itemsets must be examined.

Fk-1 X Fk-1 Itemset Generation
Advantages :
• This method merges a pair of frequent (k-1) itemsets
only if their first (k-2) are identical.
Disadvantages :
• This method requires an extra pruning step to ensure
that the remaining (k-2) subsets are frequent itemsets.

Fk-1 X F1 Itemset Generation
Advantages :
• This method takes frequent (k-1) itemsets and extends
them other frequent itemsets. For instance it takes 2-
frequent itemsets and combines them with frequent 1-
itemset.
Disadvantages :
• This method does not prevent the same candidate itemset
from being generated more than once.

Association 04.03.14

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Association 04.03.14

Similar to Association 04.03.14 (20)

Recently uploaded

Recently uploaded (20)

Association 04.03.14