2. Association Rule Mining 2
Association Rules
Finds Interesting associations / correlation
relationships among large sets of data
Business Decision Making
Example – Market Basket Analysis
Items likely to be purchased
Advertising strategy, Catalog Design, Store layout
3. Association Rule Mining 3
Association Rules
Forming Association rules
Universe – all items
Boolean Vector
Example:
Computer ⇒ Accounting_Software
[support = 5%, confidence = 60%]
Minimum support and Confidence threshold
4. Association Rule Mining 4
Basic Concepts
I = {i1, i2, …im} – Set of Items
D – Set of database Transactions
T – Transaction contains a set of items and T ⊆ I
Association rule – A ⇒ B where A ⊆ I B ⊆ I and A ∩ B = ∅
Support – Percentage of transactions in D containing both A and B -
P(A∪B)
Confidence – Percentage of transactions in D containing A that also
contain B – P(B/A)
Confidence(A ⇒ B) = Support(A ∪ B) / Support (A)
5. Association Rule Mining 5
Basic Concepts
Itemset
K-Itemset
Occurrence frequency of an itemset
Frequency, support_count (absolute support) or count
Itemset satisfies minimum support when count >=
min_sup * number of transactions in D
Minimum Support Count
Frequent Itemset
6. Association Rule Mining 6
Association Rule Mining Process
Find all frequent itemsets
Generate strong association rules from
frequent itemsets
Satisfy Minimum Support and Minimum
Confidence
7. Association Rule Mining 7
Itemsets
Complete Itemsets
Closed Frequent Itemset
X is closed in a data set S if there exists no proper super itemset
Y such that Y has the same support count as X in S
X is frequent
Maximal Frequent Itemset
X is Frequent and there exists no super-itemset Y such that X ⊂
Y and Y is frequent in S
Example:
T = { {a1,a2,…a100}, {a1,a2,…a50}}, min_sup = 1
Closed frequent itemsets : Both {{a1,a2,…a100}:1, {a1,a2,…a50}: 2}
Maximal frequent itemset: {a1,a2,…a100}
8. Association Rule Mining 8
Types of Association Rules
Types of Values
Boolean, Quantitative Association Rule
Dimensions of data
Single Dimensional, Multi-dimensional
Level of abstraction
Multilevel association rules
Based on kinds of rules
Association rules, Correlation rules, Strong gradient relationships
Based on completeness of patterns
Complete, Closed, Maximal, top-k, constrained, approximate…
9. Association Rule Mining 9
Mining Single Dimensional Boolean
Association Rules
Apriori Algorithm – Finding Frequent Itemsets
using Candidate Generation
Uses prior knowledge of frequent itemset properties
Level wise search
K itemsets used for exploring k+1 itemsets
Frequent 1-itemsets – L1
L1 is used to find L2
10. Association Rule Mining 10
Apriori Property
Reduces Search space
All non empty subsets of a frequent itemset must also be
frequent
If P(I) < min_sup then P(I U A) < min_sup
Anti-monotone property – If a set cannot pass a test all
of its supersets will fail the test as well.
Any subset of a frequent itemset must be frequent
11. Association Rule Mining 11
Apriori property application
Join Step
To find Lk -join Lk-1 with itself - Ck
li[j] – jth
item in li
Members of Lk-1 are joinable if their first (k-2) items are
common
Members l1 and l2 of Lk-1 are joinable if (l1[1]=l2[1]) ∧
(l1[2]=l2[2]) ∧ …(l1[k-2]=l2[k-2]) ∧ (l1[k-1]< l2[k-1])
Resulting itemset is l1[1], l1[2], … l1[k-1], l2[k-1]
12. Association Rule Mining 12
Apriori property application
Prune Step
Ck is a superset of Lk
Determine the count of each candidate of Ck
To reduce the size of Ck - if any (k-1) subset is not in Lk-1 it can be
removed from Ck
13. Association Rule Mining 13
The Apriori Algorithm
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 2; Lk-1 !=∅; k++) do begin
Ck = candidates generated from Lk-1;
for each transaction t in database do
increment the count of all candidates in Ck
that are contained in t
Lk = candidates in Ck with min_support
end
return ∪k Lk;
14. Association Rule Mining 14
The Apriori Algorithm—An Example
Database TDB
Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
Minimum Support = 2 / 9 = 22%
15. Association Rule Mining 15
Apriori Algorithm
Input: Database of transactions – D, min_sup
Output: L, frequent itemsets
L1 = find_frequent_1-itemsets(D);
for(k=2;Lk-1≠ Ø; k++)
{
Ck = apriori_gen(Lk-1, min_sup);
for each transaction t Є D
{
Ct = subset(Ck, t)
for each candidate c Є Ct
c.count++;
}
Lk = {c Є Ck | c.count >= min_sup }
}
return L = UkLk;
16. Association Rule Mining 16
Apriori Algorithm
procedure apriori_gen(Lk-1 , min_sup)
for each itemset l1 Є Lk-1
for each itemset l2 Є Lk-1
if(l1[1]=l2[1]) ∧ (l1[2]=l2[2]) ∧ … (l1[k-1]< l2[k-1])
{
c = l1 join l2; // Join step
if has_infrequent_subset(c, Lk-i) then
delete c;// Prune step
else add c to Ck;
}
return Ck
procedure has_infrequent_subset(c, Lk-1)
for each (k-1) subset s of c
if s is not an element of Lk-1 then return TRUE;
return false;