1.9.association mining 1

Association Rule Mining 1
Association Rule Mining

Association Rules
 Finds Interesting associations / correlation
relationships among large sets of data
 Business Decision Making
 Example – Market Basket Analysis
 Items likely to be purchased
 Advertising strategy, Catalog Design, Store layout

Association Rules
 Forming Association rules
 Universe – all items
 Boolean Vector
 Example:
 Computer ⇒ Accounting_Software
[support = 5%, confidence = 60%]
 Minimum support and Confidence threshold

Basic Concepts
 I = {i1, i2, …im} – Set of Items
 D – Set of database Transactions
 T – Transaction contains a set of items and T ⊆ I
 Association rule – A ⇒ B where A ⊆ I B ⊆ I and A ∩ B = ∅
 Support – Percentage of transactions in D containing both A and B -
P(A∪B)
 Confidence – Percentage of transactions in D containing A that also
contain B – P(B/A)
 Confidence(A ⇒ B) = Support(A ∪ B) / Support (A)

Basic Concepts
 Itemset
 K-Itemset
 Occurrence frequency of an itemset
 Frequency, support_count (absolute support) or count
 Itemset satisfies minimum support when count >=
min_sup * number of transactions in D
 Minimum Support Count
 Frequent Itemset

Association Rule Mining Process
 Find all frequent itemsets
 Generate strong association rules from
frequent itemsets
 Satisfy Minimum Support and Minimum
Confidence

Itemsets
 Complete Itemsets
 Closed Frequent Itemset
 X is closed in a data set S if there exists no proper super itemset
Y such that Y has the same support count as X in S
 X is frequent
 Maximal Frequent Itemset
 X is Frequent and there exists no super-itemset Y such that X ⊂
Y and Y is frequent in S
 Example:
 T = { {a1,a2,…a100}, {a1,a2,…a50}}, min_sup = 1
 Closed frequent itemsets : Both {{a1,a2,…a100}:1, {a1,a2,…a50}: 2}
 Maximal frequent itemset: {a1,a2,…a100}

Types of Association Rules
 Types of Values
 Boolean, Quantitative Association Rule
 Dimensions of data
 Single Dimensional, Multi-dimensional
 Level of abstraction
 Multilevel association rules
 Based on kinds of rules
 Association rules, Correlation rules, Strong gradient relationships
 Based on completeness of patterns
 Complete, Closed, Maximal, top-k, constrained, approximate…

Mining Single Dimensional Boolean
Association Rules
 Apriori Algorithm – Finding Frequent Itemsets
using Candidate Generation
 Uses prior knowledge of frequent itemset properties
 Level wise search
 K itemsets used for exploring k+1 itemsets
 Frequent 1-itemsets – L1
 L1 is used to find L2

Apriori Property
 Reduces Search space
 All non empty subsets of a frequent itemset must also be
frequent
 If P(I) < min_sup then P(I U A) < min_sup
 Anti-monotone property – If a set cannot pass a test all
of its supersets will fail the test as well.
 Any subset of a frequent itemset must be frequent

Apriori property application
 Join Step
 To find Lk -join Lk-1 with itself - Ck
 li[j] – jth
item in li
 Members of Lk-1 are joinable if their first (k-2) items are
common
 Members l1 and l2 of Lk-1 are joinable if (l1[1]=l2[1]) ∧
(l1[2]=l2[2]) ∧ …(l1[k-2]=l2[k-2]) ∧ (l1[k-1]< l2[k-1])
 Resulting itemset is l1[1], l1[2], … l1[k-1], l2[k-1]

Apriori property application
 Prune Step
 Ck is a superset of Lk
 Determine the count of each candidate of Ck
 To reduce the size of Ck - if any (k-1) subset is not in Lk-1 it can be
removed from Ck

The Apriori Algorithm
 Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 2; Lk-1 !=∅; k++) do begin
Ck = candidates generated from Lk-1;
for each transaction t in database do
increment the count of all candidates in Ck
that are contained in t
Lk = candidates in Ck with min_support
end
return ∪k Lk;

The Apriori Algorithm—An Example
Database TDB
Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
Minimum Support = 2 / 9 = 22%

Apriori Algorithm
Input: Database of transactions – D, min_sup
Output: L, frequent itemsets
L1 = find_frequent_1-itemsets(D);
for(k=2;Lk-1≠ Ø; k++)
{
Ck = apriori_gen(Lk-1, min_sup);
for each transaction t Є D
{
Ct = subset(Ck, t)
for each candidate c Є Ct
c.count++;
}
Lk = {c Є Ck | c.count >= min_sup }
}
return L = UkLk;

Apriori Algorithm
procedure apriori_gen(Lk-1 , min_sup)
for each itemset l1 Є Lk-1
for each itemset l2 Є Lk-1
if(l1[1]=l2[1]) ∧ (l1[2]=l2[2]) ∧ … (l1[k-1]< l2[k-1])
{
c = l1 join l2; // Join step
if has_infrequent_subset(c, Lk-i) then
delete c;// Prune step
else add c to Ck;
}
return Ck
procedure has_infrequent_subset(c, Lk-1)
for each (k-1) subset s of c
if s is not an element of Lk-1 then return TRUE;
return false;

1.9.association mining 1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 1.9.association mining 1

Similar to 1.9.association mining 1 (20)

More from Krish_ver2

More from Krish_ver2 (20)

Recently uploaded

Recently uploaded (20)

1.9.association mining 1