Association Rule Mining in Data Mining

Association Rule Mining
Ayesha Ali

Association Analysis
• Discovery of Association Rules
– showing attribute-value conditions that occur
frequently together in a set of data, e.g. market
basket
– Given a set of data, find rules that will predict the
occurrence of a data item based on the
occurrences of other items in the data
• A rule has the form body ⇒head
– buys(Omar, “milk”) ⇒ buys(Omar, “sugar”)

Location Business Type
1 Barber, Bakery, Convenience Store, Meat Shop, Fast Food
2 Bakery, Bookstore, Petrol Pump, Convenience Store, Library, Fast Food
3 Carpenter, Electrician, Barber, Hardware Store,
4 Bakery, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop
5 Convenience Store, Hospital, Pharmacy, Sports Shop, Gym, Fast Food
6 Internet Café, Gym, Games Shop, Shorts Shop, Fast Food, Bakery
Association Rule: X Y ; (Fast Food, Bakery)  (Convenience Store)
Support S: Fraction of items that contain both X and Y = P(X U Y)
S(Fast Food, Bakery, Convenience Store) = 2/6 = .33
Confidence C: how often items in Y appear in locations that contain X = P(X U Y)
C[(Fast Food, Bakery)  (Convenience Store)] = P(X U Y) / P(X)
= 0.33/0.50 = .66

• Given a set of transactions T, the goal of
association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold
• Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
⇒ Computationally prohibitive!

Location Business Type
1 Barber, Bakery, Convenience Store, Meat Shop, Fast Food, Meat Shop
2 Bakery, Bookstore, Petrol Pump, Convenience Store, Library, Fast Food
3 Carpenter, Electrician, Barber, Hardware Store, Meat Shop
4 Bakery, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop
5 Convenience Store, Hospital, Pharmacy, Sports Shop, Gym, Fast Food
6 Internet Café, Gym, Sweets Shop, Shorts Shop, Fast Food, Bakery
Association Rules:
(Fast Food, Bakery)  (Convenience Store) Support S: .33 Confidence C: .66
(Convenience Store, Bakery)  (Fast Food) Support S: .33 Confidence C: .50
(Fast Food, Convenience Store)  (Bakery) Support S: .33 Confidence C: .55
(Convenience Store)  (Fast Food, Bakery) Support S: .33 Confidence C: .66
(Fast Food)  (Convenience Store, Bakery) Support S: .33 Confidence C: 1
(Bakery)  (Fast Food, Convenience Store) Support S: .33 Confidence C: .66

Association Rules:
(Fast Food, Bakery)  (Convenience Store) Support S: .33 Confidence C: .66
(Convenience Store, Bakery)  (Fast Food) Support S: .33 Confidence C: .50
(Fast Food, Convenience Store)  (Bakery) Support S: .33 Confidence C: .66
(Convenience Store)  (Fast Food, Bakery) Support S: .33 Confidence C: .66
(Fast Food)  (Convenience Store, Bakery) Support S: .33 Confidence C: 1
(Bakery)  (Fast Food, Convenience Store) Support S: .33 Confidence C: .66
Observations
 Above rules are binary partitions of given item set
 Identical Support but different Confidence
 Support and Confidence thresholds may be different

Mining Association Rules
• Two-step approach:
Step 1. Frequent Itemset Generation
Generate all itemsets whose support ≥ minsup
Step 2. Rule Generation
Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset
Note: Frequent itemset generation is still computationally expensive

• Frequent Item Generation
Lattice Graph of possible item sets

• Brute-force approach:
– Each node in the lattice graphs is a candidate frequent itemset
– Count the support of each candidate by scanning the database
– N = 6
– w = (Barber, Bakery, Convenience Store, Meat Shop, Fast Food, Bookstore, Petrol Pump, Library,
Carpenter, Electrician, Hardware Store, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop,
Hospital, Pharmacy, Sports Shop, Gym, Internet Café) = 20
– M = 220 = 1048576
– Complexity ~ O (NMw)

W Unique Items in Item set

• Frequent Itemset Generation
– Reduce the number of candidates (M)
– Reduce the number of transactions/locations (N)
– Reduce the number of comparisons (NM)
• Use efficient data structures to store the candidates
• No need to match every candidate against every
transaction/location

Reducing the number of candidates
• Apriori principle:
– If an itemset is frequent, then all of its subsets must
also be frequent
• Important Support property:
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of
support

Applying Apriori principle

• N = 20
• All Possible candidate sets;
– NC1 + NC2 + NC3 + … + NCN
• Minimum Occurrence Based Filtering
Set m= 2 and L = 1
While (L < N){
Scan DB:
List = Create Occurrence Frequency Table of candidate sets of Length L
If no candidate in List then Break;
Filter all candidate sets with Occurrence Frequency < m
Create new candidate set of Length (L=L+1) from List
}

Filter Minimum Occurrences
m < 2
Business Type Count
Barber 2
Bakery 2
Book tore 1
Carpenter 1
Convenience
Store
3
Electrician 1
Fast Food 3
Flower Shop 1
Gym 1
Games Shop 1
Hardware Store 1
Hospital 1
Internet Café 1
Library 1
Meat Shop 1
Petrol Pump 1
Pharmacy 1
Sports Shop 1
Sweets Shop 1
Vegetable Market 1
Business Type Count
Barber 2
Bakery 2
Convenience Store 3
Fast Food 3
Filter
Scan 1
Business Type Count
(Barber, Bakery) 1
(Barber, Convenience Store) 1
(Barber, Fast Food) 1
(Bakery, Convenience Store) 2
(Bakery, Fast Food) 3
(Convenience Store, Fast Food) 3
Pairs of Two Items; 4C2 = 6
Business Type Count
(Bakery, Convenience Store) 2
(Bakery, Fast Food) 3
(Convenience Store, Fast Food) 3
Filter Minimum Occurrences
m < 2
L1
L2

Association Rule Mining in Data Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Association Rule Mining in Data Mining

Similar to Association Rule Mining in Data Mining (15)

Recently uploaded

Recently uploaded (20)

Association Rule Mining in Data Mining