Upcoming SlideShare
×

Lecture14 - Advanced topics in association rules

2,203 views
2,087 views

Published on

Published in: Education, Technology
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
2,203
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
236
0
Likes
2
Embeds 0
No embeds

No notes for slide

Lecture14 - Advanced topics in association rules

1. 1. Introduction to Machine Learning Lecture 14 Advanced Topics in Association Rules Mining Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
2. 2. Recap of Lecture 13 Ideas come from the market basket analysis ( y (MBA) ) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, Eggs, sugar bread bd Customer1 Customer2 Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between t e d e e t d assoc at o s a d co e at o s bet ee the different items that customers place in their shopping basket Slide 2 Artificial Intelligence Machine Learning
3. 3. Recap of Lecture 13 Itemset sup Itemset sup Database TDB Dtb {A} 2 L1 {A} 2 C1 Tid Items {B} 3 {B} 3 10 A, C A C, D {C} 3 {C} 3 1st scan 20 B, C, E {D} 1 {E} 3 30 A, B, C, E {E} 3 40 B, E Itemset sup C2 C2 Itemset te set {A, {A B} 1 L2 2nd scan Itemset sup {A, B} {A, C} 2 {A, C} 2 {A, C} {A, E} 1 {B, {B C} 2 {A, E} {B, C} 2 {B, E} 3 {B, C} {B, E} 3 {C, E} 2 {C, E} 2 {B, {B E} {C, E} Itemset te set L3 C3 3rd scan Itemset It t sup {B, C, E} {B, C, E} 2 Slide 3 Artificial Intelligence Machine Learning
4. 4. Recap of Lecture 13 Challenges g Apriori scans the data base multiple times Most ft M t often, there is a high number of candidates th i hi h b f did t Support counting for candidates can be time expensive Several methods try to improve this points by Reduce the number of scans of the data base Shrink the number of candidates Counting the support of candidates more efficiently Slide 4 Artificial Intelligence Machine Learning
5. 5. Today’s Agenda Starting a journey through some advanced topics in ARM Mining frequent patterns without candidate generation Multiple Level AR Sequential Pattern Mining Quantitative association rules Mining class association rules Beyond support & confidence B d t fid Applications Slide 5 Artificial Intelligence Machine Learning
6. 6. Revisiting Candidate Generation Remember A priori? p Use the previous frequent itemsets (k-1) to generate the k- itemsets te sets Count itemsets support by scanning the data base Bottleneck in the process: Candidate generation Suppose 100 items First level of the tree 100 nodes ⎛100 ⎞ Second level of the tree ⎜ ⎜2⎟ ⎟ ⎝ ⎠ ⎛100 ⎞ ⎜ ⎜k⎟ In general, number of k-itemsets: ⎟ ⎝ ⎠ Slide 6 Artificial Intelligence Machine Learning
7. 7. Can We Avoid Generation? Build an auxiliar structure to get statistics about the g itemsets in order to avoid candidate generation Use an FP-tree FP tree Avoid multiple scans of the data Divide-and-conquer methodology Avoid candidate generation Outline of the process: Generate an FP-Tree Mine the FP-tree Slide 7 Artificial Intelligence Machine Learning
8. 8. Building the FP-Tree TID Items Sorted FIS 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} 2 {A,B,C,F,L,M,O} {F,C,A,B,M} 3 {B,F,H,J,O} {F,B} 4 {B,C,K,S,P} {C,B,P} 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} Scan the DB for the first time and identify frequent itemsets. They are: <(f:4),(c:4), (a:3),(b:3),(m:3),(p:3)> We sort the items according to their frequency in the last column Slide 8 Artificial Intelligence Machine Learning
9. 9. Building the FP-Tree After reading TID1: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:1 2 {A,B,C,F,L,M,O} {F,C,A,B,M} 3 {B,F,H,J,O} {F,B} C:1 4 {B,C,K,S,P} {C,B,P} A:1 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} M:1 P:1 Scan again the DB to build the tree g Slide 9 Artificial Intelligence Machine Learning
10. 10. Building the FP-Tree After reading TID2: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:2 2 {A,B,C,F,L,M,O} {F,C,A,B,M} 3 {B,F,H,J,O} {F,B} C:2 4 {B,C,K,S,P} {C,B,P} A:2 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} B:1 M:1 B:1 P:1 Slide 10 Artificial Intelligence Machine Learning
11. 11. Building the FP-Tree After reading TID3: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:3 2 {A,B,C,F,L,M,O} {F,C,A,B,M} B:1 3 {B,F,H,J,O} {F,B} C:2 4 {B,C,K,S,P} {C,B,P} A:2 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} B:1 M:1 M:1 P:1 Slide 11 Artificial Intelligence Machine Learning
12. 12. Building the FP-Tree After reading TID4: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:3 C:1 2 {A,B,C,F,L,M,O} {F,C,A,B,M} B:1 3 {B,F,H,J,O} {F,B} C:2 B:1 4 {B,C,K,S,P} {C,B,P} A:2 P:1 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} B:1 M:1 M:1 P:1 Slide 12 Artificial Intelligence Machine Learning
13. 13. Building the FP-Tree After reading TID5: TID Items Sorted FIS root 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:4 C:1 2 {A,B,C,F,L,M,O} {F,C,A,B,M} B:1 3 {B,F,H,J,O} {F,B} C:3 B:1 4 {B,C,K,S,P} {C,B,P} A:3 P:1 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} B:1 M:2 M:1 P:2 Slide 13 Artificial Intelligence Machine Learning
14. 14. Building the FP-Tree TID Items Sorted FIS 1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} root 2 {A,B,C,F,L,M,O} {F,C,A,B,M} F:4 C:1 3 {B,F,H,J,O} {F,B} Item B:1 4 {B,C,K,S,P} {C,B,P} F C:3 C3 B:1 B1 5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} C A A:3 P:1 B B:1 M M:2 P M:1 P:2 Build and index to access quickly to the nodes and traverse the tree q y Slide 14 Artificial Intelligence Machine Learning
15. 15. Mining the FP-Tree Properties to mine the FP-tree p Node-link prop.: All possible itemsets in which the frequent item a is included can be found by following a’s node-links s c uded ca ou d oo g a s ode s root F:4 C:1 Item P has support of 3 B:1 F Two paths in the FP- C:3 B:1 tree for node P C {F,C,A,M} 1. A A:3 P:1 {C,B,P} {C B P} 2. 2 B B:1 M M:2 P M:1 P:2 Slide 15 Artificial Intelligence Machine Learning
16. 16. Mining the FP-Tree Prefix path p p To calculate the frequent p p prop.: q patterns for a node a in path P, only the prefix subpath of node of node a in P needs to be accumulated, and the frequency count of every node in the prefix path should carry the same count as node a root Node i i N d P is involved in: l di F:4 C:1 Item (F:4,C:3,A:3,M:2,P:2) B:1 F Take the prefix of the C:3 B:1 path until M C (F:4,C:3,A:3) A A:3 P:1 Adjust counts to 2 B B:1 (F:2,C:2,A:2) M M:2 So, F, C, and A co-ocur P M:1 with M P:2 Slide 16 Artificial Intelligence Machine Learning
17. 17. Mining the FP-Tree Fragment g g growth: Let α be an itemset in DB, B be α’s , conditional pattern base, and β be an itemset in B. Then, the support α U β is equivalent to the support of β in B. root t F:2 For M, we had , (F:2,C:2,A:2) C:2 (F:1,C:1,A:1,B:1) Therefore, A:2 {(F,C,A,M):2},{(F,C,M}:2}, B:1 … Slide 17 Artificial Intelligence Machine Learning
18. 18. Is FP-growth Faster than Apriori? As the support threshold goes down, the number of itemsets increases dramatically. FP-growth does not need to generate candidates and test them them. Slide 18 Artificial Intelligence Machine Learning
19. 19. Is FP-growth Faster than Apriori? Both FP-growth and A priori scale linearly with the number of transactions. But FP-growth is more efficient Slide 19 Artificial Intelligence Machine Learning
20. 20. Next Class Advanced topics in association rule mining Slide 20 Artificial Intelligence Machine Learning
21. 21. Introduction to Machine Learning Lecture 14 Advanced Topics in Association Rules Mining Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull