Introduction to Machine
       Learning
                      Lecture 16
 Advanced Topics in Association Rules Mining

   ...
Recap of Lecture 13-15
        Ideas come from the market basket analysis (
                                              ...
Recap of Lecture 15
        Aim: Find associations between items

        But wait!
                There are many differe...
Today’s Agenda
        Continuing our journey through some advanced
        topics in ARM
                Mining frequent ...
Introduction to Seq. AR
        So far, we have seen
              ,
                Apriori
                Fp-growth
   ...
An Example in Web Usage Mining




           Web sequence: < {Homepage} {Electronics} {Computers}
           {Laptops} {S...
Definition
        Defining the problem:
               g     p
                Let I = {i1, i2, …, im} be a set of items
...
Definition
        Defining the problem:
               g     p
                Size: The size of a sequence is the number...
Example
        Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}.
                {, , , , , , , , }
        Sequence 〈{3}{4, 5}{8}〉 is...
Objective
        Objective of sequential pattern mining (SPM)
          j            q        p            g(    )
      ...
Example
Customer             Transaction         Transaction          Customer       Customer Sequence
   ID              ...
GSP
        GSP follows closely Apriori but for sequential patterns
                          yp                  q       ...
The Algorithm




                           Does this remind you Apriori?

                                              ...
Quantitative AR

            Transaction ID      Age        Married       NumCars
                          1      23     ...
Map to Boolean Values

 Record                Age
                        g         Age
                                  ...
Problems with this Approach
        MinSup
                If number of intervals is large,
                the support of...
Problems with this Approach
        How we can solve this problem?
                Increase the number of intervals
      ...
Second Approach
        Other solutions?
                Well, the problem was that intervals were not the best ones
     ...
Third Approach
        And what if we do not map the input to a boolean
                                p       p
        ...
Mining Class Association Rules
        So far, we have seen ARM without any specific target
              ,               ...
Beyond Support and Confidence
        Support and Confidence are the basic measures of
           pp
        interestingne...
Some Applications
        Wal-Mart has used the technique
        for years to mine POS data and
        arrange their sto...
Some Applications
        Power System Restoration
               y
                PSR is a multi-objective, multi-period...
Some Applications
        Correlations with color, spatial relationships, etc.
        From coarse to Fine Resolution mini...
Next Class



        Clustering




                                               Slide 25
Artificial Intelligence     M...
Introduction to Machine
       Learning
                      Lecture 16
 Advanced Topics in Association Rules Mining

   ...
Upcoming SlideShare
Loading in …5
×

Lecture16 - Advances topics on association rules PART III

1,497 views
1,441 views

Published on

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,497
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
160
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Lecture16 - Advances topics on association rules PART III

  1. 1. Introduction to Machine Learning Lecture 16 Advanced Topics in Association Rules Mining Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  2. 2. Recap of Lecture 13-15 Ideas come from the market basket analysis ( y (MBA) ) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, Eggs, sugar bread bd Customer1 Customer2 Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between t e d e e t d assoc at o s a d co e at o s bet ee the different items that customers place in their shopping basket Slide 2 Artificial Intelligence Machine Learning
  3. 3. Recap of Lecture 15 Aim: Find associations between items But wait! There are many different diapers Dodot, Huggies … gg There are many different beers: heineken, desperados, king fisher … in bottle/can … , p , g Clothes Which rule do you prefer? diapers ⇒ beer Outwear Shirts dodot diapers M ⇒ Dam beer in Can Jackets Ski Pants Which will have greater support? Slide 3 Artificial Intelligence Machine Learning
  4. 4. Today’s Agenda Continuing our journey through some advanced topics in ARM Mining frequent patterns without candidate generation Multiple Level AR Sequential Pattern Mining Quantitative association rules Mining class association rules Beyond support & confidence B d t fid Applications Slide 4 Artificial Intelligence Machine Learning
  5. 5. Introduction to Seq. AR So far, we have seen , Apriori Fp-growth F th Mining multiple level AR But none of them consider the order of transactions However, However is the sequence important? Whether the hen or the egg? Sometimes, really important Analyze the sequence of items bought buy a customer Web usage mining searches for navigational patterns of users Slide 5 Artificial Intelligence Machine Learning
  6. 6. An Example in Web Usage Mining Web sequence: < {Homepage} {Electronics} {Computers} {Laptops} {Sony Vaio} {Order Confirmation} {Return to Shopping} > Slide 6 Artificial Intelligence Machine Learning
  7. 7. Definition Defining the problem: g p Let I = {i1, i2, …, im} be a set of items Sequence: A ordered li t of itemsets S An d d list f it t Itemset/element: A non-empty set of items X ⊆ I. We denote a sequence s b < 1a2…ar> where ai i an it by <a >, h is itemset, which i also t hi h is l called an element of s An l A element ( an it t (or itemset) of a sequence is denoted by { 1, x2, t) f id t d b {x …, xk}, where xj ∈ I is an item We W assume without loss of generality th t it ith t l f lit that items in an element i l t of a sequence are in lexicographic order Slide 7 Artificial Intelligence Machine Learning
  8. 8. Definition Defining the problem: g p Size: The size of a sequence is the number of elements (or itemsets) in the seque ce e se s) e sequence Length: The length of a sequence is the number of items in the seque ce sequence A sequence of length k is called k-sequence A sequence s1 = 〈 1a2…ar〉 i a subsequence of another 〈a is b f th sequence s2 = 〈b1b2…bv〉, or s2 is a supersequence of s1, if there e st integers 1 ≤ j1 < j2 < … < jr 1 < jr ≤ v such t at a1 ⊆ t e e exist tege s suc that r−1 bj1, a2 ⊆ bj2, …, ar ⊆ bjr. We also say that s2 contains s1 Slide 8 Artificial Intelligence Machine Learning
  9. 9. Example Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}. {, , , , , , , , } Sequence 〈{3}{4, 5}{8}〉 is contained in (or is a subsequence of) 〈{6} {3 7}{9}{4 5 8}{3 8}〉 {3, 7}{9}{4, 5, 8}{3, because {3} ⊆ {3, 7}, {4, 5} ⊆ {4, 5, 8}, and {8} ⊆ {3, 8}. However, 〈{3}{8}〉 is not contained in 〈{3, 8}〉 or vice versa. The size of the sequence 〈{3}{4, 5}{8}〉 is 3, and the length of the sequence is 4 Slide 9 Artificial Intelligence Machine Learning
  10. 10. Objective Objective of sequential pattern mining (SPM) j q p g( ) Input: A set S of input data sequences (or sequence database) Goal: the G l th problem of mining sequential patterns i t fi d all th bl f ii ti l tt is to find ll the sequences that have a user-specified minimum support Each E h such sequence is called a frequent sequence, or a h i ll d f t sequential pattern The support for a sequence is the fraction of total data sequences in S that contains this sequence Slide 10 Artificial Intelligence Machine Learning
  11. 11. Example Customer Transaction Transaction Customer Customer Sequence ID time (items bought) ID 1 July 20, 2005 30 1 < (30) (90)> 1 July 25, 2005 90 2 <(10 20) (30) (40 60 70)> 2 July 9, 2005 y, 10, 20 , 3 <(30 50 70)> ( ) 2 July 14, 2005 30 4 <(30) (40 70) (90)> 2 July 20, 2005 40,60,70 5 <(90)> 3 July 25, 2005 30,50,70 4 July 25, 2005 30 4 July 29, 2005 y, 40, 70 , 4 August 2, 2005 90 5 July 12, 2005 90 Sequential patterns with support >25% 1-sequence < (30)> <(40)> <(70)> <(90)> 2-sequence <(30)(40)> <(30)(70)><(30)(90)><(40 70)> 3-sequence <(30) (40 70)> Example borrowed from Bing Liu Slide 11 Artificial Intelligence Machine Learning
  12. 12. GSP GSP follows closely Apriori but for sequential patterns yp q p If a sequence S is not frequent, then none of the super- seque ces of s eque sequences o S is frequent For instance, if <ab> is infrequent so do <acb> and <(ca)b> GSP follows the next steps: f ll th tt Initially, every item in DB is a candidate of length-1 For each level (i.e., sequences of length-k) do Scan database to collect support count for each candidate sequence Generate candidate length-(k+1) sequences from length-k frequent sequences using Apriori Repeat until no frequent sequence or no candidate can be found Strength: Candidate pruning by Apriori Slide 12 Artificial Intelligence Machine Learning
  13. 13. The Algorithm Does this remind you Apriori? Slide 13 Artificial Intelligence Machine Learning
  14. 14. Quantitative AR Transaction ID Age Married NumCars 1 23 No 1 2 25 Yes 1 3 29 No 0 4 34 Yes 2 5 38 Yes Y 2 <Age: 30..39> and <Married: Yes> => <NumCars: 2> Support = 40% Conf = 100% 40%, How can we deal with these data? Slide 14 Artificial Intelligence Machine Learning
  15. 15. Map to Boolean Values Record Age g Age g Married Married NumCars NumCars ID [20..29] [30..39] Yes No 0 1 100 1 0 0 1 0 1 200 1 0 1 0 0 1 300 1 0 0 1 1 0 400 0 1 1 0 0 0 500 0 1 1 0 0 0 Now, Now use any system for mining boolean AR Apriori FP-growth Slide 15 Artificial Intelligence Machine Learning
  16. 16. Problems with this Approach MinSup If number of intervals is large, the support of a single interval can be lower MinConf Information lost during partition values into intervals. Confidence can be lower as number of intervals is smaller Example In the used partition: <NumCars:0> ⇒ <Married:No> c=100% But now, assume that in the partition, NumCars:0 and NumCars:1 go to the same interval <NumCars:0,1> ⇒ <Married:No> c=66.67% Slide 16 Artificial Intelligence Machine Learning
  17. 17. Problems with this Approach How we can solve this problem? Increase the number of intervals (to reduce information lost) while combining adjacent ones (t i hil bi i dj t (to increase support) t) ExecTime blows up as items per record increases ManyRules: Number of rules also blows up. Many of them will not be interesting Slide 17 Artificial Intelligence Machine Learning
  18. 18. Second Approach Other solutions? Well, the problem was that intervals were not the best ones Let’s t t L t’ try to create the best intervals f our d t t th b t i t l for data How? Discretizing/Clustering techniques Apply a discretizing/clustering technique to find the best y g g partitions Employ those partitions We’ll see how clustering techniques work in the next class. So, keep this in mind and p p pitch the p pieces together next class! g Slide 18 Artificial Intelligence Machine Learning
  19. 19. Third Approach And what if we do not map the input to a boolean p p space? Create interval based association interval-based rules directly So, So decide the best interval and and, then, count the support Usually, Usually these approaches do not provide all the association rules, but the ones with larger support and confidence f Fuzzy logics can also be applied here. But again, we’ll see GFS in two three lectures Slide 19 Artificial Intelligence Machine Learning
  20. 20. Mining Class Association Rules So far, we have seen ARM without any specific target , yp g It finds all possible rules that exist in data, i.e., any item can appear as a consequent or a condition of a rule However, what if we are interested in some specific targets? E.g.: Eg: The user has a set of text documents from some known topics. He/she wants to find out what words are associated or correlated with each topic So, now, we want to find: X ⇒ y, where X ⊆ I, and y ∈ Y The algorithms are very similar to those of ARM We are not going to see them in class. But you have information on the estudy Slide 20 Artificial Intelligence Machine Learning
  21. 21. Beyond Support and Confidence Support and Confidence are the basic measures of pp interestingness But many more have been proposed during the last few years Slide 21 Artificial Intelligence Machine Learning
  22. 22. Some Applications Wal-Mart has used the technique for years to mine POS data and arrange their store to maximize sales from such analysis Medical databases to discover commonly occurring diseases amongst groups of people Lottery results databases, to discover those lucky combinations of L tt lt d t b t di th lk bi ti f numbers Slide 22 Artificial Intelligence Machine Learning
  23. 23. Some Applications Power System Restoration y PSR is a multi-objective, multi-period, nonlinear, mixed integer op optimization p ob e with various co s a s a d a o problem a ous constraints and unforeseeable factors Discovering o assoc a o s that help bu d heuristics for PSR sco e g of associations a e p build eu s cs o S Actions in a PSR start_black_start_unit(x) start black start unit(x) energize_line(x) pick_up_load(x) pick up load(x) synchronize(x,y) connect_tie_line(x) connect tie line(x) crank_unit(x) energize_busbar(x) energize busbar(x) Slide 23 Artificial Intelligence Machine Learning
  24. 24. Some Applications Correlations with color, spatial relationships, etc. From coarse to Fine Resolution mining Slide 24 Artificial Intelligence Machine Learning
  25. 25. Next Class Clustering Slide 25 Artificial Intelligence Machine Learning
  26. 26. Introduction to Machine Learning Lecture 16 Advanced Topics in Association Rules Mining Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

×