SlideShare a Scribd company logo
Association Rules: Advanced Topics
Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Lecture outline                                 2

   Frequent patterns without candidate generation
   Multilevel association rules
   Correlation rules
   Sequential rules

                    Prof. Pier Luca Lanzi
Is Apriori Fast Enough?                              4
Performance Bottlenecks

 The core of the Apriori algorithm
     Use frequent (k–1)-itemsets to generate
     candidate frequent k-itemsets
     Use database scan and pattern matching
     to collect counts for the candidate itemsets

 The bottleneck of Apriori: candidate generation
     Huge candidate sets:
       • 104 frequent 1-itemset will generate
         107 candidate 2-itemsets
       • To discover a frequent pattern of size 100, e.g.,
         {a1, a2, …, a100}, needs to generate 2100 1030 candidates.
     Multiple scans of database:
       • Needs (n +1 ) scans, n is the length of the longest pattern

                    Prof. Pier Luca Lanzi
Mining Frequent Patterns                      5
Without Candidate Generation

 Compress a large database into a compact,
  Frequent-Pattern tree (FP-tree) structure
     Highly condensed, but complete
     for frequent pattern mining
     Avoid costly database scans

 Develop an efficient, FP-tree-based
  frequent pattern mining method

 A divide-and-conquer methodology:
  decompose mining tasks into smaller ones

 Avoid candidate generation: sub-database test only

                   Prof. Pier Luca Lanzi
The FP-growth Algorithm                          6

 Leave the generate-and-test paradigm of Apriori
 Data sets are encoded using a compact structure, the FP-tree
 Frequent itemsets are extracted directly from the FP-tree

 Major Steps to mine FP-tree
    Construct conditional pattern base
    for each node in the FP-tree
    Construct conditional FP-tree
    from each conditional pattern-base
    Recursively mine conditional FP-trees and
    grow frequent patterns obtained so far
    If the conditional FP-tree contains a single path,
    simply enumerate all the patterns

                   Prof. Pier Luca Lanzi
Building the FP-tree                                                  7

TID    Items             After reading TID=1:
1      {A,B}
2     {B,C,D}
3     {A,C,D,E}
4     {A,D,E}                                             B:1
5     {A,B,C}
6     {A,B,C,D}
                        After reading TID=2:
7      {B,C}                                              null

8     {A,B,C}
                                                    A:1                    B:1
9     {A,B,D}
10     {B,C,E}

 Items are sorted in decreasing              B:1                                C:1
  support counts

                      Prof. Pier Luca Lanzi
Building the FP-tree                                                   8

TID    Items      After reading TID=3:
1      {A,B}
2     {B,C,D}                                 A:2                 B:1
3     {A,C,D,E}
4     {A,D,E}                                         C:1
                                       B:1                                  C:1
5     {A,B,C}
6     {A,B,C,D}
                                                            D:1                   D:1
7      {B,C}
8     {A,B,C}
9     {A,B,D}                                                     E:1

10     {B,C,E}

                      Prof. Pier Luca Lanzi
Building the FP-tree                                         9

TID     Items       Pointers are used to assist
 1      {A,B}        frequent itemset generation
 2     {B,C,D}      The FP-tree is typically smaller
 3     {A,C,D,E}     than the data
 4     {A,D,E}
 5     {A,B,C}
 6     {A,B,C,D}                       A:7                        B:3
 7      {B,C}
 8     {A,B,C}
 9     {A,B,D}             B:5                                            C:3
                                                C:1    D:1
10      {B,C,E}
Item     Pointer                                 D:1    E:1
                                    D:1                                       E:1
  B                                              E:1
  C                  D:1

                        Prof. Pier Luca Lanzi
Mining the FP-Tree                                        10

 Start from the bottom, from E                                 minsup=2
 Compute the support count, by adding the
  counts associated to E
 Since, E is frequent we compute
  the frequent itemsets from the FP-tree

                                    A:7                         B:3

                         B:5                                            C:3
                                             C:1    D:1

Item   Pointer                                D:1    E:1
                                 D:1                                        E:1
  B                                           E:1
  C                D:1

                     Prof. Pier Luca Lanzi
Mining the FP-tree                                       11

 Considers the path to E,                                     minsup=2
  it is frequent
 Extracts the conditional pattern base

                                   A:7                         B:3

                        B:5                                            C:3
                                            C:1    D:1

Item   Pointer                               D:1    E:1
                                D:1                                        E:1
  B                                          E:1
  C               D:1

                    Prof. Pier Luca Lanzi
Mining the FP-tree                                       12

 From the conditional pattern base                            minsup=2
  build the conditional FP-tree
 Delete unfrequent items

A:2                   B:1                  A:2                    C:1

      C:1    D:1                                 C:1    D:1

       D:1                                        D:1

                   Prof. Pier Luca Lanzi
Mining the FP-tree                                              13

 Use the conditional FP-tree to extract                              minsup=2
  the frequent itemsets ending with DE,
  CE, and AE

  A:2                            C:1                A:2

           C:1       D:1                                   C:1        D:1

             D:1                                             D:1

        Conditional FP-tree for E                  Prefix paths ending in DE

                           Prof. Pier Luca Lanzi
Mining the FP-tree                                                     14

   Consider the suffix DE, it is frequent                                  minsup=2
   Build the conditional FP-tree for DE
   The last tree contains only A which is frequent
   So far we have obtained three frequent
    itemsets, {E, DE, ADE}

     A:2                                           A:2            C is unfrequent so
                                                                  it is deleted

            C:1       D:1                                  C:1


    Prefix paths ending in de                     Conditional FP-tree for de

                          Prof. Pier Luca Lanzi
Mining the FP-tree                                                   15

   After DE, the suffix CE is considered
   CE is frequent and thus added to the frequent itemsets
   Then, we search for the itemsets ending with AE
   At the end the frequent itemsets ending with E are {E,
    DE, ADE, CE, AE}

    A:2                            C:1               A:2


          Prefix paths ending in CE                  Prefix paths ending in AE

                             Prof. Pier Luca Lanzi
Mining Frequent Patterns Using FP-tree         16

 General idea (divide-and-conquer)
    Recursively grow frequent pattern path using the FP-tree
 Method
    For each item, construct its conditional pattern-base,
    and then its conditional FP-tree
    Repeat the process on each newly created
    conditional FP-tree
    Until the resulting FP-tree is empty, or it contains only
    one path (single path will generate all the combinations of
    its sub-paths, each of which is a frequent pattern)

                   Prof. Pier Luca Lanzi
Minining the FP-tree                                26

 Start at the last item in the table
 Find all paths containing item
      Follow the node-links
 Identify conditional patterns
      Patterns in paths with required frequency
 Build conditional FP-tree C
 Append item to all paths in C, generating frequent patterns
 Mine C recursively (appending item)
 Remove item from table and tree

                     Prof. Pier Luca Lanzi
FP-growth vs. Apriori:                                                     27
Scalability With the Support Threshold


                    90                                               D1 FP-grow th runtime
                                                                     D1 Apriori runtime

   Run time(sec.)






                          0   0.5             1        1.5       2           2.5             3
                                              Support threshold(%)

                                    Prof. Pier Luca Lanzi
Benefits of the FP-tree Structure            28

 Completeness
    Preserve complete information
    for frequent pattern mining
    Never break a long pattern of any transaction
 Compactness
    Reduce irrelevant info—infrequent items are gone
    Items in frequency descending order: the more frequently
    occurring, the more likely to be shared
    Never be larger than the original database (not count
    node-links and the count field)
    For Connect-4 DB, compression ratio could be over 100

                  Prof. Pier Luca Lanzi
Why Is FP-Growth the Winner?                   30

 Divide-and-conquer:
     decompose both the mining task and DB
     according to the frequent patterns obtained so far
     leads to focused search of smaller databases
 Other factors
     no candidate generation, no candidate test
     compressed database: FP-tree structure
     no repeated scan of entire database
     basic ops—counting local freq items and building
     sub FP-tree, no pattern search and matching

                   Prof. Pier Luca Lanzi
Multiple-Level Association Rules                                       32

 Items often form hierarchy
 Items at the lower level are expected to
  have lower support
 Rules regarding itemsets at appropriate
  levels could be quite useful
 Transaction database can be encoded
  based on dimensions and levels
 We can explore shared multi-level mining


                                                     milk                      bread

                                              skim            2%           wheat       white

                                                     Fraser        Sunset

                      Prof. Pier Luca Lanzi
Concept Hierarchy                                                           33



                              Milk                Computers                               Home

Wheat     White   Skim               2%
                                                     Desktop   Laptop Accessory      TV   DVD

                  Foremost                Kemps
                                                                  Printer   Scanner

                             Prof. Pier Luca Lanzi
Why incorporate concept hierarchy?                          34

 Rules at lower levels may not have enough support
  to appear in any frequent itemsets

 Rules at lower levels of the hierarchy are overly specific,

                 {2% milk}                  {wheat bread}

  is indicative of association between milk and bread

                    Prof. Pier Luca Lanzi
Multi-level Association Rules                  35

 How do support and confidence vary as we traverse the
  concept hierarchy?

     If X is the parent item for both X1 and X2, then
       (X) ≤ (X1) + (X2)

     If       (X1 Y1) ≥ minsup,
     and     X is parent of X1, Y is parent of Y1
     then     (X Y1) ≥ minsup, (X1 Y) ≥ minsup
              (X Y) ≥ minsup

     If      conf(X1        Y1) ≥ minconf,
     then    conf(X1        Y) ≥ minconf

                  Prof. Pier Luca Lanzi
Multi-level Association Rules                      36

 Approach 1:
     Extend current association rule formulation by augmenting each
     transaction with higher level items
     Original Transaction: {skim milk, wheat bread}
     Augmented Transaction:
     {skim milk, wheat bread, milk, bread, food}

 Issues:
     Items that reside at higher levels have
     much higher support counts
     If support threshold is low, too many frequent patterns
     involving items from the higher levels
     Increased dimensionality of the data

                     Prof. Pier Luca Lanzi
Multi-level Association Rules                  37

 Approach 2:
    Generate frequent patterns at highest level first
    Then, generate frequent patterns
    at the next highest level, and so on

 Issues:
     I/O requirements will increase dramatically because
     we need to perform more passes over the data
     May miss some potentially interesting cross-level
     association patterns

                   Prof. Pier Luca Lanzi
Mining Multi-Level Associations                  38

 A top down, progressive deepening approach

 First find high-level strong rules:

                {milk}         {bread} [20%, 60%]

 Then find their lower-level ―weaker‖ rules:

           {2% milk}           {wheat bread} [6%, 50%]

                    Prof. Pier Luca Lanzi
Multi-level Association:                         41
Redundancy Filtering

 Some rules may be redundant due to ―ancestor‖
  relationships between items

 Example
     milk   wheat bread
     [support = 8%, confidence = 70%]
     2% milk    wheat bread
     [support = 2%, confidence = 72%]

 We say the first rule is an ancestor of the second rule.

 A rule is redundant if its support is close to the ―expected‖
  value, based on the rule’s ancestor.

                    Prof. Pier Luca Lanzi
Multi-level mining with uniform support               42

Level 1                                       Milk
min_sup = 5%                             [support = 10%]

Level 2                    2% Milk                         Skim Milk
min_sup = 5%        [support = 6%]                   [support = 4%]

                 Prof. Pier Luca Lanzi
Multi-level mining with reduced support               43

Level 1                                       Milk
min_sup = 5%                             [support = 10%]

Level 2                    2% Milk                         Skim Milk
min_sup = 3%        [support = 6%]                   [support = 4%]

                 Prof. Pier Luca Lanzi
Multi-level Association:                                44
Uniform Support vs. Reduced Support

 Uniform Support: the same minimum support for all levels
     + One minimum support threshold. No need to examine
     itemsets containing any item whose ancestors do not
     have minimum support.
     – Lower level items do not occur as frequently. If support
       • too high      miss low level associations
       • too low      generate too many high level associations
 Reduced Support: reduced minimum support at lower levels
    There are 4 search strategies:
       •   Level-by-level independent
       •   Level-cross filtering by k-itemset
       •   Level-cross filtering by single item
       •   Controlled level-cross filtering by single item

                       Prof. Pier Luca Lanzi
Multi-Level Mining:                            46
Progressive Deepening

 A top-down, progressive deepening approach

 First mine high-level frequent items:
  milk (15%), bread (10%)

 Then mine their lower-level ―weaker‖ frequent itemsets:
  2% milk (5%), wheat bread (4%)

 Different min support threshold across
  multi-levels lead to different algorithms:
      If adopting the same min_support across multi-levels
      then toss t if any of t’s ancestors is infrequent.
      If adopting reduced min_support at lower levels
      then examine only those descendents whose ancestor’s
      support is frequent/non-negligible.

                   Prof. Pier Luca Lanzi
Correlation Rules                                  48

 Suppose perform association rule mining

                     c          Not c        row

             t       20         5            25

             Not t   70         5            75

             col     90         10           100

  {tea}    {coffee} [20% 80%]

 But the 90% of the customers buy coffee anyway!
 A customer who buys tea is less likely (10% less)
  to buy coffee than a customer about whom we have
  no information

                     Prof. Pier Luca Lanzi
Correlation Rules                                  49

 One measure is to calculate correlation
 If two random variables A and B are independent,

                        P( A B)
                        P( A) P( B)

 For tea and coffee,

                      P(t c)
                      P(t ) P(c)

  this means that the presence of tea is negatively correlated
  with the presence of coffee

                    Prof. Pier Luca Lanzi
Correlation Rules                                   51

 Use the 2 statistical test for testing independence
 Let n be the number of baskets
 Let k be the number of items considered
 Let r be a rule consisting of item values (ij, ij)
 Let O(r) represent the number of baskets containing rule r
  (i.e. frequency)
 Let E[ij] = O(ij) for a single item (negation is n - E[ij])
 E[r] = n * E[r1]/n * … * E[rk] / n

                     Prof. Pier Luca Lanzi
Correlation Rules                                      52

 The   2   statistic is defined as

                        2          (O (r ) E[ r ]) 2
                             r   R       E[ r ]
 Look up for significance value in a statistical textbook
     There are k-1 degrees of freedom
     If test fails cannot reject independence, otherwise contigency
     table represents dependence.

                        Prof. Pier Luca Lanzi
Example                                                 53

 Back to tea and coffee

                            c          Not c      row
                  t         20         5          25
                  Not t     70         5          75
                  col       90         10         100

   E[t] = 25, E[Not t]=75, E[c]=90, E[Not c]=10
   E[tc]=100 * 25/100 * 90 /100=22.5
   O(tc) = 20
   Contrib. to 2 = (20 - 22.5)2 / 22.5 = 0.278
   Calculate for the rest to get 2=2.204
   Not significant at 95% level (3.84 for k=2)
   Cannot reject independence assumption

                          Prof. Pier Luca Lanzi
Interest                                       55

 If 2 test shows significance, then want to find most
  interesting cell(s) in table
 I(r) = O(r)/E[r]
      Look for values far away from 1
      I(t c) = 20/22.5 = 0.89
      I(t not c) = 5/2.5 = 2
      I(not t c) = 70/67.5 = 1.04
      I(not t not c) = 5/7.5 = 0.66

                   Prof. Pier Luca Lanzi
Properties of 2 statistic                        56

 Upward closed
    If a k-itemset is correlated, so are all it’s supersets
    Look for minimal correlated itemsets,
    no subset is correlated
 Can combine a-priori and 2 statistic efficiently
    No generate and prune as in support-confidence

                    Prof. Pier Luca Lanzi
Sequential pattern mining                      58

 Association rule mining does not consider
  the order of transactions

 In many applications such orderings are significant

 In market basket analysis, it is interesting to know whether
  people buy some items in sequence

 Example: buying bed first and then bed sheets later

 In Web usage mining,
     it is useful to find navigational patterns of users
     in a Web site from sequences of page visits of users

                   Prof. Pier Luca Lanzi
Examples of Sequence                          59

 Web sequence
    < {Homepage} {Electronics} {Digital Cameras}
    {Canon Digital Camera} {Shopping Cart} {Order
    Confirmation} {Return to Shopping} >

 Sequence of initiating events causing
  the nuclear accident at 3-mile Island:
  <{clogged resin} {outlet valve closure} {loss of feedwater}
  {condenser polisher outlet valve shut} {booster pumps trip}
  {main waterpump trips} {main turbine trips} {reactor
  pressure increases}>

 Sequence of books checked out at a library:
    <{Fellowship of the Ring} {The Two Towers}
    {Return of the King}>

                  Prof. Pier Luca Lanzi
Basic concepts                                  60

 Let I = {i1, i2, …, im} be a set of items.

 A sequence is an ordered list of itemsets.

 We denote a sequence s by a1, a2, …, ar ,
  where ai is an itemset, also called an element of s.

 An element (or an itemset) of a sequence is denoted by
  {x1, x2, …, xk}, where xj I is an item.

 We assume without loss of generality that items in an
  element of a sequence are in lexicographic order

                    Prof. Pier Luca Lanzi
Basic concepts                                        61

 Size
     The size of a sequence is the number of elements
     (or itemsets) in the sequence

 Length
     The length of a sequence is the
     number of items in the sequence
     A sequence of length k is called k-sequence

 Given two sequences, s1 = a1a2 … ar and s2 = b1b2 … bv

 s1 is a subsequence of s2 if there exist integers
  1 ≤ j1 < j2 < … < jr-1 < jr v such that
  a1 bj1, a2 bj2, …, ar bjr

 A sequence s1 = a1a2 … ar is a subsequence of
  another sequence s2 = b1b2 … bv , or s2 is a supersequence of s1, if
  there exist integers 1 ≤ j_1 < j_2 < … < j_{r 1} < jr v such that
  a1 bj1, a2 bj2, …, ar bjr. We also say that s2 contains s1.

                      Prof. Pier Luca Lanzi
Example                                                  62

 Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}.

 The sequence {3}{4, 5}{8} is contained in
  (or is a subsequence of) {6} {3, 7}{9}{4, 5, 8}{3, 8}

 In fact, {3}    {3, 7}, {4, 5}            {4, 5, 8}, and {8}   {3, 8}

 However, {3}{8} is not contained in {3, 8} or vice versa

 The size of the sequence {3}{4, 5}{8} is 3,
  and the length of the sequence is 4.

                    Prof. Pier Luca Lanzi
Objective of sequential pattern mining           63

 The input is a set S of input data sequences
  (or sequence database)

 The problem of mining sequential patterns is to find all the
  sequences that have a user-specified minimum support

 Each such sequence is called a frequent sequence,
  or a sequential pattern

 The support for a sequence is the fraction of total data
  sequences in S that contains this sequence

                   Prof. Pier Luca Lanzi
Terminology                                   64

 Itemset
     Non-empty set of items
     Each itemset is mapped to an integer

 Sequence
    Ordered list of itemsets

 Support for a Sequence
    Fraction of total customers that support a sequence.

 Maximal Sequence
    A sequence that is not contained in any other sequence

 Large Sequence
     Sequence that meets minisup
                  Prof. Pier Luca Lanzi
Example                                                                65

Customer    Transaction         Items              Customer ID   Customer Sequence
   ID          Time             Bought
                                                       1         < (30) (90) >
   1        June 25 '93           30
                                                       2         < (10 20) (30) (40 60 70) >
   1        June 30 '93           90
                                                       3         < (30) (50) (70) >
   2        June 10 '93         10,20
   2        June 15 '93           30                   4         < (30) (40 70) (90) >

   2        June 20 '93       40,60,70                 5         < (90) >

   3        June 25 '93       30,50,70
   4        June 25 '93           30
   4        June 30 '93         40,70                    Maximal seq with support > 40%

   4         July 25 '93          90                              < (30) (90) >

   5        June 12 '93           90                             < (30) (40 70) >

  Note: Use Minisup of 40%, no less than two customers must support the sequence
  < (10 20) (30) > Does not have enough support (Only by Customer #2)
  < (30) >, < (70) >, < (30) (40) > … are not maximal.

                           Prof. Pier Luca Lanzi
Methods for sequential pattern mining    66

 Apriori-based Approaches
 Pattern-Growth-based Approaches

                 Prof. Pier Luca Lanzi
The Generalized Sequential Pattern (GSP)                  70
Mining Algorithm

 The Apriori property for sequential patterns
      If a sequence S is not frequent, then
      none of the super-sequences of S is frequent
      For instance, if <hb> is infrequent
      so do <hab> and <(ah)b>

 GSP (Generalized Sequential Pattern) mining algorithm
    Initially, every item in DB is a candidate of length-1
    for each level (i.e., sequences of length-k) do
       • scan database to collect support count
         for each candidate sequence
       • generate candidate length-(k+1) sequences
         from length-k frequent sequences using Apriori
      repeat until no frequent sequence or no candidate can be found

 Major strength: Candidate pruning by Apriori

                      Prof. Pier Luca Lanzi
The Generalized Sequential Pattern (GSP)                        71
Mining Algorithm
 Step 1:
     Make the first pass over the sequence database D
     to yield all the 1-element frequent sequences
 Step 2:
   Repeat until no new frequent sequences are found
      Candidate Generation:
        • Merge pairs of frequent subsequences found in the (k-1)th pass to generate
          candidate sequences that contain k items
      Candidate Pruning:
        • Prune candidate k-sequences that contain infrequent (k-1)-subsequences
      Support Counting:
        • Make a new pass over the sequence database D to find the support for these
          candidate sequences
      Candidate Elimination:
        • Eliminate candidate k-sequences whose actual support is less than minsup

                         Prof. Pier Luca Lanzi
Timing Constraints (I)                                               72

     {A    B}       {C}      {D       E}                   xg: max-gap
            <= xg           >ng                            ng: min-gap

                    <= ms                                  ms: maximum span

xg = 2, ng = 0, ms= 4
           Data sequence                            Subsequence            Contain?

  < {2,4} {3,5,6} {4,7} {4,5} {8} >                  < {6} {5} >              Yes

       < {1} {2} {3} {4} {5}>                        < {1} {4} >              No

      < {1} {2,3} {3,4} {4,5}>                      < {2} {3} {5} >           Yes

    < {1,2} {3} {2,3} {3,4} {2,4}                    < {1,2} {5} >            No

                            Prof. Pier Luca Lanzi
Mining Sequential Patterns with               73
Timing Constraints

 Approach 1:
    Mine sequential patterns without timing constraints
    Postprocess the discovered patterns

 Approach 2:
    Modify GSP to directly prune candidates
    that violate timing constraints

                  Prof. Pier Luca Lanzi

More Related Content

What's hot

Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeJeffrey Breen
05211201 A D V A N C E D D A T A S T R U C T U R E S A N D A L G O R I...
05211201  A D V A N C E D  D A T A  S T R U C T U R E S   A N D   A L G O R I...05211201  A D V A N C E D  D A T A  S T R U C T U R E S   A N D   A L G O R I...
05211201 A D V A N C E D D A T A S T R U C T U R E S A N D A L G O R I...guestd436758
Automata And Compiler Design
Automata And Compiler DesignAutomata And Compiler Design
Automata And Compiler Designguestac67362
0015.register allocation-graph-coloring
0015.register allocation-graph-coloring0015.register allocation-graph-coloring
0015.register allocation-graph-coloringsean chen
Monad presentation scala as a category
Monad presentation   scala as a categoryMonad presentation   scala as a category
Monad presentation scala as a categorysamthemonad
famous placement papers
famous placement papersfamous placement papers
famous placement papersRamanujam Ramu
Memory efficient pytorch
Memory efficient pytorchMemory efficient pytorch
Memory efficient pytorchHyungjoo Cho
Data made out of functions
Data made out of functionsData made out of functions
Data made out of functionskenbot
Compiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementCompiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementEelco Visser
Dot Call interface
Dot Call interfaceDot Call interface
Dot Call interfaceHao Chai

What's hot (20)

Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
05211201 A D V A N C E D D A T A S T R U C T U R E S A N D A L G O R I...
05211201  A D V A N C E D  D A T A  S T R U C T U R E S   A N D   A L G O R I...05211201  A D V A N C E D  D A T A  S T R U C T U R E S   A N D   A L G O R I...
05211201 A D V A N C E D D A T A S T R U C T U R E S A N D A L G O R I...
Automata And Compiler Design
Automata And Compiler DesignAutomata And Compiler Design
Automata And Compiler Design
pattern mining
pattern miningpattern mining
pattern mining
Sequential Pattern Mining and GSP
Sequential Pattern Mining and GSPSequential Pattern Mining and GSP
Sequential Pattern Mining and GSP
0015.register allocation-graph-coloring
0015.register allocation-graph-coloring0015.register allocation-graph-coloring
0015.register allocation-graph-coloring
6th Semester CS / IS (2013-June) Question Papers
6th Semester CS / IS (2013-June) Question Papers6th Semester CS / IS (2013-June) Question Papers
6th Semester CS / IS (2013-June) Question Papers
Monad presentation scala as a category
Monad presentation   scala as a categoryMonad presentation   scala as a category
Monad presentation scala as a category
6th Semester (Dec-2015; Jan-2016) Computer Science and Information Science En...
6th Semester (Dec-2015; Jan-2016) Computer Science and Information Science En...6th Semester (Dec-2015; Jan-2016) Computer Science and Information Science En...
6th Semester (Dec-2015; Jan-2016) Computer Science and Information Science En...
famous placement papers
famous placement papersfamous placement papers
famous placement papers
Memory efficient pytorch
Memory efficient pytorchMemory efficient pytorch
Memory efficient pytorch
Data made out of functions
Data made out of functionsData made out of functions
Data made out of functions
6th Semester (June; July-2014) Computer Science and Information Science Engin...
6th Semester (June; July-2014) Computer Science and Information Science Engin...6th Semester (June; July-2014) Computer Science and Information Science Engin...
6th Semester (June; July-2014) Computer Science and Information Science Engin...
Compiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementCompiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory Management
Dot Call interface
Dot Call interfaceDot Call interface
Dot Call interface
Microprocessor 11el01
Microprocessor 11el01Microprocessor 11el01
Microprocessor 11el01
6th Semester (December; January-2014 and 2015) Computer Science and Informati...
6th Semester (December; January-2014 and 2015) Computer Science and Informati...6th Semester (December; January-2014 and 2015) Computer Science and Informati...
6th Semester (December; January-2014 and 2015) Computer Science and Informati...
6th Semester (June; July-2015) Computer Science and Information Science Engin...
6th Semester (June; July-2015) Computer Science and Information Science Engin...6th Semester (June; July-2015) Computer Science and Information Science Engin...
6th Semester (June; July-2015) Computer Science and Information Science Engin...
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R

Viewers also liked

Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rulesGautam Thakur
3.1 mining frequent patterns with association rules-mca4
3.1 mining frequent patterns with association rules-mca43.1 mining frequent patterns with association rules-mca4
3.1 mining frequent patterns with association rules-mca4Azad public school
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Gong Cheng
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methodsProf.Nilesh Magar
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with RYanchang Zhao

Viewers also liked (7)

Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
3.1 mining frequent patterns with association rules-mca4
3.1 mining frequent patterns with association rules-mca43.1 mining frequent patterns with association rules-mca4
3.1 mining frequent patterns with association rules-mca4
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R

Similar to Lecture 05 Association Rules Advanced Topics

Monfort Emath Paper2_printed
Monfort Emath Paper2_printedMonfort Emath Paper2_printed
Monfort Emath Paper2_printedFelicia Shirui
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodShani729
Presentasi Eclat
Presentasi EclatPresentasi Eclat
Presentasi Eclatbryanartika
Day 3 metric and customary assessment
Day 3 metric and customary assessmentDay 3 metric and customary assessment
Day 3 metric and customary assessmentErik Tjersland
Day 7 multiply binomials
Day 7 multiply binomialsDay 7 multiply binomials
Day 7 multiply binomialsErik Tjersland

Similar to Lecture 05 Association Rules Advanced Topics (10)

Monfort Emath Paper2_printed
Monfort Emath Paper2_printedMonfort Emath Paper2_printed
Monfort Emath Paper2_printed
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
Presentasi Eclat
Presentasi EclatPresentasi Eclat
Presentasi Eclat
Day 3 metric and customary assessment
Day 3 metric and customary assessmentDay 3 metric and customary assessment
Day 3 metric and customary assessment
EOCT Practice
EOCT PracticeEOCT Practice
EOCT Practice
Chapter 09-Trees
Chapter 09-TreesChapter 09-Trees
Chapter 09-Trees
Day 7 multiply binomials
Day 7 multiply binomialsDay 7 multiply binomials
Day 7 multiply binomials

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi

More from Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group

Lecture 05 Association Rules Advanced Topics

  • 1. Association Rules: Advanced Topics Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
  • 2. Lecture outline 2  Frequent patterns without candidate generation  Multilevel association rules  Correlation rules  Sequential rules Prof. Pier Luca Lanzi
  • 3. Is Apriori Fast Enough? 4 Performance Bottlenecks  The core of the Apriori algorithm Use frequent (k–1)-itemsets to generate candidate frequent k-itemsets Use database scan and pattern matching to collect counts for the candidate itemsets  The bottleneck of Apriori: candidate generation Huge candidate sets: • 104 frequent 1-itemset will generate 107 candidate 2-itemsets • To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, needs to generate 2100 1030 candidates. Multiple scans of database: • Needs (n +1 ) scans, n is the length of the longest pattern Prof. Pier Luca Lanzi
  • 4. Mining Frequent Patterns 5 Without Candidate Generation  Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure Highly condensed, but complete for frequent pattern mining Avoid costly database scans  Develop an efficient, FP-tree-based frequent pattern mining method  A divide-and-conquer methodology: decompose mining tasks into smaller ones  Avoid candidate generation: sub-database test only Prof. Pier Luca Lanzi
  • 5. The FP-growth Algorithm 6  Leave the generate-and-test paradigm of Apriori  Data sets are encoded using a compact structure, the FP-tree  Frequent itemsets are extracted directly from the FP-tree  Major Steps to mine FP-tree Construct conditional pattern base for each node in the FP-tree Construct conditional FP-tree from each conditional pattern-base Recursively mine conditional FP-trees and grow frequent patterns obtained so far If the conditional FP-tree contains a single path, simply enumerate all the patterns Prof. Pier Luca Lanzi
  • 6. Building the FP-tree 7 null TID Items After reading TID=1: 1 {A,B} A:1 2 {B,C,D} 3 {A,C,D,E} 4 {A,D,E} B:1 5 {A,B,C} 6 {A,B,C,D} After reading TID=2: 7 {B,C} null 8 {A,B,C} A:1 B:1 9 {A,B,D} 10 {B,C,E}  Items are sorted in decreasing B:1 C:1 support counts D:1 Prof. Pier Luca Lanzi
  • 7. Building the FP-tree 8 TID Items After reading TID=3: null 1 {A,B} 2 {B,C,D} A:2 B:1 3 {A,C,D,E} 4 {A,D,E} C:1 B:1 C:1 5 {A,B,C} 6 {A,B,C,D} D:1 D:1 7 {B,C} 8 {A,B,C} 9 {A,B,D} E:1 10 {B,C,E} Prof. Pier Luca Lanzi
  • 8. Building the FP-tree 9 TID Items  Pointers are used to assist 1 {A,B} frequent itemset generation 2 {B,C,D}  The FP-tree is typically smaller 3 {A,C,D,E} than the data 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} A:7 B:3 7 {B,C} 8 {A,B,C} 9 {A,B,D} B:5 C:3 C:1 D:1 10 {B,C,E} D:1 C:3 Item Pointer D:1 E:1 D:1 E:1 A B E:1 C D:1 D E Prof. Pier Luca Lanzi
  • 9. Mining the FP-Tree 10  Start from the bottom, from E minsup=2  Compute the support count, by adding the counts associated to E  Since, E is frequent we compute the frequent itemsets from the FP-tree A:7 B:3 B:5 C:3 C:1 D:1 D:1 C:3 Item Pointer D:1 E:1 D:1 E:1 A B E:1 C D:1 D E Prof. Pier Luca Lanzi
  • 10. Mining the FP-tree 11  Considers the path to E, minsup=2 it is frequent  Extracts the conditional pattern base {(ACD:1),(AD:1),(BC:1)} A:7 B:3 B:5 C:3 C:1 D:1 D:1 C:3 Item Pointer D:1 E:1 D:1 E:1 A B E:1 C D:1 D E Prof. Pier Luca Lanzi
  • 11. Mining the FP-tree 12  From the conditional pattern base minsup=2 {(ACD:1),(AD:1),(BC:1)} build the conditional FP-tree  Delete unfrequent items A:2 B:1 A:2 C:1 C:1 C:1 D:1 C:1 D:1 D:1 D:1 Prof. Pier Luca Lanzi
  • 12. Mining the FP-tree 13  Use the conditional FP-tree to extract minsup=2 the frequent itemsets ending with DE, CE, and AE A:2 C:1 A:2 C:1 D:1 C:1 D:1 D:1 D:1 Conditional FP-tree for E Prefix paths ending in DE Prof. Pier Luca Lanzi
  • 13. Mining the FP-tree 14  Consider the suffix DE, it is frequent minsup=2  Build the conditional FP-tree for DE  The last tree contains only A which is frequent  So far we have obtained three frequent itemsets, {E, DE, ADE} A:2 A:2 C is unfrequent so it is deleted C:1 D:1 C:1 D:1 Prefix paths ending in de Conditional FP-tree for de Prof. Pier Luca Lanzi
  • 14. Mining the FP-tree 15  After DE, the suffix CE is considered minsup=2  CE is frequent and thus added to the frequent itemsets  Then, we search for the itemsets ending with AE  At the end the frequent itemsets ending with E are {E, DE, ADE, CE, AE} A:2 C:1 A:2 C:1 Prefix paths ending in CE Prefix paths ending in AE Prof. Pier Luca Lanzi
  • 15. Mining Frequent Patterns Using FP-tree 16  General idea (divide-and-conquer) Recursively grow frequent pattern path using the FP-tree  Method For each item, construct its conditional pattern-base, and then its conditional FP-tree Repeat the process on each newly created conditional FP-tree Until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern) Prof. Pier Luca Lanzi
  • 16. Minining the FP-tree 26  Start at the last item in the table  Find all paths containing item Follow the node-links  Identify conditional patterns Patterns in paths with required frequency  Build conditional FP-tree C  Append item to all paths in C, generating frequent patterns  Mine C recursively (appending item)  Remove item from table and tree Prof. Pier Luca Lanzi
  • 17. FP-growth vs. Apriori: 27 Scalability With the Support Threshold 100 90 D1 FP-grow th runtime D1 Apriori runtime 80 70 Run time(sec.) 60 50 40 30 20 10 0 0 0.5 1 1.5 2 2.5 3 Support threshold(%) Prof. Pier Luca Lanzi
  • 18. Benefits of the FP-tree Structure 28  Completeness Preserve complete information for frequent pattern mining Never break a long pattern of any transaction  Compactness Reduce irrelevant info—infrequent items are gone Items in frequency descending order: the more frequently occurring, the more likely to be shared Never be larger than the original database (not count node-links and the count field) For Connect-4 DB, compression ratio could be over 100 Prof. Pier Luca Lanzi
  • 19. Why Is FP-Growth the Winner? 30  Divide-and-conquer: decompose both the mining task and DB according to the frequent patterns obtained so far leads to focused search of smaller databases  Other factors no candidate generation, no candidate test compressed database: FP-tree structure no repeated scan of entire database basic ops—counting local freq items and building sub FP-tree, no pattern search and matching Prof. Pier Luca Lanzi
  • 20. Multiple-Level Association Rules 32  Items often form hierarchy  Items at the lower level are expected to have lower support  Rules regarding itemsets at appropriate levels could be quite useful  Transaction database can be encoded based on dimensions and levels  We can explore shared multi-level mining food milk bread skim 2% wheat white Fraser Sunset Prof. Pier Luca Lanzi
  • 21. Concept Hierarchy 33 Food Electronics Bread Milk Computers Home Wheat White Skim 2% Desktop Laptop Accessory TV DVD Foremost Kemps Printer Scanner Prof. Pier Luca Lanzi
  • 22. Why incorporate concept hierarchy? 34  Rules at lower levels may not have enough support to appear in any frequent itemsets  Rules at lower levels of the hierarchy are overly specific, {2% milk} {wheat bread} is indicative of association between milk and bread Prof. Pier Luca Lanzi
  • 23. Multi-level Association Rules 35  How do support and confidence vary as we traverse the concept hierarchy? If X is the parent item for both X1 and X2, then (X) ≤ (X1) + (X2) If (X1 Y1) ≥ minsup, and X is parent of X1, Y is parent of Y1 then (X Y1) ≥ minsup, (X1 Y) ≥ minsup (X Y) ≥ minsup If conf(X1 Y1) ≥ minconf, then conf(X1 Y) ≥ minconf Prof. Pier Luca Lanzi
  • 24. Multi-level Association Rules 36  Approach 1: Extend current association rule formulation by augmenting each transaction with higher level items Original Transaction: {skim milk, wheat bread} Augmented Transaction: {skim milk, wheat bread, milk, bread, food}  Issues: Items that reside at higher levels have much higher support counts If support threshold is low, too many frequent patterns involving items from the higher levels Increased dimensionality of the data Prof. Pier Luca Lanzi
  • 25. Multi-level Association Rules 37  Approach 2: Generate frequent patterns at highest level first Then, generate frequent patterns at the next highest level, and so on  Issues: I/O requirements will increase dramatically because we need to perform more passes over the data May miss some potentially interesting cross-level association patterns Prof. Pier Luca Lanzi
  • 26. Mining Multi-Level Associations 38  A top down, progressive deepening approach  First find high-level strong rules: {milk} {bread} [20%, 60%]  Then find their lower-level ―weaker‖ rules: {2% milk} {wheat bread} [6%, 50%] Prof. Pier Luca Lanzi
  • 27. Multi-level Association: 41 Redundancy Filtering  Some rules may be redundant due to ―ancestor‖ relationships between items  Example milk wheat bread [support = 8%, confidence = 70%] 2% milk wheat bread [support = 2%, confidence = 72%]  We say the first rule is an ancestor of the second rule.  A rule is redundant if its support is close to the ―expected‖ value, based on the rule’s ancestor. Prof. Pier Luca Lanzi
  • 28. Multi-level mining with uniform support 42 Level 1 Milk min_sup = 5% [support = 10%] Level 2 2% Milk Skim Milk min_sup = 5% [support = 6%] [support = 4%] Back Prof. Pier Luca Lanzi
  • 29. Multi-level mining with reduced support 43 Level 1 Milk min_sup = 5% [support = 10%] Level 2 2% Milk Skim Milk min_sup = 3% [support = 6%] [support = 4%] Back Prof. Pier Luca Lanzi
  • 30. Multi-level Association: 44 Uniform Support vs. Reduced Support  Uniform Support: the same minimum support for all levels + One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. – Lower level items do not occur as frequently. If support threshold • too high miss low level associations • too low generate too many high level associations  Reduced Support: reduced minimum support at lower levels There are 4 search strategies: • Level-by-level independent • Level-cross filtering by k-itemset • Level-cross filtering by single item • Controlled level-cross filtering by single item Prof. Pier Luca Lanzi
  • 31. Multi-Level Mining: 46 Progressive Deepening  A top-down, progressive deepening approach  First mine high-level frequent items: milk (15%), bread (10%)  Then mine their lower-level ―weaker‖ frequent itemsets: 2% milk (5%), wheat bread (4%)  Different min support threshold across multi-levels lead to different algorithms: If adopting the same min_support across multi-levels then toss t if any of t’s ancestors is infrequent. If adopting reduced min_support at lower levels then examine only those descendents whose ancestor’s support is frequent/non-negligible. Prof. Pier Luca Lanzi
  • 32. Correlation Rules 48  Suppose perform association rule mining c Not c row t 20 5 25 Not t 70 5 75 col 90 10 100 {tea} {coffee} [20% 80%]  But the 90% of the customers buy coffee anyway!  A customer who buys tea is less likely (10% less) to buy coffee than a customer about whom we have no information Prof. Pier Luca Lanzi
  • 33. Correlation Rules 49  One measure is to calculate correlation  If two random variables A and B are independent, P( A B) 1 P( A) P( B)  For tea and coffee, P(t c) 0.89 P(t ) P(c) this means that the presence of tea is negatively correlated with the presence of coffee Prof. Pier Luca Lanzi
  • 34. Correlation Rules 51  Use the 2 statistical test for testing independence  Let n be the number of baskets  Let k be the number of items considered  Let r be a rule consisting of item values (ij, ij)  Let O(r) represent the number of baskets containing rule r (i.e. frequency)  Let E[ij] = O(ij) for a single item (negation is n - E[ij])  E[r] = n * E[r1]/n * … * E[rk] / n Prof. Pier Luca Lanzi
  • 35. Correlation Rules 52  The 2 statistic is defined as 2 (O (r ) E[ r ]) 2 r R E[ r ]  Look up for significance value in a statistical textbook There are k-1 degrees of freedom If test fails cannot reject independence, otherwise contigency table represents dependence. Prof. Pier Luca Lanzi
  • 36. Example 53  Back to tea and coffee c Not c row t 20 5 25 Not t 70 5 75 col 90 10 100  E[t] = 25, E[Not t]=75, E[c]=90, E[Not c]=10  E[tc]=100 * 25/100 * 90 /100=22.5  O(tc) = 20  Contrib. to 2 = (20 - 22.5)2 / 22.5 = 0.278  Calculate for the rest to get 2=2.204  Not significant at 95% level (3.84 for k=2)  Cannot reject independence assumption Prof. Pier Luca Lanzi
  • 37. Interest 55  If 2 test shows significance, then want to find most interesting cell(s) in table  I(r) = O(r)/E[r] Look for values far away from 1 I(t c) = 20/22.5 = 0.89 I(t not c) = 5/2.5 = 2 I(not t c) = 70/67.5 = 1.04 I(not t not c) = 5/7.5 = 0.66 Prof. Pier Luca Lanzi
  • 38. Properties of 2 statistic 56  Upward closed If a k-itemset is correlated, so are all it’s supersets Look for minimal correlated itemsets, no subset is correlated  Can combine a-priori and 2 statistic efficiently No generate and prune as in support-confidence Prof. Pier Luca Lanzi
  • 39. Sequential pattern mining 58  Association rule mining does not consider the order of transactions  In many applications such orderings are significant  In market basket analysis, it is interesting to know whether people buy some items in sequence  Example: buying bed first and then bed sheets later  In Web usage mining, it is useful to find navigational patterns of users in a Web site from sequences of page visits of users Prof. Pier Luca Lanzi
  • 40. Examples of Sequence 59  Web sequence < {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation} {Return to Shopping} >  Sequence of initiating events causing the nuclear accident at 3-mile Island: <{clogged resin} {outlet valve closure} {loss of feedwater} {condenser polisher outlet valve shut} {booster pumps trip} {main waterpump trips} {main turbine trips} {reactor pressure increases}>  Sequence of books checked out at a library: <{Fellowship of the Ring} {The Two Towers} {Return of the King}> Prof. Pier Luca Lanzi
  • 41. Basic concepts 60  Let I = {i1, i2, …, im} be a set of items.  A sequence is an ordered list of itemsets.  We denote a sequence s by a1, a2, …, ar , where ai is an itemset, also called an element of s.  An element (or an itemset) of a sequence is denoted by {x1, x2, …, xk}, where xj I is an item.  We assume without loss of generality that items in an element of a sequence are in lexicographic order Prof. Pier Luca Lanzi
  • 42. Basic concepts 61  Size The size of a sequence is the number of elements (or itemsets) in the sequence  Length The length of a sequence is the number of items in the sequence A sequence of length k is called k-sequence  Given two sequences, s1 = a1a2 … ar and s2 = b1b2 … bv  s1 is a subsequence of s2 if there exist integers 1 ≤ j1 < j2 < … < jr-1 < jr v such that a1 bj1, a2 bj2, …, ar bjr  A sequence s1 = a1a2 … ar is a subsequence of another sequence s2 = b1b2 … bv , or s2 is a supersequence of s1, if there exist integers 1 ≤ j_1 < j_2 < … < j_{r 1} < jr v such that a1 bj1, a2 bj2, …, ar bjr. We also say that s2 contains s1. Prof. Pier Luca Lanzi
  • 43. Example 62  Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}.  The sequence {3}{4, 5}{8} is contained in (or is a subsequence of) {6} {3, 7}{9}{4, 5, 8}{3, 8}  In fact, {3} {3, 7}, {4, 5} {4, 5, 8}, and {8} {3, 8}  However, {3}{8} is not contained in {3, 8} or vice versa  The size of the sequence {3}{4, 5}{8} is 3, and the length of the sequence is 4. Prof. Pier Luca Lanzi
  • 44. Objective of sequential pattern mining 63  The input is a set S of input data sequences (or sequence database)  The problem of mining sequential patterns is to find all the sequences that have a user-specified minimum support  Each such sequence is called a frequent sequence, or a sequential pattern  The support for a sequence is the fraction of total data sequences in S that contains this sequence Prof. Pier Luca Lanzi
  • 45. Terminology 64  Itemset Non-empty set of items Each itemset is mapped to an integer  Sequence Ordered list of itemsets  Support for a Sequence Fraction of total customers that support a sequence.  Maximal Sequence A sequence that is not contained in any other sequence  Large Sequence Sequence that meets minisup Prof. Pier Luca Lanzi
  • 46. Example 65 Customer Transaction Items Customer ID Customer Sequence ID Time Bought 1 < (30) (90) > 1 June 25 '93 30 2 < (10 20) (30) (40 60 70) > 1 June 30 '93 90 3 < (30) (50) (70) > 2 June 10 '93 10,20 2 June 15 '93 30 4 < (30) (40 70) (90) > 2 June 20 '93 40,60,70 5 < (90) > 3 June 25 '93 30,50,70 4 June 25 '93 30 4 June 30 '93 40,70 Maximal seq with support > 40% 4 July 25 '93 90 < (30) (90) > 5 June 12 '93 90 < (30) (40 70) > Note: Use Minisup of 40%, no less than two customers must support the sequence < (10 20) (30) > Does not have enough support (Only by Customer #2) < (30) >, < (70) >, < (30) (40) > … are not maximal. Prof. Pier Luca Lanzi
  • 47. Methods for sequential pattern mining 66  Apriori-based Approaches GSP SPADE  Pattern-Growth-based Approaches FreeSpan PrefixSpan Prof. Pier Luca Lanzi
  • 48. The Generalized Sequential Pattern (GSP) 70 Mining Algorithm  The Apriori property for sequential patterns If a sequence S is not frequent, then none of the super-sequences of S is frequent For instance, if <hb> is infrequent so do <hab> and <(ah)b>  GSP (Generalized Sequential Pattern) mining algorithm Initially, every item in DB is a candidate of length-1 for each level (i.e., sequences of length-k) do • scan database to collect support count for each candidate sequence • generate candidate length-(k+1) sequences from length-k frequent sequences using Apriori repeat until no frequent sequence or no candidate can be found  Major strength: Candidate pruning by Apriori Prof. Pier Luca Lanzi
  • 49. The Generalized Sequential Pattern (GSP) 71 Mining Algorithm  Step 1: Make the first pass over the sequence database D to yield all the 1-element frequent sequences  Step 2: Repeat until no new frequent sequences are found Candidate Generation: • Merge pairs of frequent subsequences found in the (k-1)th pass to generate candidate sequences that contain k items Candidate Pruning: • Prune candidate k-sequences that contain infrequent (k-1)-subsequences Support Counting: • Make a new pass over the sequence database D to find the support for these candidate sequences Candidate Elimination: • Eliminate candidate k-sequences whose actual support is less than minsup Prof. Pier Luca Lanzi
  • 50. Timing Constraints (I) 72 {A B} {C} {D E} xg: max-gap <= xg >ng ng: min-gap <= ms ms: maximum span xg = 2, ng = 0, ms= 4 Data sequence Subsequence Contain? < {2,4} {3,5,6} {4,7} {4,5} {8} > < {6} {5} > Yes < {1} {2} {3} {4} {5}> < {1} {4} > No < {1} {2,3} {3,4} {4,5}> < {2} {3} {5} > Yes < {1,2} {3} {2,3} {3,4} {2,4} < {1,2} {5} > No {4,5}> Prof. Pier Luca Lanzi
  • 51. Mining Sequential Patterns with 73 Timing Constraints  Approach 1: Mine sequential patterns without timing constraints Postprocess the discovered patterns  Approach 2: Modify GSP to directly prune candidates that violate timing constraints Prof. Pier Luca Lanzi