SlideShare a Scribd company logo
1 of 29
Download to read offline
Association Rule Basics
Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
Lecture outline                                  2



   What is association rule mining?
   Frequent itemsets, support, and confidence
   Mining association rules
   The “Apriori” algorithm
   Rule generation




                    Prof. Pier Luca Lanzi
Example of market-basket transactions                3




 Bread       Bread                       Steak     Jam
 Peanuts     Jam                         Jam       Soda
 Milk        Soda                        Soda      Peanuts
 Fruit       Chips                       Chips     Milk
 Jam         Milk                        Bread     Fruit
             Fruit



 Jam          Fruit                      Fruit     Fruit
 Soda         Soda                       Soda      Peanuts
 Chips        Chips                      Peanuts   Cheese
 Milk         Milk                       Milk      Yogurt
 Bread




                 Prof. Pier Luca Lanzi
What is association mining?                      4



 Finding frequent patterns, associations, correlations, or
  causal structures among sets of items or objects in
  transaction databases, relational databases, and other
  information repositories

 Applications
    Basket data analysis
    Cross-marketing
    Catalog design
    …




                    Prof. Pier Luca Lanzi
What is association rule mining?                                      5




TID Items                                               Examples
 1   Bread, Peanuts, Milk, Fruit, Jam                      {bread}    {milk}
 2   Bread, Jam, Soda, Chips, Milk, Fruit                  {soda}    {chips}
 3   Steak, Jam, Soda, Chips, Bread                        {bread}    {jam}
 4   Jam, Soda, Peanuts, Milk, Fruit
 5   Jam, Soda, Chips, Milk, Bread
 6   Fruit, Soda, Chips, Milk
 7   Fruit, Soda, Peanuts, Milk
 8   Fruit, Peanuts, Cheese, Yogurt


 Given a set of transactions, find rules that will predict the
  occurrence of an item based on the occurrences of other
  items in the transaction


                                Prof. Pier Luca Lanzi
Definition: Frequent Itemset                                            6



   Itemset                                        TID Items
        A collection of one or more items,
        e.g., {milk, bread, jam}                   1   Bread, Peanuts, Milk, Fruit, Jam
        k-itemset, an itemset that                 2   Bread, Jam, Soda, Chips, Milk, Fruit
        contains k items
   Support count ( )                              3   Steak, Jam, Soda, Chips, Bread
        Frequency of occurrence                    4   Jam, Soda, Peanuts, Milk, Fruit
        of an itemset
          ({Milk, Bread}) = 3                      5   Jam, Soda, Chips, Milk, Bread
          ({Soda, Chips}) = 4                      6   Fruit, Soda, Chips, Milk
   Support
        Fraction of transactions                   7   Fruit, Soda, Peanuts, Milk
        that contain an itemset                    8   Fruit, Peanuts, Cheese, Yogurt
        s({Milk, Bread}) = 3/8
         s({Soda, Chips}) = 4/8
   Frequent Itemset
        An itemset whose support is greater
        than or equal to a minsup threshold




                           Prof. Pier Luca Lanzi
What is an association rule?                              7



 Implication of the form X         Y, where X and Y are itemsets

 Example, {bread}       {milk}

 Rule Evaluation Metrics, Suppor & Confidence

 Support (s)
     Fraction of transactions
     that contain both X and Y
 Confidence (c)
     Measures how often items in Y
     appear in transactions that
     contain X




                     Prof. Pier Luca Lanzi
Support and Confidence                       8



         Customer
         buys both          Customer
                            buys Milk




Customer
buys Bread




                     Prof. Pier Luca Lanzi
What is the goal?                                 9



 Given a set of transactions T, the goal of association rule
  mining is to find all rules having
     support ≥ minsup threshold
     confidence ≥ minconf threshold

 Brute-force approach:
     List all possible association rules
     Compute the support and confidence for each rule
     Prune rules that fail the minsup and minconf thresholds

 Brute-force approach is computationally prohibitive!




                    Prof. Pier Luca Lanzi
Mining Association Rules                        10



             {Bread, Jam}    {Milk} s=0.4 c=0.75
             {Milk, Jam}   {Bread} s=0.4 c=0.75
             {Bread}    {Milk, Jam} s=0.4 c=0.75
             {Jam}     {Bread, Milk} s=0.4 c=0.6
             {Milk}    {Bread, Jam} s=0.4 c=0.5

 All the above rules are binary partitions of the same itemset:

                        {Milk, Bread, Jam}

 Rules originating from the same itemset have identical
  support but can have different confidence
 We can decouple the support and confidence requirements!



                   Prof. Pier Luca Lanzi
Mining Association Rules:                       11
Two Step Approach

 Frequent Itemset Generation
     Generate all itemsets whose support     minsup

 Rule Generation
     Generate high confidence rules from frequent itemset
     Each rule is a binary partitioning of a frequent itemset

 Frequent itemset generation is computationally expensive




                   Prof. Pier Luca Lanzi
Frequent Itemset Generation                                        13



                                  null




             A       B             C              D       E




AB    AC      AD   AE        BC          BD       BE    CD     CE         DE




ABC   ABD    ABE   ACD      ACE          ADE      BCD   BCE    BDE    CDE




            ABCD   ABCE           ABDE         ACDE     BCDE
                                                               Given d items, there
                                                               are 2d possible
                                ABCDE                          candidate itemsets

                          Prof. Pier Luca Lanzi
Frequent Itemset Generation                                      14



 Brute-force approach:
     Each itemset in the lattice is a candidate frequent itemset
     Count the support of each candidate by scanning the
     database
           TID Items
            1   Bread, Peanuts, Milk, Fruit, Jam
                                                        candidates
            2   Bread, Jam, Soda, Chips, Milk, Fruit
            3   Steak, Jam, Soda, Chips, Bread
       N    4   Jam, Soda, Peanuts, Milk, Fruit
            5   Jam, Soda, Chips, Milk, Bread                         M
            6   Fruit, Soda, Chips, Milk
            7   Fruit, Soda, Peanuts, Milk
            8   Fruit, Peanuts, Cheese, Yogurt


                                w
     Match each transaction against every candidate
     Complexity ~ O(NMw) => Expensive since M = 2d


                                Prof. Pier Luca Lanzi
Computational Complexity                            15




 Given d unique items:
      Total number of itemsets = 2d
      Total number of possible association rules:




       For d=6, there are 602 rules

                        Prof. Pier Luca Lanzi
Frequent Itemset Generation Strategies           16



 Reduce the number of candidates (M)
    Complete search: M=2d
    Use pruning techniques to reduce M
 Reduce the number of transactions (N)
    Reduce size of N as the size of itemset increases
 Reduce the number of comparisons (NM)
    Use efficient data structures to store the candidates or
    transactions
    No need to match every candidate against every
    transaction




                     Prof. Pier Luca Lanzi
Reducing the Number of Candidates                18



 Apriori principle
     If an itemset is frequent, then all of
     its subsets must also be frequent

 Apriori principle holds due to the following property of the
  support measure:



 Support of an itemset never exceeds the support of its
  subsets
 This is known as the anti-monotone property of support




                    Prof. Pier Luca Lanzi
Illustrating Apriori Principle                                           19



                                                         null




                                A              B          C            D          E




                  AB   AC        AD        AE      BC           BD      BE      CD     CE    DE



Found to be
Infrequent
                 ABC   ABD      ABE       ACD      ACE          ADE    BCD      BCE    BDE   CDE




                              ABCD         ABCE          ABDE         ACDE      BCDE



                 Pruned
                 supersets                           ABCDE




                       Prof. Pier Luca Lanzi
How does the Apriori principle work?                    20



Items (1-itemsets)
Item      Count
Bread      4
Peanuts    4                        2-itemsets
Milk       6                 2-Itemset        Count
Fruit      6
                             Bread, Jam          4
Jam        5
                             Peanuts, Fruit      4
Soda       6
                             Milk, Fruit         5
Chips      4
                             Milk, Jam           4
Steak      1
                             Milk, Soda          5
Cheese     1
                             Fruit, Soda         4
Yogurt     1
                             Jam, Soda           4
                                                             3-itemsets
                             Soda, Chips         4
 Minimum Support = 4                                  3-Itemset           Count
                                                      Milk, Fruit, Soda    4




                     Prof. Pier Luca Lanzi
Apriori Algorithm                            21



 Let k=1
 Generate frequent itemsets of length 1
 Repeat until no new frequent itemsets are identified
     Generate length (k+1) candidate itemsets from length k
     frequent itemsets
     Prune candidate itemsets containing subsets of length k
     that are infrequent
     Count the support of each candidate by scanning the DB
     Eliminate candidates that are infrequent, leaving only
     those that are frequent




                    Prof. Pier Luca Lanzi
The Apriori Algorithm                            22




      Ck: Candidate itemset of size k
      Lk : frequent itemset of size k

      L1 = {frequent items};
      for (k = 1; Lk != ; k++) do begin
          Ck+1 = candidates generated from Lk;
         for each transaction t in database do
             increment the count of all candidates in Ck+1
             that are contained in t
         Lk+1 = candidates in Ck+1 with min_support
         end
      return k Lk;




                  Prof. Pier Luca Lanzi
The Apriori Algorithm                              23



 Join Step
     Ck is generated by joining Lk-1 with itself

 Prune Step
     Any (k-1)-itemset that is not frequent cannot be a subset
     of a frequent k-itemset




                    Prof. Pier Luca Lanzi
Efficient Implementation of Apriori in SQL    24



 Hard to get good performance out of pure SQL (SQL-92)
  based approaches alone
 Make use of object-relational extensions like UDFs, BLOBs,
  Table functions etc.
     Get orders of magnitude improvement
 S. Sarawagi, S. Thomas, and R. Agrawal. Integrating
  association rule mining with relational database systems:
  Alternatives and implications. In SIGMOD’98




                   Prof. Pier Luca Lanzi
How to Improve                                             25
Apriori’s Efficiency

 Hash-based itemset counting: A k-itemset whose corresponding hashing
  bucket count is below the threshold cannot be frequent
 Transaction reduction: A transaction that does not contain any frequent k-
  itemset is useless in subsequent scans
 Partitioning: Any itemset that is potentially frequent in DB must be frequent
  in at least one of the partitions of DB
 Sampling: mining on a subset of given data, lower support threshold + a
  method to determine the completeness
 Dynamic itemset counting: add new candidate itemsets only when all of
  their subsets are estimated to be frequent




                        Prof. Pier Luca Lanzi
Rule Generation                                 26



 Given a frequent itemset L, find all non-empty subsets f    L
  such that f   L – f satisfies the minimum confidence
  requirement
 If {A,B,C,D} is a frequent itemset, candidate rules:
         ABC D, ABD C, ACD B, BCD          A, A BCD, B ACD,
         C ABD, D ABC, AB CD, AC           BD, AD  BC, BC AD,
         BD AC, CD AB

 If |L| = k, then there are 2k – 2 candidate association rules
  (ignoring L       and      L)




                   Prof. Pier Luca Lanzi
How to efficiently generate rules from                28
frequent itemsets?

 Confidence does not have an anti-monotone property

 c(ABC   D) can be larger or smaller than c(AB            D)

 But confidence of rules generated from the same itemset has
  an anti-monotone property

     e.g., L = {A,B,C,D}:

             c(ABC       D)      c(AB     CD)   c(A    BCD)

     Confidence is anti-monotone with respect to the number
     of items on the right hand side of the rule




                  Prof. Pier Luca Lanzi
Rule Generation for Apriori Algorithm                            29




                                          ABCD=>{ }
Low
Confidence
Rule
               BCD=>A           ACD=>B                ABD=>C        ABC=>D




      CD=>AB     BD=>AC         BC=>AD          AD=>BC         AC=>BD        AB=>CD




               D=>ABC           C=>ABD                B=>ACD        A=>BCD
Pruned
Rules

                        Prof. Pier Luca Lanzi
Rule Generation for Apriori Algorithm               30



 Candidate rule is generated by merging two rules that share
  the same prefix
  in the rule consequent

 join(CD=>AB,BD=>AC)                      CD=>AB        BD=>AC
  would produce the candidate
  rule D => ABC

 Prune rule D=>ABC if its
  subset AD=>BC does not have
  high confidence
                                                D=>ABC




                   Prof. Pier Luca Lanzi
Effect of Support Distribution                31



   Many real data sets have skewed support distribution




Support
distribution of
a retail data set




                    Prof. Pier Luca Lanzi
How to set the appropriate minsup?              32



 If minsup is set too high, we could miss itemsets involving
  interesting rare items (e.g., expensive products)

 If minsup is set too low, it is computationally expensive
  and the number of itemsets is very large

 A single minimum support threshold may not be effective




                   Prof. Pier Luca Lanzi

More Related Content

What's hot

First order predicate logic (fopl)
First order predicate logic (fopl)First order predicate logic (fopl)
First order predicate logic (fopl)chauhankapil
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1Amrinder Arora
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Association rule mining
Association rule miningAssociation rule mining
Association rule miningAcad
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Types of machine learning
Types of machine learningTypes of machine learning
Types of machine learningHimaniAloona
 
Diffie hellman key exchange algorithm
Diffie hellman key exchange algorithmDiffie hellman key exchange algorithm
Diffie hellman key exchange algorithmSunita Kharayat
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 

What's hot (20)

Text MIning
Text MIningText MIning
Text MIning
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
First order predicate logic (fopl)
First order predicate logic (fopl)First order predicate logic (fopl)
First order predicate logic (fopl)
 
Optimal binary search tree dynamic programming
Optimal binary search tree   dynamic programmingOptimal binary search tree   dynamic programming
Optimal binary search tree dynamic programming
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Introduction to competitive machine learning
Introduction to competitive machine learningIntroduction to competitive machine learning
Introduction to competitive machine learning
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Chap4
Chap4Chap4
Chap4
 
Types of machine learning
Types of machine learningTypes of machine learning
Types of machine learning
 
Learning in AI
Learning in AILearning in AI
Learning in AI
 
Diffie hellman key exchange algorithm
Diffie hellman key exchange algorithmDiffie hellman key exchange algorithm
Diffie hellman key exchange algorithm
 
Nearest neighbor search
Nearest neighbor searchNearest neighbor search
Nearest neighbor search
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
DBMS Unit - 5 - Query processing and optimization
DBMS Unit - 5 - Query processing and optimizationDBMS Unit - 5 - Query processing and optimization
DBMS Unit - 5 - Query processing and optimization
 

Viewers also liked

Lecture 02 Machine Learning For Data Mining
Lecture 02 Machine Learning For Data MiningLecture 02 Machine Learning For Data Mining
Lecture 02 Machine Learning For Data MiningPier Luca Lanzi
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14rahulmath80
 
Alanoud alqoufi inductive learning
Alanoud alqoufi inductive learningAlanoud alqoufi inductive learning
Alanoud alqoufi inductive learningAlanoud Alqoufi
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageGuy De Pauw
 
1. Introduction to Association Rule 2. Frequent Item Set Mining 3. Market Bas...
1. Introduction to Association Rule2. Frequent Item Set Mining3. Market Bas...1. Introduction to Association Rule2. Frequent Item Set Mining3. Market Bas...
1. Introduction to Association Rule 2. Frequent Item Set Mining 3. Market Bas...Surabhi Gosavi
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1Krish_ver2
 
Videogame Design and Programming: Conferenza d'Ateneo 18 Maggio 2011
Videogame Design and Programming: Conferenza d'Ateneo 18 Maggio 2011Videogame Design and Programming: Conferenza d'Ateneo 18 Maggio 2011
Videogame Design and Programming: Conferenza d'Ateneo 18 Maggio 2011Pier Luca Lanzi
 
Fitness Inheritance in Evolutionary and
Fitness Inheritance in Evolutionary andFitness Inheritance in Evolutionary and
Fitness Inheritance in Evolutionary andPier Luca Lanzi
 
GECCO-2014 Learning Classifier Systems: A Gentle Introduction
GECCO-2014 Learning Classifier Systems: A Gentle IntroductionGECCO-2014 Learning Classifier Systems: A Gentle Introduction
GECCO-2014 Learning Classifier Systems: A Gentle IntroductionPier Luca Lanzi
 
Evolving Rules to Solve Problems: The Learning Classifier Systems Way
Evolving Rules to Solve Problems: The Learning Classifier Systems WayEvolving Rules to Solve Problems: The Learning Classifier Systems Way
Evolving Rules to Solve Problems: The Learning Classifier Systems WayPier Luca Lanzi
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringPier Luca Lanzi
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionPier Luca Lanzi
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationPier Luca Lanzi
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
Machine Learning and Data Mining: 02 Machine Learning
Machine Learning and Data Mining: 02 Machine LearningMachine Learning and Data Mining: 02 Machine Learning
Machine Learning and Data Mining: 02 Machine LearningPier Luca Lanzi
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationPier Luca Lanzi
 
Idea Generation and Conceptualization
Idea Generation and ConceptualizationIdea Generation and Conceptualization
Idea Generation and ConceptualizationPier Luca Lanzi
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationPier Luca Lanzi
 

Viewers also liked (20)

Lecture 02 Machine Learning For Data Mining
Lecture 02 Machine Learning For Data MiningLecture 02 Machine Learning For Data Mining
Lecture 02 Machine Learning For Data Mining
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
Alanoud alqoufi inductive learning
Alanoud alqoufi inductive learningAlanoud alqoufi inductive learning
Alanoud alqoufi inductive learning
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
1. Introduction to Association Rule 2. Frequent Item Set Mining 3. Market Bas...
1. Introduction to Association Rule2. Frequent Item Set Mining3. Market Bas...1. Introduction to Association Rule2. Frequent Item Set Mining3. Market Bas...
1. Introduction to Association Rule 2. Frequent Item Set Mining 3. Market Bas...
 
1.9.association mining 1
1.9.association mining 11.9.association mining 1
1.9.association mining 1
 
Videogame Design and Programming: Conferenza d'Ateneo 18 Maggio 2011
Videogame Design and Programming: Conferenza d'Ateneo 18 Maggio 2011Videogame Design and Programming: Conferenza d'Ateneo 18 Maggio 2011
Videogame Design and Programming: Conferenza d'Ateneo 18 Maggio 2011
 
Fitness Inheritance in Evolutionary and
Fitness Inheritance in Evolutionary andFitness Inheritance in Evolutionary and
Fitness Inheritance in Evolutionary and
 
GECCO-2014 Learning Classifier Systems: A Gentle Introduction
GECCO-2014 Learning Classifier Systems: A Gentle IntroductionGECCO-2014 Learning Classifier Systems: A Gentle Introduction
GECCO-2014 Learning Classifier Systems: A Gentle Introduction
 
Evolving Rules to Solve Problems: The Learning Classifier Systems Way
Evolving Rules to Solve Problems: The Learning Classifier Systems WayEvolving Rules to Solve Problems: The Learning Classifier Systems Way
Evolving Rules to Solve Problems: The Learning Classifier Systems Way
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to Clustering
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Machine Learning and Data Mining: 02 Machine Learning
Machine Learning and Data Mining: 02 Machine LearningMachine Learning and Data Mining: 02 Machine Learning
Machine Learning and Data Mining: 02 Machine Learning
 
Course Organization
Course OrganizationCourse Organization
Course Organization
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data Exploration
 
Idea Generation and Conceptualization
Idea Generation and ConceptualizationIdea Generation and Conceptualization
Idea Generation and Conceptualization
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 

Similar to Lecture 04 Association Rules Basics

DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesPier Luca Lanzi
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdfWailaBaba
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptraju980973
 
ASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxSherishJaved
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath YogiTekendra Nath Yogi
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061badirh
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Conceptsdataminers.ir
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxnikshaikh786
 
WSU Master Goats Presentation 10 28 07
WSU Master Goats Presentation 10 28 07WSU Master Goats Presentation 10 28 07
WSU Master Goats Presentation 10 28 07Eric Bowman
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisSandeep Prasad
 

Similar to Lecture 04 Association Rules Basics (16)

DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
ASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptxASSOCIATION Rule plus MArket basket Analysis.pptx
ASSOCIATION Rule plus MArket basket Analysis.pptx
 
Data mining arm-2009-v0
Data mining arm-2009-v0Data mining arm-2009-v0
Data mining arm-2009-v0
 
Apriori data mining in the cloud
Apriori data mining in the cloudApriori data mining in the cloud
Apriori data mining in the cloud
 
BIM Data Mining Unit4 by Tekendra Nath Yogi
 BIM Data Mining Unit4 by Tekendra Nath Yogi BIM Data Mining Unit4 by Tekendra Nath Yogi
BIM Data Mining Unit4 by Tekendra Nath Yogi
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 
WSU Master Goats Presentation 10 28 07
WSU Master Goats Presentation 10 28 07WSU Master Goats Presentation 10 28 07
WSU Master Goats Presentation 10 28 07
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 

More from Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 

Recently uploaded

8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncrdollysharma2066
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
IoT Insurance Observatory: summary 2024
IoT Insurance Observatory:  summary 2024IoT Insurance Observatory:  summary 2024
IoT Insurance Observatory: summary 2024Matteo Carbone
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 

Recently uploaded (20)

8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
IoT Insurance Observatory: summary 2024
IoT Insurance Observatory:  summary 2024IoT Insurance Observatory:  summary 2024
IoT Insurance Observatory: summary 2024
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 

Lecture 04 Association Rules Basics

  • 1. Association Rule Basics Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)
  • 2. Lecture outline 2  What is association rule mining?  Frequent itemsets, support, and confidence  Mining association rules  The “Apriori” algorithm  Rule generation Prof. Pier Luca Lanzi
  • 3. Example of market-basket transactions 3 Bread Bread Steak Jam Peanuts Jam Jam Soda Milk Soda Soda Peanuts Fruit Chips Chips Milk Jam Milk Bread Fruit Fruit Jam Fruit Fruit Fruit Soda Soda Soda Peanuts Chips Chips Peanuts Cheese Milk Milk Milk Yogurt Bread Prof. Pier Luca Lanzi
  • 4. What is association mining? 4  Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories  Applications Basket data analysis Cross-marketing Catalog design … Prof. Pier Luca Lanzi
  • 5. What is association rule mining? 5 TID Items Examples 1 Bread, Peanuts, Milk, Fruit, Jam {bread} {milk} 2 Bread, Jam, Soda, Chips, Milk, Fruit {soda} {chips} 3 Steak, Jam, Soda, Chips, Bread {bread} {jam} 4 Jam, Soda, Peanuts, Milk, Fruit 5 Jam, Soda, Chips, Milk, Bread 6 Fruit, Soda, Chips, Milk 7 Fruit, Soda, Peanuts, Milk 8 Fruit, Peanuts, Cheese, Yogurt  Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Prof. Pier Luca Lanzi
  • 6. Definition: Frequent Itemset 6  Itemset TID Items A collection of one or more items, e.g., {milk, bread, jam} 1 Bread, Peanuts, Milk, Fruit, Jam k-itemset, an itemset that 2 Bread, Jam, Soda, Chips, Milk, Fruit contains k items  Support count ( ) 3 Steak, Jam, Soda, Chips, Bread Frequency of occurrence 4 Jam, Soda, Peanuts, Milk, Fruit of an itemset ({Milk, Bread}) = 3 5 Jam, Soda, Chips, Milk, Bread ({Soda, Chips}) = 4 6 Fruit, Soda, Chips, Milk  Support Fraction of transactions 7 Fruit, Soda, Peanuts, Milk that contain an itemset 8 Fruit, Peanuts, Cheese, Yogurt s({Milk, Bread}) = 3/8 s({Soda, Chips}) = 4/8  Frequent Itemset An itemset whose support is greater than or equal to a minsup threshold Prof. Pier Luca Lanzi
  • 7. What is an association rule? 7  Implication of the form X Y, where X and Y are itemsets  Example, {bread} {milk}  Rule Evaluation Metrics, Suppor & Confidence  Support (s) Fraction of transactions that contain both X and Y  Confidence (c) Measures how often items in Y appear in transactions that contain X Prof. Pier Luca Lanzi
  • 8. Support and Confidence 8 Customer buys both Customer buys Milk Customer buys Bread Prof. Pier Luca Lanzi
  • 9. What is the goal? 9  Given a set of transactions T, the goal of association rule mining is to find all rules having support ≥ minsup threshold confidence ≥ minconf threshold  Brute-force approach: List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds  Brute-force approach is computationally prohibitive! Prof. Pier Luca Lanzi
  • 10. Mining Association Rules 10 {Bread, Jam} {Milk} s=0.4 c=0.75 {Milk, Jam} {Bread} s=0.4 c=0.75 {Bread} {Milk, Jam} s=0.4 c=0.75 {Jam} {Bread, Milk} s=0.4 c=0.6 {Milk} {Bread, Jam} s=0.4 c=0.5  All the above rules are binary partitions of the same itemset: {Milk, Bread, Jam}  Rules originating from the same itemset have identical support but can have different confidence  We can decouple the support and confidence requirements! Prof. Pier Luca Lanzi
  • 11. Mining Association Rules: 11 Two Step Approach  Frequent Itemset Generation Generate all itemsets whose support minsup  Rule Generation Generate high confidence rules from frequent itemset Each rule is a binary partitioning of a frequent itemset  Frequent itemset generation is computationally expensive Prof. Pier Luca Lanzi
  • 12. Frequent Itemset Generation 13 null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Given d items, there are 2d possible ABCDE candidate itemsets Prof. Pier Luca Lanzi
  • 13. Frequent Itemset Generation 14  Brute-force approach: Each itemset in the lattice is a candidate frequent itemset Count the support of each candidate by scanning the database TID Items 1 Bread, Peanuts, Milk, Fruit, Jam candidates 2 Bread, Jam, Soda, Chips, Milk, Fruit 3 Steak, Jam, Soda, Chips, Bread N 4 Jam, Soda, Peanuts, Milk, Fruit 5 Jam, Soda, Chips, Milk, Bread M 6 Fruit, Soda, Chips, Milk 7 Fruit, Soda, Peanuts, Milk 8 Fruit, Peanuts, Cheese, Yogurt w Match each transaction against every candidate Complexity ~ O(NMw) => Expensive since M = 2d Prof. Pier Luca Lanzi
  • 14. Computational Complexity 15  Given d unique items: Total number of itemsets = 2d Total number of possible association rules: For d=6, there are 602 rules Prof. Pier Luca Lanzi
  • 15. Frequent Itemset Generation Strategies 16  Reduce the number of candidates (M) Complete search: M=2d Use pruning techniques to reduce M  Reduce the number of transactions (N) Reduce size of N as the size of itemset increases  Reduce the number of comparisons (NM) Use efficient data structures to store the candidates or transactions No need to match every candidate against every transaction Prof. Pier Luca Lanzi
  • 16. Reducing the Number of Candidates 18  Apriori principle If an itemset is frequent, then all of its subsets must also be frequent  Apriori principle holds due to the following property of the support measure:  Support of an itemset never exceeds the support of its subsets  This is known as the anti-monotone property of support Prof. Pier Luca Lanzi
  • 17. Illustrating Apriori Principle 19 null A B C D E AB AC AD AE BC BD BE CD CE DE Found to be Infrequent ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE Pruned supersets ABCDE Prof. Pier Luca Lanzi
  • 18. How does the Apriori principle work? 20 Items (1-itemsets) Item Count Bread 4 Peanuts 4 2-itemsets Milk 6 2-Itemset Count Fruit 6 Bread, Jam 4 Jam 5 Peanuts, Fruit 4 Soda 6 Milk, Fruit 5 Chips 4 Milk, Jam 4 Steak 1 Milk, Soda 5 Cheese 1 Fruit, Soda 4 Yogurt 1 Jam, Soda 4 3-itemsets Soda, Chips 4 Minimum Support = 4 3-Itemset Count Milk, Fruit, Soda 4 Prof. Pier Luca Lanzi
  • 19. Apriori Algorithm 21  Let k=1  Generate frequent itemsets of length 1  Repeat until no new frequent itemsets are identified Generate length (k+1) candidate itemsets from length k frequent itemsets Prune candidate itemsets containing subsets of length k that are infrequent Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent Prof. Pier Luca Lanzi
  • 20. The Apriori Algorithm 22 Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk != ; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk; Prof. Pier Luca Lanzi
  • 21. The Apriori Algorithm 23  Join Step Ck is generated by joining Lk-1 with itself  Prune Step Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset Prof. Pier Luca Lanzi
  • 22. Efficient Implementation of Apriori in SQL 24  Hard to get good performance out of pure SQL (SQL-92) based approaches alone  Make use of object-relational extensions like UDFs, BLOBs, Table functions etc. Get orders of magnitude improvement  S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. In SIGMOD’98 Prof. Pier Luca Lanzi
  • 23. How to Improve 25 Apriori’s Efficiency  Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent  Transaction reduction: A transaction that does not contain any frequent k- itemset is useless in subsequent scans  Partitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB  Sampling: mining on a subset of given data, lower support threshold + a method to determine the completeness  Dynamic itemset counting: add new candidate itemsets only when all of their subsets are estimated to be frequent Prof. Pier Luca Lanzi
  • 24. Rule Generation 26  Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement  If {A,B,C,D} is a frequent itemset, candidate rules: ABC D, ABD C, ACD B, BCD A, A BCD, B ACD, C ABD, D ABC, AB CD, AC BD, AD BC, BC AD, BD AC, CD AB  If |L| = k, then there are 2k – 2 candidate association rules (ignoring L and L) Prof. Pier Luca Lanzi
  • 25. How to efficiently generate rules from 28 frequent itemsets?  Confidence does not have an anti-monotone property  c(ABC D) can be larger or smaller than c(AB D)  But confidence of rules generated from the same itemset has an anti-monotone property e.g., L = {A,B,C,D}: c(ABC D) c(AB CD) c(A BCD) Confidence is anti-monotone with respect to the number of items on the right hand side of the rule Prof. Pier Luca Lanzi
  • 26. Rule Generation for Apriori Algorithm 29 ABCD=>{ } Low Confidence Rule BCD=>A ACD=>B ABD=>C ABC=>D CD=>AB BD=>AC BC=>AD AD=>BC AC=>BD AB=>CD D=>ABC C=>ABD B=>ACD A=>BCD Pruned Rules Prof. Pier Luca Lanzi
  • 27. Rule Generation for Apriori Algorithm 30  Candidate rule is generated by merging two rules that share the same prefix in the rule consequent  join(CD=>AB,BD=>AC) CD=>AB BD=>AC would produce the candidate rule D => ABC  Prune rule D=>ABC if its subset AD=>BC does not have high confidence D=>ABC Prof. Pier Luca Lanzi
  • 28. Effect of Support Distribution 31  Many real data sets have skewed support distribution Support distribution of a retail data set Prof. Pier Luca Lanzi
  • 29. How to set the appropriate minsup? 32  If minsup is set too high, we could miss itemsets involving interesting rare items (e.g., expensive products)  If minsup is set too low, it is computationally expensive and the number of itemsets is very large  A single minimum support threshold may not be effective Prof. Pier Luca Lanzi