SlideShare a Scribd company logo
Introduction to Machine
       Learning
                    Lecture 13
    Introduction to Association Rules

                  Albert Orriols i Puig
                 aorriols@salle.url.edu
                     i l @ ll       ld

        Artificial Intelligence – Machine Learning
            Enginyeria i Arquitectura La Salle
                gy           q
                   Universitat Ramon Llull
Recap of Lecture 5-12

                          LET’S START WITH DATA
                             CLASSIFICATION




                                                               Slide 2
Artificial Intelligence                     Machine Learning
Recap of Lecture 5-12
                  Data Set          Classification Model        How?




We have seen four different types of approaches to classification :
         • Decision trees (C4.5)
         • Instance-based algorithms (kNN & CBR)
           Instance based
         • Bayesian classifiers (Naïve Bayes)
         •N
          Neural N t
               l Networks (P
                       k (Perceptron, Ad li
                                t     Adaline, M d li
                                               Madaline, SVM)

                                                                       Slide 3
Artificial Intelligence               Machine Learning
Today’s Agenda


        Introduction to Association Rules
        A Taxonomy of Association Rules
        Measures of Interest
        Apriori




                                                  Slide 4
Artificial Intelligence        Machine Learning
Introduction to AR
        Ideas come from the market basket analysis (
                                              y    (MBA)
                                                       )
                Let’s go shopping!

           Milk, eggs, sugar,
                 bread
                                 Milk, eggs, cereal,        Eggs, sugar
                                        bread
                                        bd




              Customer1

                                     Customer2               Customer3

                What do my customer buy? Which product are bought together?
                Aim: Find associations and correlations between t e d e e t
                         d assoc at o s a d co e at o s bet ee the different
                items that customers place in their shopping basket
                                                                          Slide 5
Artificial Intelligence                Machine Learning
Introduction to AR
        Formalizing the problem a little bit
                  g     p
                Transaction Database T: a set of transactions T = {t1, t2, …, tn}
                Each transaction contains a set of items I (it
                E ht        ti      ti        t f it       (item set)
                                                                   t)
                An itemset is a collection of items I = {i1, i2, …, im}


        General aim:
                Find frequent/interesting patterns, associations, correlations, or
                causal structures among sets of items or elements in
                databases or other information repositories.
                Put this relationships in terms of association rules
                          X⇒ Y



                                                                              Slide 6
Artificial Intelligence                  Machine Learning
Example of AR

       TID        Items                                     Examples:
       T1         bread, jelly, peanut-butter
                                                                   bread ⇒ peanut-butter
                                                                           peanut butter
       T2         bread, peanut-butter
                                                                   beer ⇒ bread
       T3         bread, milk, peanut-butter
       T4         beer, bread
       T5         beer, milk




        Frequent itemsets: Items that frequently appear together
                I = {bread, peanut-butter}
                    {bread
                I = {beer, bread}


                                                                                           Slide 7
Artificial Intelligence                         Machine Learning
What’s an Interesting Rule?
        Support count (σ)
          pp          ()                                         TID   Items
                                                                 T1    bread, jelly, peanut-butter
                Frequency of occurrence of
                a d e se
                and itemset                                      T2    bread, peanut-butter
                                                                            ,p
                          σ ({bread, peanut-butter}) = 3         T3    bread, milk, peanut-butter
                                                                 T4    beer, bread
                          σ ({beer, bread}) = 1
                            ({    ,      })
                                                                 T5    beer, milk
        Support
                Fraction f t
                F ti of transactions that
                                   ti th t
                contain an itemset
                          s ({bread peanut butter}) = 3/5
                            ({bread,peanut-butter})
                          s ({beer, bread}) = 1/5

        Frequent itemset
        F      t it    t
                An itemset whose support is greater than or equal to a
                minimum support threshold (minsup)
                                                                                         Slide 8
Artificial Intelligence                       Machine Learning
What’s an Interesting Rule?
        An association rule is an                            TID   Items
        implication of two itemsets                          T1    bread, jelly, peanut-butter

                X⇒Y                                          T2    bread, peanut-butter
                                                                        ,p
                                                             T3    bread, milk, peanut-butter
                                                             T4    beer, bread
        Many measures of interest.                           T5    beer, milk
        The two most used are:
                Support (s)
                                                                         σ (X ∪Y )
                   The occurring frequency of the rule,
                                                                    s=
                   i.e., number of transactions that
                                                                           # of trans.
                   contain both X and Y
                Confidence (c)
                                                                      σ (X ∪Y )
                   The strength of the association,
                                                                   c=
                                                                        σ (X)
                   i.e.,
                   i e measures of how often items in Y
                   appear in transactions that contain X
                                                                                       Slide 9
Artificial Intelligence                   Machine Learning
Interestingness of Rules
                                                            TID   Items
                      TID           s                   c   T1    bread, jelly, peanut-butter
bread ⇒ peanut-butter              0.60            0.75     T2    bread, peanut-butter
peanut-butter ⇒ bread              0.60            1.00     T3    bread, milk, peanut-butter
beer ⇒ bread                       0.20            0.50     T4    beer, bread
peanut-butter ⇒ jelly              0.20            0.33     T5    beer, milk
jelly ⇒ peanut-butter              0.20            1.00
j ll ⇒ milk
jelly   ilk                        0.00
                                   0 00            0.00
                                                   0 00



        Many other interesting measures
                The method presented herein are based on these two
                approaches



                                                                                    Slide 10
Artificial Intelligence              Machine Learning
Types of AR
        Binary association rules:
             y
                bread ⇒ peanut-butter


        Quantitative association rules:
                weight in [70kg – 90kg] ⇒ height in [170cm – 190cm]


        Fuzzy association rules:
                weight in TALL ⇒ height in TALL


        Let’s start for the beginning
                Binary association rules – A priori
                Bi          i ti     l         ii

                                                                      Slide 11
Artificial Intelligence                 Machine Learning
Apriori
        This is the most influential AR miner
        It consists of two steps
                 Generate all f
                 G         ll frequent i
                                       itemsets whose support ≥ minsup
                                                 h               i
        1.

                 Use frequent itemsets to generate association rules
        2.



        So, let’s
        So let s pay attention to the first step




                                                                         Slide 12
Artificial Intelligence                Machine Learning
Apriori
                                              null




               A                 B                C             D            E




AB              AC         AD        AE     BC          BD     BE     CD    CE        DE




ABC            ABD         ABE       ACD    ACE        ADE     BCD    BCE   BDE       CDE




            ABCD                 ABCE        ABDE              ACDE         BCDE




                                            ABCDE

                     Given d items, we have 2d possible itemsets.
                           Do I have to generate them all?
                                                                                   Slide 13
 Artificial Intelligence                    Machine Learning
Apriori
        Let’s avoid expanding all the graph
                      p     g         gp
        Key idea:
                Downward closure property: A subsets of a f
                D         dl               Any b      f frequent itemset
                                                                 i
                are also frequent itemsets


        Therefore, the algorithm iteratively does:
                Create itemsets
                Only continue exploration of those whose support ≥ minsup




                                                                        Slide 14
Artificial Intelligence               Machine Learning
Example Itemset Generation
                                             null
        Infrequent
          itemset

               A                 B               C             D            E




AB              AC         AD        AE    BC          BD     BE     CD    CE        DE




ABC            ABD         ABE       ACD   ACE        ADE     BCD    BCE   BDE       CDE




            ABCD                 ABCE       ABDE              ACDE         BCDE




                                            ABCD

                     Given d items, we have 2d possible itemsets.
                           Do I have to generate them all?
                                                                                  Slide 15
 Artificial Intelligence                   Machine Learning
Recovering the Example
                                                     TID    Items
                                                     T1     bread, jelly, peanut-butter
                                                     T2     bread, peanut-butter
                                                     T3     bread, ilk
                                                            b d milk, peanut-butter
                                                                             b
                                                     T4     beer, bread
Minimum support = 3
          pp
                                                     T5     beer, milk
                                                            b      ilk
          1-itemsets
  Item                    count
                                               2-itemsets
  bread                     4
                                  Item                 count
  peanut-b                  3
                                  bread, peanut-b           3
  jelly                     1
  milk                      1
  beer                      1




                                                                                   Slide 16
Artificial Intelligence           Machine Learning
Apriori Algorithm
        k=1
        Generate frequent itemsets of length 1
        Repeat until no frequent itemsets are found
                k := k+1
                Generate itemsets of size k from the k-1 frequent itemsets
                Compute the support of each candidate by scanning DB




                                                                             Slide 17
Artificial Intelligence               Machine Learning
Apriori Algorithm
Algorithm Apriori(T)
    C1 ← init-pass(T);
    F1 ← {f | f ∈ C1, f.count/n ≥ minsup}; // n: no. of transactions in T
    for (k = 2; Fk-1 ≠ ∅; k++) do
        Ck ← candidate-gen(Fk-1);
        for each transaction t ∈ T do
           for each candidate c ∈ Ck do
                 if c i contained i t th
                      is   t i d in then
                    c.count++;
           endd
        end
         Fk ← {c ∈ Ck | c count/n ≥ minsup}
                         c.count/n
    end
return F ← Uk Fk;

                                                                  Slide 18
Artificial Intelligence         Machine Learning
Apriori Algorithm
Function candidate-gen(Fk-1)
   Ck ← ∅;
   forall f1, f2 ∈ Fk-1
       with f1 = {i1, … , ik-2, ik-1}
       and f2 = {i1, … , ik-2, i’k-1}
       and ik-1 < i’k-1 do
      c ← {i1, …, ik-1, i’k-1};       // join f1 and f2
      Ck ← Ck ∪ {c};
      for each (k-1)-subset s of c do
       if ( ∉ Fk-1) th
          (s         then
           delete c from Ck;          // prune
      end
   end
   return Ck;

                                                          Slide 19
Artificial Intelligence         Machine Learning
Example of Apriori Run
                                          Itemset         sup
                                                                               Itemset        sup
Database TDB
Dtb                                           {A}          2        L1           {A}            2
                                   C1
Tid          Items                            {B}          3
                                                                                 {B}            3
10          A, C
            A C, D                            {C}          3
                                                                                 {C}            3
                              1st scan
20           B, C, E                          {D}          1
                                                                                    {E}         3
30        A, B, C, E                          {E}          3
40             B, E
                                         Itemset         sup
                                   C2                                          C2
                                                                                          Itemset
                                                                                           te set
                                          {A,
                                          {A B}           1
                                                                  2nd   scan
L2       Itemset            sup                                                            {A, B}
                                          {A, C}          2
          {A, C}             2                                                             {A, C}
                                          {A, E}          1
          {B,
          {B C}              2
                                                                                           {A, E}
                                          {B, C}          2
          {B, E}             3
                                                                                           {B, C}
                                          {B, E}          3
          {C, E}             2
                                          {C, E}          2                                {B,
                                                                                           {B E}
                                                                                           {C, E}

                Itemset
                 te set                             L3
      C3                           3rd scan                Itemset
                                                           It      t     sup
                {B, C, E}
                                                           {B, C, E}      2
                                                                                                    Slide 20
  Artificial Intelligence                      Machine Learning
Apriori
        Remember that Apriori consists of two steps
                       p                         p
                 Generate all frequent itemsets whose support ≥ minsup
        1.

                 Use frequent it
                 Uf         t itemsets t generate association rules
                                    t to       t       i ti     l
        2.



        We accomplished step 1. So we have all frequent
        itemsets
        So, let’s pay attention to the second step




                                                                         Slide 21
Artificial Intelligence                Machine Learning
Rule Generation in Apriori
        Given a frequent itemset L
                   q
                Find all non-empty subsets F in L, such that the association
                rule F ⇒ {L-F} sat s es the minimum confidence
                 ue       { } satisfies t e      u co de ce
                Create the rule F ⇒ {L-F}


        If L={A,B,C}
                The candidate itemsets are: AB⇒C, AC⇒B, BC⇒A, A⇒BC,
                B⇒AC, C⇒AB
                In general, there are 2K-2 candidate solutions, where k is the
                length of the itemset L




                                                                            Slide 22
Artificial Intelligence                Machine Learning
Can you Be More Efficient?
        Can we apply the same trick used with support?
                pp y                            pp
                Confidence does not have anti-monote property
                Th t is, c(AB⇒D) > c(A ⇒D)?
                That i    (AB D)    (A D)?
                          Don’t know!


        But confidence of rules generated from the same itemset
        does have the anti-monote property
        d    h     h     i
                L={A,B,C,D}
                          C(ABC⇒D) ≥ c(AB ⇒CD) ≥ c(A ⇒BCD)
                We can apply this p p y to p
                        pp y      property prune the rule g
                                                          generation




                                                                       Slide 23
Artificial Intelligence                  Machine Learning
Example of Efficient Rule Generation

                                              ABCD
   Low
confidence


             ABC⇒D                ABD⇒C                 ACD⇒B             BCD⇒A




AB⇒CD                     AC⇒BD      BC⇒AD              AD⇒BC         BD⇒AD           CD⇒AB




               A⇒BCD               B⇒ACD                      C⇒ABD           D⇒ABC




                                                                                        Slide 24
Artificial Intelligence                    Machine Learning
Challenges in AR Mining
        Challenges
               g
                Apriori scans the data base multiple times
                Most ft
                M t often, there is a high number of candidates
                           th    i    hi h    b    f    did t
                Support counting for candidates can be time expensive


        Several methods try to improve this points by
                Reduce the number of scans of the data base
                Shrink the number of candidates
                Counting the support of candidates more efficiently




                                                                        Slide 25
Artificial Intelligence                Machine Learning
Next Class



        Advanced topics in association rule mining




                                                     Slide 26
Artificial Intelligence      Machine Learning
Introduction to Machine
       Learning
                    Lecture 13
    Introduction to Association Rules

                  Albert Orriols i Puig
                 aorriols@salle.url.edu
                     i l @ ll       ld

        Artificial Intelligence – Machine Learning
            Enginyeria i Arquitectura La Salle
                gy           q
                   Universitat Ramon Llull

More Related Content

What's hot

Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Gaurav Aggarwal
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
Knoldus Inc.
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
Er. Nawaraj Bhandari
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Md. Ariful Hoque
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 

What's hot (20)

Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Decision tree
Decision treeDecision tree
Decision tree
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 

Viewers also liked

Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
Prof.Nilesh Magar
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Ashis Kumar Chanda
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Junghoon Kim
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
nouraalkhatib
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 

Viewers also liked (7)

Frequent itemset mining methods
Frequent itemset mining methodsFrequent itemset mining methods
Frequent itemset mining methods
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 

More from Albert Orriols-Puig (20)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture23
Lecture23Lecture23
Lecture23
 
Lecture22
Lecture22Lecture22
Lecture22
 
Lecture21
Lecture21Lecture21
Lecture21
 
Lecture20
Lecture20Lecture20
Lecture20
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture18
Lecture18Lecture18
Lecture18
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART IIILecture16 - Advances topics on association rules PART III
Lecture16 - Advances topics on association rules PART III
 
Lecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART IILecture15 - Advances topics on association rules PART II
Lecture15 - Advances topics on association rules PART II
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Lecture7 - IBk
Lecture7 - IBkLecture7 - IBk
Lecture7 - IBk
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 

Recently uploaded

Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 

Recently uploaded (20)

Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 

Lecture13 - Association Rules

  • 1. Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  • 2. Recap of Lecture 5-12 LET’S START WITH DATA CLASSIFICATION Slide 2 Artificial Intelligence Machine Learning
  • 3. Recap of Lecture 5-12 Data Set Classification Model How? We have seen four different types of approaches to classification : • Decision trees (C4.5) • Instance-based algorithms (kNN & CBR) Instance based • Bayesian classifiers (Naïve Bayes) •N Neural N t l Networks (P k (Perceptron, Ad li t Adaline, M d li Madaline, SVM) Slide 3 Artificial Intelligence Machine Learning
  • 4. Today’s Agenda Introduction to Association Rules A Taxonomy of Association Rules Measures of Interest Apriori Slide 4 Artificial Intelligence Machine Learning
  • 5. Introduction to AR Ideas come from the market basket analysis ( y (MBA) ) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, Eggs, sugar bread bd Customer1 Customer2 Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between t e d e e t d assoc at o s a d co e at o s bet ee the different items that customers place in their shopping basket Slide 5 Artificial Intelligence Machine Learning
  • 6. Introduction to AR Formalizing the problem a little bit g p Transaction Database T: a set of transactions T = {t1, t2, …, tn} Each transaction contains a set of items I (it E ht ti ti t f it (item set) t) An itemset is a collection of items I = {i1, i2, …, im} General aim: Find frequent/interesting patterns, associations, correlations, or causal structures among sets of items or elements in databases or other information repositories. Put this relationships in terms of association rules X⇒ Y Slide 6 Artificial Intelligence Machine Learning
  • 7. Example of AR TID Items Examples: T1 bread, jelly, peanut-butter bread ⇒ peanut-butter peanut butter T2 bread, peanut-butter beer ⇒ bread T3 bread, milk, peanut-butter T4 beer, bread T5 beer, milk Frequent itemsets: Items that frequently appear together I = {bread, peanut-butter} {bread I = {beer, bread} Slide 7 Artificial Intelligence Machine Learning
  • 8. What’s an Interesting Rule? Support count (σ) pp () TID Items T1 bread, jelly, peanut-butter Frequency of occurrence of a d e se and itemset T2 bread, peanut-butter ,p σ ({bread, peanut-butter}) = 3 T3 bread, milk, peanut-butter T4 beer, bread σ ({beer, bread}) = 1 ({ , }) T5 beer, milk Support Fraction f t F ti of transactions that ti th t contain an itemset s ({bread peanut butter}) = 3/5 ({bread,peanut-butter}) s ({beer, bread}) = 1/5 Frequent itemset F t it t An itemset whose support is greater than or equal to a minimum support threshold (minsup) Slide 8 Artificial Intelligence Machine Learning
  • 9. What’s an Interesting Rule? An association rule is an TID Items implication of two itemsets T1 bread, jelly, peanut-butter X⇒Y T2 bread, peanut-butter ,p T3 bread, milk, peanut-butter T4 beer, bread Many measures of interest. T5 beer, milk The two most used are: Support (s) σ (X ∪Y ) The occurring frequency of the rule, s= i.e., number of transactions that # of trans. contain both X and Y Confidence (c) σ (X ∪Y ) The strength of the association, c= σ (X) i.e., i e measures of how often items in Y appear in transactions that contain X Slide 9 Artificial Intelligence Machine Learning
  • 10. Interestingness of Rules TID Items TID s c T1 bread, jelly, peanut-butter bread ⇒ peanut-butter 0.60 0.75 T2 bread, peanut-butter peanut-butter ⇒ bread 0.60 1.00 T3 bread, milk, peanut-butter beer ⇒ bread 0.20 0.50 T4 beer, bread peanut-butter ⇒ jelly 0.20 0.33 T5 beer, milk jelly ⇒ peanut-butter 0.20 1.00 j ll ⇒ milk jelly ilk 0.00 0 00 0.00 0 00 Many other interesting measures The method presented herein are based on these two approaches Slide 10 Artificial Intelligence Machine Learning
  • 11. Types of AR Binary association rules: y bread ⇒ peanut-butter Quantitative association rules: weight in [70kg – 90kg] ⇒ height in [170cm – 190cm] Fuzzy association rules: weight in TALL ⇒ height in TALL Let’s start for the beginning Binary association rules – A priori Bi i ti l ii Slide 11 Artificial Intelligence Machine Learning
  • 12. Apriori This is the most influential AR miner It consists of two steps Generate all f G ll frequent i itemsets whose support ≥ minsup h i 1. Use frequent itemsets to generate association rules 2. So, let’s So let s pay attention to the first step Slide 12 Artificial Intelligence Machine Learning
  • 13. Apriori null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, we have 2d possible itemsets. Do I have to generate them all? Slide 13 Artificial Intelligence Machine Learning
  • 14. Apriori Let’s avoid expanding all the graph p g gp Key idea: Downward closure property: A subsets of a f D dl Any b f frequent itemset i are also frequent itemsets Therefore, the algorithm iteratively does: Create itemsets Only continue exploration of those whose support ≥ minsup Slide 14 Artificial Intelligence Machine Learning
  • 15. Example Itemset Generation null Infrequent itemset A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCD Given d items, we have 2d possible itemsets. Do I have to generate them all? Slide 15 Artificial Intelligence Machine Learning
  • 16. Recovering the Example TID Items T1 bread, jelly, peanut-butter T2 bread, peanut-butter T3 bread, ilk b d milk, peanut-butter b T4 beer, bread Minimum support = 3 pp T5 beer, milk b ilk 1-itemsets Item count 2-itemsets bread 4 Item count peanut-b 3 bread, peanut-b 3 jelly 1 milk 1 beer 1 Slide 16 Artificial Intelligence Machine Learning
  • 17. Apriori Algorithm k=1 Generate frequent itemsets of length 1 Repeat until no frequent itemsets are found k := k+1 Generate itemsets of size k from the k-1 frequent itemsets Compute the support of each candidate by scanning DB Slide 17 Artificial Intelligence Machine Learning
  • 18. Apriori Algorithm Algorithm Apriori(T) C1 ← init-pass(T); F1 ← {f | f ∈ C1, f.count/n ≥ minsup}; // n: no. of transactions in T for (k = 2; Fk-1 ≠ ∅; k++) do Ck ← candidate-gen(Fk-1); for each transaction t ∈ T do for each candidate c ∈ Ck do if c i contained i t th is t i d in then c.count++; endd end Fk ← {c ∈ Ck | c count/n ≥ minsup} c.count/n end return F ← Uk Fk; Slide 18 Artificial Intelligence Machine Learning
  • 19. Apriori Algorithm Function candidate-gen(Fk-1) Ck ← ∅; forall f1, f2 ∈ Fk-1 with f1 = {i1, … , ik-2, ik-1} and f2 = {i1, … , ik-2, i’k-1} and ik-1 < i’k-1 do c ← {i1, …, ik-1, i’k-1}; // join f1 and f2 Ck ← Ck ∪ {c}; for each (k-1)-subset s of c do if ( ∉ Fk-1) th (s then delete c from Ck; // prune end end return Ck; Slide 19 Artificial Intelligence Machine Learning
  • 20. Example of Apriori Run Itemset sup Itemset sup Database TDB Dtb {A} 2 L1 {A} 2 C1 Tid Items {B} 3 {B} 3 10 A, C A C, D {C} 3 {C} 3 1st scan 20 B, C, E {D} 1 {E} 3 30 A, B, C, E {E} 3 40 B, E Itemset sup C2 C2 Itemset te set {A, {A B} 1 2nd scan L2 Itemset sup {A, B} {A, C} 2 {A, C} 2 {A, C} {A, E} 1 {B, {B C} 2 {A, E} {B, C} 2 {B, E} 3 {B, C} {B, E} 3 {C, E} 2 {C, E} 2 {B, {B E} {C, E} Itemset te set L3 C3 3rd scan Itemset It t sup {B, C, E} {B, C, E} 2 Slide 20 Artificial Intelligence Machine Learning
  • 21. Apriori Remember that Apriori consists of two steps p p Generate all frequent itemsets whose support ≥ minsup 1. Use frequent it Uf t itemsets t generate association rules t to t i ti l 2. We accomplished step 1. So we have all frequent itemsets So, let’s pay attention to the second step Slide 21 Artificial Intelligence Machine Learning
  • 22. Rule Generation in Apriori Given a frequent itemset L q Find all non-empty subsets F in L, such that the association rule F ⇒ {L-F} sat s es the minimum confidence ue { } satisfies t e u co de ce Create the rule F ⇒ {L-F} If L={A,B,C} The candidate itemsets are: AB⇒C, AC⇒B, BC⇒A, A⇒BC, B⇒AC, C⇒AB In general, there are 2K-2 candidate solutions, where k is the length of the itemset L Slide 22 Artificial Intelligence Machine Learning
  • 23. Can you Be More Efficient? Can we apply the same trick used with support? pp y pp Confidence does not have anti-monote property Th t is, c(AB⇒D) > c(A ⇒D)? That i (AB D) (A D)? Don’t know! But confidence of rules generated from the same itemset does have the anti-monote property d h h i L={A,B,C,D} C(ABC⇒D) ≥ c(AB ⇒CD) ≥ c(A ⇒BCD) We can apply this p p y to p pp y property prune the rule g generation Slide 23 Artificial Intelligence Machine Learning
  • 24. Example of Efficient Rule Generation ABCD Low confidence ABC⇒D ABD⇒C ACD⇒B BCD⇒A AB⇒CD AC⇒BD BC⇒AD AD⇒BC BD⇒AD CD⇒AB A⇒BCD B⇒ACD C⇒ABD D⇒ABC Slide 24 Artificial Intelligence Machine Learning
  • 25. Challenges in AR Mining Challenges g Apriori scans the data base multiple times Most ft M t often, there is a high number of candidates th i hi h b f did t Support counting for candidates can be time expensive Several methods try to improve this points by Reduce the number of scans of the data base Shrink the number of candidates Counting the support of candidates more efficiently Slide 25 Artificial Intelligence Machine Learning
  • 26. Next Class Advanced topics in association rule mining Slide 26 Artificial Intelligence Machine Learning
  • 27. Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull