SlideShare a Scribd company logo
1 of 16
(Association Rule)
•
    •    PC
•                        HDD
    •
•   30        500
    •
    •           30         500     2


•        ALDH


•               A    B                 A
         A=C∧D
•   Market Basket Analysis
    •   Frequent Pattern Mining
•



           1                      2




           3                      4
•
    •                               [        =2%,   =60%]
•   A B
    • A:      (antecedent), B:     (consequent)
•            support
    •                            A B


    •
•           confidence
    •   A                               B


    •
(1/2)
                                     TID       item
•                                    T100    I1, I2, I5
          I = {I1 , I2 , ..., Im }
                                     T200    I2, I4
•             D                      T300    I2, I4
      T                              T400    I1, I2, I4
•                         T          T500    I1, I3
            T ⊆I                     T600    I2, I3
                                     T700    I1, I3
•                     T       A
                                     T800    I1, I2, I3, I5
     A⊆T                             T900    I1, I2, I3

                                     I = {I1, I2, I3, I4, I5}
•                     itemset
                                     T100 : {I1, I2, I5}
•   itemset   k
          k-itemset
(2/2)                        TID      item

 •             A⇒B                           T100
                                             T200
                                                      I1, I2, I5
                                                      I2, I4
             A ⊂ I, B ⊂ I, A ∩ B = φ
                                             T300     I2, I4
 •    A⇒B
                                             T400     I1, I2, I4
    support(A ⇒ B) = P (A ∪ B)               T500     I1, I3
 conf idence(A ⇒ B) = P (B | A)              T600     I2, I3
                                             T700     I1, I3
 A = {I1} , B = {I2} , A ∪ B = {I1, I2}      T800     I1, I2, I3, I5
 P (A ∪ B) = 4/9 P (B | A) = 4/6             T900     I1, I2, I3


  •
                                   support(A ∪ B)   support count(A ∪ B)
conf idence(A ⇒ B) = P (B | A) =                  =
                                     support(A)       support count(A)
•
•                          A∪B                                     A B,
    B A


    •
•                     (min_sup)
    •    min_sup                     itemset
•                itemset
    1. item           100                           2^100-1
    2.           9-itemset {a1, a2, .., a9}    min_sup               {a1}
             {a2} {a1,a2} {a1, a9} {a1, a2, a3} ...      min_sup
         •                 itemset
•
•
•
•
•
    •
•
•
    •
    •
•
Apriori: Overview-1
•             min_sup          itemset
    •   Agrawal & Srikant 1994
•
•              min_sup = 2
    TID        item
                                1.                D
    T100      I1, I2, I5                 1-itemset
    T200      I2, I4                     C1
    T300      I2, I4                       Itemset    Sup. count
    T400      I1, I2, I4
                                              {I1}        6
    T500      I1, I3
                                              {I2}        7
    T600      I2, I3
    T700      I1, I3                          {I3}        6
    T800      I1, I2, I3, I5                  {I4}        2
    T900      I1, I2, I3                      {I5}        2
Apriori: Overview-2
        2. min_sup                     Itemset
        3.            k-itemset           (k+1)-itemset

C1                                L1                      C2
Itemset      Sup. count           Itemset   Sup. count         Itemset

 {I1}            6                 {I1}          6             {I1,I2}
 {I2}            7                 {I2}          7             {I1,I3}
 {I3}            6                 {I3}          6             {I1, I4}
 {I4}            2                 {I4}          2             {I1, I5}
 {I5}            2                 {I5}          2             {I2, I3}
                                                               {I2, I4}
                                                               {I2, I5}
                                                               {I3, I4}
                                                               {I3, I5}
                                                               {I4, I5}
Apriori: Overview-3
 4.                                   Itemset
       • DB                 HDD
       •
 5. min_sup                                 itemset
C2                                              L2
     Itemset      Itemset    Sup. Count          Itemset    Sup. Count

 {I1,I2}         {I1,I2}          4              {I1,I2}        4
 {I1,I3}         {I1,I3}          4              {I1,I3}        4
 {I1, I4}        {I1, I4}         1              {I1, I5}       2
 {I1, I5}        {I1, I5}         2              {I2, I3}       4
 {I2, I3}        {I2, I3}         4              {I2, I4}       2
 {I2, I4}        {I2, I4}         2              {I2, I5}       2
 {I2, I5}        {I2, I5}         2
 {I3, I4}        {I3, I4}         0
 {I3, I5}        {I3, I5}         1
 {I4, I5}        {I4, I5}         0
Apriori: Overview-4
     •   L1, L2, L3        min_sup             itemset
     •   L1      C2, L2     C3, L3      C4


L2                              C3                       L3
 Itemset      Sup. Count             Itemset               Itemset      Sup. Count
{I1,I2}           4              {I1,I2, I3}             {I1,I2, I3}        2
{I1,I3}           4              {I1,I2, I5}             {I1,I2, I5}        2
{I1, I5}          2
{I2, I3}          4
                                                           C4
{I2, I4}          2
                                                                       Itemset
{I2, I5}          2
Apriori                     {I1, I2, I3, I5}

                  2            2
      {I1, I2, I3} {I1, I2, I5} {I1, I3, I5}           {I2, I3, I4}     {I2, I3, I5}



       4       4        1        2        4        2       2      0        1       0
{I1, I2} {I1,I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2,I5} {I3,I4} {I3,I5} {I4,I5}




            6                7             6              2               2
        {I1}          {I2}          {I3}              {I4}            {I5}

                                     {}
           •        ×        DB                          min_sup
                                            itemset
           •        ×        k-itemset                                itemset
                                          itemset
Apriori: Pruning Phase
  • k-itemset            (k+1)-itemset
            itemset              k-1 item             2        (k+1)-itemset
      •   {I1, I2}, {I1,I3} I1                         {I1, I2, I3}
      •   {I1, I2, I3}, {I1, I2, I5} I1,I2                         {I1, I2, I3, I5}
  •                (k+1)-itemset                                   k-itemset({I1, I2, I3}
            {I1, I2}, {I1, I3}, {I2, I3}) k-itemset
      •
      • {I1, I3, I5}        {I3, I5}                            {I1, I3, I5} min_sup


       {I1, I2, I3}      {I1, I2, I5}     {I1, I3, I5}    {I2, I3, I4}    {I2, I3, I5}




{I1, I2} {I1,I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2,I5} {I3,I4} {I3,I5} {I4,I5}
     4       4        1        2        4       2       2       0        1      0
•   1-itemset
              DIC                               2-itemset
S. Brin, R. Motowani, J. Ullman, and
             S. Tsur. 1997
                                           •          {I2} {I4}                                min_sup
                                              {12,14}                                              . {12,I4}
                                              min_sup
                                       •                  DB



      TID           item                   Apriori                                              DIC
     T100        I1, I2, I5
     T200        I2, I4
                                               1-itemset

                                                           2-itemset

                                                                       3-itemset
     T300        I2, I4




                                                                                   1-itemset



                                                                                                    2-itemset


                                                                                                                3-itemset
     T400        I1, I2, I4
     T500        I1, I3
     T600        I2, I3
     T700        I1, I3
     T800        I1, I2, I3, I5
     T900        I1, I2, I3
•
    •   Hash
        •   (k+1)-itemset             k-itemset
    •
        •           PC
    •   Heap                                itemset
        •   FP-tree (J.Han, J. Pei and Y. Yin. 2000)
•
    •
        •
            •   S.Brin, R. Motwani and C. Silverstein. 1997
            •   S. Morishita and J. Sese. 2000

More Related Content

More from sesejun

RNAseqによる変動遺伝子抽出の統計: A Review
RNAseqによる変動遺伝子抽出の統計: A ReviewRNAseqによる変動遺伝子抽出の統計: A Review
RNAseqによる変動遺伝子抽出の統計: A Reviewsesejun
 
バイオインフォマティクスによる遺伝子発現解析
バイオインフォマティクスによる遺伝子発現解析バイオインフォマティクスによる遺伝子発現解析
バイオインフォマティクスによる遺伝子発現解析sesejun
 
次世代シーケンサが求める機械学習
次世代シーケンサが求める機械学習次世代シーケンサが求める機械学習
次世代シーケンサが求める機械学習sesejun
 
20110602labseminar pub
20110602labseminar pub20110602labseminar pub
20110602labseminar pubsesejun
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pubsesejun
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
20110214nips2010 read
20110214nips2010 read20110214nips2010 read
20110214nips2010 readsesejun
 
Datamining 8th hclustering
Datamining 8th hclusteringDatamining 8th hclustering
Datamining 8th hclusteringsesejun
 
Datamining r 4th
Datamining r 4thDatamining r 4th
Datamining r 4thsesejun
 
Datamining r 3rd
Datamining r 3rdDatamining r 3rd
Datamining r 3rdsesejun
 
Datamining r 2nd
Datamining r 2ndDatamining r 2nd
Datamining r 2ndsesejun
 
Datamining r 1st
Datamining r 1stDatamining r 1st
Datamining r 1stsesejun
 
Datamining 5th knn
Datamining 5th knnDatamining 5th knn
Datamining 5th knnsesejun
 
Datamining 4th adaboost
Datamining 4th adaboostDatamining 4th adaboost
Datamining 4th adaboostsesejun
 
Datamining 3rd naivebayes
Datamining 3rd naivebayesDatamining 3rd naivebayes
Datamining 3rd naivebayessesejun
 
Datamining 2nd decisiontree
Datamining 2nd decisiontreeDatamining 2nd decisiontree
Datamining 2nd decisiontreesesejun
 
Datamining 7th kmeans
Datamining 7th kmeansDatamining 7th kmeans
Datamining 7th kmeanssesejun
 
100401 Bioinfoinfra
100401 Bioinfoinfra100401 Bioinfoinfra
100401 Bioinfoinfrasesejun
 
Datamining 8th Hclustering
Datamining 8th HclusteringDatamining 8th Hclustering
Datamining 8th Hclusteringsesejun
 
Datamining 9th Association Rule
Datamining 9th Association RuleDatamining 9th Association Rule
Datamining 9th Association Rulesesejun
 

More from sesejun (20)

RNAseqによる変動遺伝子抽出の統計: A Review
RNAseqによる変動遺伝子抽出の統計: A ReviewRNAseqによる変動遺伝子抽出の統計: A Review
RNAseqによる変動遺伝子抽出の統計: A Review
 
バイオインフォマティクスによる遺伝子発現解析
バイオインフォマティクスによる遺伝子発現解析バイオインフォマティクスによる遺伝子発現解析
バイオインフォマティクスによる遺伝子発現解析
 
次世代シーケンサが求める機械学習
次世代シーケンサが求める機械学習次世代シーケンサが求める機械学習
次世代シーケンサが求める機械学習
 
20110602labseminar pub
20110602labseminar pub20110602labseminar pub
20110602labseminar pub
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
20110214nips2010 read
20110214nips2010 read20110214nips2010 read
20110214nips2010 read
 
Datamining 8th hclustering
Datamining 8th hclusteringDatamining 8th hclustering
Datamining 8th hclustering
 
Datamining r 4th
Datamining r 4thDatamining r 4th
Datamining r 4th
 
Datamining r 3rd
Datamining r 3rdDatamining r 3rd
Datamining r 3rd
 
Datamining r 2nd
Datamining r 2ndDatamining r 2nd
Datamining r 2nd
 
Datamining r 1st
Datamining r 1stDatamining r 1st
Datamining r 1st
 
Datamining 5th knn
Datamining 5th knnDatamining 5th knn
Datamining 5th knn
 
Datamining 4th adaboost
Datamining 4th adaboostDatamining 4th adaboost
Datamining 4th adaboost
 
Datamining 3rd naivebayes
Datamining 3rd naivebayesDatamining 3rd naivebayes
Datamining 3rd naivebayes
 
Datamining 2nd decisiontree
Datamining 2nd decisiontreeDatamining 2nd decisiontree
Datamining 2nd decisiontree
 
Datamining 7th kmeans
Datamining 7th kmeansDatamining 7th kmeans
Datamining 7th kmeans
 
100401 Bioinfoinfra
100401 Bioinfoinfra100401 Bioinfoinfra
100401 Bioinfoinfra
 
Datamining 8th Hclustering
Datamining 8th HclusteringDatamining 8th Hclustering
Datamining 8th Hclustering
 
Datamining 9th Association Rule
Datamining 9th Association RuleDatamining 9th Association Rule
Datamining 9th Association Rule
 

Datamining 9th Association Rule

  • 1.
  • 2. (Association Rule) • • PC • HDD • • 30 500 • • 30 500 2 • ALDH • A B A A=C∧D
  • 3. Market Basket Analysis • Frequent Pattern Mining • 1 2 3 4
  • 4. • [ =2%, =60%] • A B • A: (antecedent), B: (consequent) • support • A B • • confidence • A B •
  • 5. (1/2) TID item • T100 I1, I2, I5 I = {I1 , I2 , ..., Im } T200 I2, I4 • D T300 I2, I4 T T400 I1, I2, I4 • T T500 I1, I3 T ⊆I T600 I2, I3 T700 I1, I3 • T A T800 I1, I2, I3, I5 A⊆T T900 I1, I2, I3 I = {I1, I2, I3, I4, I5} • itemset T100 : {I1, I2, I5} • itemset k k-itemset
  • 6. (2/2) TID item • A⇒B T100 T200 I1, I2, I5 I2, I4 A ⊂ I, B ⊂ I, A ∩ B = φ T300 I2, I4 • A⇒B T400 I1, I2, I4 support(A ⇒ B) = P (A ∪ B) T500 I1, I3 conf idence(A ⇒ B) = P (B | A) T600 I2, I3 T700 I1, I3 A = {I1} , B = {I2} , A ∪ B = {I1, I2} T800 I1, I2, I3, I5 P (A ∪ B) = 4/9 P (B | A) = 4/6 T900 I1, I2, I3 • support(A ∪ B) support count(A ∪ B) conf idence(A ⇒ B) = P (B | A) = = support(A) support count(A)
  • 7. • • A∪B A B, B A • • (min_sup) • min_sup itemset • itemset 1. item 100 2^100-1 2. 9-itemset {a1, a2, .., a9} min_sup {a1} {a2} {a1,a2} {a1, a9} {a1, a2, a3} ... min_sup • itemset
  • 8. • • • • • • • • • • •
  • 9. Apriori: Overview-1 • min_sup itemset • Agrawal & Srikant 1994 • • min_sup = 2 TID item 1. D T100 I1, I2, I5 1-itemset T200 I2, I4 C1 T300 I2, I4 Itemset Sup. count T400 I1, I2, I4 {I1} 6 T500 I1, I3 {I2} 7 T600 I2, I3 T700 I1, I3 {I3} 6 T800 I1, I2, I3, I5 {I4} 2 T900 I1, I2, I3 {I5} 2
  • 10. Apriori: Overview-2 2. min_sup Itemset 3. k-itemset (k+1)-itemset C1 L1 C2 Itemset Sup. count Itemset Sup. count Itemset {I1} 6 {I1} 6 {I1,I2} {I2} 7 {I2} 7 {I1,I3} {I3} 6 {I3} 6 {I1, I4} {I4} 2 {I4} 2 {I1, I5} {I5} 2 {I5} 2 {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5}
  • 11. Apriori: Overview-3 4. Itemset • DB HDD • 5. min_sup itemset C2 L2 Itemset Itemset Sup. Count Itemset Sup. Count {I1,I2} {I1,I2} 4 {I1,I2} 4 {I1,I3} {I1,I3} 4 {I1,I3} 4 {I1, I4} {I1, I4} 1 {I1, I5} 2 {I1, I5} {I1, I5} 2 {I2, I3} 4 {I2, I3} {I2, I3} 4 {I2, I4} 2 {I2, I4} {I2, I4} 2 {I2, I5} 2 {I2, I5} {I2, I5} 2 {I3, I4} {I3, I4} 0 {I3, I5} {I3, I5} 1 {I4, I5} {I4, I5} 0
  • 12. Apriori: Overview-4 • L1, L2, L3 min_sup itemset • L1 C2, L2 C3, L3 C4 L2 C3 L3 Itemset Sup. Count Itemset Itemset Sup. Count {I1,I2} 4 {I1,I2, I3} {I1,I2, I3} 2 {I1,I3} 4 {I1,I2, I5} {I1,I2, I5} 2 {I1, I5} 2 {I2, I3} 4 C4 {I2, I4} 2 Itemset {I2, I5} 2
  • 13. Apriori {I1, I2, I3, I5} 2 2 {I1, I2, I3} {I1, I2, I5} {I1, I3, I5} {I2, I3, I4} {I2, I3, I5} 4 4 1 2 4 2 2 0 1 0 {I1, I2} {I1,I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2,I5} {I3,I4} {I3,I5} {I4,I5} 6 7 6 2 2 {I1} {I2} {I3} {I4} {I5} {} • × DB min_sup itemset • × k-itemset itemset itemset
  • 14. Apriori: Pruning Phase • k-itemset (k+1)-itemset itemset k-1 item 2 (k+1)-itemset • {I1, I2}, {I1,I3} I1 {I1, I2, I3} • {I1, I2, I3}, {I1, I2, I5} I1,I2 {I1, I2, I3, I5} • (k+1)-itemset k-itemset({I1, I2, I3} {I1, I2}, {I1, I3}, {I2, I3}) k-itemset • • {I1, I3, I5} {I3, I5} {I1, I3, I5} min_sup {I1, I2, I3} {I1, I2, I5} {I1, I3, I5} {I2, I3, I4} {I2, I3, I5} {I1, I2} {I1,I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2,I5} {I3,I4} {I3,I5} {I4,I5} 4 4 1 2 4 2 2 0 1 0
  • 15. 1-itemset DIC 2-itemset S. Brin, R. Motowani, J. Ullman, and S. Tsur. 1997 • {I2} {I4} min_sup {12,14} . {12,I4} min_sup • DB TID item Apriori DIC T100 I1, I2, I5 T200 I2, I4 1-itemset 2-itemset 3-itemset T300 I2, I4 1-itemset 2-itemset 3-itemset T400 I1, I2, I4 T500 I1, I3 T600 I2, I3 T700 I1, I3 T800 I1, I2, I3, I5 T900 I1, I2, I3
  • 16. • Hash • (k+1)-itemset k-itemset • • PC • Heap itemset • FP-tree (J.Han, J. Pei and Y. Yin. 2000) • • • • S.Brin, R. Motwani and C. Silverstein. 1997 • S. Morishita and J. Sese. 2000