Learning the membership
  function contexts for mining
fuzzy association rules by using
       genetic algorithms
        ...
Before we go




Thanks to Prof. Hong who provide me the second paper today.
Before we go
• T. Hong, C. Chen,Y. Wu,Y. Lee, Using divide-
  and-conquer GA strategy in fuzzy data
  mining, in: IEEE Sym...
Problem Description


       2-tuples   Quantitative
GA      model     Association
                     Rule
A Transaction Database

    TID            items
     1          Bread, Milk
     2    Bread, Diaper, Beer, Eggs
     3   ...
Association Rule Mining
                                         Examples:
TID            items
                          ...
Association Rule Mining
                                           Examples:
TID            items
                        ...
Terminology
                                     Examples:
                                  {Milk, Diaper}→{Beer}
TID    ...
Terminology
                                           Examples:
                                       {Milk, Diaper}→{Be...
Terminology
                                           Examples:
                                       {Milk, Diaper}→{Be...
Terminology
                                           Examples:
                                       {Milk, Diaper}→{Be...
Real-world
     Transaction Database
TID                   (item, quantity)
 1                  (Bread, 3), (Milk, 1)
 2  ...
Real-world
     Transaction Database
TID                   (item, quantity)
 1                  (Bread, 3), (Milk, 1)
 2  ...
Quantitative
Association
   Rule
2-tuples   Quantitative
 model     Association
              Rule
Linguistic terms

Low   Middle   High       Low   Middle   High




       age                      weight



  if age is ...
The 2-tuples linguistic
       representation

               if age is Middle then weight is High




F. Herrera, L. Mart...
The 2-tuples linguistic
       representation

               if age is Middle then weight is High

if age is (Middle, 0.3...
-1 -0.5           0.5 1

         s0   s1         s2        s3   s4

domain
         0    1           2        3    4
    ...
-1 -0.5           0.5 1

         s0   s1         s2        s3   s4
                       -0.3
domain                1.7
...
-1 -0.5            0.5 1

                s0          s1         s2             s3          s4
                           ...
-1 -0.5            0.5 1

                s0          s1         s2             s3          s4
                           ...
Interpretation


if age is (Middle, 0.3) then weight is (High, -0.1)
Interpretation


if age is (Middle, 0.3) then weight is (High, -0.1)


 if age is (higher than Middle)
 then weight is (a ...
2-tuples
 model
2-tuples
GA    model
Traditional GA
Traditional GA
       Population
    (chromosomes)
Traditional GA
       Population
    (chromosomes)
                       parents


                    Evaluation
       ...
Traditional GA
       Population
    (chromosomes)
                       parents


                    Evaluation
       ...
Traditional GA
                                   Population
                                (chromosomes)
               ...
Traditional GA
                                   Population
                                (chromosomes)
               ...
GA Used in this paper

• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover ope...
GA Used in this paper

• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover ope...
Scheme of CHC model




L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontradit...
Scheme of CHC model
Initialize population
 and THRESHOLD




     L. Eshelman, The CHC adaptive search algorithm: How to h...
Scheme of CHC model
Initialize population                          Crossover of N
 and THRESHOLD                          ...
Scheme of CHC model
Initialize population                          Crossover of N
 and THRESHOLD                          ...
Scheme of CHC model
Initialize population                          Crossover of N                                Evaluatio...
Scheme of CHC model
Initialize population                          Crossover of N                                Evaluatio...
Scheme of CHC model
Initialize population                          Crossover of N                                Evaluatio...
Scheme of CHC model
Initialize population                          Crossover of N                                Evaluatio...
Scheme of CHC model
Initialize population                          Crossover of N                                Evaluatio...
Scheme of CHC model
Initialize population                          Crossover of N                                Evaluatio...
GA Used in this paper

• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover ope...
age   L1   M1   H1                       L2       M2   H2   weight




                        L1   M1   H1   L2   M2 H2
 ...
age   L1   M1   H1                        L2       M2   H2   weight




                        L1   M1   H1   L2   M2 H2
...
Initial Gene Pool
chromosome:    (c11,...,c1m,c21,...,c2m,...,cn1,...,cnm)

          1 item with m MFs


 • initial MFs o...
Implementation:
  Gray Code
Decimal   Binary   Gray Code
  0        000        000
  1        001        001
  2        01...
Implementation:
  Gray Code
Decimal   Binary   Gray Code
  0        000        000
  1        001        001
  2        01...
GA Used in this paper

• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover ope...
Equation Mania
                                            x∈L1 f uzzy support
                  f itness(Cq ) =
         ...
m   m
                                           overlap(Ri , Rj )
overlap f actor(Cqk ) =           [max(                ...
m        m
                                           overlap(Ri , Rj )
overlap f actor(Cqk ) =           [max(           ...
m        m
                                           overlap(Ri , Rj )
overlap f actor(Cqk ) =           [max(           ...
m        m
                                           overlap(Ri , Rj )
overlap f actor(Cqk ) =           [max(           ...
m        m
                                           overlap(Ri , Rj )
overlap f actor(Cqk ) =           [max(           ...
m        m
                                           overlap(Ri , Rj )
overlap f actor(Cqk ) =           [max(           ...
m        m
                                           overlap(Ri , Rj )
overlap f actor(Cqk ) =           [max(           ...
m        m
                                           overlap(Ri , Rj )
overlap f actor(Cqk ) =           [max(           ...
1
coverage f actor(Cqk )= range(R
                              1 ,...,Rm )
                           max(Ik )
1
coverage f actor(Cqk )= range(R
                                   1 ,...,Rm )
                            max(Ik )
   q...
1
             coverage f actor(Cqk )= range(R
                                                1 ,...,Rm )
               ...
1
             coverage f actor(Cqk )= range(R
                                                1 ,...,Rm )
               ...
1
             coverage f actor(Cqk )= range(R
                                                1 ,...,Rm )
               ...
1
             coverage f actor(Cqk )= range(R
                                                1 ,...,Rm )
               ...
Fuzzy Support (count)
Fuzzy Support (count)
     DB   n item

T
Fuzzy Support (count)
     DB   n item
                    (i)
                   vj
T            ith
Fuzzy Support (count)
     DB   n item
                        (i)
                       vj
T            ith
            ...
Fuzzy Support (count)
     DB   n item
                        (i)
                       vj
T            ith
            ...
Fuzzy Support (count)
     DB   n item
                        (i)
                       vj                              ...
Fuzzy Support (count)
        DB           n item
                                    (i)
                                ...
Fuzzy Support (count)
        DB           n item
                                    (i)
                                ...
Fuzzy Support


                  x∈L1 f uzzy support
f itness(Cq ) =
                   suitability(Cq )
Fuzzy Support


                  x∈L1 f uzzy support
f itness(Cq ) =
                   suitability(Cq )


              ...
Fuzzy Support


                  x∈L1 f uzzy support
f itness(Cq ) =
                   suitability(Cq )


              ...
Fuzzy Support
                    L1


                  x∈L1 f uzzy support
f itness(Cq ) =
                   suitabilit...
Fuzzy Support
                    L1         count / T         # transaction



                  x∈L1 f uzzy support
f it...
GA Used in this paper

• CHC genetic model
• MFs codification and initial gene pool
• Chromosome evaluation
• Crossover ope...
PCBLX Crossover
    X = (x1 · · · xn ) Y = (y1 · · · yn )                   (xi , yi ∈ [ai , bi ] ⊂ R, i = 1 · · · n)
O1 =...
PCBLX Crossover
    X = (x1 · · · xn ) Y = (y1 · · · yn )                   (xi , yi ∈ [ai , bi ] ⊂ R, i = 1 · · · n)
O1 =...
Conceptual Flowchart
Conceptual Flowchart
    Learning
Membership Function
Conceptual Flowchart
      Learning
  Membership Function
                Learning
                Process
Predefined MFs

...
Conceptual Flowchart
      Learning
  Membership Function
                Learning
                Process
Predefined MFs

...
Conceptual Flowchart
      Learning                         Mining Fuzzy
  Membership Function                Association ...
Conceptual Flowchart
      Learning                                Mining Fuzzy
  Membership Function                     ...
Conceptual Flowchart
      Learning                                Mining Fuzzy
  Membership Function                     ...
Procedures
Stage 1
1. initialization
2. evaluate the initial chromosomes
     1. for all items in transaction, transfer th...
Experiments
Parameters
   Proposed                 Hong’s

• # 50 individuals     • 0.01 mutation rate
• 10,000 evaluations   • 0.35 d...
Data Set

                                                                          Bureau of the Census
                 ...
Results obtained in the
          genetic process
      Proposed approach                Hong el al.’s approach           ...
Results obtained in the
          genetic process
      Proposed approach                Hong el al.’s approach           ...
Results obtained in the
          genetic process
      Proposed approach                Hong el al.’s approach           ...
Results obtained in the
   genetic process
          Hong el al.’s approach with the 2-tuples
Support   Fitness           ...
Fitness vs Function Evaluation

                           1
Average Fitness Values.




                          0.8

  ...
Frequent 1-itemsets vs minsup
Number of Large 1-itemsets




                             20

                            ...
MFs w/o lateral displacement

             l1' = (l1,0.4) l2' = (l2,0.4) l3' = (l3,0.5)        l1' = (l1,0.0) l2' = (l2,-0...
Hong’s MFs
      l1'             l2'                  l3'                      l1'       l2'              l3'             ...
#rules vs minsup
                                                                             minconf = 0.8
              ...
#rules vs minconf
                                                                              minsup = 0.2
             ...
#rules vs minsup vs
                                    minsup
                  200000
Number of Rules




              ...
#rules vs minsup vs
                                     minsup
                  200000
Number of Rules




             ...
Time vs #Transaction

                    30.00
                    25.00
Runtime (minutes)




                    20.00
...
Time vs #Attribute

                    30.00
                    25.00
Runtime (minutes)




                    20.00
  ...
Time vs #Linguistic terms

                    70.00
Runtime (minutes)




                    60.00

                    ...
Example of Rules
                     If number if children is Low and
 Classic Fuzzy       hours head worked last week is...
Author’s conclusion
Author’s conclusion


   2-tuples linguistic
representation works!!
Discussions
T. Hong, C. Chen,Y. Wu,Y. Lee, Using divide-and-conquer GA strategy in fuzzy data mining, IEEE Symp. on Fuzzy Systems,
Bud...
Pitfalls
• domain knowledge & Symmetric assumption
• flowchart
• Hong’s method
• inadequate fitness function
• gray code and...
Pitfalls
• domain knowledge & Symmetric assumption
• flowchart
• Hong’s method
                                      n
    ...
Pitfalls
• domain knowledge & Symmetric assumption
• flowchart
• Hong’s method
• inadequate fitness function
• gray code and...
Reference
• L. Eshelman, The CHC adaptive search algorithm: How to have safe search when
    engaging in nontraditional ge...
Thank you!
Questions?
Learning The Membership Function Contexts For Mining Fuzzy Association Rules By Using Genetic Algorithms
Learning The Membership Function Contexts For Mining Fuzzy Association Rules By Using Genetic Algorithms
Learning The Membership Function Contexts For Mining Fuzzy Association Rules By Using Genetic Algorithms
Upcoming SlideShare
Loading in …5
×

Learning The Membership Function Contexts For Mining Fuzzy Association Rules By Using Genetic Algorithms

1,710 views
1,630 views

Published on

Some errors in this paper, especially the flowchart and the fitness function. I'm sorry to say that the experiments are misleading!

Published in: Education
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,710
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Learning The Membership Function Contexts For Mining Fuzzy Association Rules By Using Genetic Algorithms

  1. 1. Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms Jesús Alcalá-Fdez, Rafael Alcalá María José Gacto, Francisco Herrera Fuzzy Sets and Systems (2008), article in press Presenter: Chia-Ming Wang
  2. 2. Before we go Thanks to Prof. Hong who provide me the second paper today.
  3. 3. Before we go • T. Hong, C. Chen,Y. Wu,Y. Lee, Using divide- and-conquer GA strategy in fuzzy data mining, in: IEEE Symp. on Fuzzy Systems, Budapest, Hungary, 2004, pp. 116–121. • T. Hong, C. Kuo, S. Chi,Trade-off between time complexity and number of rules for fuzzy mining from quantitative data, Journal of Uncertain Fuzziness Knowledge-Based Systems 9 (5) (2001) 587–604. Thanks to Prof. Hong who provide me the second paper today.
  4. 4. Problem Description 2-tuples Quantitative GA model Association Rule
  5. 5. A Transaction Database TID items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  6. 6. Association Rule Mining Examples: TID items {Diaper}→{Beer} 1 Bread, Milk {Milk, Bread}→{Eggs, coke} 2 Bread, Diaper, Beer, Eggs {Beer, Bread}→{Milk} 3 Milk, Diaper, Beer, Coke X→Y, X∩Y=∅ 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  7. 7. Association Rule Mining Examples: TID items {Diaper}→{Beer} 1 Bread, Milk {Milk, Bread}→{Eggs, coke} 2 Bread, Diaper, Beer, Eggs {Beer, Bread}→{Milk} 3 Milk, Diaper, Beer, Coke X→Y, X∩Y=∅ 4 Bread, Milk, Diaper, Beer Implication means co-occurrence, 5 Bread, Milk, Diaper, Coke not causality!
  8. 8. Terminology Examples: {Milk, Diaper}→{Beer} TID items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  9. 9. Terminology Examples: {Milk, Diaper}→{Beer} TID items support 1 Bread, Milk σ{Milk, Diaper, Beer} 2 s= = = 0.4 2 Bread, Diaper, Beer, Eggs |T| 5 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  10. 10. Terminology Examples: {Milk, Diaper}→{Beer} TID items support 1 Bread, Milk σ{Milk, Diaper, Beer} 2 s= = = 0.4 2 Bread, Diaper, Beer, Eggs |T| 5 3 Milk, Diaper, Beer, Coke confident 4 Bread, Milk, Diaper, Beer c= σ{Milk, Diaper, Beer} 2 = = 0.67 σ{Milk, Diaper} 3 5 Bread, Milk, Diaper, Coke
  11. 11. Terminology Examples: {Milk, Diaper}→{Beer} TID items support 1 Bread, Milk σ{Milk, Diaper, Beer} 2 s= = = 0.4 2 Bread, Diaper, Beer, Eggs |T| 5 3 Milk, Diaper, Beer, Coke confident 4 Bread, Milk, Diaper, Beer c= σ{Milk, Diaper, Beer} 2 = = 0.67 σ{Milk, Diaper} 3 5 Bread, Milk, Diaper, Coke Itemset, minsup, minconf
  12. 12. Real-world Transaction Database TID (item, quantity) 1 (Bread, 3), (Milk, 1) 2 (Bread, 1), (Diaper, 2), (Beer, 3), (Eggs, 12) 3 (Milk,2), (Diaper, 4), (Beer, 5), (Coke, 2) 4 (Bread, 3), (Milk, 1), (Diaper, 2), (Beer, 12) 5 (Bread, 2), (Milk, 4), (Diaper, 5), (Coke, 3)
  13. 13. Real-world Transaction Database TID (item, quantity) 1 (Bread, 3), (Milk, 1) 2 Quantitative 3), (Eggs, 12) (Bread, 1), (Diaper, 2), (Beer, Association Rule 3 (Milk,2), (Diaper, 4), (Beer, 5), (Coke, 2) Mining 4 (Bread, 3), (Milk, 1), (Diaper, 2), (Beer, 12) 5 (Bread, 2), (Milk, 4), (Diaper, 5), (Coke, 3)
  14. 14. Quantitative Association Rule
  15. 15. 2-tuples Quantitative model Association Rule
  16. 16. Linguistic terms Low Middle High Low Middle High age weight if age is Middle then weight is High
  17. 17. The 2-tuples linguistic representation if age is Middle then weight is High F. Herrera, L. Martínez, A 2-tuple fuzzy linguistic representation model for computing with words, IEEE Trans. Fuzzy Systems 8 (6) (2000) 746–752.
  18. 18. The 2-tuples linguistic representation if age is Middle then weight is High if age is (Middle, 0.3) then weight is (High, -0.1) (si , αi ), si ∈ S, αi ∈ [−0.5, 0.5) F. Herrera, L. Martínez, A 2-tuple fuzzy linguistic representation model for computing with words, IEEE Trans. Fuzzy Systems 8 (6) (2000) 746–752.
  19. 19. -1 -0.5 0.5 1 s0 s1 s2 s3 s4 domain 0 1 2 3 4 (s2, -0.3)
  20. 20. -1 -0.5 0.5 1 s0 s1 s2 s3 s4 -0.3 domain 1.7 0 1 2 3 4 (s2, -0.3)
  21. 21. -1 -0.5 0.5 1 s0 s1 s2 s3 s4 -0.3 domain 1.7 0 1 2 3 4 (s2, -0.3) -0.5 0.5 -0.5 0.5 -0.5 0.5 -0.5 0.5 -0.5 0.5 s0 s1 s2 s3 s4 0 1 2 3 4
  22. 22. -1 -0.5 0.5 1 s0 s1 s2 s3 s4 -0.3 domain 1.7 0 1 2 3 4 (s2, -0.3) -0.5 0.5 -0.5 0.5 -0.5 0.5 -0.5 0.5 -0.5 0.5 α=-0.3 s0 s1 s2 s3 s4 (s2, -0.3) 0 1 2 3 4
  23. 23. Interpretation if age is (Middle, 0.3) then weight is (High, -0.1)
  24. 24. Interpretation if age is (Middle, 0.3) then weight is (High, -0.1) if age is (higher than Middle) then weight is (a bit smaller than High)
  25. 25. 2-tuples model
  26. 26. 2-tuples GA model
  27. 27. Traditional GA
  28. 28. Traditional GA Population (chromosomes)
  29. 29. Traditional GA Population (chromosomes) parents Evaluation (fitness)
  30. 30. Traditional GA Population (chromosomes) parents Evaluation (fitness) Reproduction Mating pool (selection)
  31. 31. Traditional GA Population (chromosomes) parents ‣ crossover Genetic Evaluation ‣ mutation operators (fitness) Mates Reproduction Mating pool (recombination) (selection)
  32. 32. Traditional GA Population (chromosomes) offsprings parents ‣ crossover Genetic Evaluation ‣ mutation operators (fitness) Mates Reproduction Mating pool (recombination) (selection)
  33. 33. GA Used in this paper • CHC genetic model • MFs codification and initial gene pool • Chromosome evaluation • Crossover operator
  34. 34. GA Used in this paper • CHC genetic model • MFs codification and initial gene pool • Chromosome evaluation • Crossover operator
  35. 35. Scheme of CHC model L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  36. 36. Scheme of CHC model Initialize population and THRESHOLD L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  37. 37. Scheme of CHC model Initialize population Crossover of N and THRESHOLD parents L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  38. 38. Scheme of CHC model Initialize population Crossover of N and THRESHOLD parents Incest prevention 1/2 * hamming distance > L L = (#Genes *BITSGENE)/4 L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  39. 39. Scheme of CHC model Initialize population Crossover of N Evaluation of the and THRESHOLD parents New Individuals L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  40. 40. Scheme of CHC model Initialize population Crossover of N Evaluation of the and THRESHOLD parents New Individuals Selection of the best N individuals between parents and offsprings L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  41. 41. Scheme of CHC model Initialize population Crossover of N Evaluation of the and THRESHOLD parents New Individuals Selection of the best N individuals between parents and offsprings if NO new individual, decrement THRESHOLD L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  42. 42. Scheme of CHC model Initialize population Crossover of N Evaluation of the and THRESHOLD parents New Individuals Selection of the best N individuals between parents and offsprings THRESHOLD if NO new individual, <0 decrement THRESHOLD L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  43. 43. Scheme of CHC model Initialize population Crossover of N Evaluation of the and THRESHOLD parents New Individuals Selection of the best N individuals between parents and offsprings no THRESHOLD if NO new individual, <0 decrement THRESHOLD L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  44. 44. Scheme of CHC model Initialize population Crossover of N Evaluation of the and THRESHOLD parents New Individuals Selection of the best N individuals between parents and offsprings no Restart the population THRESHOLD if NO new individual, and THRESHOLD <0 decrement THRESHOLD yes L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283.
  45. 45. GA Used in this paper • CHC genetic model • MFs codification and initial gene pool • Chromosome evaluation • Crossover operator
  46. 46. age L1 M1 H1 L2 M2 H2 weight L1 M1 H1 L2 M2 H2 0 0 0 0 0 0 MFs Codification
  47. 47. age L1 M1 H1 L2 M2 H2 weight L1 M1 H1 L2 M2 H2 0 0 0 0 0 0 MFs Codification L1 M1 H1 L2 M2 H2 0.2 0.4 0 -0.2 -0.3 -0.5 age L1 M1 H1 L2 M2 H2 weight
  48. 48. Initial Gene Pool chromosome: (c11,...,c1m,c21,...,c2m,...,cn1,...,cnm) 1 item with m MFs • initial MFs obtained from expert knowledge • individuals generated at random in [-0.5, 0.5)
  49. 49. Implementation: Gray Code Decimal Binary Gray Code 0 000 000 1 001 001 2 010 011 3 011 010 4 100 110 5 101 111 6 110 101 7 111 100
  50. 50. Implementation: Gray Code Decimal Binary Gray Code 0 000 000 1 001 001 2 010 011 3 011 010 4 100 110 5 101 111 6 110 101 7 111 100
  51. 51. GA Used in this paper • CHC genetic model • MFs codification and initial gene pool • Chromosome evaluation • Crossover operator
  52. 52. Equation Mania x∈L1 f uzzy support f itness(Cq ) = suitability(Cq ) n suitability(Cq ) = [overlap f actor(Cqk ) + coverage f actor(Cqk )] k=1 m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi ) 1 coverage f actor(Cqk )= range(R 1 ,...,Rm ) max(Ik ) n suitability(Cq ) = [overlap f actor(Cqk ) + 1] k=1
  53. 53. m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi )
  54. 54. m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi ) qth chromosome kth item
  55. 55. m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi ) qth chromosome kth item Ri Rj
  56. 56. m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi ) qth chromosome kth item Ri Rj overlap
  57. 57. m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi ) qth chromosome kth item Ri Rj overlap SpanR
  58. 58. m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi ) qth chromosome kth item Ri Rj overlap SpanR SpanL
  59. 59. m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi ) qth chromosome kth item Ri Rj Ri Rj overlap overlap SpanR SpanR SpanL SpanL
  60. 60. m m overlap(Ri , Rj ) overlap f actor(Cqk ) = [max( , 1) − 1] i=1 j=i+1 min(spanRRi , spanLRi ) qth chromosome kth item Ri Rj Ri Rj penalty overlap overlap SpanR SpanR SpanL SpanL
  61. 61. 1 coverage f actor(Cqk )= range(R 1 ,...,Rm ) max(Ik )
  62. 62. 1 coverage f actor(Cqk )= range(R 1 ,...,Rm ) max(Ik ) qth chromosome kth item
  63. 63. 1 coverage f actor(Cqk )= range(R 1 ,...,Rm ) max(Ik ) qth chromosome kth item R1 R2 R3 Milk 0 5 10
  64. 64. 1 coverage f actor(Cqk )= range(R 1 ,...,Rm ) max(Ik ) qth chromosome kth item R1 R2 R3 R1 R2 R3 Milk Milk 0 5 10 0 5 10
  65. 65. 1 coverage f actor(Cqk )= range(R 1 ,...,Rm ) max(Ik ) qth chromosome kth item R1 R2 R3 R1 R2 R3 Milk Milk 0 5 10 0 5 10 range
  66. 66. 1 coverage f actor(Cqk )= range(R 1 ,...,Rm ) max(Ik ) qth chromosome kth item R1 R2 R3 R1 R2 R3 Milk Milk 0 5 10 0 5 10 coverage f actor(Cqk ) = 1 range
  67. 67. Fuzzy Support (count)
  68. 68. Fuzzy Support (count) DB n item T
  69. 69. Fuzzy Support (count) DB n item (i) vj T ith
  70. 70. Fuzzy Support (count) DB n item (i) vj T ith (i) (i) (i) fj1 fjm bread fj = + ··· Rj1 Rjm
  71. 71. Fuzzy Support (count) DB n item (i) vj T ith (i) (i) (i) fj1 fjm bread fj = + ··· Rj1 Rjm item m mf
  72. 72. Fuzzy Support (count) DB n item (i) vj degree T ith (i) (i) (i) fj1 fjm bread fj = + ··· Rj1 Rjm item m mf
  73. 73. Fuzzy Support (count) DB n item (i) vj degree T ith (i) (i) (i) fj1 fjm bread fj = + ··· Rj1 Rjm T (i) countjk = fjk item i=1 m mf bread.low.count
  74. 74. Fuzzy Support (count) DB n item (i) vj degree T ith (i) (i) (i) fj1 fjm bread fj = + ··· Rj1 Rjm T (i) countjk = fjk item i=1 m mf bread.low.count L1 = {Rjk |countjk ≥ α, 1 ≤ j ≤ n and 1 ≤ k ≤ m n item
  75. 75. Fuzzy Support x∈L1 f uzzy support f itness(Cq ) = suitability(Cq )
  76. 76. Fuzzy Support x∈L1 f uzzy support f itness(Cq ) = suitability(Cq ) n suitability(Cq ) = [overlap f actor(Cqk ) + 1] k=1
  77. 77. Fuzzy Support x∈L1 f uzzy support f itness(Cq ) = suitability(Cq ) n suitability(Cq ) = [overlap f actor(Cqk ) + 1] k=1 n item
  78. 78. Fuzzy Support L1 x∈L1 f uzzy support f itness(Cq ) = suitability(Cq ) n suitability(Cq ) = [overlap f actor(Cqk ) + 1] k=1 n item
  79. 79. Fuzzy Support L1 count / T # transaction x∈L1 f uzzy support f itness(Cq ) = suitability(Cq ) n suitability(Cq ) = [overlap f actor(Cqk ) + 1] k=1 n item
  80. 80. GA Used in this paper • CHC genetic model • MFs codification and initial gene pool • Chromosome evaluation • Crossover operator
  81. 81. PCBLX Crossover X = (x1 · · · xn ) Y = (y1 · · · yn ) (xi , yi ∈ [ai , bi ] ⊂ R, i = 1 · · · n) O1 = (o11 · · · o1n ) [li , u1 ] li = max{ai , xi − Ii · α} u2 = min{bi , xi + Ii · α} 1 i 1 i O2 = (o21 · · · o2n ) [li , u2 ] li = max{ai , yi − Ii · α} u2 = min{bi , yi + Ii · α} 2 i 2 i Ii = |xi − yi | F. Herrera, M. Lozano, A.M. Sánchez, A taxonomy for the crossover operator for real-coded genetic algorithms: An experimental study. Int. J. Intell. Syst. 18 (2003) 309-338.
  82. 82. PCBLX Crossover X = (x1 · · · xn ) Y = (y1 · · · yn ) (xi , yi ∈ [ai , bi ] ⊂ R, i = 1 · · · n) O1 = (o11 · · · o1n ) [li , u1 ] li = max{ai , xi − Ii · α} u2 = min{bi , xi + Ii · α} 1 i 1 i O2 = (o21 · · · o2n ) [li , u2 ] li = max{ai , yi − Ii · α} u2 = min{bi , yi + Ii · α} 2 i 2 i Ii = |xi − yi | ai xi yi bi PCBLX BLX F. Herrera, M. Lozano, A.M. Sánchez, A taxonomy for the crossover operator for real-coded genetic algorithms: An experimental study. Int. J. Intell. Syst. 18 (2003) 309-338.
  83. 83. Conceptual Flowchart
  84. 84. Conceptual Flowchart Learning Membership Function
  85. 85. Conceptual Flowchart Learning Membership Function Learning Process Predefined MFs Transaction Database
  86. 86. Conceptual Flowchart Learning Membership Function Learning Process Predefined MFs Evaluation Module (Fitness) Transaction Database MFs
  87. 87. Conceptual Flowchart Learning Mining Fuzzy Membership Function Association Rules Learning Process Predefined MFs Evaluation Module (Fitness) Transaction Database MFs
  88. 88. Conceptual Flowchart Learning Mining Fuzzy Membership Function Association Rules Learning Fuzzy Process mining Predefined MFs Definitive MFs Evaluation Module (Fitness) Transaction Transaction Database Database MFs
  89. 89. Conceptual Flowchart Learning Mining Fuzzy Membership Function Association Rules Learning Fuzzy Process mining Predefined MFs Definitive MFs Evaluation Module (Fitness) Transaction Transaction Database Database Fuzzy Association Rules MFs
  90. 90. Procedures Stage 1 1. initialization 2. evaluate the initial chromosomes 1. for all items in transaction, transfer the quantitative values to fuzzy sets 2. calculate count, fuzzy support 3. calculate fitness 3. set threshold L 4. generate the next population 5. CHC procedure 6. if # run not reach, goto step4 Stage 2 Mining Fuzzy association rules by (Hong 2001)
  91. 91. Experiments
  92. 92. Parameters Proposed Hong’s • # 50 individuals • 0.01 mutation rate • 10,000 evaluations • 0.35 d factor • 30 bits per gene • 0.6 crossover rate • 0.8 fuzzy rule confident
  93. 93. Data Set Bureau of the Census FAM95 #63,756 instance #23 attr. #10 attr. This data set was obtained from the Statistics Data Sets Archive website http://www.stat.ucla.edu/data/fpp.
  94. 94. Results obtained in the genetic process Proposed approach Hong el al.’s approach Uniform fuzzy partition Sup Fit Fsup Suit #1I Sup Fit Fsup Suit #1I Sup Fit Fsup Suit #1I With three linguistic terms 0.2 0.99 11.68 11.85 20 0.2 0.68 10.83 15.83 19 0.2 0.92 9.24 10.00 16 0.5 0.94 11.68 12.39 17 0.5 0.53 10.28 19.45 15 0.5 0.76 7.55 10.00 10 0.7 0.66 6.98 10.63 9 0.7 0.37 6.55 17.94 8 0.7 0.57 5.71 10.00 7 0.9 0.28 2.80 10.00 3 0.9 0.00 0.00 14.75 0 0.9 0.00 0.00 10.00 0 With five linguistic terms 0.2 0.95 10.46 10.99 22 0.2 0.53 10.22 19.27 22 0.2 0.94 9.43 10.00 21 0.5 0.77 9.92 12.92 15 0.5 0.38 7.95 20.63 12 0.5 0.46 4.57 10.00 7 0.7 0.61 7.69 12.57 10 0.7 0.20 3.96 19.54 5 0.7 0.24 2.36 10.00 3 0.9 0.10 0.92 10.00 1 0.9 0.06 0.90 15.01 1 0.9 0.00 0.00 10.00 0
  95. 95. Results obtained in the genetic process Proposed approach Hong el al.’s approach Uniform fuzzy partition Sup Fit Fsup Suit #1I Sup Fit Fsup Suit #1I Sup Fit Fsup Suit #1I With three linguistic terms 0.2 0.99 11.68 11.85 20 0.2 0.68 10.83 15.83 19 0.2 0.92 9.24 10.00 16 0.5 0.94 11.68 12.39 17 0.5 0.53 10.28 19.45 15 0.5 0.76 7.55 10.00 10 0.7 0.66 6.98 10.63 9 0.7 0.37 6.55 17.94 8 0.7 0.57 5.71 10.00 7 0.9 0.28 2.80 10.00 3 0.9 0.00 0.00 14.75 0 0.9 0.00 0.00 10.00 0 With five linguistic terms 0.2 0.95 10.46 10.99 22 0.2 0.53 10.22 19.27 22 0.2 0.94 9.43 10.00 21 0.5 0.77 9.92 12.92 15 0.5 0.38 7.95 20.63 12 0.5 0.46 4.57 10.00 7 0.7 0.61 7.69 12.57 10 0.7 0.20 3.96 19.54 5 0.7 0.24 2.36 10.00 3 0.9 0.10 0.92 10.00 1 0.9 0.06 0.90 15.01 1 0.9 0.00 0.00 10.00 0
  96. 96. Results obtained in the genetic process Proposed approach Hong el al.’s approach Uniform fuzzy partition Sup Fit Fsup Suit #1I Sup Fit Fsup Suit #1I Sup Fit Fsup Suit #1I With three linguistic terms 0.2 0.99 11.68 11.85 20 0.2 0.68 10.83 15.83 19 0.2 0.92 9.24 10.00 16 0.5 0.94 11.68 12.39 17 0.5 0.53 10.28 19.45 15 0.5 0.76 7.55 10.00 10 0.7 0.66 6.98 10.63 9 0.7 0.37 6.55 17.94 8 0.7 0.57 5.71 10.00 7 0.9 0.28 2.80 10.00 3 0.9 0.00 0.00 14.75 0 0.9 0.00 0.00 10.00 0 With five linguistic terms 0.2 0.95 10.46 10.99 22 0.2 0.53 10.22 19.27 22 0.2 0.94 9.43 10.00 21 0.5 0.77 9.92 12.92 15 0.5 0.38 7.95 20.63 12 0.5 0.46 4.57 10.00 7 0.7 0.61 7.69 12.57 10 0.7 0.20 3.96 19.54 5 0.7 0.24 2.36 10.00 3 0.9 0.10 0.92 10.00 1 0.9 0.06 0.90 15.01 1 0.9 0.00 0.00 10.00 0
  97. 97. Results obtained in the genetic process Hong el al.’s approach with the 2-tuples Support Fitness Fsup Suit #1Itemset With three linguistic terms 0.2 0.97 10.90 11.18 20 0.5 0.89 11.36 12.64 18 0.7 0.59 6.20 10.33 7 0.9 0.26 2.79 10.52 3 With five linguistic terms 0.2 0.93 10.18 10.93 22 0.5 0.64 7.39 11.80 11 0.7 0.41 0.476 11.60 6 0.9 0.08 0.91 10.92 1
  98. 98. Fitness vs Function Evaluation 1 Average Fitness Values. 0.8 0.6 0.4 0.2 0 0 2000 4000 6000 8000 10000 Evaluations The Proposed Approach Hong et al.'s Approach
  99. 99. Frequent 1-itemsets vs minsup Number of Large 1-itemsets 20 15 10 5 0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Minimum Support The Proposed Approach Hong et al.'s Approach Uniform Fuzzy Partition
  100. 100. MFs w/o lateral displacement l1' = (l1,0.4) l2' = (l2,0.4) l3' = (l3,0.5) l1' = (l1,0.0) l2' = (l2,-0.2) l3' = (l3,0.0) l1' = (l1,-0.1) l2' = (l2,-0.2) l3' = (l3,0.2) X1 X2 X3 l1 l2 l3 l1 l2 l3 l1 l2 l3 l1' = (l1,0.0) l2' = (l2,0.0) l3' = (l3,0.4) l1' = (l1,0.1) l2' = (l2,-0.2) l3' = (l3,0.1) l1' = (l1,0.1) l2' = (l2,-0.5) l3' = (l3,0.1) X4 X5 X6 l1 l2 l3 l1 l2 l3 l1 l2 l3 l1' = (l1,-0.1) l2' = (l2,-0.1) l3' = (l3,0.4) l1' = (l1,0.0) l2' = (l2,-0.2) l3' = (l3,-0.2) l1' = (l1,0.0) l2' = (l2,-0.3) l3' = (l3,0.1) X7 X8 X9 l1 l2 l3 l1 l2 l3 l1 l2 l3 l1' = (l1,0.0) l2' = (l2,-0.2) l3' = (l3,0.2) X10 l1 l2 l3
  101. 101. Hong’s MFs l1' l2' l3' l1' l2' l3' l1' l2' l3' X1 X2 X3 l1 l2 l3 l1 l2 l3 l1 l2 l3 l1' l2' l3' l1' l2' l3' l1' l2' l3' X4 X5 X6 l1 l2 l3 l1 l2 l3 l1 l2 l3 l1' l2' l3' l1' l2' l3' l1' l2' l3' X7 X8 X9 l1 l2 l3 l1 l2 l3 l1 l2 l3 l1' l2' l3' X10 l1 l2 l3
  102. 102. #rules vs minsup minconf = 0.8 160000 140000 120000 Number of Rules 100000 80000 60000 40000 20000 0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Minimum Support Proposed Approach Hong et al.'s Approach Uniform Fuzzy Partition
  103. 103. #rules vs minconf minsup = 0.2 90000 80000 70000 Number of Rules 60000 50000 40000 30000 20000 10000 0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Minimum Confidence Proposed Approach Hong et al.'s Approach Uniform Fuzzy Partition
  104. 104. #rules vs minsup vs minsup 200000 Number of Rules 150000 100000 50000 0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Minimum Support Conf = 0.5 Conf = 0.6 Conf = 0.7 Conf = 0.8 Conf = 0.9
  105. 105. #rules vs minsup vs minsup 200000 Number of Rules 150000 100000 50000 0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Minimum Confidence Minsup = 0.1 Minsup = 0.2 Minsup = 0.3 Minsup = 0.4 Minsup = 0.5 Minsup = 0.6
  106. 106. Time vs #Transaction 30.00 25.00 Runtime (minutes) 20.00 15.00 10.00 5.00 0.00 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Number of Transactions Proposed Approach Hong et al.'s Approach
  107. 107. Time vs #Attribute 30.00 25.00 Runtime (minutes) 20.00 15.00 10.00 5.00 0.00 2 3 4 5 6 7 8 9 10 Number of Attributes Proposed Approach Hong et al.'s Approach
  108. 108. Time vs #Linguistic terms 70.00 Runtime (minutes) 60.00 50.00 40.00 30.00 20.00 3 4 5 6 7 Number of Linguistic Terms Proposed Approach Hong et al.'s Approach
  109. 109. Example of Rules If number if children is Low and Classic Fuzzy hours head worked last week is Low Association Rule then head’s personal income is Low (Factor of confidence 0.87) If number if children is (Low, -0.16) and Rule with 2-Tuples hours head worked last week is (Low, -0.06) Representation then head’s personal income is (Low, 0.1) (Factor of confidence 0.99)
  110. 110. Author’s conclusion
  111. 111. Author’s conclusion 2-tuples linguistic representation works!!
  112. 112. Discussions
  113. 113. T. Hong, C. Chen,Y. Wu,Y. Lee, Using divide-and-conquer GA strategy in fuzzy data mining, IEEE Symp. on Fuzzy Systems, Budapest, Hungary, 2004, pp. 116–121.
  114. 114. Pitfalls • domain knowledge & Symmetric assumption • flowchart • Hong’s method • inadequate fitness function • gray code and crossover • fuzzy association? • dataset • replication? • scalability
  115. 115. Pitfalls • domain knowledge & Symmetric assumption • flowchart • Hong’s method n suitability(Cq ) = [overlap f actor(Cqk ) + coverage f actor(Cqk )] k=1 • inadequate fitness function • gray code and crossover • fuzzy association? • dataset • replication? • scalability
  116. 116. Pitfalls • domain knowledge & Symmetric assumption • flowchart • Hong’s method • inadequate fitness function • gray code and crossover • fuzzy association? • dataset • replication? • scalability
  117. 117. Reference • L. Eshelman, The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, in: G. Rawlin (Ed.), Foundations of Genetic Algorithms, Vol. 1, Morgan Kaufmann, Los Altos, CA, 1991, pp. 265–283. • F. Herrera, L. Martínez, A 2-tuple fuzzy linguistic representation model for computing with words, IEEE Trans. Fuzzy Systems 8 (6) (2000) 746–752. • F. Herrera, M. Lozano, A.M. Sánchez, A taxonomy for the crossover operator for real- coded genetic algorithms: An experimental study. Int. J. Intell. Syst. 18 (2003) 309-338. • T. Hong, C. Chen, Y. Wu,Y. Lee, Using divide-and-conquer GA strategy in fuzzy data mining, in: IEEE Symp. on Fuzzy Systems, Budapest, Hungary, 2004, pp. 116–121. • T. Hong, C. Chen, Y. Wu,Y. Lee, quot;Genetic-Fuzzy Data Mining with Divide-and-Conquer Strategyquot;, IEEE Transactions on Evolutionary Computation 12 (2) 252-265. • T. Hong, C. Kuo, S. Chi, Trade-off between time complexity and number of rules for fuzzy mining from quantitative data, Journal of Uncertain Fuzziness Knowledge-Based Systems 9 (5) (2001) 587–604. • H. Ishibuchi, T. Nakashima, T.Yamamoto, Fuzzy association rules for handling continuous attributes, in: IEEE Internat. Symp. on Industrial Electronics Proceedings, Pusan, Korea, 2001, pp. 118–121. • P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining, Addison Wesley, May 2005.
  118. 118. Thank you!
  119. 119. Questions?

×