4.1                          4

•
    •               1
                        10    20     30
                         0.74 0.76 1.34
                                              40
                                               1.75
                 10 2    2.01 2.62 30 0.87
                         20                      40
                                               0.69
                    3
                 0.74    0.87 0.60 1.34
                         0.76         1.83     1.90
                                                 1.75
                    4    1.73 1.83 0.96        0.93
                 2.01    2.62      0.87          0.69
        4.1                                  10     20     30
              40
                 0.87
                    4
                         0.60      1.83 0       1.90
                 1.73    1.83      0.96         0.93




                                                                2
•
    •
•
    •
•
    •
(2)
•
    •
        •
        •




                  4
(3)
•   2
    •
        •
    •
        •
•
    •       (
        •
    •
        •

                      5
x2                               x2
            x                                x

                      dx2 + dx2
                        1     2
                                                 |dx1 | + |dx2 |
      dx2                          dx2
                         y                                y

                dx1                              dx1
                             x1                               x1
(A)                                    (B)
                                                                   6
n
                 i=1 (xi   − x)(yi − y )
                             ¯       ¯
    r=     n                     n
           i=1 (xi   −   x)2
                         ¯       i=1 (yi   − y )2
                                             ¯


y                y                    y




             x                    x                 x
     r≈1             r≈0                   r ≈ −1
                                                        7
•
•
•
•                                                           (Top-down
    Clustering, Divisive Clustering)
                         (Bottom-up Clustering, Agglomerative Clustering)


                  C
              B

                      A


                          F
    G         E
                      D
                              A       B     C   D   E   F   G
        (A)                           (B)
                                  8
k-
•                k
•            d                                                      S
         S                          k                   S1 , S 2 , . . . , S k
                              k-
    •                     S        S = S1 ∪ S2 ∪ · · · ∪ Sk
    •                                         Si ∩ Sj = φ (i = j)
•                3-
                      C                                     C
                 B                                  B

                          A
                                                            A
                                                G
                                                        E
                              F                                     F
        G        E
                          D                                     D

    (A) 3-                           (B) 3-                                      10
k-                                  (2)
 •    k-


      •    (A)
           •            v(Si)
      •    (B)
           •            q(V)


                 ci   = (1/|Si |)            x
(A)                                  x∈Si
                                                             (A)
                           1                       n
           v(Si ) =                      (d(x, ci ))
                          |Si |
                                  x∈Si



(B) diameter(Si ) = max {d(x1 , x2 )|x1 , x2 ∈ Si }
          q(V ) = max {diameter(Si ) | i = 1, . . . , k}
                                                              (B)   11
k-                            (3)
•                2
    •       n,       d         O(n^(O(dn)))


        •
    •                                         k-Means
•
    •                    NP-
    •       2




                                                        12
(Hard) K-means(K-   )

•            k-means
•
•                             K
•
    1.
         •                K
             •                K
             •            K
    2.
    3.
    4. 2, 3
7
                         2                          .
                         2-means                     2


        (0)

              m(1)

                         7         2
                 m(2)                  m(1), m(2)
                         2


(1) k                m
                                                         14
m(1)
                                    x          x
                                 k = arg min{d(m(k) , x)}
                      m(2)                k
                                      k
          x                      x                    m (k)
(2)
                             x                m (k)



                                      d(m(1) , x) > d(m(2) , x)
              +                         x m (2)
                  □
                             d(x,y)
      □       □ □
                                                                  15
+m(1)
          +
              □
              m(2)
                  □
      □   □ □                    m(k)

(3)                                        (n) (n)
                                        n rk x
                             m(k) =
                                         R(k)
                       (n)
                      rk       x(n)                  k

                      R(k)               k




                                                         16
m(1)
              x
      m(2)
                                      x
(4)                           m (2)       m (1)




        + +
          +
              □     (2),(3)
 □      □ □

                                                  17
K-means
            •   5000
            •
            •   0      1   2    3     4   5    6    7     8   9
            0                        21             3     1   7
            1          7   14   1         1     3         4
            2   21          1             1    19         1
            3          6    7    2   3              21    1   14
Cluster #




            4               1   24        21   1          1
            5        37     1    1   17    9   4    6    27   13
            6                         8             8     1    9
            7                   15         6             10
            8   29         22    2        12   23         1
            9               4    5   1              12    3   7    18
K-means
                  •
                       •
                  •        k
                  •
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge
You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.

            288                                                                           20 — An Example Inf

                      10                                          10                                       Figure 2
                                                                                                           for a cas
                       8                                           8                                       clusters.
                                                                                                           data. (b
                       6                                           6                                       assignm
              (a)                                           (b)                                            four poi
                       4                                           4                                       cluster h
                                                                                                           assigned
                       2                                           2                                       (Points
                                                                                                           cluster a
                       0                                           0
                           0   2   4   6    8    10                    0   2   4     6    8    10

                                                                                                               19
                                                                                                           Figure 2
(1)
        •
112                                                                       4
 (1)                   V                             C
    C                   {}
 (2)                         V         1   c1 ∈ V             C               c1             V

 (3) j = 2, . . . , k     C                (a),(b)  C
     (a)              B        neighbor(x)      B
                                               x(∈ V − C)                                C
                     x neighbor(x)
                           A                         A
                             E                            E
        (b)                      cj F C                                   F

                                               G
                  d(cj , neighbor(cj )) = max {d(x, neighbor(x)) | x ∈ V − C}
                   G
                                  D        x∈V −C                     D
                 (A)               7                (B)           G
            cj

                                                                                                 20
(1)                  V                                          C
   C                  {}
(2)                               V            1       c1 ∈ V                    C             c1       (2) V
(3) j = 2, . . . , k                                      (a),(b)
    (a)                                       neighbor(x)     x(∈ V − C)                                        C
                    x                 neighbor(x)

          (b)                           cj         C

                    d(cj , neighbor(cj )) = max {d(x, neighbor(x)) | x ∈ V − C}
                                                       x∈V −C

            cj
                                                                C                                       C
                      C
                                                        B                                      B
            B                     4.38
                                                                A                                        A
                        A
                E
                                                            E           F                           E
4.10                          F                                                                                 F
                                               G                                       G
  G                                                                 D
                k-        D                                                 NP                              D
(C)                                          (D)                                     (E)
      C                                                                                    D                    k-   21
(2)
                  C                         C
          B                         B

                      A                     A

              E           F             E       F
  G                             G
                      D                     D
(E)                           (F)
      D




                  2


                                                          22
(Self Organizing Map)
•
•
    •
•
•




                                23
Datamining 7th Kmeans
Datamining 7th Kmeans
Datamining 7th Kmeans

Datamining 7th Kmeans

  • 2.
    4.1 4 • • 1 10 20 30 0.74 0.76 1.34 40 1.75 10 2 2.01 2.62 30 0.87 20 40 0.69 3 0.74 0.87 0.60 1.34 0.76 1.83 1.90 1.75 4 1.73 1.83 0.96 0.93 2.01 2.62 0.87 0.69 4.1 10 20 30 40 0.87 4 0.60 1.83 0 1.90 1.73 1.83 0.96 0.93 2
  • 3.
    • • • • •
  • 4.
    (2) • • • • 4
  • 5.
    (3) • 2 • • • • • • ( • • • 5
  • 6.
    x2 x2 x x dx2 + dx2 1 2 |dx1 | + |dx2 | dx2 dx2 y y dx1 dx1 x1 x1 (A) (B) 6
  • 7.
    n i=1 (xi − x)(yi − y ) ¯ ¯ r= n n i=1 (xi − x)2 ¯ i=1 (yi − y )2 ¯ y y y x x x r≈1 r≈0 r ≈ −1 7
  • 8.
    • • • • (Top-down Clustering, Divisive Clustering) (Bottom-up Clustering, Agglomerative Clustering) C B A F G E D A B C D E F G (A) (B) 8
  • 10.
    k- • k • d S S k S1 , S 2 , . . . , S k k- • S S = S1 ∪ S2 ∪ · · · ∪ Sk • Si ∩ Sj = φ (i = j) • 3- C C B B A A G E F F G E D D (A) 3- (B) 3- 10
  • 11.
    k- (2) • k- • (A) • v(Si) • (B) • q(V) ci = (1/|Si |) x (A) x∈Si (A) 1 n v(Si ) = (d(x, ci )) |Si | x∈Si (B) diameter(Si ) = max {d(x1 , x2 )|x1 , x2 ∈ Si } q(V ) = max {diameter(Si ) | i = 1, . . . , k} (B) 11
  • 12.
    k- (3) • 2 • n, d O(n^(O(dn))) • • k-Means • • NP- • 2 12
  • 13.
    (Hard) K-means(K- ) • k-means • • K • 1. • K • K • K 2. 3. 4. 2, 3
  • 14.
    7 2 . 2-means 2 (0) m(1) 7 2 m(2) m(1), m(2) 2 (1) k m 14
  • 15.
    m(1) x x k = arg min{d(m(k) , x)} m(2) k k x x m (k) (2) x m (k) d(m(1) , x) > d(m(2) , x) + x m (2) □ d(x,y) □ □ □ 15
  • 16.
    +m(1) + □ m(2) □ □ □ □ m(k) (3) (n) (n) n rk x m(k) = R(k) (n) rk x(n) k R(k) k 16
  • 17.
    m(1) x m(2) x (4) m (2) m (1) + + + □ (2),(3) □ □ □ 17
  • 18.
    K-means • 5000 • • 0 1 2 3 4 5 6 7 8 9 0 21 3 1 7 1 7 14 1 1 3 4 2 21 1 1 19 1 3 6 7 2 3 21 1 14 Cluster # 4 1 24 21 1 1 5 37 1 1 17 9 4 6 27 13 6 8 8 1 9 7 15 6 10 8 29 22 2 12 23 1 9 4 5 1 12 3 7 18
  • 19.
    K-means • • • k • Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links. 288 20 — An Example Inf 10 10 Figure 2 for a cas 8 8 clusters. data. (b 6 6 assignm (a) (b) four poi 4 4 cluster h assigned 2 2 (Points cluster a 0 0 0 2 4 6 8 10 0 2 4 6 8 10 19 Figure 2
  • 20.
    (1) • 112 4 (1) V C C {} (2) V 1 c1 ∈ V C c1 V (3) j = 2, . . . , k C (a),(b) C (a) B neighbor(x) B x(∈ V − C) C x neighbor(x) A A E E (b) cj F C F G d(cj , neighbor(cj )) = max {d(x, neighbor(x)) | x ∈ V − C} G D x∈V −C D (A) 7 (B) G cj 20
  • 21.
    (1) V C C {} (2) V 1 c1 ∈ V C c1 (2) V (3) j = 2, . . . , k (a),(b) (a) neighbor(x) x(∈ V − C) C x neighbor(x) (b) cj C d(cj , neighbor(cj )) = max {d(x, neighbor(x)) | x ∈ V − C} x∈V −C cj C C C B B B 4.38 A A A E E F E 4.10 F F G G G D k- D NP D (C) (D) (E) C D k- 21
  • 22.
    (2) C C B B A A E F E F G G D D (E) (F) D 2 22
  • 23.