SlideShare a Scribd company logo
.
                                                                     .
.
                                        9
..                                                               .




                                                                     .
                    December 11, 2010




     (@kisa12012)               9           December 11, 2010   1 / 120
9         :                        EM
   .
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   2 / 120
(Mixture Models)




 .
                                                                              .
..
                        (2.3.9       )
                                         (9.2   )
 .                      (12      )
 ..                                                                       .




                                                                              .
                           (K-means [Lloyd, 1982])
                             etc . . .


      (@kisa12012)                        9          December 11, 2010   3 / 120
EM (Expectation-Maximization)


     EM
   K-means                        (9.1   )                                         EM
                       (9.2   )              EM
             EM                                  9.4
   EM

                                                       9.2.1
                                                               10
        (Bishop                     10                                     ...)




        (@kisa12012)                         9                 December 11, 2010    4 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   5 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   6 / 120
.
                                                                                            .
..
                                     { xk }                        xk
 .
 ..                                                                                     .




                                                                                            .
 .
                                                                                            .
..
      D                                                   x ∈ ℜD   N
 .                                   {x1 , . . . , xN }
 ..                                                                                     .




                                                                                            .
 .
                                                                                            .
..
                         {xk }   K
 .                                                         K
 ..                                                                                     .




                                                                                            .
          (@kisa12012)                               9             December 11, 2010   7 / 120
.
                                                                                 .
..
      K           D                                     µk ∈ ℜD
 .                       { µ1 , µ2 , . . . , µK }
 ..                                                                          .




                                                                                 .
                  µk            k
 .
                                                                                 .
..
                { µk }                    { xn }
              {xn }                            { µk }
 .
 ..                                                                          .




                                                                                 .
          (@kisa12012)                      9           December 11, 2010   8 / 120
K                     (1-of-K coding scheme)


                xn                             µk
                                    →2
 .
 2                                                                                       .
..
                            xn                  2
 .                           rnk ∈ {0, 1} (k = 1, . . . , K )
 ..                                                                                  .




                                                                                         .
 2                                        xn              k
      rnk = 1              j ̸= k          rnk = 0
                                      K         (1-of-K coding scheme)




            (@kisa12012)                             9          December 11, 2010   9 / 120
{ xn }                            { µk }
                                        J
 .
                                                                                     .
..
                               N    K
                       J=     ∑ ∑ rnk ∥xn − µk ∥2                              (1)
 .                            n =1 k =1
 ..                                                                              .




                                                                                     .
                                (distortion measure)
 .
                                                                                     .
..
 .     J                      {rnk }        { µk }
 ..                                                                              .




                                                                                     .
             K-means


      (@kisa12012)                               9         December 11, 2010   10 / 120
K-means

 K-means                                {rnk }   { µk }                           2

 .
 K-means                                                                              .
..
   ...
     1   { µk }                    {rnk }             J
   ...
     2   {rnk }                     { µk }            J
 . ..
     .
     3                      1,2
 ..                                                                              .




                                                                                     .
       2                                           EM
 E(Expectation)                   [1]    M(Maximization)           [2]

 E                   M


             (@kisa12012)                         9        December 11, 2010   11 / 120
K-means                                             :           1
             1               rnk                         J
                                          N     K
                                   J=    ∑ ∑ rnk ∥xn − µk ∥2
                                         n =1 k =1

 J     rnk                                           n
                                   xn                    ∥xn − µk ∥2                       k
      rnk = 1
 .
                 1                                                                               .
..                                  {
                                        1 k = arg minj ∥xn − µj ∥2
                            rnk =                                                          (2)
                                        0 otherwise
 .
 ..                                                                                          .




                                                                                                 .
                                           xn                                  µk

             (@kisa12012)                                9             December 11, 2010   12 / 120
K-means                                                  2

      rnk                        µk
            J       µk                       0                    J

                         ∂J       N
                             = 2 ∑ rnk (xn − µk ) = 0                            (3)
                         ∂µk    n =1

 µk
 .
                2                                                                      .
..
                                          ∑n rnk xn
                               µk =                                              (4)
 .                                         ∑n rnk
 ..                                                                                .




                                                                                       .
                    µk                k
                                                 → K-means

        (@kisa12012)                             9           December 11, 2010   13 / 120
2   (a)                2   (b)                2   (c)



 0                      0                      0



−2                     −2                     −2

     −2        0   2        −2    0       2        −2    0        2

 2   (d)                2   (e)                2   (f)



 0                      0                      0



−2                     −2                     −2

     −2        0   2        −2    0       2        −2    0        2

 2   (g)                2   (h)                2   (i)



 0                      0                      0



−2                     −2                     −2

     −2        0   2        −2    0       2        −2    0        2


(@kisa12012)                          9                  December 11, 2010   14 / 120
J


       1000


         J


         500




               0
                   1   2       3   4




(@kisa12012)               9           December 11, 2010   15 / 120
K-means



                    J
                                       [      9.1]
                                           [MacQueen,1967]
                              [        ]
     K-means++



     [Ramasubramanian+, 1990; Moore, 2000]
                                                     [Hodgson,
     1998; Elkan, 2003]


     (@kisa12012)                 9            December 11, 2010   16 / 120
K-means

                                                                          K-means

                           K-means
                                          [MacQueen, 1967]
           (1)            Robbins-Monro            (2.3.5   )
                            xn                                              µk

 .
                                                                                          .
..                                         (                )
                           µnew = µold + ηn xn − µold
                            k      k              k                                 (5)
 .
 ..                                                                                  .




                                                                                         .
      ηn                                   n


           (@kisa12012)                        9                December 11, 2010   17 / 120
K-medoids
      K-means

                             µk                xn                 (2.3.7          )
 .
 K-medoids                                                                                  .
..
                                   N    K
                           J=     ∑ ∑ rnk V (xn , µk )                                (6)
 .                                n =1 k =1
 ..                                                                                     .




                                                                                            .
                            V (x, x′ )
      E                  K-means
      M

                                              xn                   µk

                                2
                  O (KN ) + O (Nk )
          Nk                      k                      xn
          (@kisa12012)                         9              December 11, 2010       18 / 120
2

 .                                           .
                                        .                                              .
..                                          ..
                  xn       1                              xn
 .     µk                                    .
 ..                                 .        ..                                    .


                                        .




                                                                                       .
      K-means
      9.2

                                                          µk
                               xn                     1


            (@kisa12012)                          9            December 11, 2010   19 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   20 / 120
.
                                                           .
..
 1
.                           [Forsyth+, 2003]
..                                                     .




                                                           .
          K-means
            K-means



      (             )




     (@kisa12012)       9        December 11, 2010   21 / 120
3               (                                       [0,1])
                           1                  xn
                     µk

          K =2            K =3       K = 10        Original image




(@kisa12012)                           9                     December 11, 2010   22 / 120
.
                              (lossless data compression)                         .
..

 .
 ..                                                                           .




                                                                                  .
 .
                                  (lossy data compression)                        .
..



 .
 ..                                                                           .




                                                                                  .
                                                     (vector
 quantization)
                        xn                µk
                             µk                (code-book vector)

         (@kisa12012)                      9            December 11, 2010   23 / 120
{R,G,B}         8bit   N
                                  24Nbit
                                    (
       )
                   K          1                                log2 K bit
                              24K bit
                                 24K + N log2 K bit
———————–




    (@kisa12012)                           9          December 11, 2010   24 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   25 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   26 / 120
2.3.9



      EM
 (2.188)

 .
                                                                                     .
..
                                     K
                          p (x) =   ∑ πk N (x| µk , Σk )                       (7)
 .                                  k =1
 ..                                                                              .




                                                                                     .
           (@kisa12012)                        9           December 11, 2010   27 / 120
K           2                z
    1-of-K              zk ∈ {0, 1}       ∑k zk = 1
            p (z)                     p (x|z)           p (x, z)



                   Figure:
                                 z



                                x




    (@kisa12012)                          9           December 11, 2010   28 / 120
z                                 πk

                        p ( zk = 1 ) = π k

                   { πk }                            (8)(9)

                             0 ≤ πk ≤ 1                                            (8)
                               K
                              ∑ πk = 1                                             (9)
                              k =1

z
                                       K
                            p (z) =   ∏ πk
                                         z       k
                                                                                  (10)
                                      k =1


    (@kisa12012)                             9                December 11, 2010    29 / 120
z                                              x


                   p ( x | zk = 1 ) = N ( x | µ k , Σ k )

                                        (11)
                                  K
                   p (x|z) =    ∏ N (x| µk , Σk )z     k                        (11)
                                k =1

p (x, z)           p (z)p (x|z)

                                           K
    p (x) =        ∑ p (z)p (x|z) =       ∑ πk N (x| µk , Σk )                  (12)
                   z                     k =1

(12)
               {x1 , . . . , xN }
xn                                             zn
(@kisa12012)                                   9            December 11, 2010    30 / 120
.
                                                                .
..
                   p (x, z)
      EM
                   x              z
      p (z|x)
 .
 ..                                                         .




                                                                .
           (@kisa12012)       9       December 11, 2010   31 / 120
x                                         z                     γ ( zk )
        γ ( zk )

                                         p ( zk = 1 ) p ( x | zk = 1 )
        γ ( zk ) ≡ p ( zk = 1 | x ) =    K
                                        ∑ p(zj = 1)p(x|zj = 1)
                                        j =1
                                         πk N (x| µk , Σk )
                                  =      K
                                                                                   (13)
                                        ∑ πj N (x| µj , Σj )
                                        j =1

πk   zk = 1                       γ ( zk )                 x
zk = 1
        γ ( zk )             k     x
        (responsibility)


        (@kisa12012)                           9               December 11, 2010    32 / 120
(ancestral sampling)



                        (8.1.2   )



 .
                                                                             .
..
   ...
     1           z          ˆ
                            z         p (z)
 ..
    ..
     2           x                      p (x|z)
                                             ˆ
 ..                                                                      .




                                                                             .
                                          11
         (@kisa12012)                         9    December 11, 2010   33 / 120
p ( x, z )

                1
                     (a)



               0.5



                0

                     0        0.5       1
                p (x, z)
                 z

                 (complete)

(@kisa12012)                        9       December 11, 2010   34 / 120
p (x)

                1
                      (b)



               0.5



                0

                       0     0.5           1
                     p (x)
                                   z

                 (incomplete)

(@kisa12012)                           9       December 11, 2010   35 / 120
(responsibility)

    xn                                                 k
                              p ( zk | x n )

                    1
                        (c)



                  0.5



                    0

                        0            0.5           1
                         xn
               γ(znk ) ≡ p (zk = 1|xn )

(@kisa12012)                                   9           December 11, 2010   36 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   37 / 120
{ x1 , . . . , xN }
 .
 Notation
..
                                                                                       .
                   X        n        xT
                                      n          N ×D
 .              Z       n       zT
                                 n             N ×K
 ..                                                                                .




                                                                                       .
                xn

                                          zn
                                π


                                          xn
                                µ                        Σ
                                                 N
             (@kisa12012)                            9       December 11, 2010   38 / 120
ln p (X|π, µ, Σ)
                              {                                   }
                        N          K
 ln p (X|π, µ, Σ) =     ∑ ln ∑ πk N (xn |µk , Σk )                             (14)
                       n =1       k =1




                                                        Σk = σk I
                                                              2


                 j                                 µj                   xn
                      µ j = xn                              xn


                                           1       1
               N (xn |xn , σj2 I) =                                            (15)
                                       (2π ) σj
                                               1
                                               2


(@kisa12012)                           9                   December 11, 2010    39 / 120
σj → 0           (15)
ln p (X|π, µ, Σ)




                       1
                           0                           0




                 p(x)




                               x
        (@kisa12012)       9       December 11, 2010   40 / 120
.
                                                                     .
..
                           (10.1   )



 .
 ..                                                              .




                                                                     .
 .
                                                                     .
..
                            K
                      K!

 .                    K! − 1
 ..                                                              .




                                                                     .
                     12

      (@kisa12012)                     9   December 11, 2010   41 / 120
{                      }
                                     N          K
            ln p (X|π, µ, Σ) =       ∑ ln ∑ πk N (xn |µk , Σk )
                                    n =1       k =1

              K

                                           0
 .
                                                                                      .
..
                          [Fletcher, 1987; Nocedal+, 1999; Bishop+, 2008]
      5
      EM
 .    10
 ..                                                                               .




                                                                                      .
           (@kisa12012)                             9       December 11, 2010   42 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   43 / 120
EM
 .
 EM                          (expectation-maximization algorithm) .
..

 .           [Dempster+, 1977; McLachlan+, 1997]
 ..                                                                               .




                                                                                      .
                                                             EM
                                                            EM
      (9.3     )               (10.1       )


                                               {                  }
                                     N              K
               ln p (X|π, µ, Σ) =   ∑ ln ∑ πk N (xn |µk , Σk )
                                    n =1           k =1



              (@kisa12012)                              9   December 11, 2010   44 / 120
µk

                                                        µk

                                               µk                 0
                        (             {                      })
         ∂                   N             K
     0=
        ∂µk                 ∑ ln ∑ πj N (xn |µj , Σj )
                            n =1          j =1
                    N
                        πk N (xn |µk , Σk ) −1
        =−         ∑                         Σ (xn − µk )
                           π N (xn |µj , Σj ) k
                                                                                  (16)
                   n =1 ∑ j j

                                      γ(znk )

(              )


(@kisa12012)                                        9         December 11, 2010    45 / 120
Σk         (               )

                               1 N
                        µk =      ∑ γ(znk )xn
                               Nk n = 1
                                                                      (17)




                                  N
                          Nk =   ∑ γ(znk )                            (18)
                                 n =1

Nk                  k
                k                            µk

     xn                          k                   xn
                           γ(znk )


     (@kisa12012)                       9         December 11, 2010    46 / 120
Σk


                         Σk                  0

                    1 N
            Σk =       ∑ γ(znk )(xn − µk )(xn − µk )T
                    Nk n = 1
                                                                       (19)


            [       2.34]

         xn                        γ(znk )
Nk                   k




     (@kisa12012)                        9         December 11, 2010    47 / 120
πk
                πk                                                   k
    1                  (9)

                                      (              )
                                           K
               ln p (X|π, µ, Σ) + λ       ∑ πk − 1                           (20)
                                          k =1




                N
                    N ( xn | µ k , Σ k )
        0=      ∑   π N ( xn | µ j , Σ j )
                                           +λ                                (21)
            n =1 ∑ j j
                 (                                     )
              K     N
                         πk N (xn |µk , Σk )
          = ∑ ∑                                 + πk λ
            k =1 n =1 ∑ j j
                             π N (xn |µj , Σj )


(@kisa12012)                              9              December 11, 2010    48 / 120
λ = −N
                      Nk
               πk =                                    (22)
                      N
πk             xn              k          γ(znk )




(@kisa12012)               9       December 11, 2010    49 / 120
EM



                                     µk       Σk               πk

                    γ(znk )   (13)


                                               → EM




     (@kisa12012)                         9           December 11, 2010   50 / 120
EM

 .
 E              (expectation step)                                            .
..
 (13)
 .
 ..                                                                       .




                                                                              .
 .
 M               (maximization step)                                          .
..
                         γ(znk )           µk        Σk
 . πk   (17)           (19)    (22)
 ..                                                                       .




                                                                              .
 M

 E              M                                      (9.4         )




        (@kisa12012)                   9        December 11, 2010       51 / 120
2




                      0




                   −2

                          −2        0   (a)   2


Old Faithful                                          EM

            K-means


2                              (1                 )



    (@kisa12012)                          9       December 11, 2010   52 / 120
2                                      2
                                             L=1



 0                                      0




−2                                     −2

     −2              0   (b)   2            −2       0    (c)    2


          E                                      M




      (@kisa12012)                 9                 December 11, 2010   53 / 120
2                              2                             2
      L=2                            L=5                           L = 20



 0                              0                             0




−2                             −2                            −2

     −2         0   (d)    2        −2         0   (e)   2        −2            0     (f)     2


            2                              5                                 20




            (@kisa12012)                             9                 December 11, 2010    54 / 120
EM



     EM                            K-means
                                        K-means
                         K-means




          (@kisa12012)             9         December 11, 2010   55 / 120
EM
 .
                                                                                               .
..
                                                                      µ, Σ, π
 .
 ..                                                                                        .




                                                                                               .
 .
               1                                                                               .
..
 .    µk                  Σk                πk
 ..                                                                                        .




                                                                                               .
 .
               2(E                )                                                            .
..
                                                 γ(znk )

                                        πk N (xn |µk , Σk )
                           γ(znk ) =    K
                                                                                        (23)
                                       ∑ π j N ( xn | µ k , Σ k )
 .                                     j =1
 ..                                                                                        .




                                                                                               .
           (@kisa12012)                             9               December 11, 2010    56 / 120
EM
 .
          3(M                )                                                 .
..


                       1 N
           µnew
            k        =    ∑ γ(znk )xn
                       Nk n = 1
                                                                        (24)

                       1 N
           Σnew =
            k            ∑ γ(znk )(xn − µk )(xn − µk )T
                      Nk n = 1
                                                                        (25)

                      N
           πk
            new
                     = k                                                (26)
                       N

                                   N
                           Nk =   ∑ γ(znk )                             (27)
 .                                n =1
 ..                                                                        .




                                                                               .
      (@kisa12012)                        9         December 11, 2010    57 / 120
EM


 .
          4                                                                     .
..

                                    {                       }
                              N          K
       ln p (X|µ, Σ, π ) =   ∑ ln ∑ πk N (xn |µk , Σk )                  (28)
                             n =1       k =1



 .                                               2
 ..                                                                         .




                                                                                .
      (@kisa12012)                           9       December 11, 2010    58 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   59 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   60 / 120
EM

                                                                      EM

                                         EM

.
EM                                                                             .
..
 .
 ..                                                                        .




                                                                               .
 .
 Notation
..
                                                                               .
                            X(       n        n                 xn )
                           Z(    n        n                     zn )
.                                θ
..                                                                         .




                                                                               .
       (@kisa12012)                  9            December 11, 2010    61 / 120
(29)
                                        {                    }
                      ln p (X|θ) = ln       ∑ p(X, Z|θ)                                (29)
                                               Z

 Z
 .
 Example (                            (28))                                                   .
..                                         {                                }
                                N               K
        ln p (X|µ, Σ, π ) =     ∑     ln       ∑ π k N ( xn | µ k , Σ k )
 .                             n =1            k =1
 ..                                                                                      .




                                                                                             .
       Z                                                                        p (X, Z|θ)
                                                               p (X|θ)




       (@kisa12012)                                 9              December 11, 2010    62 / 120
.                                          {                  }                               .
                         ln p (X|θ) = ln       ∑ p(X, Z|θ)
.                                              Z
..                                                                                        .




                                                                                              .
                                                               (           {X, Z}
     )                                                  ln p (X, Z|θ)


                                                                             X
                 Z                                     p (Z|X, θ)

                                                  ln p (X, Z|θ)
                                        p (Z|X, θ)
                     E
                                                             θ
                     M
                                  9.4
         (@kisa12012)                              9                December 11, 2010   63 / 120
.
 E                                                                                 .
..
                     p (Z|X, θold )
 .                                θold
 ..                                                                            .




                                                                                   .
 .
 M                                                                                 .
..
                                  ln p (X, Z|θ)          p (Z|X, θold )
                     Q(θ, θold )

              Q(θ, θold ) = ∑ p (Z|X, θold ) ln p (X, Z|θ)                  (30)
                              Z

                                    θnew

                     θnew = arg max Q(θ, θold )                             (31)
                                         θ



 .
 ..                                                                            .




                                                                                   .
      (@kisa12012)                           9          December 11, 2010    64 / 120
EM

      X                Z              p (X, Z|θ)
      θ
 .
                                                                             .
..
 .    p (X|θ)
 ..                                                                      .




                                                                             .
 .
          1                                                                  .
..
 .                   θold
 ..                                                                      .




                                                                             .
 .
          2(E               )                                                .
..
 .                   p (Z|X, θold )
 ..                                                                      .




                                                                             .
      (@kisa12012)                       9         December 11, 2010   65 / 120
.
            3(M                  )                                                   .
..
 (32)                                 θnew

                           θnew = arg max Q(θ, θold )                         (32)
                                        θ



                Q(θ, θold ) = ∑ p (Z|X, θold ) ln p (X, Z|θ)                  (33)
 .                                Z
 ..                                                                              .




                                                                                     .
 .
            4                                                                        .
..



                                 θold ← θnew                                  (34)

 .                     2
 ..                                                                              .




                                                                                     .
        (@kisa12012)                         9            December 11, 2010    66 / 120
EM

                             p (θ)                                       MAP
                        EM                                         [      9.4]
     M                  Q(θ, θold ) + ln p (θ)


                                                       EM
                        (                            12.11)


                               EM
                                      (missing at random)




         (@kisa12012)                            9            December 11, 2010   67 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   68 / 120
EM
 .
                                       EM                                                   .
..
 .                  ln p (X|π, µ, Σ) (14)
 ..                                                                                     .




                                                                                            .
                                                             k

                                                            Z
                                                        {X, Z}

                                                                 zn
                                                         π
       .
       Example                                  .
      ..                                                         xn
       .
       ..                                   .            µ
                                                .

                                                                                       Σ
                                                                          N

            (@kisa12012)                            9             December 11, 2010   69 / 120
.
                                                                                                   .
..
 .                         {X, Z}
 ..                                                                                            .




                                                                                                   .
                                              N   K
              p (X, Z|π, µ, Σ) =           ∏ ∏ πk
                                                z     nk
                                                           N (xn |µk , Σk )znk              (35)
                                          n =1 k =1

                                znk       zn      k


                                     N    K
      ln p (X, Z|π, µ, Σ) =         ∑ ∑ znk {ln πk + ln N (xn |µk , Σk )}                   (36)
                                    n =1 k =1




            (@kisa12012)                              9                 December 11, 2010    70 / 120
N    K
ln p (X, Z|π, µ, Σ) =   ∑ ∑ znk {ln πk + ln N (xn |µk , Σk )}
                        n =1 k =1

     zn       1-of-K                                   K

                          µ         Σ

     πk

                             1 N
                        πk =   ∑ znk
                             N n =1
                                                                      (37)




    (@kisa12012)                     9            December 11, 2010    71 / 120
Z

(10), (11)                         Z
                                       N   K
                                   ∏ ∏ [πk N (xn |µk , Σk )]
                                                               znk
             p (Z|X, π, µ, Σ) ∝                                              (38)
                                   n =1 k =1

                        n
{ zn }         (            9.5)
8                                      /




         (@kisa12012)                          9         December 11, 2010    72 / 120
znk

                           ∑ znk ∏ [πk ′ N (xn |µk ′ , Σk ′ )]
                                                                                      znk ′

                           zn     ′     k
             E [znk ] =                         [                              ]znj
                                  ∑∏                π j N ( xn | µ j , Σ j )
                                  zn    j
                            π k N ( xn | µ k , Σ k )
                       =    K
                                                                 = γ(znk )                                (39)
                           ∑ πj N (xn |µj , Σj )
                           j =1

1                                                      znk = 1                    n,k
    2                                                                           znk                          k
                                   xn

                                   N        K
EZ [ln p (X, Z|π, µ, Σ)] =        ∑ ∑ γ(znk ){ln πk + ln N (xn |π k , µk )}
                                  n =1 k =1
                                                                                                          (40)
        (@kisa12012)                                         9                        December 11, 2010    73 / 120
N    K
EZ [ln p (X, Z|π, µ, Σ)] =   ∑ ∑ γ(znk ){ln πk + ln N (xn |π k , µk )}
                             n =1 k =1

EM
                                         EM
     (      9.8)
                                                                   9.4




         (@kisa12012)                     9            December 11, 2010   74 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   75 / 120
K-means                               EM

K-means                                              EM

K-means
     EM
                              K-means
                                   EM

 .                                     .
                                  .                                           .
..                                    ..
                 xn       1                     xn
 .    µk                               .
                                       ..                                 .




                                                                              .
 ..                           .
                                  .




           (@kisa12012)                     9         December 11, 2010   76 / 120
ϵI
                     ϵ
I
    k
                                              {                  }
                               1                   1
        p (x| µk , Σk ) =               exp       − ∥x − µk ∥2                 (41)
                            (2πϵ) 2
                                    D
                                                   2ϵ

    K                                                EM
                         xn               k

                          πk exp{−∥xn − µk ∥2 }/2ϵ
             γ(znk ) =                                                         (42)
                         ∑j πj exp{−∥xn − µj ∥2 }/2ϵ




    (@kisa12012)                              9            December 11, 2010    77 / 120
K-means E


        ∥ xn − µ j ∥ 2                          j   j∗
ϵ→0                          k =   j∗                    0
                                        {
                                            1 k = j∗
                   ∀πk > 0 γ(znk ) →
                                            0 otherwise
                         K-means (2)
                                 xn                      µk
γ(znk ) → rnk




        (@kisa12012)                        9                December 11, 2010   78 / 120
K-means M

           EM                                       µk            (17)
K-means                           (4)
          πk               (22)         πk          k

                                             (40)        ϵ→0
                       (    9.11)

                                    1 N K
 EZ [ln p (X, Z|µ, Σ, π )] → −        ∑ ∑ rnk ∥xn − µk ∥2 + const (43)
                                    2 n =1 k =1

ϵ→0                                                                  (1)
                       J


        (@kisa12012)                            9              December 11, 2010   79 / 120
K-means                                          Σ
     µ

EM                 K-means              (elliptical K-means
algorithm)                [Sung+, 1994]




    (@kisa12012)                       9             December 11, 2010   80 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   81 / 120
EM




                                        2

                              (latent class analysis)
           [Lazarsfeld+ 1968; McLachlan+ 2000]

Markov            (13.2   )




   (@kisa12012)                     9            December 11, 2010   82 / 120
D          2                  xi ( i = 1 , . . . , D )


                                                D
                                 p (x| µ ) =   ∏ µ x ( 1 − µ i ) (1−x )
                                                   i
                                                         i                i                       (44)
                                               i =1

               x = ( x1 , . . . , xD ) T       µ = ( µ1 , . . . , µD )T
µ                                                     xi
                                   E [x]               cov [x]
    (2.1        )

                                      E [x] = µ                                                   (45)
                                   cov [x] = diag {µi (1 − µi )}                                  (46)




               (@kisa12012)                                  9                December 11, 2010    83 / 120
.
                                                                                              .
..
                                            K
                       p (x|µ, π ) =       ∑ πk p (x| µk )                             (47)
 .                                         k =1
 ..                                                                                       .




                                                                                              .
      µ = { µ1 , . . . , µK } π = { π1 , . . . , πK }
                                     D
                     p (x| µk ) =   ∏ µx (1 − µki )(1−x )
                                       ki
                                            i                  i                       (48)
                                    i =1


          [          9.12]
                             K
              E [x] =    ∑ πk µk                                                       (49)
                         k =1
                           K        {                 }
          cov [x] =      ∑ πk           Σk + µk µT
                                                 k        − E [x]E [x]T                (50)
                         k =1

      Σk = diag {µki (1 − µki )}
      (@kisa12012)                                9                December 11, 2010    84 / 120
cov [x]

          X = {x1 , . . . , xN }

 .
                                                                                 .
..                                     {                  }
                                   N        K
             ln p (X|µ, π ) =      ∑ ln ∑ πk p(xn |µk )                   (51)
 .                              n =1       k =1
 ..                                                                          .




                                                                                 .
      (@kisa12012)                           9        December 11, 2010    85 / 120
EM
      x                                           z
                     z = (z1 , . . . , zK )T          1-of-K               (
                        )
                     z                            x
 .
 z                             x                                                          .
..
                                            K
                        p ( x | z, µ ) =   ∏ p (x| µk )z   k                       (52)
 .                                         k =1
 ..                                                                                   .




                                                                                          .
      (@kisa12012)                                9            December 11, 2010    86 / 120
.
 z                                  x                                                              .
..
                                                 K
                             p ( x | z, µ ) =   ∏ p (x| µk )z       k

 .                                              k =1
 ..                                                                                            .




                                                                                                   .
                         z
                                                      K
                                  p (z| π ) =        ∏ πk
                                                        z       k
                                                                                            (53)
                                                     k =1

 (                                      )
 p (x|z, µ)    p (z| π )                        z                                   (47)




          (@kisa12012)                                      9           December 11, 2010    87 / 120
EM


      EM

 .
                                                                                      .
..
                                   N    K
       lnp (X, Z|µ, π ) =         ∑ ∑ znk
                                  n =1 k =1
           {                                                        }          (54)
                          D
               ln πk +   ∑ [xni ln µki + (1 − xni ) ln(1 − µki )]
 .                       i =1
 ..                                                                               .




                                                                                      .
                                X = {xn }                Z = { zn }



       (@kisa12012)                           9            December 11, 2010    88 / 120
Z

 .
                                                                                      .
..
                                      N    K
      EZ [ln p (X, Z|µ, π )] =       ∑ ∑ γ(znk )
                                     n =1 k =1
            {                                                     }           (55)
                           D
                ln πk +   ∑ xni ln µki + (1 − xni ln(1 − µki )]
 .                        i =1
 ..                                                                               .




                                                                                      .
      γ(znk ) = E [znk ]                                 xn                   k
                                 (                   )




      (@kisa12012)                               9        December 11, 2010    89 / 120
E
 .
            γ(znk )                                                                        .
..

                                 ∑zn znk ∏k ′ [πk ′ p (xn |µk ′ )]znk ′
            γ(znk ) = E [znk ] =
                                     ∑zn ∏j [πj p (xn |µj )]znj
                                          πk p (xn |µk )
                                =                                                   (56)
 .                                    ∑K 1 πj p (xn |µj )
                                       j=
 ..                                                                                    .




                                                                                           .
 (55)                             2
                                      N
                           Nk =   ∑ γ(znk )                                         (57)
                                  n =1
                                  1
                           xk =      ∑ N γ(znk )xn
                                  Nk n = 1
                                                                                    (58)

 Nk     k
            (@kisa12012)                           9            December 11, 2010    90 / 120
M
                                         µk       π
 (55)        µk              0                [       9.15]
 .
                        µk                                                                .
..
 .                           µ k = xk                                              (59)
 ..                                                                                  .




                                                                                          .
              k

        πk                                                            [           9.16]
 .
..
              πk                                                                          .
                                    Nk
                             πk =                                                  (60)
 .                                  N
 ..                                                                                  .




                                                                                          .
              k

         (@kisa12012)                    9                    December 11, 2010     91 / 120
N = 600, K = 3, πk =   1
                                      K , ∑j   µkj = 1




(@kisa12012)                      9                 December 11, 2010   92 / 120
(@kisa12012)   9   December 11, 2010   92 / 120
(@kisa12012)   9   December 11, 2010   92 / 120
EM


     0 ≤ p ( xn | µ k ) ≤ 1                                       [
     9.17]
                    0



    [2.1.1          ]
                                                        EM
                              [       9.18]

[      9.19]



     (@kisa12012)                 9           December 11, 2010       93 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   94 / 120
3.5.2                             α, β

                                                                                  0

                      EM
               α, β
 .
         α, β                                                                               .
..                                      ∫
                         p (t|α, β) =       p (t|w, β)p (w|α)dw
 .
 ..                                                                                     .




                                                                                            .
              w


          (@kisa12012)                             9              December 11, 2010   95 / 120
E



              α, β                       w
          w                                      3
 .
 E                                                                             .
..
                     p ( w ) = N ( w | mN , S N )
                                 −
                       mN = SN (S0 1 m0 + βΦT t)
                       −      −
 .                    SN 1 = S0 1 + βΦT Φ
 ..                                                                        .




                                                                               .
      (@kisa12012)                           9       December 11, 2010   96 / 120
M
 .
                                                                                        .
..
 .                 ln p (t, w|α, β) = ln p (t|w, β) + ln p (w|α)                 (61)
 ..                                                                                .




                                                                                       .
                                     N
                     p (t|w, β) =   ∏ N (tn |wT ϕ(xn ), β−1 )
                                    n =1
                          p (w|α) = N (w|0, α−1 I)
               w
                             M ( α )          α
      E [ln p (t, w|α, β)] =   ln    −            E [wT w]
                             2    2π          2
                                 ( )                                             (62)
                             N     β          β    N
                           + ln
                             2    2π
                                     −             ∑ E [(tn − w
                                              2 n =1
                                                                T
                                                                    ϕn ) ]
                                                                        2



           (@kisa12012)                        9             December 11, 2010    97 / 120
M



 (62)                  α                 0                           α
                [          9.20]
 .
 α
..
                                                                                      .
                                M             M
                       α=         T w]
                                       = T                                     (63)
 .                           E [w       mN mN + tr (SN )
 ..                                                                               .




                                                                                      .
 β                                   [       9.21]




        (@kisa12012)                           9           December 11, 2010    98 / 120
EM

EM

                           M ×M

α                          2
                    (64)              γ              (3.92)


                                  M
                                          1
                    γ = M−α∑                   = M − αtr (SN )             (64)
                               i =1
                                      λi + α
             αmT mN = γ = M − αtr (SN )
               N                                                           (65)

                      EM

     (@kisa12012)                         9            December 11, 2010    99 / 120
RVM(relevance vector
machine)
   7.2.1                                               α, β

                      w                           EM

   E                  (7.81)
   M

                          Ew [{ln p (t|X, w, β)p (w|α)}]                       (66)

                                         1
                           αnew =
                            i                                                  (67)
                                      mi2 + Σii
                                      ∥t − Φm∥2 + β−1 Σi γi
                      ( βn ew )−1 =                                            (68)
                                                  N
       (@kisa12012)                           9            December 11, 2010   100 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   101 / 120
.
. . K-means
  1




  .
..
 2                           (Mixture of Gaussians)


                                 EM
.   ..
    3    EM


          K-means

                                        EM
 ..
.4              EM

              (@kisa12012)                      9     December 11, 2010   102 / 120
EM
 .
 EM                        (expectation-maximization algorithm) .
..

 [Dempster+, 1977; McLachlan+, 1997]
 .
 ..                                                                       .




                                                                              .
                     EM
                      EM


      [10.1     ]
 .
 Notation
..
                                                                              .
                     X
                     Z
 .                         θ
 ..                                                                       .




                                                                              .
          (@kisa12012)                   9         December 11, 2010   103 / 120
EM

 .
                                                                              .
..

                         p (X|θ) =   ∑ p(X, Z|θ)                       (69)
 .                                   Z
 ..                                                                       .




                                                                              .
          Z

 .
                                                                              .
..
      p (X|θ)
                               ln p (X, Z|θ)
 .                           q (Z)
 ..                                                                       .




                                                                              .
          (@kisa12012)                    9        December 11, 2010   104 / 120
.
                                                                                                    .
..
 .                            ln p (X|θ) = L(q , θ) + KL(q ∥p )                              (70)
 ..                                                                                            .




                                                                                                   .
                                                    {                }
                                                        p (X, Z|θ)
                            L(q , θ) = ∑ q (Z) ln                                            (71)
                                       Z
                                                           q (Z)
                                                        {                }
                                                            p (Z|X, θ)
                      KL(q ∥p ) = − ∑ q (Z) ln                                               (72)
                                           Z
                                                               q (Z)

 (70)

                       ln p (X, Z|θ) = ln p (Z|X, θ) + ln p (X|θ)                            (73)

      (71)

             (@kisa12012)                               9                December 11, 2010   105 / 120
(71)       L(q , θ)

 .
                                                                                              .
..
 .                       ln p (X|θ) = L(q , θ) + KL(q ∥p )
 ..                                                                                       .




                                                                                              .
 .
 (71)     L(q , θ)                                                                            .
..
      L(q , θ)            q (Z)
      X    Z
                                                  {                }
                                                      p (X, Z|θ)
                         L(q , θ) = ∑ q (Z) ln
                                                         q (Z)
 .                                   Z
 ..                                                                                       .




                                                                                              .
          (@kisa12012)                        9                    December 11, 2010   106 / 120
(72)       KL(q ∥p )

 .
                                                                                        .
..
 .                       ln p (X|θ) = L(q , θ) + KL(q ∥p )
 ..                                                                                 .




                                                                                        .
 .
 (72)     KL(q ∥p )                                                                     .
..
                  q (Z)               p (Z|X, θ)   KL-divergence
      KL(q ∥p ) ≥ 0         L(q , θ) ≤ ln p (X|θ)
                 L(q , θ) ln p (X|θ)
                                              {            }
                                                p (Z|X, θ)
                    KL(q ∥p ) = − ∑ q (Z) ln
                                                   q (Z)
 .                                  Z
 ..                                                                                 .




                                                                                        .
          (@kisa12012)                        9              December 11, 2010   107 / 120
(72)    KL(q ∥p )

 .
                                                                                       .
..
 .                    ln p (X|θ) = L(q , θ) + KL(q ∥p )
 ..                                                                                .




                                                                                       .
                KL(q||p)




                         L(q, θ)                ln p(X|θ)




       (@kisa12012)                        9                December 11, 2010   108 / 120
E
 EM

 .
                                                                                        .
..
 .                       ln p (X|θ) = L(q , θ) + KL(q ∥p )
 ..                                                                                 .




                                                                                        .
 .
 E                                                                                      .
..
                                              θold
      E                           L(q , θold ) θold                         q (Z)

                         q (Z)
                             ln p (X|θ) q (Z)
      KL(q ∥p ) = 0 ⇔ q (Z) = p (Z|X, θold )
 .
 ..                                                                                 .




                                                                                        .
          (@kisa12012)                        9              December 11, 2010   109 / 120
E



    KL-divergence          0


                KL(q||p)




                       L(q, θ)       ln p(X|θ)




       (@kisa12012)              9               December 11, 2010   110 / 120
E



    KL-divergence        0
          KL(q||p) = 0




                         L(q, θ old )       ln p(X|θ old )




       (@kisa12012)                     9         December 11, 2010   110 / 120
M

 .
                                                                                        .
..
 .                       ln p (X|θ) = L(q , θ) + KL(q ∥p )
 ..                                                                                 .




                                                                                        .
 .
 M                                                                                      .
..
           q (Z)                      L(q , θ)       θ
                              θnew

            L(q , θ)
         ln p (X|θ)
      q (Z)                              KL(q ∥p )
 .
 ..                                                                                 .




                                                                                        .
          (@kisa12012)                           9           December 11, 2010   111 / 120
M

            L(q , θ)                       ln p (X|θ)
    q (Z)                         KL(q ∥p )



                 KL(q||p)




                        L(q, θ)               ln p(X|θ)




        (@kisa12012)                   9                  December 11, 2010   112 / 120
M

            L(q , θ)                                ln p (X|θ)
    q (Z)                                  KL(q ∥p )

             KL(q||p) = 0




                            L(q, θ old )                ln p(X|θ old )




        (@kisa12012)                            9             December 11, 2010   112 / 120
M
            L(q , θ)                           ln p (X|θ)
    q (Z)                             KL(q ∥p )


                KL(q||p)




                       L(q, θ new )               ln p(X|θ new )



        (@kisa12012)                       9                 December 11, 2010   112 / 120
E                                       q              q (Z) = p (Z|X, θold )        (71)
                            E
 .
 E                                       L(q , θ)                                                 .
..

      L(q , θ)
        = ∑ p (Z|X, θold ) ln p (X, Z|θ) − ∑ p (Z|X, θold ) ln p (Z|X, θold )
             Z                                         Z

 .      = Q(θ, θ         old
                                ) + const                                                  (74)
 ..                                                                                          .




                                                                                                 .
                                const         q
         M
                                               ln p (X, Z|θ)
                                        p (X, Z|θ)


                 (@kisa12012)                              9           December 11, 2010   113 / 120
EM


                                                    ln p(X|θ)




                                         L (q, θ)

                                   new
                           θ old θ

                                                           ln p (X|θ)
                     θ
                     old
                                                            L(q , θold )
                    θnew                                    L(q , θnew )

     (@kisa12012)                         9                     December 11, 2010   114 / 120
i.i.d.

 .
 Notation
..
                                                                                           .
    N     i.i.d.                     {xn }                          X
 .                          { zn }                             Z
 ..                                                                                    .




                                                                                           .
 i.i.d.
 .
                                                                                           .
..
                            p (X, Z) =   ∏ p(xn , zn )
 .                                           n
 ..                                                                                    .




                                                                                           .
          { zn }                                 p ( X ) = ∏ n p ( xn )



          (@kisa12012)                           9              December 11, 2010   115 / 120
i.i.d.

 .
 E                                                                               .
..

                      p (X, Z|θ)      ∏N=1 p (xn , zn |θ)
        p (Z|X, θ) =               =    n
                     ∑Z p (X, Z|θ)   ∑Z ∏N=1 p (xn , zn |θ)
                                          n
                               N
                          =   ∏ p(zn |xn , θ)                           (75)
 .                            n =1
 ..                                                                          .




                                                                                 .
 (75)                           xn                                      xn
                θ
                       p ( X, Z )                                       EM



        (@kisa12012)                            9   December 11, 2010   116 / 120
i.i.d.


 .
                                         EM                                                     .
..
                      (17)(18)
                             (                               )
                                 γnew (zmk ) − γold (zmk )
      µnew
       k      =   µold
                   k     +                 new                   (xm − µold )
                                                                        k                (76)
                                          Nk

 .    Nk = Nk + γnew (zmk ) − γ
       new  old                               old
                                                    (zmk )                               (77)
 ..                                                                                        .




                                                                                               .
                      EM




         (@kisa12012)                                9               December 11, 2010   117 / 120
MAP                            EM

EM                                 p (θ)
      p (θ|X)


               ln p (θ|X) = ln p (θ, X) − ln p (X)                           (78)
                           = ln p (X|θ) + ln p (θ) − ln p (X)



       ln p (θ|X) = L(q , θ) + KL(q ∥p ) + ln p (θ) − ln p (X)
                      ≥ L(q , θ) + ln p (θ) − ln p (X)                       (79)




       (@kisa12012)                         9            December 11, 2010   118 / 120
EM

                         M
 .
          EM                        (generalized EM algorithm)                       .
..
      M                      L(q , θ)   θ
                  L(q , θ)                         θnew
 .
 ..                                                                              .




                                                                                     .
 .
 ECM       (expectation conditional maximization)                                    .
..
      M                                         [Meng+, 1993]

 .
 ..                                                                              .




                                                                                     .
          (@kisa12012)                      9             December 11, 2010   119 / 120
EM




                         E
 .
 [Neal+, 1999]                                              .
..
 .    L(q , θ)       q
 ..                                                     .




                                                            .
          (@kisa12012)       9   December 11, 2010   120 / 120

More Related Content

More from Hidekazu Oiwa

SGD+α: 確率的勾配降下法の現在と未来
SGD+α: 確率的勾配降下法の現在と未来SGD+α: 確率的勾配降下法の現在と未来
SGD+α: 確率的勾配降下法の現在と未来
Hidekazu Oiwa
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa
 
Incentive Compatible Regression Learning (Mathematical Informatics Reading)
Incentive Compatible Regression Learning (Mathematical Informatics Reading)Incentive Compatible Regression Learning (Mathematical Informatics Reading)
Incentive Compatible Regression Learning (Mathematical Informatics Reading)Hidekazu Oiwa
 
PoisoningAttackSVM (ICMLreading2012)
PoisoningAttackSVM (ICMLreading2012)PoisoningAttackSVM (ICMLreading2012)
PoisoningAttackSVM (ICMLreading2012)Hidekazu Oiwa
 
Pfi last seminar
Pfi last seminarPfi last seminar
Pfi last seminar
Hidekazu Oiwa
 
PRML5
PRML5PRML5
Arow
ArowArow

More from Hidekazu Oiwa (11)

SGD+α: 確率的勾配降下法の現在と未来
SGD+α: 確率的勾配降下法の現在と未来SGD+α: 確率的勾配降下法の現在と未来
SGD+α: 確率的勾配降下法の現在と未来
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
 
Incentive Compatible Regression Learning (Mathematical Informatics Reading)
Incentive Compatible Regression Learning (Mathematical Informatics Reading)Incentive Compatible Regression Learning (Mathematical Informatics Reading)
Incentive Compatible Regression Learning (Mathematical Informatics Reading)
 
PoisoningAttackSVM (ICMLreading2012)
PoisoningAttackSVM (ICMLreading2012)PoisoningAttackSVM (ICMLreading2012)
PoisoningAttackSVM (ICMLreading2012)
 
FOBOS
FOBOSFOBOS
FOBOS
 
OnlineClassifiers
OnlineClassifiersOnlineClassifiers
OnlineClassifiers
 
IBMModel2
IBMModel2IBMModel2
IBMModel2
 
Pfi last seminar
Pfi last seminarPfi last seminar
Pfi last seminar
 
NLPforml5
NLPforml5NLPforml5
NLPforml5
 
PRML5
PRML5PRML5
PRML5
 
Arow
ArowArow
Arow
 

Prml9

  • 1. . . . 9 .. . . December 11, 2010 (@kisa12012) 9 December 11, 2010 1 / 120
  • 2. 9 : EM . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 2 / 120
  • 3. (Mixture Models) . . .. (2.3.9 ) (9.2 ) . (12 ) .. . . (K-means [Lloyd, 1982]) etc . . . (@kisa12012) 9 December 11, 2010 3 / 120
  • 4. EM (Expectation-Maximization) EM K-means (9.1 ) EM (9.2 ) EM EM 9.4 EM 9.2.1 10 (Bishop 10 ...) (@kisa12012) 9 December 11, 2010 4 / 120
  • 5. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 5 / 120
  • 6. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 6 / 120
  • 7. . . .. { xk } xk . .. . . . . .. D x ∈ ℜD N . {x1 , . . . , xN } .. . . . . .. {xk } K . K .. . . (@kisa12012) 9 December 11, 2010 7 / 120
  • 8. . . .. K D µk ∈ ℜD . { µ1 , µ2 , . . . , µK } .. . . µk k . . .. { µk } { xn } {xn } { µk } . .. . . (@kisa12012) 9 December 11, 2010 8 / 120
  • 9. K (1-of-K coding scheme) xn µk →2 . 2 . .. xn 2 . rnk ∈ {0, 1} (k = 1, . . . , K ) .. . . 2 xn k rnk = 1 j ̸= k rnk = 0 K (1-of-K coding scheme) (@kisa12012) 9 December 11, 2010 9 / 120
  • 10. { xn } { µk } J . . .. N K J= ∑ ∑ rnk ∥xn − µk ∥2 (1) . n =1 k =1 .. . . (distortion measure) . . .. . J {rnk } { µk } .. . . K-means (@kisa12012) 9 December 11, 2010 10 / 120
  • 11. K-means K-means {rnk } { µk } 2 . K-means . .. ... 1 { µk } {rnk } J ... 2 {rnk } { µk } J . .. . 3 1,2 .. . . 2 EM E(Expectation) [1] M(Maximization) [2] E M (@kisa12012) 9 December 11, 2010 11 / 120
  • 12. K-means : 1 1 rnk J N K J= ∑ ∑ rnk ∥xn − µk ∥2 n =1 k =1 J rnk n xn ∥xn − µk ∥2 k rnk = 1 . 1 . .. { 1 k = arg minj ∥xn − µj ∥2 rnk = (2) 0 otherwise . .. . . xn µk (@kisa12012) 9 December 11, 2010 12 / 120
  • 13. K-means 2 rnk µk J µk 0 J ∂J N = 2 ∑ rnk (xn − µk ) = 0 (3) ∂µk n =1 µk . 2 . .. ∑n rnk xn µk = (4) . ∑n rnk .. . . µk k → K-means (@kisa12012) 9 December 11, 2010 13 / 120
  • 14. 2 (a) 2 (b) 2 (c) 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 2 (d) 2 (e) 2 (f) 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 2 (g) 2 (h) 2 (i) 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 (@kisa12012) 9 December 11, 2010 14 / 120
  • 15. J 1000 J 500 0 1 2 3 4 (@kisa12012) 9 December 11, 2010 15 / 120
  • 16. K-means J [ 9.1] [MacQueen,1967] [ ] K-means++ [Ramasubramanian+, 1990; Moore, 2000] [Hodgson, 1998; Elkan, 2003] (@kisa12012) 9 December 11, 2010 16 / 120
  • 17. K-means K-means K-means [MacQueen, 1967] (1) Robbins-Monro (2.3.5 ) xn µk . . .. ( ) µnew = µold + ηn xn − µold k k k (5) . .. . . ηn n (@kisa12012) 9 December 11, 2010 17 / 120
  • 18. K-medoids K-means µk xn (2.3.7 ) . K-medoids . .. N K J= ∑ ∑ rnk V (xn , µk ) (6) . n =1 k =1 .. . . V (x, x′ ) E K-means M xn µk 2 O (KN ) + O (Nk ) Nk k xn (@kisa12012) 9 December 11, 2010 18 / 120
  • 19. 2 . . . . .. .. xn 1 xn . µk . .. . .. . . . K-means 9.2 µk xn 1 (@kisa12012) 9 December 11, 2010 19 / 120
  • 20. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 20 / 120
  • 21. . . .. 1 . [Forsyth+, 2003] .. . . K-means K-means ( ) (@kisa12012) 9 December 11, 2010 21 / 120
  • 22. 3 ( [0,1]) 1 xn µk K =2 K =3 K = 10 Original image (@kisa12012) 9 December 11, 2010 22 / 120
  • 23. . (lossless data compression) . .. . .. . . . (lossy data compression) . .. . .. . . (vector quantization) xn µk µk (code-book vector) (@kisa12012) 9 December 11, 2010 23 / 120
  • 24. {R,G,B} 8bit N 24Nbit ( ) K 1 log2 K bit 24K bit 24K + N log2 K bit ———————– (@kisa12012) 9 December 11, 2010 24 / 120
  • 25. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 25 / 120
  • 26. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 26 / 120
  • 27. 2.3.9 EM (2.188) . . .. K p (x) = ∑ πk N (x| µk , Σk ) (7) . k =1 .. . . (@kisa12012) 9 December 11, 2010 27 / 120
  • 28. K 2 z 1-of-K zk ∈ {0, 1} ∑k zk = 1 p (z) p (x|z) p (x, z) Figure: z x (@kisa12012) 9 December 11, 2010 28 / 120
  • 29. z πk p ( zk = 1 ) = π k { πk } (8)(9) 0 ≤ πk ≤ 1 (8) K ∑ πk = 1 (9) k =1 z K p (z) = ∏ πk z k (10) k =1 (@kisa12012) 9 December 11, 2010 29 / 120
  • 30. z x p ( x | zk = 1 ) = N ( x | µ k , Σ k ) (11) K p (x|z) = ∏ N (x| µk , Σk )z k (11) k =1 p (x, z) p (z)p (x|z) K p (x) = ∑ p (z)p (x|z) = ∑ πk N (x| µk , Σk ) (12) z k =1 (12) {x1 , . . . , xN } xn zn (@kisa12012) 9 December 11, 2010 30 / 120
  • 31. . . .. p (x, z) EM x z p (z|x) . .. . . (@kisa12012) 9 December 11, 2010 31 / 120
  • 32. x z γ ( zk ) γ ( zk ) p ( zk = 1 ) p ( x | zk = 1 ) γ ( zk ) ≡ p ( zk = 1 | x ) = K ∑ p(zj = 1)p(x|zj = 1) j =1 πk N (x| µk , Σk ) = K (13) ∑ πj N (x| µj , Σj ) j =1 πk zk = 1 γ ( zk ) x zk = 1 γ ( zk ) k x (responsibility) (@kisa12012) 9 December 11, 2010 32 / 120
  • 33. (ancestral sampling) (8.1.2 ) . . .. ... 1 z ˆ z p (z) .. .. 2 x p (x|z) ˆ .. . . 11 (@kisa12012) 9 December 11, 2010 33 / 120
  • 34. p ( x, z ) 1 (a) 0.5 0 0 0.5 1 p (x, z) z (complete) (@kisa12012) 9 December 11, 2010 34 / 120
  • 35. p (x) 1 (b) 0.5 0 0 0.5 1 p (x) z (incomplete) (@kisa12012) 9 December 11, 2010 35 / 120
  • 36. (responsibility) xn k p ( zk | x n ) 1 (c) 0.5 0 0 0.5 1 xn γ(znk ) ≡ p (zk = 1|xn ) (@kisa12012) 9 December 11, 2010 36 / 120
  • 37. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 37 / 120
  • 38. { x1 , . . . , xN } . Notation .. . X n xT n N ×D . Z n zT n N ×K .. . . xn zn π xn µ Σ N (@kisa12012) 9 December 11, 2010 38 / 120
  • 39. ln p (X|π, µ, Σ) { } N K ln p (X|π, µ, Σ) = ∑ ln ∑ πk N (xn |µk , Σk ) (14) n =1 k =1 Σk = σk I 2 j µj xn µ j = xn xn 1 1 N (xn |xn , σj2 I) = (15) (2π ) σj 1 2 (@kisa12012) 9 December 11, 2010 39 / 120
  • 40. σj → 0 (15) ln p (X|π, µ, Σ) 1 0 0 p(x) x (@kisa12012) 9 December 11, 2010 40 / 120
  • 41. . . .. (10.1 ) . .. . . . . .. K K! . K! − 1 .. . . 12 (@kisa12012) 9 December 11, 2010 41 / 120
  • 42. { } N K ln p (X|π, µ, Σ) = ∑ ln ∑ πk N (xn |µk , Σk ) n =1 k =1 K 0 . . .. [Fletcher, 1987; Nocedal+, 1999; Bishop+, 2008] 5 EM . 10 .. . . (@kisa12012) 9 December 11, 2010 42 / 120
  • 43. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 43 / 120
  • 44. EM . EM (expectation-maximization algorithm) . .. . [Dempster+, 1977; McLachlan+, 1997] .. . . EM EM (9.3 ) (10.1 ) { } N K ln p (X|π, µ, Σ) = ∑ ln ∑ πk N (xn |µk , Σk ) n =1 k =1 (@kisa12012) 9 December 11, 2010 44 / 120
  • 45. µk µk µk 0 ( { }) ∂ N K 0= ∂µk ∑ ln ∑ πj N (xn |µj , Σj ) n =1 j =1 N πk N (xn |µk , Σk ) −1 =− ∑ Σ (xn − µk ) π N (xn |µj , Σj ) k (16) n =1 ∑ j j γ(znk ) ( ) (@kisa12012) 9 December 11, 2010 45 / 120
  • 46. Σk ( ) 1 N µk = ∑ γ(znk )xn Nk n = 1 (17) N Nk = ∑ γ(znk ) (18) n =1 Nk k k µk xn k xn γ(znk ) (@kisa12012) 9 December 11, 2010 46 / 120
  • 47. Σk Σk 0 1 N Σk = ∑ γ(znk )(xn − µk )(xn − µk )T Nk n = 1 (19) [ 2.34] xn γ(znk ) Nk k (@kisa12012) 9 December 11, 2010 47 / 120
  • 48. πk πk k 1 (9) ( ) K ln p (X|π, µ, Σ) + λ ∑ πk − 1 (20) k =1 N N ( xn | µ k , Σ k ) 0= ∑ π N ( xn | µ j , Σ j ) +λ (21) n =1 ∑ j j ( ) K N πk N (xn |µk , Σk ) = ∑ ∑ + πk λ k =1 n =1 ∑ j j π N (xn |µj , Σj ) (@kisa12012) 9 December 11, 2010 48 / 120
  • 49. λ = −N Nk πk = (22) N πk xn k γ(znk ) (@kisa12012) 9 December 11, 2010 49 / 120
  • 50. EM µk Σk πk γ(znk ) (13) → EM (@kisa12012) 9 December 11, 2010 50 / 120
  • 51. EM . E (expectation step) . .. (13) . .. . . . M (maximization step) . .. γ(znk ) µk Σk . πk (17) (19) (22) .. . . M E M (9.4 ) (@kisa12012) 9 December 11, 2010 51 / 120
  • 52. 2 0 −2 −2 0 (a) 2 Old Faithful EM K-means 2 (1 ) (@kisa12012) 9 December 11, 2010 52 / 120
  • 53. 2 2 L=1 0 0 −2 −2 −2 0 (b) 2 −2 0 (c) 2 E M (@kisa12012) 9 December 11, 2010 53 / 120
  • 54. 2 2 2 L=2 L=5 L = 20 0 0 0 −2 −2 −2 −2 0 (d) 2 −2 0 (e) 2 −2 0 (f) 2 2 5 20 (@kisa12012) 9 December 11, 2010 54 / 120
  • 55. EM EM K-means K-means K-means (@kisa12012) 9 December 11, 2010 55 / 120
  • 56. EM . . .. µ, Σ, π . .. . . . 1 . .. . µk Σk πk .. . . . 2(E ) . .. γ(znk ) πk N (xn |µk , Σk ) γ(znk ) = K (23) ∑ π j N ( xn | µ k , Σ k ) . j =1 .. . . (@kisa12012) 9 December 11, 2010 56 / 120
  • 57. EM . 3(M ) . .. 1 N µnew k = ∑ γ(znk )xn Nk n = 1 (24) 1 N Σnew = k ∑ γ(znk )(xn − µk )(xn − µk )T Nk n = 1 (25) N πk new = k (26) N N Nk = ∑ γ(znk ) (27) . n =1 .. . . (@kisa12012) 9 December 11, 2010 57 / 120
  • 58. EM . 4 . .. { } N K ln p (X|µ, Σ, π ) = ∑ ln ∑ πk N (xn |µk , Σk ) (28) n =1 k =1 . 2 .. . . (@kisa12012) 9 December 11, 2010 58 / 120
  • 59. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 59 / 120
  • 60. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 60 / 120
  • 61. EM EM EM . EM . .. . .. . . . Notation .. . X( n n xn ) Z( n n zn ) . θ .. . . (@kisa12012) 9 December 11, 2010 61 / 120
  • 62. (29) { } ln p (X|θ) = ln ∑ p(X, Z|θ) (29) Z Z . Example ( (28)) . .. { } N K ln p (X|µ, Σ, π ) = ∑ ln ∑ π k N ( xn | µ k , Σ k ) . n =1 k =1 .. . . Z p (X, Z|θ) p (X|θ) (@kisa12012) 9 December 11, 2010 62 / 120
  • 63. . { } . ln p (X|θ) = ln ∑ p(X, Z|θ) . Z .. . . ( {X, Z} ) ln p (X, Z|θ) X Z p (Z|X, θ) ln p (X, Z|θ) p (Z|X, θ) E θ M 9.4 (@kisa12012) 9 December 11, 2010 63 / 120
  • 64. . E . .. p (Z|X, θold ) . θold .. . . . M . .. ln p (X, Z|θ) p (Z|X, θold ) Q(θ, θold ) Q(θ, θold ) = ∑ p (Z|X, θold ) ln p (X, Z|θ) (30) Z θnew θnew = arg max Q(θ, θold ) (31) θ . .. . . (@kisa12012) 9 December 11, 2010 64 / 120
  • 65. EM X Z p (X, Z|θ) θ . . .. . p (X|θ) .. . . . 1 . .. . θold .. . . . 2(E ) . .. . p (Z|X, θold ) .. . . (@kisa12012) 9 December 11, 2010 65 / 120
  • 66. . 3(M ) . .. (32) θnew θnew = arg max Q(θ, θold ) (32) θ Q(θ, θold ) = ∑ p (Z|X, θold ) ln p (X, Z|θ) (33) . Z .. . . . 4 . .. θold ← θnew (34) . 2 .. . . (@kisa12012) 9 December 11, 2010 66 / 120
  • 67. EM p (θ) MAP EM [ 9.4] M Q(θ, θold ) + ln p (θ) EM ( 12.11) EM (missing at random) (@kisa12012) 9 December 11, 2010 67 / 120
  • 68. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 68 / 120
  • 69. EM . EM . .. . ln p (X|π, µ, Σ) (14) .. . . k Z {X, Z} zn π . Example . .. xn . .. . µ . Σ N (@kisa12012) 9 December 11, 2010 69 / 120
  • 70. . . .. . {X, Z} .. . . N K p (X, Z|π, µ, Σ) = ∏ ∏ πk z nk N (xn |µk , Σk )znk (35) n =1 k =1 znk zn k N K ln p (X, Z|π, µ, Σ) = ∑ ∑ znk {ln πk + ln N (xn |µk , Σk )} (36) n =1 k =1 (@kisa12012) 9 December 11, 2010 70 / 120
  • 71. N K ln p (X, Z|π, µ, Σ) = ∑ ∑ znk {ln πk + ln N (xn |µk , Σk )} n =1 k =1 zn 1-of-K K µ Σ πk 1 N πk = ∑ znk N n =1 (37) (@kisa12012) 9 December 11, 2010 71 / 120
  • 72. Z (10), (11) Z N K ∏ ∏ [πk N (xn |µk , Σk )] znk p (Z|X, π, µ, Σ) ∝ (38) n =1 k =1 n { zn } ( 9.5) 8 / (@kisa12012) 9 December 11, 2010 72 / 120
  • 73. znk ∑ znk ∏ [πk ′ N (xn |µk ′ , Σk ′ )] znk ′ zn ′ k E [znk ] = [ ]znj ∑∏ π j N ( xn | µ j , Σ j ) zn j π k N ( xn | µ k , Σ k ) = K = γ(znk ) (39) ∑ πj N (xn |µj , Σj ) j =1 1 znk = 1 n,k 2 znk k xn N K EZ [ln p (X, Z|π, µ, Σ)] = ∑ ∑ γ(znk ){ln πk + ln N (xn |π k , µk )} n =1 k =1 (40) (@kisa12012) 9 December 11, 2010 73 / 120
  • 74. N K EZ [ln p (X, Z|π, µ, Σ)] = ∑ ∑ γ(znk ){ln πk + ln N (xn |π k , µk )} n =1 k =1 EM EM ( 9.8) 9.4 (@kisa12012) 9 December 11, 2010 74 / 120
  • 75. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 75 / 120
  • 76. K-means EM K-means EM K-means EM K-means EM . . . . .. .. xn 1 xn . µk . .. . . .. . . (@kisa12012) 9 December 11, 2010 76 / 120
  • 77. ϵI ϵ I k { } 1 1 p (x| µk , Σk ) = exp − ∥x − µk ∥2 (41) (2πϵ) 2 D 2ϵ K EM xn k πk exp{−∥xn − µk ∥2 }/2ϵ γ(znk ) = (42) ∑j πj exp{−∥xn − µj ∥2 }/2ϵ (@kisa12012) 9 December 11, 2010 77 / 120
  • 78. K-means E ∥ xn − µ j ∥ 2 j j∗ ϵ→0 k = j∗ 0 { 1 k = j∗ ∀πk > 0 γ(znk ) → 0 otherwise K-means (2) xn µk γ(znk ) → rnk (@kisa12012) 9 December 11, 2010 78 / 120
  • 79. K-means M EM µk (17) K-means (4) πk (22) πk k (40) ϵ→0 ( 9.11) 1 N K EZ [ln p (X, Z|µ, Σ, π )] → − ∑ ∑ rnk ∥xn − µk ∥2 + const (43) 2 n =1 k =1 ϵ→0 (1) J (@kisa12012) 9 December 11, 2010 79 / 120
  • 80. K-means Σ µ EM K-means (elliptical K-means algorithm) [Sung+, 1994] (@kisa12012) 9 December 11, 2010 80 / 120
  • 81. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 81 / 120
  • 82. EM 2 (latent class analysis) [Lazarsfeld+ 1968; McLachlan+ 2000] Markov (13.2 ) (@kisa12012) 9 December 11, 2010 82 / 120
  • 83. D 2 xi ( i = 1 , . . . , D ) D p (x| µ ) = ∏ µ x ( 1 − µ i ) (1−x ) i i i (44) i =1 x = ( x1 , . . . , xD ) T µ = ( µ1 , . . . , µD )T µ xi E [x] cov [x] (2.1 ) E [x] = µ (45) cov [x] = diag {µi (1 − µi )} (46) (@kisa12012) 9 December 11, 2010 83 / 120
  • 84. . . .. K p (x|µ, π ) = ∑ πk p (x| µk ) (47) . k =1 .. . . µ = { µ1 , . . . , µK } π = { π1 , . . . , πK } D p (x| µk ) = ∏ µx (1 − µki )(1−x ) ki i i (48) i =1 [ 9.12] K E [x] = ∑ πk µk (49) k =1 K { } cov [x] = ∑ πk Σk + µk µT k − E [x]E [x]T (50) k =1 Σk = diag {µki (1 − µki )} (@kisa12012) 9 December 11, 2010 84 / 120
  • 85. cov [x] X = {x1 , . . . , xN } . . .. { } N K ln p (X|µ, π ) = ∑ ln ∑ πk p(xn |µk ) (51) . n =1 k =1 .. . . (@kisa12012) 9 December 11, 2010 85 / 120
  • 86. EM x z z = (z1 , . . . , zK )T 1-of-K ( ) z x . z x . .. K p ( x | z, µ ) = ∏ p (x| µk )z k (52) . k =1 .. . . (@kisa12012) 9 December 11, 2010 86 / 120
  • 87. . z x . .. K p ( x | z, µ ) = ∏ p (x| µk )z k . k =1 .. . . z K p (z| π ) = ∏ πk z k (53) k =1 ( ) p (x|z, µ) p (z| π ) z (47) (@kisa12012) 9 December 11, 2010 87 / 120
  • 88. EM EM . . .. N K lnp (X, Z|µ, π ) = ∑ ∑ znk n =1 k =1 { } (54) D ln πk + ∑ [xni ln µki + (1 − xni ) ln(1 − µki )] . i =1 .. . . X = {xn } Z = { zn } (@kisa12012) 9 December 11, 2010 88 / 120
  • 89. Z . . .. N K EZ [ln p (X, Z|µ, π )] = ∑ ∑ γ(znk ) n =1 k =1 { } (55) D ln πk + ∑ xni ln µki + (1 − xni ln(1 − µki )] . i =1 .. . . γ(znk ) = E [znk ] xn k ( ) (@kisa12012) 9 December 11, 2010 89 / 120
  • 90. E . γ(znk ) . .. ∑zn znk ∏k ′ [πk ′ p (xn |µk ′ )]znk ′ γ(znk ) = E [znk ] = ∑zn ∏j [πj p (xn |µj )]znj πk p (xn |µk ) = (56) . ∑K 1 πj p (xn |µj ) j= .. . . (55) 2 N Nk = ∑ γ(znk ) (57) n =1 1 xk = ∑ N γ(znk )xn Nk n = 1 (58) Nk k (@kisa12012) 9 December 11, 2010 90 / 120
  • 91. M µk π (55) µk 0 [ 9.15] . µk . .. . µ k = xk (59) .. . . k πk [ 9.16] . .. πk . Nk πk = (60) . N .. . . k (@kisa12012) 9 December 11, 2010 91 / 120
  • 92. N = 600, K = 3, πk = 1 K , ∑j µkj = 1 (@kisa12012) 9 December 11, 2010 92 / 120
  • 93. (@kisa12012) 9 December 11, 2010 92 / 120
  • 94. (@kisa12012) 9 December 11, 2010 92 / 120
  • 95. EM 0 ≤ p ( xn | µ k ) ≤ 1 [ 9.17] 0 [2.1.1 ] EM [ 9.18] [ 9.19] (@kisa12012) 9 December 11, 2010 93 / 120
  • 96. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 94 / 120
  • 97. 3.5.2 α, β 0 EM α, β . α, β . .. ∫ p (t|α, β) = p (t|w, β)p (w|α)dw . .. . . w (@kisa12012) 9 December 11, 2010 95 / 120
  • 98. E α, β w w 3 . E . .. p ( w ) = N ( w | mN , S N ) − mN = SN (S0 1 m0 + βΦT t) − − . SN 1 = S0 1 + βΦT Φ .. . . (@kisa12012) 9 December 11, 2010 96 / 120
  • 99. M . . .. . ln p (t, w|α, β) = ln p (t|w, β) + ln p (w|α) (61) .. . . N p (t|w, β) = ∏ N (tn |wT ϕ(xn ), β−1 ) n =1 p (w|α) = N (w|0, α−1 I) w M ( α ) α E [ln p (t, w|α, β)] = ln − E [wT w] 2 2π 2 ( ) (62) N β β N + ln 2 2π − ∑ E [(tn − w 2 n =1 T ϕn ) ] 2 (@kisa12012) 9 December 11, 2010 97 / 120
  • 100. M (62) α 0 α [ 9.20] . α .. . M M α= T w] = T (63) . E [w mN mN + tr (SN ) .. . . β [ 9.21] (@kisa12012) 9 December 11, 2010 98 / 120
  • 101. EM EM M ×M α 2 (64) γ (3.92) M 1 γ = M−α∑ = M − αtr (SN ) (64) i =1 λi + α αmT mN = γ = M − αtr (SN ) N (65) EM (@kisa12012) 9 December 11, 2010 99 / 120
  • 102. RVM(relevance vector machine) 7.2.1 α, β w EM E (7.81) M Ew [{ln p (t|X, w, β)p (w|α)}] (66) 1 αnew = i (67) mi2 + Σii ∥t − Φm∥2 + β−1 Σi γi ( βn ew )−1 = (68) N (@kisa12012) 9 December 11, 2010 100 / 120
  • 103. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 101 / 120
  • 104. . . . K-means 1 . .. 2 (Mixture of Gaussians) EM . .. 3 EM K-means EM .. .4 EM (@kisa12012) 9 December 11, 2010 102 / 120
  • 105. EM . EM (expectation-maximization algorithm) . .. [Dempster+, 1977; McLachlan+, 1997] . .. . . EM EM [10.1 ] . Notation .. . X Z . θ .. . . (@kisa12012) 9 December 11, 2010 103 / 120
  • 106. EM . . .. p (X|θ) = ∑ p(X, Z|θ) (69) . Z .. . . Z . . .. p (X|θ) ln p (X, Z|θ) . q (Z) .. . . (@kisa12012) 9 December 11, 2010 104 / 120
  • 107. . . .. . ln p (X|θ) = L(q , θ) + KL(q ∥p ) (70) .. . . { } p (X, Z|θ) L(q , θ) = ∑ q (Z) ln (71) Z q (Z) { } p (Z|X, θ) KL(q ∥p ) = − ∑ q (Z) ln (72) Z q (Z) (70) ln p (X, Z|θ) = ln p (Z|X, θ) + ln p (X|θ) (73) (71) (@kisa12012) 9 December 11, 2010 105 / 120
  • 108. (71) L(q , θ) . . .. . ln p (X|θ) = L(q , θ) + KL(q ∥p ) .. . . . (71) L(q , θ) . .. L(q , θ) q (Z) X Z { } p (X, Z|θ) L(q , θ) = ∑ q (Z) ln q (Z) . Z .. . . (@kisa12012) 9 December 11, 2010 106 / 120
  • 109. (72) KL(q ∥p ) . . .. . ln p (X|θ) = L(q , θ) + KL(q ∥p ) .. . . . (72) KL(q ∥p ) . .. q (Z) p (Z|X, θ) KL-divergence KL(q ∥p ) ≥ 0 L(q , θ) ≤ ln p (X|θ) L(q , θ) ln p (X|θ) { } p (Z|X, θ) KL(q ∥p ) = − ∑ q (Z) ln q (Z) . Z .. . . (@kisa12012) 9 December 11, 2010 107 / 120
  • 110. (72) KL(q ∥p ) . . .. . ln p (X|θ) = L(q , θ) + KL(q ∥p ) .. . . KL(q||p) L(q, θ) ln p(X|θ) (@kisa12012) 9 December 11, 2010 108 / 120
  • 111. E EM . . .. . ln p (X|θ) = L(q , θ) + KL(q ∥p ) .. . . . E . .. θold E L(q , θold ) θold q (Z) q (Z) ln p (X|θ) q (Z) KL(q ∥p ) = 0 ⇔ q (Z) = p (Z|X, θold ) . .. . . (@kisa12012) 9 December 11, 2010 109 / 120
  • 112. E KL-divergence 0 KL(q||p) L(q, θ) ln p(X|θ) (@kisa12012) 9 December 11, 2010 110 / 120
  • 113. E KL-divergence 0 KL(q||p) = 0 L(q, θ old ) ln p(X|θ old ) (@kisa12012) 9 December 11, 2010 110 / 120
  • 114. M . . .. . ln p (X|θ) = L(q , θ) + KL(q ∥p ) .. . . . M . .. q (Z) L(q , θ) θ θnew L(q , θ) ln p (X|θ) q (Z) KL(q ∥p ) . .. . . (@kisa12012) 9 December 11, 2010 111 / 120
  • 115. M L(q , θ) ln p (X|θ) q (Z) KL(q ∥p ) KL(q||p) L(q, θ) ln p(X|θ) (@kisa12012) 9 December 11, 2010 112 / 120
  • 116. M L(q , θ) ln p (X|θ) q (Z) KL(q ∥p ) KL(q||p) = 0 L(q, θ old ) ln p(X|θ old ) (@kisa12012) 9 December 11, 2010 112 / 120
  • 117. M L(q , θ) ln p (X|θ) q (Z) KL(q ∥p ) KL(q||p) L(q, θ new ) ln p(X|θ new ) (@kisa12012) 9 December 11, 2010 112 / 120
  • 118. E q q (Z) = p (Z|X, θold ) (71) E . E L(q , θ) . .. L(q , θ) = ∑ p (Z|X, θold ) ln p (X, Z|θ) − ∑ p (Z|X, θold ) ln p (Z|X, θold ) Z Z . = Q(θ, θ old ) + const (74) .. . . const q M ln p (X, Z|θ) p (X, Z|θ) (@kisa12012) 9 December 11, 2010 113 / 120
  • 119. EM ln p(X|θ) L (q, θ) new θ old θ ln p (X|θ) θ old L(q , θold ) θnew L(q , θnew ) (@kisa12012) 9 December 11, 2010 114 / 120
  • 120. i.i.d. . Notation .. . N i.i.d. {xn } X . { zn } Z .. . . i.i.d. . . .. p (X, Z) = ∏ p(xn , zn ) . n .. . . { zn } p ( X ) = ∏ n p ( xn ) (@kisa12012) 9 December 11, 2010 115 / 120
  • 121. i.i.d. . E . .. p (X, Z|θ) ∏N=1 p (xn , zn |θ) p (Z|X, θ) = = n ∑Z p (X, Z|θ) ∑Z ∏N=1 p (xn , zn |θ) n N = ∏ p(zn |xn , θ) (75) . n =1 .. . . (75) xn xn θ p ( X, Z ) EM (@kisa12012) 9 December 11, 2010 116 / 120
  • 122. i.i.d. . EM . .. (17)(18) ( ) γnew (zmk ) − γold (zmk ) µnew k = µold k + new (xm − µold ) k (76) Nk . Nk = Nk + γnew (zmk ) − γ new old old (zmk ) (77) .. . . EM (@kisa12012) 9 December 11, 2010 117 / 120
  • 123. MAP EM EM p (θ) p (θ|X) ln p (θ|X) = ln p (θ, X) − ln p (X) (78) = ln p (X|θ) + ln p (θ) − ln p (X) ln p (θ|X) = L(q , θ) + KL(q ∥p ) + ln p (θ) − ln p (X) ≥ L(q , θ) + ln p (θ) − ln p (X) (79) (@kisa12012) 9 December 11, 2010 118 / 120
  • 124. EM M . EM (generalized EM algorithm) . .. M L(q , θ) θ L(q , θ) θnew . .. . . . ECM (expectation conditional maximization) . .. M [Meng+, 1993] . .. . . (@kisa12012) 9 December 11, 2010 119 / 120
  • 125. EM E . [Neal+, 1999] . .. . L(q , θ) q .. . . (@kisa12012) 9 December 11, 2010 120 / 120