The document discusses the EM algorithm and K-means clustering. It begins by introducing mixture models and the EM algorithm for parameter estimation. It then describes the K-means clustering algorithm and how it can be viewed as a special case of EM. The document concludes by explaining how EM can be applied to Gaussian mixture models, deriving the E and M steps, and introducing the responsibilities that indicate cluster assignments.
17. K-means
K-means
K-means
[MacQueen, 1967]
(1) Robbins-Monro (2.3.5 )
xn µk
.
.
.. ( )
µnew = µold + ηn xn − µold
k k k (5)
.
.. .
.
ηn n
(@kisa12012) 9 December 11, 2010 17 / 120
18. K-medoids
K-means
µk xn (2.3.7 )
.
K-medoids .
..
N K
J= ∑ ∑ rnk V (xn , µk ) (6)
. n =1 k =1
.. .
.
V (x, x′ )
E K-means
M
xn µk
2
O (KN ) + O (Nk )
Nk k xn
(@kisa12012) 9 December 11, 2010 18 / 120
24. {R,G,B} 8bit N
24Nbit
(
)
K 1 log2 K bit
24K bit
24K + N log2 K bit
———————–
(@kisa12012) 9 December 11, 2010 24 / 120
25. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 25 / 120
26. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 26 / 120
27. 2.3.9
EM
(2.188)
.
.
..
K
p (x) = ∑ πk N (x| µk , Σk ) (7)
. k =1
.. .
.
(@kisa12012) 9 December 11, 2010 27 / 120
28. K 2 z
1-of-K zk ∈ {0, 1} ∑k zk = 1
p (z) p (x|z) p (x, z)
Figure:
z
x
(@kisa12012) 9 December 11, 2010 28 / 120
29. z πk
p ( zk = 1 ) = π k
{ πk } (8)(9)
0 ≤ πk ≤ 1 (8)
K
∑ πk = 1 (9)
k =1
z
K
p (z) = ∏ πk
z k
(10)
k =1
(@kisa12012) 9 December 11, 2010 29 / 120
30. z x
p ( x | zk = 1 ) = N ( x | µ k , Σ k )
(11)
K
p (x|z) = ∏ N (x| µk , Σk )z k (11)
k =1
p (x, z) p (z)p (x|z)
K
p (x) = ∑ p (z)p (x|z) = ∑ πk N (x| µk , Σk ) (12)
z k =1
(12)
{x1 , . . . , xN }
xn zn
(@kisa12012) 9 December 11, 2010 30 / 120
31. .
.
..
p (x, z)
EM
x z
p (z|x)
.
.. .
.
(@kisa12012) 9 December 11, 2010 31 / 120
32. x z γ ( zk )
γ ( zk )
p ( zk = 1 ) p ( x | zk = 1 )
γ ( zk ) ≡ p ( zk = 1 | x ) = K
∑ p(zj = 1)p(x|zj = 1)
j =1
πk N (x| µk , Σk )
= K
(13)
∑ πj N (x| µj , Σj )
j =1
πk zk = 1 γ ( zk ) x
zk = 1
γ ( zk ) k x
(responsibility)
(@kisa12012) 9 December 11, 2010 32 / 120
33. (ancestral sampling)
(8.1.2 )
.
.
..
...
1 z ˆ
z p (z)
..
..
2 x p (x|z)
ˆ
.. .
.
11
(@kisa12012) 9 December 11, 2010 33 / 120
34. p ( x, z )
1
(a)
0.5
0
0 0.5 1
p (x, z)
z
(complete)
(@kisa12012) 9 December 11, 2010 34 / 120
35. p (x)
1
(b)
0.5
0
0 0.5 1
p (x)
z
(incomplete)
(@kisa12012) 9 December 11, 2010 35 / 120
36. (responsibility)
xn k
p ( zk | x n )
1
(c)
0.5
0
0 0.5 1
xn
γ(znk ) ≡ p (zk = 1|xn )
(@kisa12012) 9 December 11, 2010 36 / 120
37. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 37 / 120
38. { x1 , . . . , xN }
.
Notation
..
.
X n xT
n N ×D
. Z n zT
n N ×K
.. .
.
xn
zn
π
xn
µ Σ
N
(@kisa12012) 9 December 11, 2010 38 / 120
39. ln p (X|π, µ, Σ)
{ }
N K
ln p (X|π, µ, Σ) = ∑ ln ∑ πk N (xn |µk , Σk ) (14)
n =1 k =1
Σk = σk I
2
j µj xn
µ j = xn xn
1 1
N (xn |xn , σj2 I) = (15)
(2π ) σj
1
2
(@kisa12012) 9 December 11, 2010 39 / 120
40. σj → 0 (15)
ln p (X|π, µ, Σ)
1
0 0
p(x)
x
(@kisa12012) 9 December 11, 2010 40 / 120
42. { }
N K
ln p (X|π, µ, Σ) = ∑ ln ∑ πk N (xn |µk , Σk )
n =1 k =1
K
0
.
.
..
[Fletcher, 1987; Nocedal+, 1999; Bishop+, 2008]
5
EM
. 10
.. .
.
(@kisa12012) 9 December 11, 2010 42 / 120
43. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 43 / 120
44. EM
.
EM (expectation-maximization algorithm) .
..
. [Dempster+, 1977; McLachlan+, 1997]
.. .
.
EM
EM
(9.3 ) (10.1 )
{ }
N K
ln p (X|π, µ, Σ) = ∑ ln ∑ πk N (xn |µk , Σk )
n =1 k =1
(@kisa12012) 9 December 11, 2010 44 / 120
45. µk
µk
µk 0
( { })
∂ N K
0=
∂µk ∑ ln ∑ πj N (xn |µj , Σj )
n =1 j =1
N
πk N (xn |µk , Σk ) −1
=− ∑ Σ (xn − µk )
π N (xn |µj , Σj ) k
(16)
n =1 ∑ j j
γ(znk )
( )
(@kisa12012) 9 December 11, 2010 45 / 120
46. Σk ( )
1 N
µk = ∑ γ(znk )xn
Nk n = 1
(17)
N
Nk = ∑ γ(znk ) (18)
n =1
Nk k
k µk
xn k xn
γ(znk )
(@kisa12012) 9 December 11, 2010 46 / 120
47. Σk
Σk 0
1 N
Σk = ∑ γ(znk )(xn − µk )(xn − µk )T
Nk n = 1
(19)
[ 2.34]
xn γ(znk )
Nk k
(@kisa12012) 9 December 11, 2010 47 / 120
48. πk
πk k
1 (9)
( )
K
ln p (X|π, µ, Σ) + λ ∑ πk − 1 (20)
k =1
N
N ( xn | µ k , Σ k )
0= ∑ π N ( xn | µ j , Σ j )
+λ (21)
n =1 ∑ j j
( )
K N
πk N (xn |µk , Σk )
= ∑ ∑ + πk λ
k =1 n =1 ∑ j j
π N (xn |µj , Σj )
(@kisa12012) 9 December 11, 2010 48 / 120
49. λ = −N
Nk
πk = (22)
N
πk xn k γ(znk )
(@kisa12012) 9 December 11, 2010 49 / 120
50. EM
µk Σk πk
γ(znk ) (13)
→ EM
(@kisa12012) 9 December 11, 2010 50 / 120
51. EM
.
E (expectation step) .
..
(13)
.
.. .
.
.
M (maximization step) .
..
γ(znk ) µk Σk
. πk (17) (19) (22)
.. .
.
M
E M (9.4 )
(@kisa12012) 9 December 11, 2010 51 / 120
52. 2
0
−2
−2 0 (a) 2
Old Faithful EM
K-means
2 (1 )
(@kisa12012) 9 December 11, 2010 52 / 120
53. 2 2
L=1
0 0
−2 −2
−2 0 (b) 2 −2 0 (c) 2
E M
(@kisa12012) 9 December 11, 2010 53 / 120
55. EM
EM K-means
K-means
K-means
(@kisa12012) 9 December 11, 2010 55 / 120
56. EM
.
.
..
µ, Σ, π
.
.. .
.
.
1 .
..
. µk Σk πk
.. .
.
.
2(E ) .
..
γ(znk )
πk N (xn |µk , Σk )
γ(znk ) = K
(23)
∑ π j N ( xn | µ k , Σ k )
. j =1
.. .
.
(@kisa12012) 9 December 11, 2010 56 / 120
57. EM
.
3(M ) .
..
1 N
µnew
k = ∑ γ(znk )xn
Nk n = 1
(24)
1 N
Σnew =
k ∑ γ(znk )(xn − µk )(xn − µk )T
Nk n = 1
(25)
N
πk
new
= k (26)
N
N
Nk = ∑ γ(znk ) (27)
. n =1
.. .
.
(@kisa12012) 9 December 11, 2010 57 / 120
58. EM
.
4 .
..
{ }
N K
ln p (X|µ, Σ, π ) = ∑ ln ∑ πk N (xn |µk , Σk ) (28)
n =1 k =1
. 2
.. .
.
(@kisa12012) 9 December 11, 2010 58 / 120
59. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 59 / 120
60. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 60 / 120
61. EM
EM
EM
.
EM .
..
.
.. .
.
.
Notation
..
.
X( n n xn )
Z( n n zn )
. θ
.. .
.
(@kisa12012) 9 December 11, 2010 61 / 120
62. (29)
{ }
ln p (X|θ) = ln ∑ p(X, Z|θ) (29)
Z
Z
.
Example ( (28)) .
.. { }
N K
ln p (X|µ, Σ, π ) = ∑ ln ∑ π k N ( xn | µ k , Σ k )
. n =1 k =1
.. .
.
Z p (X, Z|θ)
p (X|θ)
(@kisa12012) 9 December 11, 2010 62 / 120
63. . { } .
ln p (X|θ) = ln ∑ p(X, Z|θ)
. Z
.. .
.
( {X, Z}
) ln p (X, Z|θ)
X
Z p (Z|X, θ)
ln p (X, Z|θ)
p (Z|X, θ)
E
θ
M
9.4
(@kisa12012) 9 December 11, 2010 63 / 120
64. .
E .
..
p (Z|X, θold )
. θold
.. .
.
.
M .
..
ln p (X, Z|θ) p (Z|X, θold )
Q(θ, θold )
Q(θ, θold ) = ∑ p (Z|X, θold ) ln p (X, Z|θ) (30)
Z
θnew
θnew = arg max Q(θ, θold ) (31)
θ
.
.. .
.
(@kisa12012) 9 December 11, 2010 64 / 120
65. EM
X Z p (X, Z|θ)
θ
.
.
..
. p (X|θ)
.. .
.
.
1 .
..
. θold
.. .
.
.
2(E ) .
..
. p (Z|X, θold )
.. .
.
(@kisa12012) 9 December 11, 2010 65 / 120
67. EM
p (θ) MAP
EM [ 9.4]
M Q(θ, θold ) + ln p (θ)
EM
( 12.11)
EM
(missing at random)
(@kisa12012) 9 December 11, 2010 67 / 120
68. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 68 / 120
69. EM
.
EM .
..
. ln p (X|π, µ, Σ) (14)
.. .
.
k
Z
{X, Z}
zn
π
.
Example .
.. xn
.
.. . µ
.
Σ
N
(@kisa12012) 9 December 11, 2010 69 / 120
70. .
.
..
. {X, Z}
.. .
.
N K
p (X, Z|π, µ, Σ) = ∏ ∏ πk
z nk
N (xn |µk , Σk )znk (35)
n =1 k =1
znk zn k
N K
ln p (X, Z|π, µ, Σ) = ∑ ∑ znk {ln πk + ln N (xn |µk , Σk )} (36)
n =1 k =1
(@kisa12012) 9 December 11, 2010 70 / 120
71. N K
ln p (X, Z|π, µ, Σ) = ∑ ∑ znk {ln πk + ln N (xn |µk , Σk )}
n =1 k =1
zn 1-of-K K
µ Σ
πk
1 N
πk = ∑ znk
N n =1
(37)
(@kisa12012) 9 December 11, 2010 71 / 120
72. Z
(10), (11) Z
N K
∏ ∏ [πk N (xn |µk , Σk )]
znk
p (Z|X, π, µ, Σ) ∝ (38)
n =1 k =1
n
{ zn } ( 9.5)
8 /
(@kisa12012) 9 December 11, 2010 72 / 120
73. znk
∑ znk ∏ [πk ′ N (xn |µk ′ , Σk ′ )]
znk ′
zn ′ k
E [znk ] = [ ]znj
∑∏ π j N ( xn | µ j , Σ j )
zn j
π k N ( xn | µ k , Σ k )
= K
= γ(znk ) (39)
∑ πj N (xn |µj , Σj )
j =1
1 znk = 1 n,k
2 znk k
xn
N K
EZ [ln p (X, Z|π, µ, Σ)] = ∑ ∑ γ(znk ){ln πk + ln N (xn |π k , µk )}
n =1 k =1
(40)
(@kisa12012) 9 December 11, 2010 73 / 120
74. N K
EZ [ln p (X, Z|π, µ, Σ)] = ∑ ∑ γ(znk ){ln πk + ln N (xn |π k , µk )}
n =1 k =1
EM
EM
( 9.8)
9.4
(@kisa12012) 9 December 11, 2010 74 / 120
75. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 75 / 120
76. K-means EM
K-means EM
K-means
EM
K-means
EM
. .
. .
.. ..
xn 1 xn
. µk .
.. .
.
.. .
.
(@kisa12012) 9 December 11, 2010 76 / 120
77. ϵI
ϵ
I
k
{ }
1 1
p (x| µk , Σk ) = exp − ∥x − µk ∥2 (41)
(2πϵ) 2
D
2ϵ
K EM
xn k
πk exp{−∥xn − µk ∥2 }/2ϵ
γ(znk ) = (42)
∑j πj exp{−∥xn − µj ∥2 }/2ϵ
(@kisa12012) 9 December 11, 2010 77 / 120
79. K-means M
EM µk (17)
K-means (4)
πk (22) πk k
(40) ϵ→0
( 9.11)
1 N K
EZ [ln p (X, Z|µ, Σ, π )] → − ∑ ∑ rnk ∥xn − µk ∥2 + const (43)
2 n =1 k =1
ϵ→0 (1)
J
(@kisa12012) 9 December 11, 2010 79 / 120
80. K-means Σ
µ
EM K-means (elliptical K-means
algorithm) [Sung+, 1994]
(@kisa12012) 9 December 11, 2010 80 / 120
81. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 81 / 120
82. EM
2
(latent class analysis)
[Lazarsfeld+ 1968; McLachlan+ 2000]
Markov (13.2 )
(@kisa12012) 9 December 11, 2010 82 / 120
83. D 2 xi ( i = 1 , . . . , D )
D
p (x| µ ) = ∏ µ x ( 1 − µ i ) (1−x )
i
i i (44)
i =1
x = ( x1 , . . . , xD ) T µ = ( µ1 , . . . , µD )T
µ xi
E [x] cov [x]
(2.1 )
E [x] = µ (45)
cov [x] = diag {µi (1 − µi )} (46)
(@kisa12012) 9 December 11, 2010 83 / 120
84. .
.
..
K
p (x|µ, π ) = ∑ πk p (x| µk ) (47)
. k =1
.. .
.
µ = { µ1 , . . . , µK } π = { π1 , . . . , πK }
D
p (x| µk ) = ∏ µx (1 − µki )(1−x )
ki
i i (48)
i =1
[ 9.12]
K
E [x] = ∑ πk µk (49)
k =1
K { }
cov [x] = ∑ πk Σk + µk µT
k − E [x]E [x]T (50)
k =1
Σk = diag {µki (1 − µki )}
(@kisa12012) 9 December 11, 2010 84 / 120
85. cov [x]
X = {x1 , . . . , xN }
.
.
.. { }
N K
ln p (X|µ, π ) = ∑ ln ∑ πk p(xn |µk ) (51)
. n =1 k =1
.. .
.
(@kisa12012) 9 December 11, 2010 85 / 120
86. EM
x z
z = (z1 , . . . , zK )T 1-of-K (
)
z x
.
z x .
..
K
p ( x | z, µ ) = ∏ p (x| µk )z k (52)
. k =1
.. .
.
(@kisa12012) 9 December 11, 2010 86 / 120
87. .
z x .
..
K
p ( x | z, µ ) = ∏ p (x| µk )z k
. k =1
.. .
.
z
K
p (z| π ) = ∏ πk
z k
(53)
k =1
( )
p (x|z, µ) p (z| π ) z (47)
(@kisa12012) 9 December 11, 2010 87 / 120
88. EM
EM
.
.
..
N K
lnp (X, Z|µ, π ) = ∑ ∑ znk
n =1 k =1
{ } (54)
D
ln πk + ∑ [xni ln µki + (1 − xni ) ln(1 − µki )]
. i =1
.. .
.
X = {xn } Z = { zn }
(@kisa12012) 9 December 11, 2010 88 / 120
89. Z
.
.
..
N K
EZ [ln p (X, Z|µ, π )] = ∑ ∑ γ(znk )
n =1 k =1
{ } (55)
D
ln πk + ∑ xni ln µki + (1 − xni ln(1 − µki )]
. i =1
.. .
.
γ(znk ) = E [znk ] xn k
( )
(@kisa12012) 9 December 11, 2010 89 / 120
90. E
.
γ(znk ) .
..
∑zn znk ∏k ′ [πk ′ p (xn |µk ′ )]znk ′
γ(znk ) = E [znk ] =
∑zn ∏j [πj p (xn |µj )]znj
πk p (xn |µk )
= (56)
. ∑K 1 πj p (xn |µj )
j=
.. .
.
(55) 2
N
Nk = ∑ γ(znk ) (57)
n =1
1
xk = ∑ N γ(znk )xn
Nk n = 1
(58)
Nk k
(@kisa12012) 9 December 11, 2010 90 / 120
91. M
µk π
(55) µk 0 [ 9.15]
.
µk .
..
. µ k = xk (59)
.. .
.
k
πk [ 9.16]
.
..
πk .
Nk
πk = (60)
. N
.. .
.
k
(@kisa12012) 9 December 11, 2010 91 / 120
92. N = 600, K = 3, πk = 1
K , ∑j µkj = 1
(@kisa12012) 9 December 11, 2010 92 / 120
95. EM
0 ≤ p ( xn | µ k ) ≤ 1 [
9.17]
0
[2.1.1 ]
EM
[ 9.18]
[ 9.19]
(@kisa12012) 9 December 11, 2010 93 / 120
96. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 94 / 120
97. 3.5.2 α, β
0
EM
α, β
.
α, β .
.. ∫
p (t|α, β) = p (t|w, β)p (w|α)dw
.
.. .
.
w
(@kisa12012) 9 December 11, 2010 95 / 120
98. E
α, β w
w 3
.
E .
..
p ( w ) = N ( w | mN , S N )
−
mN = SN (S0 1 m0 + βΦT t)
− −
. SN 1 = S0 1 + βΦT Φ
.. .
.
(@kisa12012) 9 December 11, 2010 96 / 120
99. M
.
.
..
. ln p (t, w|α, β) = ln p (t|w, β) + ln p (w|α) (61)
.. .
.
N
p (t|w, β) = ∏ N (tn |wT ϕ(xn ), β−1 )
n =1
p (w|α) = N (w|0, α−1 I)
w
M ( α ) α
E [ln p (t, w|α, β)] = ln − E [wT w]
2 2π 2
( ) (62)
N β β N
+ ln
2 2π
− ∑ E [(tn − w
2 n =1
T
ϕn ) ]
2
(@kisa12012) 9 December 11, 2010 97 / 120
100. M
(62) α 0 α
[ 9.20]
.
α
..
.
M M
α= T w]
= T (63)
. E [w mN mN + tr (SN )
.. .
.
β [ 9.21]
(@kisa12012) 9 December 11, 2010 98 / 120
101. EM
EM
M ×M
α 2
(64) γ (3.92)
M
1
γ = M−α∑ = M − αtr (SN ) (64)
i =1
λi + α
αmT mN = γ = M − αtr (SN )
N (65)
EM
(@kisa12012) 9 December 11, 2010 99 / 120
102. RVM(relevance vector
machine)
7.2.1 α, β
w EM
E (7.81)
M
Ew [{ln p (t|X, w, β)p (w|α)}] (66)
1
αnew =
i (67)
mi2 + Σii
∥t − Φm∥2 + β−1 Σi γi
( βn ew )−1 = (68)
N
(@kisa12012) 9 December 11, 2010 100 / 120
103. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 101 / 120
104. .
. . K-means
1
.
..
2 (Mixture of Gaussians)
EM
. ..
3 EM
K-means
EM
..
.4 EM
(@kisa12012) 9 December 11, 2010 102 / 120
105. EM
.
EM (expectation-maximization algorithm) .
..
[Dempster+, 1977; McLachlan+, 1997]
.
.. .
.
EM
EM
[10.1 ]
.
Notation
..
.
X
Z
. θ
.. .
.
(@kisa12012) 9 December 11, 2010 103 / 120
106. EM
.
.
..
p (X|θ) = ∑ p(X, Z|θ) (69)
. Z
.. .
.
Z
.
.
..
p (X|θ)
ln p (X, Z|θ)
. q (Z)
.. .
.
(@kisa12012) 9 December 11, 2010 104 / 120
107. .
.
..
. ln p (X|θ) = L(q , θ) + KL(q ∥p ) (70)
.. .
.
{ }
p (X, Z|θ)
L(q , θ) = ∑ q (Z) ln (71)
Z
q (Z)
{ }
p (Z|X, θ)
KL(q ∥p ) = − ∑ q (Z) ln (72)
Z
q (Z)
(70)
ln p (X, Z|θ) = ln p (Z|X, θ) + ln p (X|θ) (73)
(71)
(@kisa12012) 9 December 11, 2010 105 / 120
115. M
L(q , θ) ln p (X|θ)
q (Z) KL(q ∥p )
KL(q||p)
L(q, θ) ln p(X|θ)
(@kisa12012) 9 December 11, 2010 112 / 120
116. M
L(q , θ) ln p (X|θ)
q (Z) KL(q ∥p )
KL(q||p) = 0
L(q, θ old ) ln p(X|θ old )
(@kisa12012) 9 December 11, 2010 112 / 120
117. M
L(q , θ) ln p (X|θ)
q (Z) KL(q ∥p )
KL(q||p)
L(q, θ new ) ln p(X|θ new )
(@kisa12012) 9 December 11, 2010 112 / 120
118. E q q (Z) = p (Z|X, θold ) (71)
E
.
E L(q , θ) .
..
L(q , θ)
= ∑ p (Z|X, θold ) ln p (X, Z|θ) − ∑ p (Z|X, θold ) ln p (Z|X, θold )
Z Z
. = Q(θ, θ old
) + const (74)
.. .
.
const q
M
ln p (X, Z|θ)
p (X, Z|θ)
(@kisa12012) 9 December 11, 2010 113 / 120
119. EM
ln p(X|θ)
L (q, θ)
new
θ old θ
ln p (X|θ)
θ
old
L(q , θold )
θnew L(q , θnew )
(@kisa12012) 9 December 11, 2010 114 / 120
120. i.i.d.
.
Notation
..
.
N i.i.d. {xn } X
. { zn } Z
.. .
.
i.i.d.
.
.
..
p (X, Z) = ∏ p(xn , zn )
. n
.. .
.
{ zn } p ( X ) = ∏ n p ( xn )
(@kisa12012) 9 December 11, 2010 115 / 120
121. i.i.d.
.
E .
..
p (X, Z|θ) ∏N=1 p (xn , zn |θ)
p (Z|X, θ) = = n
∑Z p (X, Z|θ) ∑Z ∏N=1 p (xn , zn |θ)
n
N
= ∏ p(zn |xn , θ) (75)
. n =1
.. .
.
(75) xn xn
θ
p ( X, Z ) EM
(@kisa12012) 9 December 11, 2010 116 / 120
122. i.i.d.
.
EM .
..
(17)(18)
( )
γnew (zmk ) − γold (zmk )
µnew
k = µold
k + new (xm − µold )
k (76)
Nk
. Nk = Nk + γnew (zmk ) − γ
new old old
(zmk ) (77)
.. .
.
EM
(@kisa12012) 9 December 11, 2010 117 / 120
123. MAP EM
EM p (θ)
p (θ|X)
ln p (θ|X) = ln p (θ, X) − ln p (X) (78)
= ln p (X|θ) + ln p (θ) − ln p (X)
ln p (θ|X) = L(q , θ) + KL(q ∥p ) + ln p (θ) − ln p (X)
≥ L(q , θ) + ln p (θ) − ln p (X) (79)
(@kisa12012) 9 December 11, 2010 118 / 120
124. EM
M
.
EM (generalized EM algorithm) .
..
M L(q , θ) θ
L(q , θ) θnew
.
.. .
.
.
ECM (expectation conditional maximization) .
..
M [Meng+, 1993]
.
.. .
.
(@kisa12012) 9 December 11, 2010 119 / 120
125. EM
E
.
[Neal+, 1999] .
..
. L(q , θ) q
.. .
.
(@kisa12012) 9 December 11, 2010 120 / 120