HMM, MEMM, CRF メモ

10,968 views

Published on

2010-06-21にhandsOutにアップした資料の明らかな間違いを修正した資料です。

Published in: Technology, Business
0 Comments
33 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
10,968
On SlideShare
0
From Embeds
0
Number of Embeds
4,152
Actions
Shares
0
Downloads
152
Comments
0
Likes
33
Embeds 0
No embeds

No notes for slide

HMM, MEMM, CRF メモ

  1. 1. HMM, MEMM, CRF CRF
  2. 2. Hidden Markov Model
  3. 3. P(X, Y ) = P(Y )P(X|Y ) = Y t P(Yt|Yt 1)P(Xt|Yt) Yt 1 Yt Yt+1 Xt+1XtXt 1 HMM X Y 1 HMM Viterbi P(X, Y )
  4. 4. P(X) = X Y P(X, Y ) arg max Y P(Y |X) = arg max Y P(X, Y ) P(X) = arg max Y P(X, Y ) HMM X → X Y → Viterbi EM → Baum-Welch ※
  5. 5. xt 2 O yt 2 S O(|S|2 T) , t 1. sS 2. t = 1, ..., T – 1 3. sE = P(X = x) = X y P(X1 = x1, · · · , XT = xT , Y = y) x = x1 · · · xT six1 · · · xt t(x, si) = P(X1 = x1, · · · , Xt = xt, Yt = si) 1(x, si) = P(Y1 = si|Ys = ss)P(X1 = x1|Y1 = si) t+1(x, si) = 2 4 X j t(x, sj)P(si|sj) 3 5 P(xt+1|si) P(x) = X j T (x, sj)P(sE|sj) 1
  6. 6. t(x, s2) = P(x1, x2, Y2 = s2) |S| = 3 4 sS s1 s2 s3 sE YS Y1 Y2 Y3 Y4 YE X1 X2 X3 X4 s1 s2 s3 s1 s2 s3 s1 s2 s3 s2 x1 x2
  7. 7. xt+1 · · · xT 1. sE 2. t = T – 1, ..., 1 3. sS = x = x1 · · · xT si t(x, si) = ( P(xt+1, · · · , xT , Yt = si) if t = 1, · · · , T 1 P(Yt = si) if t = T O(|S|2 T) T (x, si) = P(sE|si) P(x) = X j P(sj|sS)P(xt|sj) 1(x, sj) t(x, si) = X j P(sj|si)P(xt|sj) t+1(x, sj) ※ 1
  8. 8. 3(x, s1) = P(x4, Y3 = s1) |S| = 3 4 sS s1 s2 s3 sE YS Y1 Y2 Y3 Y4 YE X1 X2 X3 X4 s1 s2 s3 s1 s2 s3 s1 s2 s3 x4 s1
  9. 9. ˆy = arg max y P(y|x) Viterbi t 1. sS 2. t = 1, ..., T – 1 3. sE = 4. t = T – 1, ..., 1 Viterbi ← x = x1 · · · xT x1 · · · xt si O(|S|2 T) t(x, si) = max y1···yt 1 P(x1, · · · , xt, y1, · · · , yt 1, Yt = si) 1(x, si) = P(si|sS)P(x1|si) t+1(x, si) = max sj [ t(x, sj)P(si|sj)] P(xt+1|si) ⇥t(x, si) = arg max sj [ t(x, sj)P(si|sj)] max y P(x, y) = max sj [ T (x, sj)] P(sE|sj) ˆyT = arg max sj T (x, sj) ˆyt = t(x, ˆyt+1) 1
  10. 10. arg max sj 4(x, sj) = s1 t(x, sj) Viterbi |S| = 3 4 1 2 3 s1 s1 s3 s1 s2 s1 s2 s3 s3 s2 s1 s1 sj t sS s1 s2 s3 sE YS Y1 Y2 Y3 Y4 YE X1 X2 X3 X4 s1 s2 s3 s1 s2 s3 s1 s2 s3 x1 x2 x3 x4 sS s2 s3 s1 sEs1
  11. 11. ⇤t(si, sj| ) = P(Yt = si, Yt+1 = sj|x, ) = t(si| )P(sj|si, )P(xt+1|sj, )⇥t+1(sj| ) P k T (sk| )P(sE|sk, ) P(x| ) ⇥T (si, ·|✓) = T (si|✓) P k T (sk|✓) Baum-Welch = γ ※θ sS s1 s2 s3 sE YS Y1 Y2 Y3 Y4 YE X1 X2 X3 X4 s1 s2 s3 s1 s2 s3 s1 s2 s3 1
  12. 12. ✓ ¯✓ = (· · · , ¯i, · · · , ¯aij, · · · ,¯bik, · · · ) ※O ✓ = ( 1, · · · , |S|, a11, · · · , a|S||S|, b11, · · · , b|S||O|) t(si|✓) = X sj 2S t(si, sj|✓) t = 1, · · · , T 1 ¯⇥i = ¯P(si|sS, ✓) = 1(si|✓) ¯aij = ¯P(sj|si, ✓) = PT 1 t=1 t(si, sj|✓) PT 1 t=1 t(si|✓) ¯bik = ¯P(ok|si, ✓) = P t:xt=ok t(si|✓) PT t=1 t(si|✓)
  13. 13. Maximum Entropy Markov Model
  14. 14. HMM (features) → → X Y , ‘er’ HMM ‘er’ ‘er’ , ‘er’ …
  15. 15. [[x = y]] = ( 1 if x = y 0 if x 6= y MEMM MEMM X Y ※ ( ) HMM ※ Yt 1 Yt Yt+1 Xt+1XtXt 1 Z(Xt, Yt 1) Ps(Yt|Xt) = 1 Z(Xt, s) exp X a afa(Xt, Yt) ! P(Y |X) = Y t Ps(Yt|Xt)[[Yt 1 = s]] s ME
  16. 16. f<begins-with-number,question> = 1 features 2 Usenet FAQ begins-with-number begins-with-ordinal begins-with-punctuation begins-with-question-word begins-with-subject blank contains-alphanum contains-bracketed-number contains-http contains-non-space contains-number contains-pipe contains-question-mark contains-question-word ends-with-question-mark first-alpha-is-capitalized indented indented-1-to-4 indented-5-to-10 more-than-one-third-space only-punctuation prev-is-blank prev-begins-with-ordinal shorter-than-30 Xt 1 head, question, answer, tail t question f<b,s>(Xt, Yt) = ( 1 if b(Xt) is true and Yt = s 0 otherwise ※ 1 t question
  17. 17. 1. sS 2. t = 1, ..., T – 1 3. 4. t = T – 1, ..., 1 Viterbi ← ˆy = arg max y P(y|x) tx = x1 · · · xT x1 · · · xt si 1(si|x) = P(si|sS, x1) t+1(si|x) = max sj [ t(sj|x)P(si|sj, xt+1)] ⇥t(si|x) = arg max sj [ t(sj|x)P(si|sj, xt+1)] max y P(y|x) = max sj T (sj|x) ˆyT = arg max sj T (sj|x) ˆyt = t(ˆyt+1|x) t(si|x) = max y1···yt 1 P(y1, · · · , yt 1, Yt = si|x1, · · · , xt)
  18. 18. (x(1) , y(1) ), · · · , (x(n) , y(n) ) MEMM Generalized Iterative Scaling 1. o, s C ※ 2. 3. x 4. 5. 3, 4 s ME fc(o, s) 0 8 o, sfc(o, s) = C X a fa(o, s) C = X a fa(o, s) ˜E[fa] = 1 n nX i=1 1 m (i) s X t:yt 1=s fa(x (i) t , y (i) t ) E[fa] = 1 n nX i=1 1 m (i) s X t:yt 1=s X y2S Ps(y|xt, )fa(x (i) t , y) new a = a + 1 C log ˜E[fa] E[fa] !
  19. 19. Conditional Random Fields
  20. 20. MEMM s0 s5 s4 s6 s3 s1 s2 0.65 0.35 1 0.5 0.5 1 1 1 s0 → s1 → s2 → s3 : 0.325 x1 x2 x3 s0 s1 s2 s3 s0 s1 s4 s3 s0 s5 s6 s3 ME x1 x2 x3 1 1 s0 → s1 → s4 → s3 : 0.325 s0 → s5 → s6 → s3 : 0.35
  21. 21. P(Y |X) = 1 Z(X) exp 0 @ X t,i ifi(Yt 1, Yt, X, t) + X t,j µjgj(Yt, X, t) 1 A Z(X) X CRF CRF MEMM HMM Yt 1 Yt Yt+1
  22. 22. log Y t P(Yt|Yt 1)P(Xt|Yt) ! = X t {log P(Yt|Yt 1) + log P(Xt|Yt)} = X t 8 < : X <s,s0> [[Yt 1 = s0 ]][[Yt = s]] log P(s|s0 ) + X <o,s> [[Xt = o]][[Yt = s]] log P(o|s) 9 = ; = X <s,s0> log P(s|s0 )[[Yt 1 = s0 ]][[Yt = s]] + X t,<o,s> log P(o|s)[[Xt = o]][[Yt = s]] P(Y |X) = 1 P(X) exp 0 @ X t,<o,s> <o,s>f<o,s>(Yt 1, Yt, X, t) + X t,<s,s0> µ<s,s0>g<s,s0>(Yt, X, t) 1 A X o exp( <o,s>) = 1 X s exp(µ<s,s0>) = 1 HMM CRF CRF P(X, Y ) = P(Y )P(X|Y ) = Y t P(Yt|Yt 1)P(Xt|Xt) <o,s> f<o,s>(Yt 1, Yt, X, t) g<s,s0>(Yt, X, t)µ<s,s0>
  23. 23. ˆyT = arg max sm T (x, sm) x skt k 1. sS 2. k = 1, ..., T – 1 3. 4. t = T – 1, ..., 1 Viterbi ← ˆyt = k(x, ˆyt+1) 1(x, sl) = h1(sS, sl, x) k+1(x, sl) = max sm [ k(x, sm) + hk+1(sm, sl, x)] ⇥k(x, sl) = arg max sm [ k(x, sm) + hk+1(sm, sl, x)] k(x, sl) = max y1···yk 1 "k 1X t=1 ht(yt 1, yt, x) + hk(yk 1, sl, x) # ht(Yt 1, Yt, X) = X i ifi(Yt 1, Yt, X, t) + X j µjgj(Yt, X, t) ˆy = arg max y P(Y |X) = arg max y 2 4 X t,i ifi(Yt 1, Yt, X, t) + X t,j µjgj(Yt, X, t) 3 5
  24. 24. (|S| + 2) × (|S| + 2) |S| + 2 |S| + 2 sS sm sE sl Mt(X) Mt(sl, sm|X) = exp ht(sl, sm, X) ↵t(X) 0(Y |X) = ( 1 if Y = sS 0 otherwise T +1(Y |X) = ( 1 if Y = sE 0 otherwise t(X)T = t 1(X)T Mt(X) t(X) t(X) = Mt+1(X) t+1(X) sS sEMt(X) ↵t(X) t(X)
  25. 25. C MEMM (x(1) , y(1) ), · · · , (x(n) , y(n) ) Generalized Iterative Scaling 1. C ※ 2. 3. x 4. 5. 3, 4 C = X t,i fi(yt 1, yt, x, t) + X t,j gj(yt, x, t) c(x, y) = C X t,i fi(yt 1, yt, x, t) X t,j gj(yt, x, t) new i = i + 1 C log ˜E[fi] E[fi] ! E[fi] = 1 n nX k=1 X t X sl,sm t 1(sl|x(k) , , µ)Mt(sl, sm|x(k) , , µ)⇥t(sm|x(k) , , µ) Z(x(k)| , µ) fi(sl, sm, x(k) , t) Z(x) = Y t Mt(x) ! sS ,sE P(Yt 1 = sl, Yt = sm|x, , µ) ˜E[fi] = 1 n nX k=1 X t fi(y (k) t 1, y (k) t , x(k) , t) c(x(k) , y(k) ) 0 k = 1, · · · , n
  26. 26. • , ( ). . , 1999. • A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. Proc. ICML, pp. 591-598, 2000. • J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proc. ICML, pp. 282-289 , 2001. • Charles Elkan. Log-Linear Models and Conditional Random Fields. Notes for a tutorial at CIKM, 2008. • Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004. • , , . Conditional Random Fields . , pp. 89-96, 2004. • http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706b.pdf • http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706a.pdf
  27. 27. Conditional Random Fields X Y1 Y2 Y3 Y4 Y5 Y5 Y4 Y6 X cf. PRML8
  28. 28. p(y|x) = 1 Z Y C C(yC|x) = 1 Z exp X C E(yC|x) ! chain-structed CRFs Y1 Y2 Y2 Y1 Y3 PRML8 X Y1 Y2 Y3 Y4 Y5 Y6
  29. 29. E(yC|x) = X j jtj(yi 1, yi, x, i) X k µksk(yi, x, i) sk(yi 1, yi, x, i) E(yC|x) = X j jfj(yi 1, yi, x, i) E y_i-1 y_i ( ) y_i 2 p(y|x) = 1 Z Y C C(yC|x) = 1 Z exp X C E(yC|x) !
  30. 30. i yi C CRF E(yC|x) = X j jfj(yi 1, yi, x, i) X C E(yC|x) = X C X j jfj(yi 1, yi, x, i) = X j jFj(y, x) Fj(y|x) = X C jfj(yi 1, yi, x, i) p(y|x) = 1 Z exp 0 @ X j jFj(y, x) 1 A

×