HMM, MEMM, CRF メモ

  • 5,805 views
Uploaded on

2010-06-21にhandsOutにアップした資料の明らかな間違いを修正した資料です。

2010-06-21にhandsOutにアップした資料の明らかな間違いを修正した資料です。

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,805
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
93
Comments
0
Likes
15

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HMM, MEMM, CRF CRF
  • 2. Hidden Markov Model
  • 3. HMM X Y P (X, Y ) 1 Viterbi ￿P (X, Y ) = P (Y )P (X|Y ) = P (Yt |Yt−1 )P (Xt |Yt ) t Yt−1 Yt Yt+1 Xt−1 Xt Xt+1 HMM
  • 4. HMM ￿ X P (X) = P (X, Y ) Y→ X Y P (X, Y ) arg max P (Y |X) = arg max = arg max P (X, Y ) Y Y P (X) Y → ViterbiEM→ Baum-Welch※
  • 5. ￿ xt ∈ O yt ∈ S P (X = x) = P (X1 = x1 , · · · , XT = xT , Y = y) yx = x1 · · · xT , x1 · · · xt t si αt (x, si ) = P (X1 = x1 , · · · , Xt = xt , Yt = si ) O(|S|2 T ) 1. sS α1 (x, si ) = P (Y1 = si |Ys = ss )P (X1 = x1 |Y1 = si ) 2. t = 1, ..., T – 1   ￿ αt+1 (x, si ) =  αt (x, sj )P (si |sj ) P (xt+1 |si ) j 3. s ￿E P (x) = αT (x, sj )P (sE |sj ) j = 1
  • 6. |S| = 3 4αt (x, s2 ) = P (x1 , x2 , Y2 = s2 ) X1 X2 X3 X4 x1 x2 YS Y1 Y2 Y3 Y4 YE s2 s1 s1 s1 s1 sS s2 s2 s2 s2 sE s3 s3 s3 s3
  • 7. x = x1 · · · xT si xt+1 · · · xT ￿ P (xt+1 , · · · , xT , Yt = si ) if t = 1, · · · , T − 1 βt (x, si ) = P (Yt = si ) if t = T ※ 2 O(|S| T ) 1. sE βT (x, si ) = P (sE |si ) = 1 2. t = T – 1, ..., 1 ￿ βt (x, si ) = P (sj |si )P (xt |sj )βt+1 (x, sj ) j 3. ￿sS P (x) = P (sj |sS )P (xt |sj )β1 (x, sj ) j
  • 8. |S| = 3 4β3 (x, s1 ) = P (x4 , Y3 = s1 ) X1 X2 X3 X4 x4 YS Y1 Y2 Y3 Y4 YE s1 s1 s1 s1 s1 sS s2 s2 s2 s2 sE s3 s3 s3 s3
  • 9. Viterbi ˆ y = arg max P (y|x) yx = x1 · · · xT x1 · · · xt t si δt (x, si ) = max P (x1 , · · · , xt , y1 , · · · , yt−1 , Yt = si ) y1 ···yt−1 O(|S|2 T ) Viterbi 1. sS δ1 (x, si ) = P (si |sS )P (x1 |si ) 2. t = 1, ..., T – 1 δt+1 (x, si ) = max [δt (x, sj )P (si |sj )] P (xt+1 |si ) sj ψt (x, si ) = arg max [δt (x, sj )P (si |sj )] ← sj 3. sE max P (x, y) = max [δT (x, sj )] P (sE |sj ) y sj = yˆ = arg max δT (x, sj ) T 1 sj 4. t = T – 1, ..., 1 yt = ψt (x, yt+1 ) ˆ ˆ
  • 10. Viterbi |S| = 3 4 arg max δ4 (x, sj ) = s1 sj X1 X2 X3 X4 x1 x2 x3 x4 ψt (x, sj ) YS Y1 Y2 Y3 Y4 YE sS s2 s3 s1 s1 sE t 1 2 3sj s1 s1 s1 s1 s1 s1 s3 s1 sS s2 s2 s2 s2 sE s2 s1 s2 s3 s3 s2 s1 s1 s3 s3 s3 s3
  • 11. Baum-Welch γγt (si , sj |θ) = P (Yt = si , Yt+1 = sj |x, θ) αT (si |θ) γT (si , ·|θ) = ￿ αt (si |θ)P (sj |si , θ)P (xt+1 |sj , θ)βt+1 (sj |θ) k αT (sk |θ) = ￿ k αT (sk |θ)P (sE |sk , θ) P (x|θ) = X1 X2 X3 1 X4 ※θ YS Y1 Y2 Y3 Y4 YE s1 s1 s1 s1 sS s2 s2 s2 s2 sE s3 s3 s3 s3
  • 12. θ = (π1 , · · · , π|S| , a11 , · · · , a|S||S| , b11 , · · · , b|S||O| ) ※O ￿γt (si |θ) = γt (si , sj |θ) t = 1, · · · , T − 1 sj ∈S ¯ πi = P (si |sS , θ) = γ1 (si |θ) ¯ ￿T −1 ¯ (sj |si , θ) = ￿ γt (si , sj |θ) aij = P ¯ t=1 T −1 t=1 γt (si |θ) ￿ γ (s |θ) ¯ik = P (ok |si , θ) = ￿ t =ok t i b ¯ t:x T t=1 γt (si |θ) θ ← θ = (· · · , πi , · · · , aij , · · · , ¯ik , · · · ) ¯ ¯ ¯ b
  • 13. Maximum Entropy Markov Model
  • 14. HMM X Y→→ (features) , ‘er’HMM ‘er’ ‘er’ , ‘er’ …
  • 15. MEMM X Y ※HMM ￿ P (Y |X) = Ps (Yt |Xt )[[Yt−1 = s]] ( ) ￿ ￿ t 1 ￿ Ps (Yt |Xt ) = exp λa fa (Xt , Yt ) Z(Xt , s) a s ME Yt−1 Yt Yt+1 Z(Xt , Yt−1 ) ￿ 1 if x = y ※ [[x = y]] = 0 if x ￿= y Xt−1 Xt Xt+1 MEMM
  • 16. features 2 ￿ 1 if b(Xt ) is true and Yt = s f<b,s> (Xt , Yt ) = 0 otherwise Usenet FAQ head, question, begins-with-number contains-question-mark answer, tail begins-with-ordinal contains-question-word Xt 1 begins-with-punctuation ends-with-question-mark begins-with-question-word first-alpha-is-capitalized t begins-with-subject indented question blank indented-1-to-4f<begins-with-number,question> = 1 contains-alphanum indented-5-to-10 contains-bracketed-number more-than-one-third-space ※ 1 contains-http only-punctuation contains-non-space prev-is-blank t question contains-number prev-begins-with-ordinal contains-pipe shorter-than-30
  • 17. ˆ y = arg max P (y|x) yx = x1 · · · xT x1 · · · xt t si δt (si |x) = max P (y1 , · · · , yt−1 , Yt = si |x1 , · · · , xt ) y1 ···yt−1 Viterbi 1. sS δ1 (si |x) = P (si |sS , x1 ) 2. t = 1, ..., T – 1 δt+1 (si |x) = max [δt (sj |x)P (si |sj , xt+1 )] sj ψt (si |x) = arg max [δt (sj |x)P (si |sj , xt+1 )] ← sj 3. max P (y|x) = max δT (sj |x) y sj yT = arg max δT (sj |x) ˆ sj 4. t = T – 1, ..., 1 yt = ψt (ˆt+1 |x) ˆ y
  • 18. MEMMGeneralized Iterative Scaling 1. o, s C ￿ C= fa (o, s) ※ a ￿ ∀ fc (o, s) = C − fa (o, s) fc (o, s) ≥ 0 o, s a 2. (x(1) , y (1) ), · · · , (x(n) , y (n) ) n ˜ a] = 1￿ 1 ￿ (i) (i) E[f (i) fa (xt , yt ) n i=1 ms t:y =s t−1 3. x n ￿ ￿ ￿ 1 1 (i) E[fa ] = (i) Ps (y|xt , λ)fa (xt , y) n i=1 ms t:yt−1 =s y∈S 4. ￿ ￿ 1 ˜ E[fa ] λnew = λa + log a C E[fa ] 5. 3, 4 s ME
  • 19. Conditional Random Fields
  • 20. MEMM x1 x2 x3 s2 0.5 1 0.65 s1 0.5 1 s4 s3s0 1 1 s0 → s1 → s2 → s3 : 0.325 0.35 s5 s6 s0 → s1 → s4 → s3 : 0.325 1 1 s0 → s5 → s6 → s3 : 0.35 x1 x2 x3 s0 s1 s2 s3 s0 s1 s4 s3 s0 s5 s6 s3 ME
  • 21. CRFMEMMHMM   1 ￿ ￿ P (Y |X) = exp  λi fi (Yt−1 , Yt , X, t) + µj gj (Yt , X, t) Z(X) t,i t,j Yt−1 Yt Yt+1 Z(X) X CRF
  • 22. HMM CRF ￿ P (X, Y ) = P (Y )P (X|Y ) = P (Yt |Yt−1 )P (Xt |Xt ) ￿ ￿ t ￿ log P (Yt |Yt−1 )P (Xt |Yt ) t ￿= {log P (Yt |Yt−1 ) + log P (Xt |Yt )} t   ￿ ￿ ￿  ￿ ￿= [[Yt−1 = s ]][[Yt = s]] log P (s|s ) + [[Xt = o]][[Yt = s]] log P (o|s)   t <s,s￿ > <o,s> ￿ ￿ ￿ ￿= log P (s|s )[[Yt−1 = s ]][[Yt = s]] + log P (o|s)[[Xt = o]][[Yt = s]] <s,s￿ > t,<o,s> µ<s,s￿ > λ<o,s> f<o,s> (Yt−1 , Yt , X, t) g<s,s￿ > (Yt , X, t)CRF   1 ￿ ￿P (Y |X) = exp  λ<o,s> f<o,s> (Yt−1 , Yt , X, t) + µ<s,s￿ > g<s,s￿ > (Yt , X, t) P (X) t,<o,s> t,<s,s￿ > ￿ ￿ exp(λ<o,s> ) = 1 exp(µ<s,s￿ > ) = 1 o s
  • 23.   ￿ ￿y = arg max P (Y |X) = arg max ˆ λi (Yt−1 , Yt , X, t) + µj gj (Yt , X, t) y y t,i t,j ￿ ￿ht (Yt−1 , Yt , X) = λi (Yt−1 , Yt , X, t) + µj gj (Yt , X, t) t,i t,jx t ￿k−1 sk k ￿ ￿ δk (x, sl ) = max ht (yt−1 , yt , x) + hk (yk−1 , sl , x) y1 ···yk−1 t=1 Viterbi 1. sS δ1 (x, sl ) = h1 (sS , sl , x) 2. k = 1, ..., T – 1 δk+1 (x, sl ) = max [δk (x, sm ) + hk+1 (sm , sl , x)] sm ψk (x, sl ) = arg max [δk (x, sm ) + hk+1 (sm , sl , x)] ← sm 3. yT = arg max δT (x, sm ) ˆ sm 4. t = T – 1, ..., 1 yt = ψk (x, yt+1 ) ˆ ˆ
  • 24. Mt (X)(|S| + 2) × (|S| + 2) Mt (X) sS sm sE Mt (sl , sm |X) = exp ht (sl , sm , X) sS|S| + 2 αt (X) sl αt (X) ￿ sE 1 if Y = sS α0 (Y |X) = 0 otherwise αt (X)T = αt−1 (X)T Mt (X)|S| + 2 βt (X) βt (X) ￿ 1 if Y = sE βT +1 (Y |X) = 0 otherwise βt (X) = Mt+1 (X)βt+1 (X)
  • 25. Generalized Iterative Scaling 1. ￿ ￿ C C= fi (yt−1 , yt , x, t) + gj (yt , x, t) t,i t,j ※ ￿ ￿ c(x, y) = C − fi (yt−1 , yt , x, t) − gj (yt , x, t) c(x(k) , y (k) ) ≥ 0 k = 1, · · · , n t,i t,j (1) (1) (n) (n) 2. (x ,y ), · · · , (x ,y ) ￿￿ n ˜ i] = 1 E[f (k) (k) fi (yt−1 , yt , x(k) , t) n P (Yt−1 = sl , Yt = sm |x, λ, µ) t k=1 3. x n 1 ￿ ￿ ￿ αt−1 (sl |x(k) , λ, µ)Mt (sl , sm |x(k) , λ, µ)βt (sm |x(k) , λ, µ) E[fi ] = (k) |λ, µ) fi (sl , sm , x(k) , t) n Z(x k=1 t sl ,sm ￿ ￿ ￿ ￿ ￿ 4. Z(x) = Mt (x) 1 ˜ E[fi ] λnew = λi + log i t sS ,sE C E[fi ] C MEMM 5. 3, 4
  • 26. • , ( ). . , 1999.• A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. Proc. ICML, pp. 591-598, 2000.• J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proc. ICML, pp. 282-289 , 2001.• Charles Elkan. Log-Linear Models and Conditional Random Fields. Notes for a tutorial at CIKM, 2008.• Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004.• , , . Conditional Random Fields . , pp. 89-96, 2004.• http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706b.pdf• http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706a.pdf
  • 27. Conditional Random Fields X X cf. PRML8 Y1 Y2 Y3 Y4 Y5 Y5 Y4 Y6
  • 28. chain-structed CRFsY1 Y2 Y3 Y4 Y5 Y6 Y1 Y2 X Y2 Y1 Y3 PRML8 ￿ ￿ 1 ￿ 1 ￿p(y|x) = ψC (yC |x) = exp − E(yC |x) Z Z C C
  • 29. ￿ ￿ 1 ￿ 1 ￿ p(y|x) = ψC (yC |x) = exp − E(yC |x) Z Z C CE ￿ ￿ E(yC |x) = − λj tj (yi−1 , yi , x, i) − µk sk (yi , x, i) j k y_i-1 y_i y_i ( ) sk (yi−1 , yi , x, i) 2 ￿ E(yC |x) = − λj fj (yi−1 , yi , x, i) j
  • 30. ￿ E(yC |x) = − λj fj (yi−1 , yi , x, i) j ￿ ￿￿ − E(yC |x) = λj fj (yi−1 , yi , x, i) C C j ￿ = λj Fj (y, x) j ￿ Fj (y|x) = λj fj (yi−1 , yi , x, i) C i yi C   1 ￿ p(y|x) = exp  λj Fj (y, x) Z jCRF