0
Upcoming SlideShare
×

# HMM, MEMM, CRF メモ

7,375

Published on

2010-06-21にhandsOutにアップした資料の明らかな間違いを修正した資料です。

25 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
7,375
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
117
0
Likes
25
Embeds 0
No embeds

No notes for slide

### Transcript of "HMM, MEMM, CRF メモ"

1. 1. HMM, MEMM, CRF CRF
2. 2. Hidden Markov Model
3. 3. HMM X Y P (X, Y ) 1 Viterbi P (X, Y ) = P (Y )P (X|Y ) = P (Yt |Yt−1 )P (Xt |Yt ) t Yt−1 Yt Yt+1 Xt−1 Xt Xt+1 HMM
4. 4. HMM X P (X) = P (X, Y ) Y→ X Y P (X, Y ) arg max P (Y |X) = arg max = arg max P (X, Y ) Y Y P (X) Y → ViterbiEM→ Baum-Welch※
5. 5. xt ∈ O yt ∈ S P (X = x) = P (X1 = x1 , · · · , XT = xT , Y = y) yx = x1 · · · xT , x1 · · · xt t si αt (x, si ) = P (X1 = x1 , · · · , Xt = xt , Yt = si ) O(|S|2 T ) 1. sS α1 (x, si ) = P (Y1 = si |Ys = ss )P (X1 = x1 |Y1 = si ) 2. t = 1, ..., T – 1   αt+1 (x, si ) =  αt (x, sj )P (si |sj ) P (xt+1 |si ) j 3. s E P (x) = αT (x, sj )P (sE |sj ) j = 1
6. 6. |S| = 3 4αt (x, s2 ) = P (x1 , x2 , Y2 = s2 ) X1 X2 X3 X4 x1 x2 YS Y1 Y2 Y3 Y4 YE s2 s1 s1 s1 s1 sS s2 s2 s2 s2 sE s3 s3 s3 s3
7. 7. x = x1 · · · xT si xt+1 · · · xT P (xt+1 , · · · , xT , Yt = si ) if t = 1, · · · , T − 1 βt (x, si ) = P (Yt = si ) if t = T ※ 2 O(|S| T ) 1. sE βT (x, si ) = P (sE |si ) = 1 2. t = T – 1, ..., 1 βt (x, si ) = P (sj |si )P (xt |sj )βt+1 (x, sj ) j 3. sS P (x) = P (sj |sS )P (xt |sj )β1 (x, sj ) j
8. 8. |S| = 3 4β3 (x, s1 ) = P (x4 , Y3 = s1 ) X1 X2 X3 X4 x4 YS Y1 Y2 Y3 Y4 YE s1 s1 s1 s1 s1 sS s2 s2 s2 s2 sE s3 s3 s3 s3
9. 9. Viterbi ˆ y = arg max P (y|x) yx = x1 · · · xT x1 · · · xt t si δt (x, si ) = max P (x1 , · · · , xt , y1 , · · · , yt−1 , Yt = si ) y1 ···yt−1 O(|S|2 T ) Viterbi 1. sS δ1 (x, si ) = P (si |sS )P (x1 |si ) 2. t = 1, ..., T – 1 δt+1 (x, si ) = max [δt (x, sj )P (si |sj )] P (xt+1 |si ) sj ψt (x, si ) = arg max [δt (x, sj )P (si |sj )] ← sj 3. sE max P (x, y) = max [δT (x, sj )] P (sE |sj ) y sj = yˆ = arg max δT (x, sj ) T 1 sj 4. t = T – 1, ..., 1 yt = ψt (x, yt+1 ) ˆ ˆ
10. 10. Viterbi |S| = 3 4 arg max δ4 (x, sj ) = s1 sj X1 X2 X3 X4 x1 x2 x3 x4 ψt (x, sj ) YS Y1 Y2 Y3 Y4 YE sS s2 s3 s1 s1 sE t 1 2 3sj s1 s1 s1 s1 s1 s1 s3 s1 sS s2 s2 s2 s2 sE s2 s1 s2 s3 s3 s2 s1 s1 s3 s3 s3 s3
11. 11. Baum-Welch γγt (si , sj |θ) = P (Yt = si , Yt+1 = sj |x, θ) αT (si |θ) γT (si , ·|θ) = αt (si |θ)P (sj |si , θ)P (xt+1 |sj , θ)βt+1 (sj |θ) k αT (sk |θ) = k αT (sk |θ)P (sE |sk , θ) P (x|θ) = X1 X2 X3 1 X4 ※θ YS Y1 Y2 Y3 Y4 YE s1 s1 s1 s1 sS s2 s2 s2 s2 sE s3 s3 s3 s3
12. 12. θ = (π1 , · · · , π|S| , a11 , · · · , a|S||S| , b11 , · · · , b|S||O| ) ※O γt (si |θ) = γt (si , sj |θ) t = 1, · · · , T − 1 sj ∈S ¯ πi = P (si |sS , θ) = γ1 (si |θ) ¯ T −1 ¯ (sj |si , θ) = γt (si , sj |θ) aij = P ¯ t=1 T −1 t=1 γt (si |θ) γ (s |θ) ¯ik = P (ok |si , θ) = t =ok t i b ¯ t:x T t=1 γt (si |θ) θ ← θ = (· · · , πi , · · · , aij , · · · , ¯ik , · · · ) ¯ ¯ ¯ b
13. 13. Maximum Entropy Markov Model
14. 14. HMM X Y→→ (features) , ‘er’HMM ‘er’ ‘er’ , ‘er’ …
15. 15. MEMM X Y ※HMM P (Y |X) = Ps (Yt |Xt )[[Yt−1 = s]] ( ) t 1 Ps (Yt |Xt ) = exp λa fa (Xt , Yt ) Z(Xt , s) a s ME Yt−1 Yt Yt+1 Z(Xt , Yt−1 ) 1 if x = y ※ [[x = y]] = 0 if x = y Xt−1 Xt Xt+1 MEMM
16. 16. features 2 1 if b(Xt ) is true and Yt = s fb,s (Xt , Yt ) = 0 otherwise Usenet FAQ head, question, begins-with-number contains-question-mark answer, tail begins-with-ordinal contains-question-word Xt 1 begins-with-punctuation ends-with-question-mark begins-with-question-word first-alpha-is-capitalized t begins-with-subject indented question blank indented-1-to-4fbegins-with-number,question = 1 contains-alphanum indented-5-to-10 contains-bracketed-number more-than-one-third-space ※ 1 contains-http only-punctuation contains-non-space prev-is-blank t question contains-number prev-begins-with-ordinal contains-pipe shorter-than-30
17. 17. ˆ y = arg max P (y|x) yx = x1 · · · xT x1 · · · xt t si δt (si |x) = max P (y1 , · · · , yt−1 , Yt = si |x1 , · · · , xt ) y1 ···yt−1 Viterbi 1. sS δ1 (si |x) = P (si |sS , x1 ) 2. t = 1, ..., T – 1 δt+1 (si |x) = max [δt (sj |x)P (si |sj , xt+1 )] sj ψt (si |x) = arg max [δt (sj |x)P (si |sj , xt+1 )] ← sj 3. max P (y|x) = max δT (sj |x) y sj yT = arg max δT (sj |x) ˆ sj 4. t = T – 1, ..., 1 yt = ψt (ˆt+1 |x) ˆ y
18. 18. MEMMGeneralized Iterative Scaling 1. o, s C C= fa (o, s) ※ a ∀ fc (o, s) = C − fa (o, s) fc (o, s) ≥ 0 o, s a 2. (x(1) , y (1) ), · · · , (x(n) , y (n) ) n ˜ a] = 1 1 (i) (i) E[f (i) fa (xt , yt ) n i=1 ms t:y =s t−1 3. x n 1 1 (i) E[fa ] = (i) Ps (y|xt , λ)fa (xt , y) n i=1 ms t:yt−1 =s y∈S 4. 1 ˜ E[fa ] λnew = λa + log a C E[fa ] 5. 3, 4 s ME
19. 19. Conditional Random Fields
20. 20. MEMM x1 x2 x3 s2 0.5 1 0.65 s1 0.5 1 s4 s3s0 1 1 s0 → s1 → s2 → s3 : 0.325 0.35 s5 s6 s0 → s1 → s4 → s3 : 0.325 1 1 s0 → s5 → s6 → s3 : 0.35 x1 x2 x3 s0 s1 s2 s3 s0 s1 s4 s3 s0 s5 s6 s3 ME
21. 21. CRFMEMMHMM   1 P (Y |X) = exp  λi fi (Yt−1 , Yt , X, t) + µj gj (Yt , X, t) Z(X) t,i t,j Yt−1 Yt Yt+1 Z(X) X CRF
22. 22. HMM CRF P (X, Y ) = P (Y )P (X|Y ) = P (Yt |Yt−1 )P (Xt |Xt ) t log P (Yt |Yt−1 )P (Xt |Yt ) t = {log P (Yt |Yt−1 ) + log P (Xt |Yt )} t     = [[Yt−1 = s ]][[Yt = s]] log P (s|s ) + [[Xt = o]][[Yt = s]] log P (o|s)   t s,s o,s = log P (s|s )[[Yt−1 = s ]][[Yt = s]] + log P (o|s)[[Xt = o]][[Yt = s]] s,s t,o,s µs,s λo,s fo,s (Yt−1 , Yt , X, t) gs,s (Yt , X, t)CRF   1 P (Y |X) = exp  λo,s fo,s (Yt−1 , Yt , X, t) + µs,s gs,s (Yt , X, t) P (X) t,o,s t,s,s exp(λo,s ) = 1 exp(µs,s ) = 1 o s
23. 23.   y = arg max P (Y |X) = arg max ˆ λi (Yt−1 , Yt , X, t) + µj gj (Yt , X, t) y y t,i t,j ht (Yt−1 , Yt , X) = λi (Yt−1 , Yt , X, t) + µj gj (Yt , X, t) t,i t,jx t k−1 sk k δk (x, sl ) = max ht (yt−1 , yt , x) + hk (yk−1 , sl , x) y1 ···yk−1 t=1 Viterbi 1. sS δ1 (x, sl ) = h1 (sS , sl , x) 2. k = 1, ..., T – 1 δk+1 (x, sl ) = max [δk (x, sm ) + hk+1 (sm , sl , x)] sm ψk (x, sl ) = arg max [δk (x, sm ) + hk+1 (sm , sl , x)] ← sm 3. yT = arg max δT (x, sm ) ˆ sm 4. t = T – 1, ..., 1 yt = ψk (x, yt+1 ) ˆ ˆ
24. 24. Mt (X)(|S| + 2) × (|S| + 2) Mt (X) sS sm sE Mt (sl , sm |X) = exp ht (sl , sm , X) sS|S| + 2 αt (X) sl αt (X) sE 1 if Y = sS α0 (Y |X) = 0 otherwise αt (X)T = αt−1 (X)T Mt (X)|S| + 2 βt (X) βt (X) 1 if Y = sE βT +1 (Y |X) = 0 otherwise βt (X) = Mt+1 (X)βt+1 (X)
25. 25. Generalized Iterative Scaling 1. C C= fi (yt−1 , yt , x, t) + gj (yt , x, t) t,i t,j ※ c(x, y) = C − fi (yt−1 , yt , x, t) − gj (yt , x, t) c(x(k) , y (k) ) ≥ 0 k = 1, · · · , n t,i t,j (1) (1) (n) (n) 2. (x ,y ), · · · , (x ,y ) n ˜ i] = 1 E[f (k) (k) fi (yt−1 , yt , x(k) , t) n P (Yt−1 = sl , Yt = sm |x, λ, µ) t k=1 3. x n 1 αt−1 (sl |x(k) , λ, µ)Mt (sl , sm |x(k) , λ, µ)βt (sm |x(k) , λ, µ) E[fi ] = (k) |λ, µ) fi (sl , sm , x(k) , t) n Z(x k=1 t sl ,sm 4. Z(x) = Mt (x) 1 ˜ E[fi ] λnew = λi + log i t sS ,sE C E[fi ] C MEMM 5. 3, 4
26. 26. • , ( ). . , 1999.• A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. Proc. ICML, pp. 591-598, 2000.• J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proc. ICML, pp. 282-289 , 2001.• Charles Elkan. Log-Linear Models and Conditional Random Fields. Notes for a tutorial at CIKM, 2008.• Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004.• , , . Conditional Random Fields . , pp. 89-96, 2004.• http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706b.pdf• http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706a.pdf
27. 27. Conditional Random Fields X X cf. PRML8 Y1 Y2 Y3 Y4 Y5 Y5 Y4 Y6
28. 28. chain-structed CRFsY1 Y2 Y3 Y4 Y5 Y6 Y1 Y2 X Y2 Y1 Y3 PRML8 1 1 p(y|x) = ψC (yC |x) = exp − E(yC |x) Z Z C C
29. 29. 1 1 p(y|x) = ψC (yC |x) = exp − E(yC |x) Z Z C CE E(yC |x) = − λj tj (yi−1 , yi , x, i) − µk sk (yi , x, i) j k y_i-1 y_i y_i ( ) sk (yi−1 , yi , x, i) 2 E(yC |x) = − λj fj (yi−1 , yi , x, i) j
30. 30. E(yC |x) = − λj fj (yi−1 , yi , x, i) j − E(yC |x) = λj fj (yi−1 , yi , x, i) C C j = λj Fj (y, x) j Fj (y|x) = λj fj (yi−1 , yi , x, i) C i yi C   1 p(y|x) = exp  λj Fj (y, x) Z jCRF
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.