HMM, MEMM, CRF
CRF
Hidden Markov Model
P(X, Y ) = P(Y )P(X|Y ) =
Y
t
P(Yt|Yt 1)P(Xt|Yt)
Yt 1 Yt Yt+1
Xt+1XtXt 1
HMM
X Y
1
HMM
Viterbi
P(X, Y )
P(X) =
X
Y
P(X, Y )
arg max
Y
P(Y |X) = arg max
Y
P(X, Y )
P(X)
= arg max
Y
P(X, Y )
HMM
X
→
X Y
→ Viterbi
EM
→ Baum-Welch
※
xt 2 O yt 2 S
O(|S|2
T)
, t
1. sS
2. t = 1, ..., T – 1
3. sE
=
P(X = x) =
X
y
P(X1 = x1, · · · , XT = xT , Y = y)
x = x1 · · · xT six1 · · · xt
t(x, si) = P(X1 = x1, · · · , Xt = xt, Yt = si)
1(x, si) = P(Y1 = si|Ys = ss)P(X1 = x1|Y1 = si)
t+1(x, si) =
2
4
X
j
t(x, sj)P(si|sj)
3
5 P(xt+1|si)
P(x) =
X
j
T (x, sj)P(sE|sj)
1
t(x, s2) = P(x1, x2, Y2 = s2)
|S| = 3 4
sS
s1
s2
s3
sE
YS Y1 Y2 Y3 Y4 YE
X1 X2 X3 X4
s1
s2
s3
s1
s2
s3
s1
s2
s3
s2
x1 x2
xt+1 · · · xT
1. sE
2. t = T – 1, ..., 1
3. sS
=
x = x1 · · · xT si
t(x, si) =
(
P(xt+1, · · · , xT , Yt = si) if t = 1, · · · , T 1
P(Yt = si) if t = T
O(|S|2
T)
T (x, si) = P(sE|si)
P(x) =
X
j
P(sj|sS)P(xt|sj) 1(x, sj)
t(x, si) =
X
j
P(sj|si)P(xt|sj) t+1(x, sj)
※
1
3(x, s1) = P(x4, Y3 = s1)
|S| = 3 4
sS
s1
s2
s3
sE
YS Y1 Y2 Y3 Y4 YE
X1 X2 X3 X4
s1
s2
s3
s1
s2
s3
s1
s2
s3
x4
s1
ˆy = arg max
y
P(y|x)
Viterbi
t
1. sS
2. t = 1, ..., T – 1
3. sE
=
4. t = T – 1, ..., 1
Viterbi
←
x = x1 · · · xT x1 · · · xt si
O(|S|2
T)
t(x, si) = max
y1···yt 1
P(x1, · · · , xt, y1, · · · , yt 1, Yt = si)
1(x, si) = P(si|sS)P(x1|si)
t+1(x, si) = max
sj
[ t(x, sj)P(si|sj)] P(xt+1|si)
⇥t(x, si) = arg max
sj
[ t(x, sj)P(si|sj)]
max
y
P(x, y) = max
sj
[ T (x, sj)] P(sE|sj)
ˆyT = arg max
sj
T (x, sj)
ˆyt = t(x, ˆyt+1)
1
arg max
sj
4(x, sj) = s1
t(x, sj)
Viterbi
|S| = 3 4
1 2 3
s1 s1 s3 s1
s2 s1 s2 s3
s3 s2 s1 s1
sj
t
sS
s1
s2
s3
sE
YS Y1 Y2 Y3 Y4 YE
X1 X2 X3 X4
s1
s2
s3
s1
s2
s3
s1
s2
s3
x1 x2 x3 x4
sS s2 s3 s1 sEs1
⇤t(si, sj| ) = P(Yt = si, Yt+1 = sj|x, )
=
t(si| )P(sj|si, )P(xt+1|sj, )⇥t+1(sj| )
P
k T (sk| )P(sE|sk, ) P(x| )
⇥T (si, ·|✓) =
T (si|✓)
P
k T (sk|✓)
Baum-Welch
=
γ
※θ
sS
s1
s2
s3
sE
YS Y1 Y2 Y3 Y4 YE
X1 X2 X3 X4
s1
s2
s3
s1
s2
s3
s1
s2
s3
1
✓ ¯✓ = (· · · , ¯i, · · · , ¯aij, · · · ,¯bik, · · · )
※O
✓ = ( 1, · · · , |S|, a11, · · · , a|S||S|, b11, · · · , b|S||O|)
t(si|✓) =
X
sj 2S
t(si, sj|✓) t = 1, · · · , T 1
¯⇥i = ¯P(si|sS, ✓) = 1(si|✓)
¯aij = ¯P(sj|si, ✓) =
PT 1
t=1 t(si, sj|✓)
PT 1
t=1 t(si|✓)
¯bik = ¯P(ok|si, ✓) =
P
t:xt=ok
t(si|✓)
PT
t=1 t(si|✓)
Maximum Entropy Markov Model
HMM
(features)
→
→
X Y
,
‘er’
HMM ‘er’
‘er’
, ‘er’ …
[[x = y]] =
(
1 if x = y
0 if x 6= y
MEMM
MEMM
X Y
※
( )
HMM
※
Yt 1 Yt Yt+1
Xt+1XtXt 1
Z(Xt, Yt 1)
Ps(Yt|Xt) =
1
Z(Xt, s)
exp
X
a
afa(Xt, Yt)
!
P(Y |X) =
Y
t
Ps(Yt|Xt)[[Yt 1 = s]]
s ME
f<begins-with-number,question>
= 1
features 2
Usenet FAQ
begins-with-number
begins-with-ordinal
begins-with-punctuation
begins-with-question-word
begins-with-subject
blank
contains-alphanum
contains-bracketed-number
contains-http
contains-non-space
contains-number
contains-pipe
contains-question-mark
contains-question-word
ends-with-question-mark
first-alpha-is-capitalized
indented
indented-1-to-4
indented-5-to-10
more-than-one-third-space
only-punctuation
prev-is-blank
prev-begins-with-ordinal
shorter-than-30
Xt 1
head, question,
answer, tail
t
question
f<b,s>(Xt, Yt) =
(
1 if b(Xt) is true and Yt = s
0 otherwise
※ 1
t question
1. sS
2. t = 1, ..., T – 1
3.
4. t = T – 1, ..., 1
Viterbi
←
ˆy = arg max
y
P(y|x)
tx = x1 · · · xT x1 · · · xt si
1(si|x) = P(si|sS, x1)
t+1(si|x) = max
sj
[ t(sj|x)P(si|sj, xt+1)]
⇥t(si|x) = arg max
sj
[ t(sj|x)P(si|sj, xt+1)]
max
y
P(y|x) = max
sj
T (sj|x)
ˆyT = arg max
sj
T (sj|x)
ˆyt = t(ˆyt+1|x)
t(si|x) = max
y1···yt 1
P(y1, · · · , yt 1, Yt = si|x1, · · · , xt)
(x(1)
, y(1)
), · · · , (x(n)
, y(n)
)
MEMM
Generalized Iterative Scaling
1. o, s C
※
2.
3. x
4.
5. 3, 4 s ME
fc(o, s) 0 8
o, sfc(o, s) = C
X
a
fa(o, s)
C =
X
a
fa(o, s)
˜E[fa] =
1
n
nX
i=1
1
m
(i)
s
X
t:yt 1=s
fa(x
(i)
t , y
(i)
t )
E[fa] =
1
n
nX
i=1
1
m
(i)
s
X
t:yt 1=s
X
y2S
Ps(y|xt, )fa(x
(i)
t , y)
new
a = a +
1
C
log
˜E[fa]
E[fa]
!
Conditional Random Fields
MEMM
s0
s5
s4
s6
s3
s1
s2
0.65
0.35
1
0.5
0.5 1
1
1 s0 → s1 → s2 → s3 : 0.325
x1 x2 x3 s0 s1 s2 s3 s0 s1 s4 s3
s0 s5 s6 s3
ME
x1 x2 x3
1
1
s0 → s1 → s4 → s3 : 0.325
s0 → s5 → s6 → s3 : 0.35
P(Y |X) =
1
Z(X)
exp
0
@
X
t,i
ifi(Yt 1, Yt, X, t) +
X
t,j
µjgj(Yt, X, t)
1
A
Z(X)
X
CRF
CRF
MEMM
HMM
Yt 1 Yt Yt+1
log
Y
t
P(Yt|Yt 1)P(Xt|Yt)
!
=
X
t
{log P(Yt|Yt 1) + log P(Xt|Yt)}
=
X
t
8
<
:
X
<s,s0>
[[Yt 1 = s0
]][[Yt = s]] log P(s|s0
) +
X
<o,s>
[[Xt = o]][[Yt = s]] log P(o|s)
9
=
;
=
X
<s,s0>
log P(s|s0
)[[Yt 1 = s0
]][[Yt = s]] +
X
t,<o,s>
log P(o|s)[[Xt = o]][[Yt = s]]
P(Y |X) =
1
P(X)
exp
0
@
X
t,<o,s>
<o,s>f<o,s>(Yt 1, Yt, X, t) +
X
t,<s,s0>
µ<s,s0>g<s,s0>(Yt, X, t)
1
A
X
o
exp( <o,s>) = 1
X
s
exp(µ<s,s0>) = 1
HMM CRF
CRF
P(X, Y ) = P(Y )P(X|Y ) =
Y
t
P(Yt|Yt 1)P(Xt|Xt)
<o,s> f<o,s>(Yt 1, Yt, X, t) g<s,s0>(Yt, X, t)µ<s,s0>
ˆyT = arg max
sm
T (x, sm)
x skt k
1. sS
2. k = 1, ..., T – 1
3.
4. t = T – 1, ..., 1
Viterbi
←
ˆyt = k(x, ˆyt+1)
1(x, sl) = h1(sS, sl, x)
k+1(x, sl) = max
sm
[ k(x, sm) + hk+1(sm, sl, x)]
⇥k(x, sl) = arg max
sm
[ k(x, sm) + hk+1(sm, sl, x)]
k(x, sl) = max
y1···yk 1
"k 1X
t=1
ht(yt 1, yt, x) + hk(yk 1, sl, x)
#
ht(Yt 1, Yt, X) =
X
i
ifi(Yt 1, Yt, X, t) +
X
j
µjgj(Yt, X, t)
ˆy = arg max
y
P(Y |X) = arg max
y
2
4
X
t,i
ifi(Yt 1, Yt, X, t) +
X
t,j
µjgj(Yt, X, t)
3
5
(|S| + 2) × (|S| + 2)
|S| + 2
|S| + 2
sS
sm
sE
sl
Mt(X)
Mt(sl, sm|X) = exp ht(sl, sm, X)
↵t(X)
0(Y |X) =
(
1 if Y = sS
0 otherwise
T +1(Y |X) =
(
1 if Y = sE
0 otherwise
t(X)T
= t 1(X)T
Mt(X)
t(X)
t(X) = Mt+1(X) t+1(X)
sS sEMt(X)
↵t(X)
t(X)
C MEMM
(x(1)
, y(1)
), · · · , (x(n)
, y(n)
)
Generalized Iterative Scaling
1. C
※
2.
3. x
4.
5. 3, 4
C =
X
t,i
fi(yt 1, yt, x, t) +
X
t,j
gj(yt, x, t)
c(x, y) = C
X
t,i
fi(yt 1, yt, x, t)
X
t,j
gj(yt, x, t)
new
i = i +
1
C
log
˜E[fi]
E[fi]
!
E[fi] =
1
n
nX
k=1
X
t
X
sl,sm
t 1(sl|x(k)
, , µ)Mt(sl, sm|x(k)
, , µ)⇥t(sm|x(k)
, , µ)
Z(x(k)| , µ)
fi(sl, sm, x(k)
, t)
Z(x) =
Y
t
Mt(x)
!
sS ,sE
P(Yt 1 = sl, Yt = sm|x, , µ)
˜E[fi] =
1
n
nX
k=1
X
t
fi(y
(k)
t 1, y
(k)
t , x(k)
, t)
c(x(k)
, y(k)
) 0 k = 1, · · · , n
• , ( ). . , 1999.
• A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for
information extraction and segmentation. Proc. ICML, pp. 591-598, 2000.
• J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic
models for segmenting and labeling sequence data. Proc. ICML, pp. 282-289
, 2001.
• Charles Elkan. Log-Linear Models and Conditional Random Fields. Notes for a
tutorial at CIKM, 2008.
• Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report
MS-CIS-04-21. Department of Computer and Information Science, University of
Pennsylvania, 2004.
• , , . Conditional Random Fields
. , pp. 89-96, 2004.
• http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706b.pdf
• http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706a.pdf
Conditional Random Fields
X
Y1 Y2 Y3
Y4 Y5
Y5 Y4 Y6
X
cf. PRML8
p(y|x) =
1
Z
Y
C
C(yC|x) =
1
Z
exp
X
C
E(yC|x)
!
chain-structed CRFs
Y1 Y2
Y2 Y1 Y3
PRML8
X
Y1 Y2 Y3 Y4 Y5 Y6
E(yC|x) =
X
j
jtj(yi 1, yi, x, i)
X
k
µksk(yi, x, i)
sk(yi 1, yi, x, i)
E(yC|x) =
X
j
jfj(yi 1, yi, x, i)
E
y_i-1 y_i
( )
y_i
2
p(y|x) =
1
Z
Y
C
C(yC|x) =
1
Z
exp
X
C
E(yC|x)
!
i yi C
CRF
E(yC|x) =
X
j
jfj(yi 1, yi, x, i)
X
C
E(yC|x) =
X
C
X
j
jfj(yi 1, yi, x, i)
=
X
j
jFj(y, x)
Fj(y|x) =
X
C
jfj(yi 1, yi, x, i)
p(y|x) =
1
Z
exp
0
@
X
j
jFj(y, x)
1
A

HMM, MEMM, CRF メモ

  • 1.
  • 2.
  • 3.
    P(X, Y )= P(Y )P(X|Y ) = Y t P(Yt|Yt 1)P(Xt|Yt) Yt 1 Yt Yt+1 Xt+1XtXt 1 HMM X Y 1 HMM Viterbi P(X, Y )
  • 4.
    P(X) = X Y P(X, Y) arg max Y P(Y |X) = arg max Y P(X, Y ) P(X) = arg max Y P(X, Y ) HMM X → X Y → Viterbi EM → Baum-Welch ※
  • 5.
    xt 2 Oyt 2 S O(|S|2 T) , t 1. sS 2. t = 1, ..., T – 1 3. sE = P(X = x) = X y P(X1 = x1, · · · , XT = xT , Y = y) x = x1 · · · xT six1 · · · xt t(x, si) = P(X1 = x1, · · · , Xt = xt, Yt = si) 1(x, si) = P(Y1 = si|Ys = ss)P(X1 = x1|Y1 = si) t+1(x, si) = 2 4 X j t(x, sj)P(si|sj) 3 5 P(xt+1|si) P(x) = X j T (x, sj)P(sE|sj) 1
  • 6.
    t(x, s2) =P(x1, x2, Y2 = s2) |S| = 3 4 sS s1 s2 s3 sE YS Y1 Y2 Y3 Y4 YE X1 X2 X3 X4 s1 s2 s3 s1 s2 s3 s1 s2 s3 s2 x1 x2
  • 7.
    xt+1 · ·· xT 1. sE 2. t = T – 1, ..., 1 3. sS = x = x1 · · · xT si t(x, si) = ( P(xt+1, · · · , xT , Yt = si) if t = 1, · · · , T 1 P(Yt = si) if t = T O(|S|2 T) T (x, si) = P(sE|si) P(x) = X j P(sj|sS)P(xt|sj) 1(x, sj) t(x, si) = X j P(sj|si)P(xt|sj) t+1(x, sj) ※ 1
  • 8.
    3(x, s1) =P(x4, Y3 = s1) |S| = 3 4 sS s1 s2 s3 sE YS Y1 Y2 Y3 Y4 YE X1 X2 X3 X4 s1 s2 s3 s1 s2 s3 s1 s2 s3 x4 s1
  • 9.
    ˆy = argmax y P(y|x) Viterbi t 1. sS 2. t = 1, ..., T – 1 3. sE = 4. t = T – 1, ..., 1 Viterbi ← x = x1 · · · xT x1 · · · xt si O(|S|2 T) t(x, si) = max y1···yt 1 P(x1, · · · , xt, y1, · · · , yt 1, Yt = si) 1(x, si) = P(si|sS)P(x1|si) t+1(x, si) = max sj [ t(x, sj)P(si|sj)] P(xt+1|si) ⇥t(x, si) = arg max sj [ t(x, sj)P(si|sj)] max y P(x, y) = max sj [ T (x, sj)] P(sE|sj) ˆyT = arg max sj T (x, sj) ˆyt = t(x, ˆyt+1) 1
  • 10.
    arg max sj 4(x, sj)= s1 t(x, sj) Viterbi |S| = 3 4 1 2 3 s1 s1 s3 s1 s2 s1 s2 s3 s3 s2 s1 s1 sj t sS s1 s2 s3 sE YS Y1 Y2 Y3 Y4 YE X1 X2 X3 X4 s1 s2 s3 s1 s2 s3 s1 s2 s3 x1 x2 x3 x4 sS s2 s3 s1 sEs1
  • 11.
    ⇤t(si, sj| )= P(Yt = si, Yt+1 = sj|x, ) = t(si| )P(sj|si, )P(xt+1|sj, )⇥t+1(sj| ) P k T (sk| )P(sE|sk, ) P(x| ) ⇥T (si, ·|✓) = T (si|✓) P k T (sk|✓) Baum-Welch = γ ※θ sS s1 s2 s3 sE YS Y1 Y2 Y3 Y4 YE X1 X2 X3 X4 s1 s2 s3 s1 s2 s3 s1 s2 s3 1
  • 12.
    ✓ ¯✓ =(· · · , ¯i, · · · , ¯aij, · · · ,¯bik, · · · ) ※O ✓ = ( 1, · · · , |S|, a11, · · · , a|S||S|, b11, · · · , b|S||O|) t(si|✓) = X sj 2S t(si, sj|✓) t = 1, · · · , T 1 ¯⇥i = ¯P(si|sS, ✓) = 1(si|✓) ¯aij = ¯P(sj|si, ✓) = PT 1 t=1 t(si, sj|✓) PT 1 t=1 t(si|✓) ¯bik = ¯P(ok|si, ✓) = P t:xt=ok t(si|✓) PT t=1 t(si|✓)
  • 13.
  • 14.
  • 15.
    [[x = y]]= ( 1 if x = y 0 if x 6= y MEMM MEMM X Y ※ ( ) HMM ※ Yt 1 Yt Yt+1 Xt+1XtXt 1 Z(Xt, Yt 1) Ps(Yt|Xt) = 1 Z(Xt, s) exp X a afa(Xt, Yt) ! P(Y |X) = Y t Ps(Yt|Xt)[[Yt 1 = s]] s ME
  • 16.
    f<begins-with-number,question> = 1 features 2 UsenetFAQ begins-with-number begins-with-ordinal begins-with-punctuation begins-with-question-word begins-with-subject blank contains-alphanum contains-bracketed-number contains-http contains-non-space contains-number contains-pipe contains-question-mark contains-question-word ends-with-question-mark first-alpha-is-capitalized indented indented-1-to-4 indented-5-to-10 more-than-one-third-space only-punctuation prev-is-blank prev-begins-with-ordinal shorter-than-30 Xt 1 head, question, answer, tail t question f<b,s>(Xt, Yt) = ( 1 if b(Xt) is true and Yt = s 0 otherwise ※ 1 t question
  • 17.
    1. sS 2. t= 1, ..., T – 1 3. 4. t = T – 1, ..., 1 Viterbi ← ˆy = arg max y P(y|x) tx = x1 · · · xT x1 · · · xt si 1(si|x) = P(si|sS, x1) t+1(si|x) = max sj [ t(sj|x)P(si|sj, xt+1)] ⇥t(si|x) = arg max sj [ t(sj|x)P(si|sj, xt+1)] max y P(y|x) = max sj T (sj|x) ˆyT = arg max sj T (sj|x) ˆyt = t(ˆyt+1|x) t(si|x) = max y1···yt 1 P(y1, · · · , yt 1, Yt = si|x1, · · · , xt)
  • 18.
    (x(1) , y(1) ), ·· · , (x(n) , y(n) ) MEMM Generalized Iterative Scaling 1. o, s C ※ 2. 3. x 4. 5. 3, 4 s ME fc(o, s) 0 8 o, sfc(o, s) = C X a fa(o, s) C = X a fa(o, s) ˜E[fa] = 1 n nX i=1 1 m (i) s X t:yt 1=s fa(x (i) t , y (i) t ) E[fa] = 1 n nX i=1 1 m (i) s X t:yt 1=s X y2S Ps(y|xt, )fa(x (i) t , y) new a = a + 1 C log ˜E[fa] E[fa] !
  • 19.
  • 20.
    MEMM s0 s5 s4 s6 s3 s1 s2 0.65 0.35 1 0.5 0.5 1 1 1 s0→ s1 → s2 → s3 : 0.325 x1 x2 x3 s0 s1 s2 s3 s0 s1 s4 s3 s0 s5 s6 s3 ME x1 x2 x3 1 1 s0 → s1 → s4 → s3 : 0.325 s0 → s5 → s6 → s3 : 0.35
  • 21.
    P(Y |X) = 1 Z(X) exp 0 @ X t,i ifi(Yt1, Yt, X, t) + X t,j µjgj(Yt, X, t) 1 A Z(X) X CRF CRF MEMM HMM Yt 1 Yt Yt+1
  • 22.
    log Y t P(Yt|Yt 1)P(Xt|Yt) ! = X t {log P(Yt|Yt1) + log P(Xt|Yt)} = X t 8 < : X <s,s0> [[Yt 1 = s0 ]][[Yt = s]] log P(s|s0 ) + X <o,s> [[Xt = o]][[Yt = s]] log P(o|s) 9 = ; = X <s,s0> log P(s|s0 )[[Yt 1 = s0 ]][[Yt = s]] + X t,<o,s> log P(o|s)[[Xt = o]][[Yt = s]] P(Y |X) = 1 P(X) exp 0 @ X t,<o,s> <o,s>f<o,s>(Yt 1, Yt, X, t) + X t,<s,s0> µ<s,s0>g<s,s0>(Yt, X, t) 1 A X o exp( <o,s>) = 1 X s exp(µ<s,s0>) = 1 HMM CRF CRF P(X, Y ) = P(Y )P(X|Y ) = Y t P(Yt|Yt 1)P(Xt|Xt) <o,s> f<o,s>(Yt 1, Yt, X, t) g<s,s0>(Yt, X, t)µ<s,s0>
  • 23.
    ˆyT = argmax sm T (x, sm) x skt k 1. sS 2. k = 1, ..., T – 1 3. 4. t = T – 1, ..., 1 Viterbi ← ˆyt = k(x, ˆyt+1) 1(x, sl) = h1(sS, sl, x) k+1(x, sl) = max sm [ k(x, sm) + hk+1(sm, sl, x)] ⇥k(x, sl) = arg max sm [ k(x, sm) + hk+1(sm, sl, x)] k(x, sl) = max y1···yk 1 "k 1X t=1 ht(yt 1, yt, x) + hk(yk 1, sl, x) # ht(Yt 1, Yt, X) = X i ifi(Yt 1, Yt, X, t) + X j µjgj(Yt, X, t) ˆy = arg max y P(Y |X) = arg max y 2 4 X t,i ifi(Yt 1, Yt, X, t) + X t,j µjgj(Yt, X, t) 3 5
  • 24.
    (|S| + 2)× (|S| + 2) |S| + 2 |S| + 2 sS sm sE sl Mt(X) Mt(sl, sm|X) = exp ht(sl, sm, X) ↵t(X) 0(Y |X) = ( 1 if Y = sS 0 otherwise T +1(Y |X) = ( 1 if Y = sE 0 otherwise t(X)T = t 1(X)T Mt(X) t(X) t(X) = Mt+1(X) t+1(X) sS sEMt(X) ↵t(X) t(X)
  • 25.
    C MEMM (x(1) , y(1) ),· · · , (x(n) , y(n) ) Generalized Iterative Scaling 1. C ※ 2. 3. x 4. 5. 3, 4 C = X t,i fi(yt 1, yt, x, t) + X t,j gj(yt, x, t) c(x, y) = C X t,i fi(yt 1, yt, x, t) X t,j gj(yt, x, t) new i = i + 1 C log ˜E[fi] E[fi] ! E[fi] = 1 n nX k=1 X t X sl,sm t 1(sl|x(k) , , µ)Mt(sl, sm|x(k) , , µ)⇥t(sm|x(k) , , µ) Z(x(k)| , µ) fi(sl, sm, x(k) , t) Z(x) = Y t Mt(x) ! sS ,sE P(Yt 1 = sl, Yt = sm|x, , µ) ˜E[fi] = 1 n nX k=1 X t fi(y (k) t 1, y (k) t , x(k) , t) c(x(k) , y(k) ) 0 k = 1, · · · , n
  • 26.
    • , (). . , 1999. • A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. Proc. ICML, pp. 591-598, 2000. • J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proc. ICML, pp. 282-289 , 2001. • Charles Elkan. Log-Linear Models and Conditional Random Fields. Notes for a tutorial at CIKM, 2008. • Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004. • , , . Conditional Random Fields . , pp. 89-96, 2004. • http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706b.pdf • http://www.dbl.k.hosei.ac.jp/~miurat/readings/Nov0706a.pdf
  • 29.
    Conditional Random Fields X Y1Y2 Y3 Y4 Y5 Y5 Y4 Y6 X cf. PRML8
  • 30.
    p(y|x) = 1 Z Y C C(yC|x) = 1 Z exp X C E(yC|x) ! chain-structedCRFs Y1 Y2 Y2 Y1 Y3 PRML8 X Y1 Y2 Y3 Y4 Y5 Y6
  • 31.
    E(yC|x) = X j jtj(yi 1,yi, x, i) X k µksk(yi, x, i) sk(yi 1, yi, x, i) E(yC|x) = X j jfj(yi 1, yi, x, i) E y_i-1 y_i ( ) y_i 2 p(y|x) = 1 Z Y C C(yC|x) = 1 Z exp X C E(yC|x) !
  • 32.
    i yi C CRF E(yC|x)= X j jfj(yi 1, yi, x, i) X C E(yC|x) = X C X j jfj(yi 1, yi, x, i) = X j jFj(y, x) Fj(y|x) = X C jfj(yi 1, yi, x, i) p(y|x) = 1 Z exp 0 @ X j jFj(y, x) 1 A