Successfully reported this slideshow.
Your SlideShare is downloading. ×

rinko2010

Ad

CRF
          2010   12   10       10


      1                    (        )

Ad

‣

‣


    2

Ad

4
[Lafferty+, 01] Conditional Random Fields: Probabilistic Models
for Segmenting and Labeling Sequence Data. John Lafferty...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Upcoming SlideShare
rinko2011-agh
rinko2011-agh
Loading in …3
×

Check these out next

1 of 40 Ad
1 of 40 Ad

rinko2010

Download to read offline

東大情報理工の数理輪講で発表したときのスライド資料です。CRF, Structured Perceptron, DPLVM (LD-CRF), Latent Variable Perceptron についての説明で、機械学習を専門としていない人も対象としています。

東大情報理工の数理輪講で発表したときのスライド資料です。CRF, Structured Perceptron, DPLVM (LD-CRF), Latent Variable Perceptron についての説明で、機械学習を専門としていない人も対象としています。

More Related Content

Viewers also liked (20)

rinko2010

  1. 1. CRF 2010 12 10 10 1 ( )
  2. 2. ‣ ‣ 2
  3. 3. 4 [Lafferty+, 01] Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. John Lafferty, Andrew McCallum, Fernando Pereira. Proceedings of ICML’01, 2001. [Collins, 02] Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. Michael Collins. Proceedings of EMNLP’02, 2002. [Morency+, 07] Latent-dynamic discriminative models for continuous gesture recognition. Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. Proceedings of CVPR’07, 2007. [Sun+, 09] Latent Variable Perceptron Algorithm for Structured Classification. Xu Sun, Takuya Matsuzaki, Daisuke Okanohara and Jun’ichi Tsujii. Proceedings of IJCAI’09, 2009 3
  4. 4. ‣ ‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron ‣ 4
  5. 5. x= x1 x2 xm y= y1 y2 ym y1 , . . . , ym ∈ Y 5
  6. 6. ( : NP-chunking) x1 x2 x3 x4 x5 He is her brother . B O B I O y1 y2 y3 y4 y5 Y = {B, I, O} 6
  7. 7. ‣ ‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron ‣ 7
  8. 8. Θ P (y|x, Θ) P (yi |xi , Θ) ∗ Θ {(xi , yi )}i=1 ∗ d .. . .. . d 8
  9. 9. Θ P (y|x, Θ) x ˆ y = argmax P (y|x, Θ) ˆ y 9
  10. 10. (x, y) →     f1 (y, x) Θ1   f2 (y, x)     Θ2    . .   . .  n  . · .  = F (y|x, Θ)      . .   . .   .   .  fn (y, x) Θn = = f (y, x) Θ 10
  11. 11. 1 P (y|x, Θ) = exp F (y|x, Θ) Z Z= exp F (y |x, Θ) y F (y|x, Θ) = f (y, x) · Θ 1/Z argmax P (y|x, Θ) = argmax F (y|x, Θ) y y 11
  12. 12. 1 P (y|x, Θ) = exp F (y|x, Θ) Z Z= exp F (y |x, Θ) y F (y|x, Θ) = fO(|Yx) ) Θ (y, |m · 1/Z argmax P (y|x, Θ) = argmax F (y|x, Θ) y y 12
  13. 13. CRF: Conditional Random Field (sequential) yj−1 yj s(j, x, yj ) t(j, x, yj−1 , yj ) ⇒ 13
  14. 14. CRF: Conditional Random Field (sequential) yj−1 yj s(j, x, yj ) t(j, x, yj−1 , yj ) ⇒ 14
  15. 15. CRF d maximize log P (yi |xi , Θ) ∗ − R(Θ) i=1 R(Θ) Θ 15
  16. 16. ‣ ‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron ‣ 16
  17. 17. Structured Perceptron ‣ ‣ (xi , yi ) ∗ F (yi |xi , Θ) ∗ =Θ· f (yi , xi ) ∗ (xi , yi ) ∗ yi = argmax F (y|xi , Θ ) i y yi = ∗ yi yi = ∗ yi Θ i+1 =Θ + i f (yi , xi ) ∗ − f (yi , xi ) Θ i+1 =Θ i 17
  18. 18. Structured Perceptron Θ i+1 =Θ + i f (yi , xi ) ∗ − f (yi , xi ) Θ i+1 · (f (yi , xi ) ∗ − f (yi , xi )) 2 =Θ · i (f (yi , xi ) ∗ − f (yi , xi )) + f (yi , xi ) ∗ − f (yi , xi )2 ⇔ F (yi |xi , Θ ) ∗ i+1 − F (yi |xi , Θ i+1 ) 2 = F (yi |xi , Θi ) ∗ − F (yi |xi , Θ ) + i f (yi , xi ) ∗ − f (yi , xi )2 ≥0 18
  19. 19. Structured Perceptron Θ i+1 =Θ + i f (yi , xi ) ∗ − f (yi , xi ) ∗ yi yi F (yi |xi , Θ ) ∗ i+1 − F (yi |xi , Θ i+1 ) 2 = F (yi |xi , Θi ) ∗ − F (yi |xi , Θ ) + i f (yi , xi ) ∗ − f (yi , xi )2 ≥0 19
  20. 20. Structured Perceptron ‣ ‣ d M 20
  21. 21. separability G(xi ) = {all possible label sequences for an example xi }, G(xi ) = G(xi ) − ∗ {yi } {(xi , yi )}d ∗ i=1 δ0 U2 = 1 U ∀i, ∀z ∈ G(xi ), F (yi |xi , U) − F (z|xi , U) ≥ δ. ∗ 21
  22. 22. mistake bound δ0 {(xi , yi )}d ∗ i=1 M 2 R M≤ 2 δ R ∀i, ∀z ∈ G(xi ), f (yi , xi ) − f (z, xi )2 ≤ R ∗ d 22
  23. 23. ‣ ‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron ‣ 23
  24. 24. They are her flowers . B O B I O They gave her flowers . B O B B O 24
  25. 25. They are her flowers . B O B I O B1 They gave her flowers . B O B B O B2 25
  26. 26. DPLVM - Discriminative Probabilistic Latent Variable Model Y ={ B , I , O }                    HB = { B1 , . . . , B|HB | } |HB | 26
  27. 27. DPLVM - Discriminative Probabilistic Latent Variable Model y= y1 y2 ym h= h1 h2 hm ∀j, hj ∈ Hyj def. ⇐⇒ Proj(h) = y 27
  28. 28. DPLVM (x, h) →     f1 (h, x) Θ1  f2 (h, x)   Θ2       . .   . .   . · .  = F (h|x, Θ)      .   .   . .   . .  fn (h, x) Θn = = f (h, x) Θ 28
  29. 29. DPLVM h 1 P (h|x, Θ) = exp F (h|x, Θ) Z Z= exp F (h |x, Θ) h F (h|x, Θ) = f (h, x) · Θ f (h, x) argmax P (h|x, Θ) = argmax F (h|x, Θ) h h 29
  30. 30. DPLVM ∗ (xi , yi ) h P (h|x, Θ) ∗ yi h P (y|x, Θ) = P (y|h, x, Θ)P (h|x, Θ) h = P (h|x, Θ) h:Proj(h)=y 30
  31. 31. DPLVM d maximize log P (yi |xi , Θ) ∗ − R(Θ) i=1 R(Θ) Θ 31
  32. 32. ‣ ‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron ‣ 32
  33. 33. Latent Variable Perceptron (xi , yi ) ∗ hi = argmax F (hi |xi , Θ), h yi = Proj(hi ) yi = ∗ yi yi = ∗ yi Θ i+1 =Θ +i f (hi , xi ) ∗ − f (h, xi ) Θ i+1 =Θ i ∗ hi ∗ hi = argmax F (h|xi , Θ ) i ∗ h:Proj(h)=yi 33
  34. 34. mistake bound δ0 {(xi , yi )}i=1 ∗ d M 2T M 2 2 M≤ δ2 T d M = max f (y, xi )2 . i,y 34
  35. 35. ‣ ‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron ‣ 35
  36. 36. ( ) ‣ X = {a, b} ‣ Y = {A, B} ‣ HA = {A1 , A2 }, HB = {B1 , B2 } ‣ P (hj |hj−1 ) P (xj |hj ) h x ‣ y = Proj(h) ‣ {(xi , yi )}i=1 ∗ d ‣ 36
  37. 37. ( ) ‣ p from to A1 A2 B1 B2 A1 (1 − p)/3 (1 − p)/3 (1 − p)/3 p A2 p (1 − p)/3 (1 − p)/3 (1 − p)/3 B1 (1 − p)/3 p (1 − p)/3 (1 − p)/3 B2 (1 − p)/3 (1 − p)/3 p (1 − p)/3 ‣ P (xi = a|hi ) hi = A1 hi = A2 hi = B1 hi = B2 0.1 0.7 0.7 0.6 37
  38. 38. ( ) Latent Variable Perceptron Structured Perceptron 100 90 accuracy [%] 80 70 60 50 40 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 p 38
  39. 39. ‣ ‣ ‣ ‣ ‣ 39
  40. 40. ‣ ‣ 40

×