Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

rinko2010

7,160 views

Published on

東大情報理工の数理輪講で発表したときのスライド資料です。CRF, Structured Perceptron, DPLVM (LD-CRF), Latent Variable Perceptron についての説明で、機械学習を専門としていない人も対象としています。

Published in: Technology, Education
  • Be the first to comment

rinko2010

  1. 1. CRF 2010 12 10 10 1 ( )
  2. 2. ‣‣ 2
  3. 3. 4[Lafferty+, 01] Conditional Random Fields: Probabilistic Modelsfor Segmenting and Labeling Sequence Data. John Lafferty, AndrewMcCallum, Fernando Pereira. Proceedings of ICML’01, 2001.[Collins, 02] Discriminative training methods for hidden markovmodels: Theory and experiments with perceptron algorithms.Michael Collins. Proceedings of EMNLP’02, 2002.[Morency+, 07] Latent-dynamic discriminative models forcontinuous gesture recognition. Louis-Philippe Morency, AriadnaQuattoni, and Trevor Darrell. Proceedings of CVPR’07, 2007.[Sun+, 09] Latent Variable Perceptron Algorithm for StructuredClassification. Xu Sun, Takuya Matsuzaki, Daisuke Okanohara andJun’ichi Tsujii. Proceedings of IJCAI’09, 2009 3
  4. 4. ‣‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron‣ 4
  5. 5. x= x1 x2 xmy= y1 y2 ym y1 , . . . , ym ∈ Y 5
  6. 6. ( : NP-chunking)x1 x2 x3 x4 x5He is her brother .B O B I Oy1 y2 y3 y4 y5 Y = {B, I, O} 6
  7. 7. ‣‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron‣ 7
  8. 8. Θ P (y|x, Θ)P (yi |xi , Θ) ∗ Θ {(xi , yi )}i=1 ∗ d .. . .. . d 8
  9. 9. Θ P (y|x, Θ)x ˆ y = argmax P (y|x, Θ) ˆ y 9
  10. 10. (x, y) →     f1 (y, x) Θ1   f2 (y, x)     Θ2    . .   . . n  . · .  = F (y|x, Θ)      . .   . .   .   .  fn (y, x) Θn = = f (y, x) Θ 10
  11. 11. 1 P (y|x, Θ) = exp F (y|x, Θ) Z Z= exp F (y |x, Θ) y F (y|x, Θ) = f (y, x) · Θ1/Z argmax P (y|x, Θ) = argmax F (y|x, Θ) y y 11
  12. 12. 1 P (y|x, Θ) = exp F (y|x, Θ) Z Z= exp F (y |x, Θ) y F (y|x, Θ) = fO(|Yx) ) Θ (y, |m ·1/Z argmax P (y|x, Θ) = argmax F (y|x, Θ) y y 12
  13. 13. CRF: Conditional Random Field (sequential) yj−1 yj s(j, x, yj ) t(j, x, yj−1 , yj ) ⇒ 13
  14. 14. CRF: Conditional Random Field (sequential) yj−1 yj s(j, x, yj ) t(j, x, yj−1 , yj ) ⇒ 14
  15. 15. CRF d maximize log P (yi |xi , Θ) ∗ − R(Θ) i=1 R(Θ) Θ 15
  16. 16. ‣‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron‣ 16
  17. 17. Structured Perceptron‣‣ (xi , yi ) ∗ F (yi |xi , Θ) ∗ =Θ· f (yi , xi ) ∗(xi , yi ) ∗ yi = argmax F (y|xi , Θ ) i y yi = ∗ yi yi = ∗ yi Θ i+1 =Θ + i f (yi , xi ) ∗ − f (yi , xi ) Θ i+1 =Θ i 17
  18. 18. Structured Perceptron Θ i+1 =Θ + i f (yi , xi ) ∗ − f (yi , xi ) Θ i+1 · (f (yi , xi ) ∗ − f (yi , xi )) 2 =Θ · i (f (yi , xi ) ∗ − f (yi , xi )) + f (yi , xi ) ∗ − f (yi , xi )2⇔ F (yi |xi , Θ ) ∗ i+1 − F (yi |xi , Θ i+1 ) 2 = F (yi |xi , Θi ) ∗ − F (yi |xi , Θ ) + i f (yi , xi ) ∗ − f (yi , xi )2 ≥0 18
  19. 19. Structured Perceptron Θ i+1 =Θ + i f (yi , xi ) ∗ − f (yi , xi ) ∗ yi yi F (yi |xi , Θ ) ∗ i+1 − F (yi |xi , Θ i+1 ) 2 = F (yi |xi , Θi ) ∗ − F (yi |xi , Θ ) + i f (yi , xi ) ∗ − f (yi , xi )2 ≥0 19
  20. 20. Structured Perceptron‣‣ d M 20
  21. 21. separabilityG(xi ) = {all possible label sequences for an example xi },G(xi ) = G(xi ) − ∗ {yi } {(xi , yi )}d ∗ i=1 δ0 U2 = 1 U ∀i, ∀z ∈ G(xi ), F (yi |xi , U) − F (z|xi , U) ≥ δ. ∗ 21
  22. 22. mistake bound δ0{(xi , yi )}d ∗ i=1 M 2 R M≤ 2 δ R ∀i, ∀z ∈ G(xi ), f (yi , xi ) − f (z, xi )2 ≤ R ∗ d 22
  23. 23. ‣‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron‣ 23
  24. 24. They are her flowers . B O B I OThey gave her flowers . B O B B O 24
  25. 25. They are her flowers . B O B I O B1They gave her flowers . B O B B O B2 25
  26. 26. DPLVM - Discriminative Probabilistic Latent Variable Model Y ={ B , I , O }                    HB = { B1 , . . . , B|HB | } |HB | 26
  27. 27. DPLVM - Discriminative Probabilistic Latent Variable Model y= y1 y2 ym h= h1 h2 hm ∀j, hj ∈ Hyj def. ⇐⇒ Proj(h) = y 27
  28. 28. DPLVM (x, h) →     f1 (h, x) Θ1  f2 (h, x)   Θ2       . .   . .   . · .  = F (h|x, Θ)      .   .   . .   . .  fn (h, x) Θn = = f (h, x) Θ 28
  29. 29. DPLVM h 1 P (h|x, Θ) = exp F (h|x, Θ) Z Z= exp F (h |x, Θ) h F (h|x, Θ) = f (h, x) · Θ f (h, x) argmax P (h|x, Θ) = argmax F (h|x, Θ) h h 29
  30. 30. DPLVM ∗ (xi , yi ) h P (h|x, Θ) ∗ yi h P (y|x, Θ) = P (y|h, x, Θ)P (h|x, Θ) h = P (h|x, Θ) h:Proj(h)=y 30
  31. 31. DPLVM d maximize log P (yi |xi , Θ) ∗ − R(Θ) i=1 R(Θ) Θ 31
  32. 32. ‣‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron‣ 32
  33. 33. Latent Variable Perceptron (xi , yi ) ∗ hi = argmax F (hi |xi , Θ), h yi = Proj(hi ) yi = ∗ yi yi = ∗ yiΘ i+1 =Θ +i f (hi , xi ) ∗ − f (h, xi ) Θ i+1 =Θ i ∗ hi ∗ hi = argmax F (h|xi , Θ ) i ∗ h:Proj(h)=yi 33
  34. 34. mistake bound δ0{(xi , yi )}i=1 ∗ d M 2T M 2 2 M≤ δ2 T dM = max f (y, xi )2 . i,y 34
  35. 35. ‣‣ CRF Structured (Conditional Random Field) Perceptron 1 2 3 4 DPLVM Latent Variable (Discriminative Probabilistic Latent Variable Model) Perceptron‣ 35
  36. 36. ( )‣ X = {a, b}‣ Y = {A, B}‣ HA = {A1 , A2 }, HB = {B1 , B2 }‣ P (hj |hj−1 ) P (xj |hj ) h x‣ y = Proj(h)‣ {(xi , yi )}i=1 ∗ d‣ 36
  37. 37. ( )‣ p from to A1 A2 B1 B2 A1 (1 − p)/3 (1 − p)/3 (1 − p)/3 p A2 p (1 − p)/3 (1 − p)/3 (1 − p)/3 B1 (1 − p)/3 p (1 − p)/3 (1 − p)/3 B2 (1 − p)/3 (1 − p)/3 p (1 − p)/3‣ P (xi = a|hi ) hi = A1 hi = A2 hi = B1 hi = B2 0.1 0.7 0.7 0.6 37
  38. 38. ( ) Latent Variable Perceptron Structured Perceptron 100 90accuracy [%] 80 70 60 50 40 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 p 38
  39. 39. ‣‣‣‣‣ 39
  40. 40. ‣‣ 40

×