Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reinforcement learning for NLP coreference

694 views

Published on

Slide for TFUG
https://tfug-tokyo.connpass.com/event/75524/
Paper introduction

  • A professional Paper writing services can alleviate your stress in writing a successful paper and take the pressure off you to hand it in on time. Check out, please ⇒ www.HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Reinforcement learning for NLP coreference

  1. 1. NN論文を肴に酒を飲む会 #5 紹介者 Shitian Ni (倪石天) ENMLP 2016 Deep Reinforcement Learning for Mention-Ranking Coreference Models Kevin Clark Christopher D. Manning Computer Science Department Stanford University Computer Science Department Stanford University
  2. 2. 自己紹介 Shitian Ni (倪石天) 東京工業大学 工学部 1/15
  3. 3. 自己紹介 Shitian Ni (倪石天) 東京工業大学 工学部 • Topcoder blue • Kaggle Silver medalist (Recruit Restaurant Visitor Forecasting) • Nvidia Deep Learning Institute TA 1/15
  4. 4. Coreference • Identify all noun phrases (mentions) that refer to the same real world identity • 共通の指示対象を持つ2つ以上の単語の文法的関係 • 同一指示 2/15
  5. 5. Coreference • Identify all noun phrases (mentions) that refer to the same real world identity • 共通の指示対象を持つ2つ以上の単語の文法的関係 • 同一指示 Example 2/15 My university that has TSUBAME 3.0, which is a TOP500 supercomputer that accelerates my research but cost Tokyo Tech a lot of money, is located in Oookayama.
  6. 6. Applications • Full text understanding 3/15
  7. 7. Applications • Full text understanding • Text summary 3/15
  8. 8. Applications • Full text understanding • Text summary • Information retrieval 3/15
  9. 9. Applications • Full text understanding • Text summary • Information retrieval • Machine translation 3/15
  10. 10. Applications • Full text understanding • Text summary • Information retrieval • Machine translation • I have a dog. It is 2 years old. <-> 2歳の犬を飼っている 3/15
  11. 11. Applications • Full text understanding • Text summary • Information retrieval • Machine translation • I have a dog. It is 2 years old. <-> 2歳の犬を飼っている • Chat bot question answering 3/15
  12. 12. Applications • Full text understanding • Text summary • Information retrieval • Machine translation • I have a dog. It is 2 years old. <-> 2歳の犬を飼っている • Chat bot question answering • I want to eat Japanese food. Where can I find that? 3/15
  13. 13. Neural Mention-Ranking Model • m: mention • c: candidate antecedent • s(c,m): compatibility for coreference Hidden Layer Input Layer Scoring Layer s(c,m) 4/15
  14. 14. Neural Mention-Ranking Model • m: mention • c: candidate antecedent • s(c,m): compatibility for coreference Hidden Layer Input Layer Scoring Layer s(c,m) trained with heuristic loss functions tuned via hyperparameters 4/15
  15. 15. Challenge • Finding Effective Error Penalties for loss calculations. • Some errors are severe, some errors are minor 5/15
  16. 16. Challenge • Finding Effective Error Penalties for loss calculations. • Some errors are severe, some errors are minor • Bill’s girlfriend is a friend of Michael’s wife. 5/15
  17. 17. Challenge • Finding Effective Error Penalties for loss calculations. • Some errors are severe, some errors are minor • Bill’s girlfriend is a friend of Michael’s wife. 5/15 Severe error
  18. 18. Challenge • Finding Effective Error Penalties for loss calculations. • Some errors are severe, some errors are minor • It is raining. That is my dog. Minor error 5/15
  19. 19. Error types • False New New I bought a gift which is a chocolate for my girlfriend. 6/15 以前同一ものを指す単語が現れたが、初めてのものと認識される
  20. 20. Error types • False New New I bought a gift which is a chocolate for my girlfriend. 6/15 False New 以前同一ものを指す単語が現れたが、初めて現れたものと認識される
  21. 21. Error types • False New • False Anaphoric New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. 6/15 False New False Anaphoric 以前同一ものを指す単語が現れたが、初めて現れたものと認識される 初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識 (照応)
  22. 22. Error types • False New • False Anaphoric • False Link New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. 6/15 False New False Anaphoric False Link 以前同一ものを指す単語が現れたが、初めて現れたものと認識される 初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識 二回以上現れたものを指す単語が、他の単語と同一指示関係にあると誤認識 (照応)
  23. 23. Error types • False New • False Anaphoric • False Link New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. 6/15 False New False Anaphoric False Link 以前同一ものを指す単語が現れたが、初めて現れたものと認識される 初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識 二回以上現れたものを指す単語が、他の単語と同一指示関係にあると誤認識 (照応)
  24. 24. Prior work: Heuristic Loss Function • Use max margin loss (c,mi) (1 + s(c, mi) - s(ti, mi))hL(θ) = ∑ max C Max over candidate coreference decision Cost for this coref decision Loss for scoring this decision too highly h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error 7/15 Costs for linking mi to a candidate antecedent c ∈ C(mi): ti := the highest scoring true antecedent of mi
  25. 25. Prior work: Heuristic Loss Function • Use max margin loss (c,mi) (1 + s(c, mi) - s(ti, mi))hL(θ) = max C Max over candidate coreference decision Cost for this coref decision Loss for scoring this decision too highly h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error 7/15 Costs for linking mi to a candidate antecedent c ∈ C(mi): ti := the highest scoring true antecedent of mi Tune !
  26. 26. Prior work: Heuristic Loss Function • Disadvantage • Grid search over hyperparameters h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error 7/15 Grid search: 機械学習モデルのハイパーパラメータを自動的に最適化 Costs for linking mi to a candidate antecedent c ∈ C(mi):
  27. 27. Proposed Reinforcement Learning methods • Model takes a sequence of actions -> Receive a reward • REINFORCE algorithm • Reward rescaling 8/15 New I bought a gift which is a chocolate for my girlfriend. a1 a2 a3 a4
  28. 28. REINFORCE algorithm • Define probability distribution over action. • Maximize expected reward • Sample trajectories of actions to approximate gradient • アクション軌跡のサンプリングで勾配を近似 • (Policy gradient) 9/15
  29. 29. REINFORCE algorithm • Competitive with heuristic loss 10/15
  30. 30. REINFORCE algorithm • Competitive with heuristic loss • But not much 10/15
  31. 31. REINFORCE algorithm • CON: • REINFORCE maximizes performance in expectation(choose better-result action) • Only need highest scoring action to be correct (choose better score for action) • Only links the current mention to a single antecedent(先行詞), but is trained to assign high probability to all correct antecedents. 10/15
  32. 32. Reward Rescaling • Incorporate reward into the max-margin objective’s slack rescaling h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error max-margin objective 11/15
  33. 33. Reward Rescaling • Incorporate reward into the max-margin objective’s slack rescaling h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error max-margin objective 11/15
  34. 34. Reward Rescaling • Since actions are independent, we can change an action a to a different action a’ and see what the (B3 coreference metric) reward we would have instead. 12/15
  35. 35. Reward Rescaling • Since actions are independent, we can change an action a to a different action a’ and see what the (B3 coreference metric) reward we would have instead. Reward = 1 Regret = 99 12/15 New I bought a chocolate for my girlfriend. a
  36. 36. Reward Rescaling • Since actions are independent, we can change an action a to a different action a’ and see what the reward we would have instead. Reward = 35 Regret = 65 12/15 New I bought a chocolate for my girlfriend. a’
  37. 37. Reward Rescaling • Since actions are independent, we can change an action a to a different action a’ and see what the reward we would have instead. Reward = 100 Regret = 0 12/15 New I bought a chocolate for my girlfriend. a’’
  38. 38. Reward Rescaling • Cost is the regret taking the action • Replaces the heuristic cost • Benefit from its max-margin loss as well as directly optimizing for coreference metrics h (c,mi) = max R(a1,…,a’,…,aT) Reward for best action - R(a1,…,(c,mi),…,aT) Reward for current action 13/15
  39. 39. Reward Rescaling • Cost is the regret taking the action • Replaces the heuristic cost • Benefit from its max-margin loss as well as directly optimizing for coreference metrics h (c,mi) = max R(a1,…,a’,…,aT) Reward for best action - R(a1,…,(c,mi),…,aT) Reward for current action 13/15
  40. 40. Experiment • B3 coreference metric for action sequence reward • MUC has the flaw of treating all errors equally • CEAFφ4 is slow to compute 14/15
  41. 41. Experiment result • Reward-rescaling model make more errors • However, the errors are less severe • ~0.7% lower cost on average • Comparing to Heuristic Loss • Reward Rescaling make • More errors on • False anaphoric(照応) • False New (word) • Less error on • Wrong link 14/15
  42. 42. Thank you • Question and comments ? 15/15 Reference • Deep Reinforcement Learning for Mention-Ranking Coreference Models (Kevin Clark, Christopher D. Manning) • Stanford CS224n Lecture 15: Coreference Resolution https://www.youtube.com/watch?v=rpwEWLaueRk • https://github.com/clarkkev/deep-coref

×