Reinforcement learning for NLP coreference

NN論文を肴に酒を飲む会 #5
紹介者 Shitian Ni (倪石天)
ENMLP 2016
Deep Reinforcement Learning for Mention-Ranking
Coreference Models
Kevin Clark Christopher D. Manning
Computer Science Department
Stanford University
Computer Science Department
Stanford University

自己紹介
Shitian Ni (倪石天)
東京工業大学工学部
1/15

自己紹介
Shitian Ni (倪石天)
東京工業大学工学部
• Topcoder blue
• Kaggle Silver medalist (Recruit Restaurant Visitor Forecasting)
• Nvidia Deep Learning Institute TA
1/15

Coreference
• Identify all noun phrases (mentions) that refer to the same real world
identity
• 共通の指示対象を持つ2つ以上の単語の文法的関係
• 同一指示
2/15

Coreference
• Identify all noun phrases (mentions) that refer to the same real world
identity
• 共通の指示対象を持つ2つ以上の単語の文法的関係
• 同一指示
Example
2/15
My university that has TSUBAME 3.0,
which is a TOP500 supercomputer that accelerates my research
but cost Tokyo Tech a lot of money,
is located in Oookayama.

Applications
• Full text understanding
3/15

Applications
• Text summary
3/15

Applications
• Text summary
• Information retrieval
3/15

Applications
• Text summary
• Machine translation
3/15

Applications
• Text summary
• I have a dog. It is 2 years old. <-> 2歳の犬を飼っている
3/15

Applications
• Text summary
• Chat bot question answering
3/15

Applications
• Text summary
• Chat bot question answering
• I want to eat Japanese food. Where can I find that?
3/15

Neural Mention-Ranking Model
• m: mention
• c: candidate antecedent
• s(c,m): compatibility for coreference
Hidden Layer
Input Layer
Scoring Layer
s(c,m)
4/15

Neural Mention-Ranking Model
• m: mention
• c: candidate antecedent
• s(c,m): compatibility for coreference
Hidden Layer
Input Layer
Scoring Layer
s(c,m)
trained with heuristic loss functions
tuned via hyperparameters
4/15

Challenge
• Finding Effective Error Penalties for loss calculations.
• Some errors are severe, some errors are minor
5/15

Challenge
• Bill’s girlfriend is a friend of Michael’s wife.
5/15

Challenge
• Bill’s girlfriend is a friend of Michael’s wife.
5/15
Severe error

Challenge
• It is raining. That is my dog.
Minor error
5/15

Error types
• False New
New I bought a gift which is a chocolate for my girlfriend.
6/15
以前同一ものを指す単語が現れたが、初めてのものと認識される

Error types
• False New
6/15
False New
以前同一ものを指す単語が現れたが、初めて現れたものと認識される

Error types
• False New
• False Anaphoric
6/15
False New
False Anaphoric
初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識
(照応)

Error types
• False New
• False Anaphoric
• False Link
6/15
False New
False Anaphoric
False Link
初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識
二回以上現れたものを指す単語が、他の単語と同一指示関係にあると誤認識
(照応)

Prior work: Heuristic Loss Function
• Use max margin loss
(c,mi) (1 + s(c, mi) - s(ti, mi))hL(θ) = ∑ max
C
Max over candidate
coreference decision
Cost for this
coref decision
Loss for scoring this decision too highly
h (c,mi) =
0 if c ∈ T (mi) if c and mi are coreferent
αFN if c = NA ∧ T (mi) != {NA} if false new error
αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error
αWL if c != NA ∧ c ∉ T (mi) if wrong link error
7/15
Costs for linking mi to a candidate antecedent c ∈ C(mi):
ti := the highest scoring true antecedent of mi

• Use max margin loss
(c,mi) (1 + s(c, mi) - s(ti, mi))hL(θ) = max
C
Max over candidate
coreference decision
Cost for this
coref decision
Loss for scoring this decision too highly
h (c,mi) =
7/15
ti := the highest scoring true antecedent of mi
Tune !

• Disadvantage
• Grid search over hyperparameters
h (c,mi) =
7/15
Grid search: 機械学習モデルのハイパーパラメータを自動的に最適化

Proposed Reinforcement Learning methods
• Model takes a sequence of actions
-> Receive a reward
• REINFORCE algorithm
• Reward rescaling
8/15
a1
a2
a3
a4

REINFORCE algorithm
• Define probability distribution over action.
• Maximize expected reward
• Sample trajectories of actions to approximate gradient
• アクション軌跡のサンプリングで勾配を近似
• (Policy gradient)
9/15

REINFORCE algorithm
• Competitive with heuristic loss
10/15

REINFORCE algorithm
• Competitive with heuristic loss
• But not much
10/15

REINFORCE algorithm
• CON:
• REINFORCE maximizes performance in expectation(choose better-result action)
• Only need highest scoring action to be correct (choose better score for action)
• Only links the current mention to a single antecedent(先行詞), but is trained
to assign high probability to all correct antecedents.
10/15

Reward Rescaling
• Incorporate reward into the max-margin objective’s slack rescaling
h (c,mi) =
max-margin objective
11/15

Reward Rescaling
• Since actions are independent, we can change an action a to a
different action a’ and see what the (B3 coreference metric) reward
we would have instead.
12/15

Reward Rescaling
different action a’ and see what the (B3 coreference metric) reward
we would have instead.
Reward = 1
Regret = 99
12/15
New I bought a chocolate for my girlfriend.
a

Reward Rescaling
different action a’ and see what the reward we would have instead.
Reward = 35
Regret = 65
12/15
a’

Reward Rescaling
different action a’ and see what the reward we would have instead.
Reward = 100
Regret = 0
12/15
a’’

Reward Rescaling
• Cost is the regret taking the action
• Replaces the heuristic cost
• Benefit from its max-margin loss as well as directly optimizing for
coreference metrics
h (c,mi) =
max R(a1,…,a’,…,aT) Reward for best action
- R(a1,…,(c,mi),…,aT) Reward for current action
13/15

Experiment
• B3 coreference metric for action sequence reward
• MUC has the flaw of treating all errors equally
• CEAFφ4 is slow to compute
14/15

Experiment result
• Reward-rescaling model make more errors
• However, the errors are less severe
• ~0.7% lower cost on average
• Comparing to Heuristic Loss
• Reward Rescaling make
• More errors on
• False anaphoric(照応)
• False New (word)
• Less error on
• Wrong link
14/15

Thank you
• Question and comments ?
15/15
Reference
• Deep Reinforcement Learning for Mention-Ranking Coreference
Models (Kevin Clark, Christopher D. Manning)
• Stanford CS224n
Lecture 15: Coreference Resolution
https://www.youtube.com/watch?v=rpwEWLaueRk
• https://github.com/clarkkev/deep-coref

Reinforcement learning for NLP coreference

Recommended

Recommended

More Related Content

Similar to Reinforcement learning for NLP coreference

Similar to Reinforcement learning for NLP coreference (20)

Recently uploaded

Recently uploaded (20)

Reinforcement learning for NLP coreference