SlideShare a Scribd company logo
NN論文を肴に酒を飲む会 #5
紹介者 Shitian Ni (倪石天)
ENMLP 2016
Deep Reinforcement Learning for Mention-Ranking
Coreference Models
Kevin Clark Christopher D. Manning
Computer Science Department
Stanford University
Computer Science Department
Stanford University
自己紹介
Shitian Ni (倪石天)
東京工業大学 工学部
1/15
自己紹介
Shitian Ni (倪石天)
東京工業大学 工学部
• Topcoder blue
• Kaggle Silver medalist (Recruit Restaurant Visitor Forecasting)
• Nvidia Deep Learning Institute TA
1/15
Coreference
• Identify all noun phrases (mentions) that refer to the same real world
identity
• 共通の指示対象を持つ2つ以上の単語の文法的関係
• 同一指示
2/15
Coreference
• Identify all noun phrases (mentions) that refer to the same real world
identity
• 共通の指示対象を持つ2つ以上の単語の文法的関係
• 同一指示
Example
2/15
My university that has TSUBAME 3.0,
which is a TOP500 supercomputer that accelerates my research
but cost Tokyo Tech a lot of money,
is located in Oookayama.
Applications
• Full text understanding
3/15
Applications
• Full text understanding
• Text summary
3/15
Applications
• Full text understanding
• Text summary
• Information retrieval
3/15
Applications
• Full text understanding
• Text summary
• Information retrieval
• Machine translation
3/15
Applications
• Full text understanding
• Text summary
• Information retrieval
• Machine translation
• I have a dog. It is 2 years old. <-> 2歳の犬を飼っている
3/15
Applications
• Full text understanding
• Text summary
• Information retrieval
• Machine translation
• I have a dog. It is 2 years old. <-> 2歳の犬を飼っている
• Chat bot question answering
3/15
Applications
• Full text understanding
• Text summary
• Information retrieval
• Machine translation
• I have a dog. It is 2 years old. <-> 2歳の犬を飼っている
• Chat bot question answering
• I want to eat Japanese food. Where can I find that?
3/15
Neural Mention-Ranking Model
• m: mention
• c: candidate antecedent
• s(c,m): compatibility for coreference
Hidden Layer
Input Layer
Scoring Layer
s(c,m)
4/15
Neural Mention-Ranking Model
• m: mention
• c: candidate antecedent
• s(c,m): compatibility for coreference
Hidden Layer
Input Layer
Scoring Layer
s(c,m)
trained with heuristic loss functions
tuned via hyperparameters
4/15
Challenge
• Finding Effective Error Penalties for loss calculations.
• Some errors are severe, some errors are minor
5/15
Challenge
• Finding Effective Error Penalties for loss calculations.
• Some errors are severe, some errors are minor
• Bill’s girlfriend is a friend of Michael’s wife.
5/15
Challenge
• Finding Effective Error Penalties for loss calculations.
• Some errors are severe, some errors are minor
• Bill’s girlfriend is a friend of Michael’s wife.
5/15
Severe error
Challenge
• Finding Effective Error Penalties for loss calculations.
• Some errors are severe, some errors are minor
• It is raining. That is my dog.
Minor error
5/15
Error types
• False New
New I bought a gift which is a chocolate for my girlfriend.
6/15
以前同一ものを指す単語が現れたが、初めてのものと認識される
Error types
• False New
New I bought a gift which is a chocolate for my girlfriend.
6/15
False New
以前同一ものを指す単語が現れたが、初めて現れたものと認識される
Error types
• False New
• False Anaphoric
New I bought a gift which is a chocolate for my girlfriend.
New I bought a gift which is a chocolate for my girlfriend.
6/15
False New
False Anaphoric
以前同一ものを指す単語が現れたが、初めて現れたものと認識される
初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識
(照応)
Error types
• False New
• False Anaphoric
• False Link
New I bought a gift which is a chocolate for my girlfriend.
New I bought a gift which is a chocolate for my girlfriend.
New I bought a gift which is a chocolate for my girlfriend.
6/15
False New
False Anaphoric
False Link
以前同一ものを指す単語が現れたが、初めて現れたものと認識される
初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識
二回以上現れたものを指す単語が、他の単語と同一指示関係にあると誤認識
(照応)
Error types
• False New
• False Anaphoric
• False Link
New I bought a gift which is a chocolate for my girlfriend.
New I bought a gift which is a chocolate for my girlfriend.
New I bought a gift which is a chocolate for my girlfriend.
6/15
False New
False Anaphoric
False Link
以前同一ものを指す単語が現れたが、初めて現れたものと認識される
初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識
二回以上現れたものを指す単語が、他の単語と同一指示関係にあると誤認識
(照応)
Prior work: Heuristic Loss Function
• Use max margin loss
(c,mi) (1 + s(c, mi) - s(ti, mi))hL(θ) = ∑ max
C
Max over candidate
coreference decision
Cost for this
coref decision
Loss for scoring this decision too highly
h (c,mi) =
0 if c ∈ T (mi) if c and mi are coreferent
αFN if c = NA ∧ T (mi) != {NA} if false new error
αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error
αWL if c != NA ∧ c ∉ T (mi) if wrong link error
7/15
Costs for linking mi to a candidate antecedent c ∈ C(mi):
ti := the highest scoring true antecedent of mi
Prior work: Heuristic Loss Function
• Use max margin loss
(c,mi) (1 + s(c, mi) - s(ti, mi))hL(θ) = max
C
Max over candidate
coreference decision
Cost for this
coref decision
Loss for scoring this decision too highly
h (c,mi) =
0 if c ∈ T (mi) if c and mi are coreferent
αFN if c = NA ∧ T (mi) != {NA} if false new error
αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error
αWL if c != NA ∧ c ∉ T (mi) if wrong link error
7/15
Costs for linking mi to a candidate antecedent c ∈ C(mi):
ti := the highest scoring true antecedent of mi
Tune !
Prior work: Heuristic Loss Function
• Disadvantage
• Grid search over hyperparameters
h (c,mi) =
0 if c ∈ T (mi) if c and mi are coreferent
αFN if c = NA ∧ T (mi) != {NA} if false new error
αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error
αWL if c != NA ∧ c ∉ T (mi) if wrong link error
7/15
Grid search: 機械学習モデルのハイパーパラメータを自動的に最適化
Costs for linking mi to a candidate antecedent c ∈ C(mi):
Proposed Reinforcement Learning methods
• Model takes a sequence of actions
-> Receive a reward
• REINFORCE algorithm
• Reward rescaling
8/15
New I bought a gift which is a chocolate for my girlfriend.
a1
a2
a3
a4
REINFORCE algorithm
• Define probability distribution over action.
• Maximize expected reward
• Sample trajectories of actions to approximate gradient
• アクション軌跡のサンプリングで勾配を近似
• (Policy gradient)
9/15
REINFORCE algorithm
• Competitive with heuristic loss
10/15
REINFORCE algorithm
• Competitive with heuristic loss
• But not much
10/15
REINFORCE algorithm
• CON:
• REINFORCE maximizes performance in expectation(choose better-result action)
• Only need highest scoring action to be correct (choose better score for action)
• Only links the current mention to a single antecedent(先行詞), but is trained
to assign high probability to all correct antecedents.
10/15
Reward Rescaling
• Incorporate reward into the max-margin objective’s slack rescaling
h (c,mi) =
0 if c ∈ T (mi) if c and mi are coreferent
αFN if c = NA ∧ T (mi) != {NA} if false new error
αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error
αWL if c != NA ∧ c ∉ T (mi) if wrong link error
max-margin objective
11/15
Reward Rescaling
• Incorporate reward into the max-margin objective’s slack rescaling
h (c,mi) =
0 if c ∈ T (mi) if c and mi are coreferent
αFN if c = NA ∧ T (mi) != {NA} if false new error
αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error
αWL if c != NA ∧ c ∉ T (mi) if wrong link error
max-margin objective
11/15
Reward Rescaling
• Since actions are independent, we can change an action a to a
different action a’ and see what the (B3 coreference metric) reward
we would have instead.
12/15
Reward Rescaling
• Since actions are independent, we can change an action a to a
different action a’ and see what the (B3 coreference metric) reward
we would have instead.
Reward = 1
Regret = 99
12/15
New I bought a chocolate for my girlfriend.
a
Reward Rescaling
• Since actions are independent, we can change an action a to a
different action a’ and see what the reward we would have instead.
Reward = 35
Regret = 65
12/15
New I bought a chocolate for my girlfriend.
a’
Reward Rescaling
• Since actions are independent, we can change an action a to a
different action a’ and see what the reward we would have instead.
Reward = 100
Regret = 0
12/15
New I bought a chocolate for my girlfriend.
a’’
Reward Rescaling
• Cost is the regret taking the action
• Replaces the heuristic cost
• Benefit from its max-margin loss as well as directly optimizing for
coreference metrics
h (c,mi) =
max R(a1,…,a’,…,aT) Reward for best action
- R(a1,…,(c,mi),…,aT) Reward for current action
13/15
Reward Rescaling
• Cost is the regret taking the action
• Replaces the heuristic cost
• Benefit from its max-margin loss as well as directly optimizing for
coreference metrics
h (c,mi) =
max R(a1,…,a’,…,aT) Reward for best action
- R(a1,…,(c,mi),…,aT) Reward for current action
13/15
Experiment
• B3 coreference metric for action sequence reward
• MUC has the flaw of treating all errors equally
• CEAFφ4 is slow to compute
14/15
Experiment result
• Reward-rescaling model make more errors
• However, the errors are less severe
• ~0.7% lower cost on average
• Comparing to Heuristic Loss
• Reward Rescaling make
• More errors on
• False anaphoric(照応)
• False New (word)
• Less error on
• Wrong link
14/15
Thank you
• Question and comments ?
15/15
Reference
• Deep Reinforcement Learning for Mention-Ranking Coreference
Models (Kevin Clark, Christopher D. Manning)
• Stanford CS224n
Lecture 15: Coreference Resolution
https://www.youtube.com/watch?v=rpwEWLaueRk
• https://github.com/clarkkev/deep-coref

More Related Content

Similar to Reinforcement learning for NLP coreference

Startup finance: valuation of tech companies
Startup finance: valuation of tech companiesStartup finance: valuation of tech companies
Startup finance: valuation of tech companies
Rianne Vogels
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
Optimizely
 
Asko Relas: Machine Learning for conversion optimization – How to be relevant...
Asko Relas: Machine Learning for conversion optimization – How to be relevant...Asko Relas: Machine Learning for conversion optimization – How to be relevant...
Asko Relas: Machine Learning for conversion optimization – How to be relevant...
Loihde Advisory
 
Test review
Test reviewTest review
Test reviewbweldon
 
Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015
Webanalisten .nl
 
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob..."Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
Quantopian
 
PRESENT WORTH ANALYSIS.pptx
PRESENT WORTH ANALYSIS.pptxPRESENT WORTH ANALYSIS.pptx
PRESENT WORTH ANALYSIS.pptx
ismailshah64
 
Linear programming
Linear programmingLinear programming
Linear programming
Surekha98
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
Rising Media, Inc.
 
Your A/B Tests are Lying to You
Your A/B Tests are Lying to YouYour A/B Tests are Lying to You
Your A/B Tests are Lying to You
John Clevenger
 
Your A/B Tests are Lying to You
Your A/B Tests are Lying to YouYour A/B Tests are Lying to You
Your A/B Tests are Lying to You
John Clevenger
 
Increasing reporting value with statistics
Increasing reporting value with statisticsIncreasing reporting value with statistics
Increasing reporting value with statisticsvraopolisetti
 
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationStop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Sam McAfee
 
New Product Development Cost Assessment PowerPoint Presentation Slides
New Product Development Cost Assessment PowerPoint Presentation SlidesNew Product Development Cost Assessment PowerPoint Presentation Slides
New Product Development Cost Assessment PowerPoint Presentation Slides
SlideTeam
 
Section 8 Ensure Valid Test and Survey Results Trough .docx
Section 8 Ensure Valid Test and Survey Results Trough .docxSection 8 Ensure Valid Test and Survey Results Trough .docx
Section 8 Ensure Valid Test and Survey Results Trough .docx
kenjordan97598
 
Translating Lift into Dollar Value & Tracking Revenue in Optimizely
Translating Lift into Dollar Value & Tracking Revenue in OptimizelyTranslating Lift into Dollar Value & Tracking Revenue in Optimizely
Translating Lift into Dollar Value & Tracking Revenue in OptimizelyOptimizely
 
Part 1 of 8 - Question 1 of 17 1.0 Points A pha.docx
Part 1 of 8 -  Question 1 of 17 1.0 Points A pha.docxPart 1 of 8 -  Question 1 of 17 1.0 Points A pha.docx
Part 1 of 8 - Question 1 of 17 1.0 Points A pha.docx
herbertwilson5999
 
Week 3 Ppt.pptx
Week 3 Ppt.pptxWeek 3 Ppt.pptx
Week 3 Ppt.pptx
ImranMohammed971139
 
VSSML18. Evaluations
VSSML18. EvaluationsVSSML18. Evaluations
VSSML18. Evaluations
BigML, Inc
 
Pricing strategies 2017 nz
Pricing strategies 2017 nzPricing strategies 2017 nz
Pricing strategies 2017 nz
Robert Magnus, DVM, MBA
 

Similar to Reinforcement learning for NLP coreference (20)

Startup finance: valuation of tech companies
Startup finance: valuation of tech companiesStartup finance: valuation of tech companies
Startup finance: valuation of tech companies
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
 
Asko Relas: Machine Learning for conversion optimization – How to be relevant...
Asko Relas: Machine Learning for conversion optimization – How to be relevant...Asko Relas: Machine Learning for conversion optimization – How to be relevant...
Asko Relas: Machine Learning for conversion optimization – How to be relevant...
 
Test review
Test reviewTest review
Test review
 
Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015Chris Stuccio - Data science - Conversion Hotel 2015
Chris Stuccio - Data science - Conversion Hotel 2015
 
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob..."Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
 
PRESENT WORTH ANALYSIS.pptx
PRESENT WORTH ANALYSIS.pptxPRESENT WORTH ANALYSIS.pptx
PRESENT WORTH ANALYSIS.pptx
 
Linear programming
Linear programmingLinear programming
Linear programming
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
Your A/B Tests are Lying to You
Your A/B Tests are Lying to YouYour A/B Tests are Lying to You
Your A/B Tests are Lying to You
 
Your A/B Tests are Lying to You
Your A/B Tests are Lying to YouYour A/B Tests are Lying to You
Your A/B Tests are Lying to You
 
Increasing reporting value with statistics
Increasing reporting value with statisticsIncreasing reporting value with statistics
Increasing reporting value with statistics
 
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo SimulationStop Flying Blind! Quantifying Risk with Monte Carlo Simulation
Stop Flying Blind! Quantifying Risk with Monte Carlo Simulation
 
New Product Development Cost Assessment PowerPoint Presentation Slides
New Product Development Cost Assessment PowerPoint Presentation SlidesNew Product Development Cost Assessment PowerPoint Presentation Slides
New Product Development Cost Assessment PowerPoint Presentation Slides
 
Section 8 Ensure Valid Test and Survey Results Trough .docx
Section 8 Ensure Valid Test and Survey Results Trough .docxSection 8 Ensure Valid Test and Survey Results Trough .docx
Section 8 Ensure Valid Test and Survey Results Trough .docx
 
Translating Lift into Dollar Value & Tracking Revenue in Optimizely
Translating Lift into Dollar Value & Tracking Revenue in OptimizelyTranslating Lift into Dollar Value & Tracking Revenue in Optimizely
Translating Lift into Dollar Value & Tracking Revenue in Optimizely
 
Part 1 of 8 - Question 1 of 17 1.0 Points A pha.docx
Part 1 of 8 -  Question 1 of 17 1.0 Points A pha.docxPart 1 of 8 -  Question 1 of 17 1.0 Points A pha.docx
Part 1 of 8 - Question 1 of 17 1.0 Points A pha.docx
 
Week 3 Ppt.pptx
Week 3 Ppt.pptxWeek 3 Ppt.pptx
Week 3 Ppt.pptx
 
VSSML18. Evaluations
VSSML18. EvaluationsVSSML18. Evaluations
VSSML18. Evaluations
 
Pricing strategies 2017 nz
Pricing strategies 2017 nzPricing strategies 2017 nz
Pricing strategies 2017 nz
 

Recently uploaded

Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Matjaž Lipuš
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
OWASP Beja
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
eCommerce Institute
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
amekonnen
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
IP ServerOne
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
faizulhassanfaiz1670
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
kkirkland2
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Dutch Power
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
Vladimir Samoylov
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Dutch Power
 
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AwangAniqkmals
 

Recently uploaded (20)

Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
 
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
 

Reinforcement learning for NLP coreference

  • 1. NN論文を肴に酒を飲む会 #5 紹介者 Shitian Ni (倪石天) ENMLP 2016 Deep Reinforcement Learning for Mention-Ranking Coreference Models Kevin Clark Christopher D. Manning Computer Science Department Stanford University Computer Science Department Stanford University
  • 3. 自己紹介 Shitian Ni (倪石天) 東京工業大学 工学部 • Topcoder blue • Kaggle Silver medalist (Recruit Restaurant Visitor Forecasting) • Nvidia Deep Learning Institute TA 1/15
  • 4. Coreference • Identify all noun phrases (mentions) that refer to the same real world identity • 共通の指示対象を持つ2つ以上の単語の文法的関係 • 同一指示 2/15
  • 5. Coreference • Identify all noun phrases (mentions) that refer to the same real world identity • 共通の指示対象を持つ2つ以上の単語の文法的関係 • 同一指示 Example 2/15 My university that has TSUBAME 3.0, which is a TOP500 supercomputer that accelerates my research but cost Tokyo Tech a lot of money, is located in Oookayama.
  • 6. Applications • Full text understanding 3/15
  • 7. Applications • Full text understanding • Text summary 3/15
  • 8. Applications • Full text understanding • Text summary • Information retrieval 3/15
  • 9. Applications • Full text understanding • Text summary • Information retrieval • Machine translation 3/15
  • 10. Applications • Full text understanding • Text summary • Information retrieval • Machine translation • I have a dog. It is 2 years old. <-> 2歳の犬を飼っている 3/15
  • 11. Applications • Full text understanding • Text summary • Information retrieval • Machine translation • I have a dog. It is 2 years old. <-> 2歳の犬を飼っている • Chat bot question answering 3/15
  • 12. Applications • Full text understanding • Text summary • Information retrieval • Machine translation • I have a dog. It is 2 years old. <-> 2歳の犬を飼っている • Chat bot question answering • I want to eat Japanese food. Where can I find that? 3/15
  • 13. Neural Mention-Ranking Model • m: mention • c: candidate antecedent • s(c,m): compatibility for coreference Hidden Layer Input Layer Scoring Layer s(c,m) 4/15
  • 14. Neural Mention-Ranking Model • m: mention • c: candidate antecedent • s(c,m): compatibility for coreference Hidden Layer Input Layer Scoring Layer s(c,m) trained with heuristic loss functions tuned via hyperparameters 4/15
  • 15. Challenge • Finding Effective Error Penalties for loss calculations. • Some errors are severe, some errors are minor 5/15
  • 16. Challenge • Finding Effective Error Penalties for loss calculations. • Some errors are severe, some errors are minor • Bill’s girlfriend is a friend of Michael’s wife. 5/15
  • 17. Challenge • Finding Effective Error Penalties for loss calculations. • Some errors are severe, some errors are minor • Bill’s girlfriend is a friend of Michael’s wife. 5/15 Severe error
  • 18. Challenge • Finding Effective Error Penalties for loss calculations. • Some errors are severe, some errors are minor • It is raining. That is my dog. Minor error 5/15
  • 19. Error types • False New New I bought a gift which is a chocolate for my girlfriend. 6/15 以前同一ものを指す単語が現れたが、初めてのものと認識される
  • 20. Error types • False New New I bought a gift which is a chocolate for my girlfriend. 6/15 False New 以前同一ものを指す単語が現れたが、初めて現れたものと認識される
  • 21. Error types • False New • False Anaphoric New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. 6/15 False New False Anaphoric 以前同一ものを指す単語が現れたが、初めて現れたものと認識される 初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識 (照応)
  • 22. Error types • False New • False Anaphoric • False Link New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. 6/15 False New False Anaphoric False Link 以前同一ものを指す単語が現れたが、初めて現れたものと認識される 初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識 二回以上現れたものを指す単語が、他の単語と同一指示関係にあると誤認識 (照応)
  • 23. Error types • False New • False Anaphoric • False Link New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. New I bought a gift which is a chocolate for my girlfriend. 6/15 False New False Anaphoric False Link 以前同一ものを指す単語が現れたが、初めて現れたものと認識される 初めて現れたものを指す単語なのに、他の単語と同一指示関係にあると誤認識 二回以上現れたものを指す単語が、他の単語と同一指示関係にあると誤認識 (照応)
  • 24. Prior work: Heuristic Loss Function • Use max margin loss (c,mi) (1 + s(c, mi) - s(ti, mi))hL(θ) = ∑ max C Max over candidate coreference decision Cost for this coref decision Loss for scoring this decision too highly h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error 7/15 Costs for linking mi to a candidate antecedent c ∈ C(mi): ti := the highest scoring true antecedent of mi
  • 25. Prior work: Heuristic Loss Function • Use max margin loss (c,mi) (1 + s(c, mi) - s(ti, mi))hL(θ) = max C Max over candidate coreference decision Cost for this coref decision Loss for scoring this decision too highly h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error 7/15 Costs for linking mi to a candidate antecedent c ∈ C(mi): ti := the highest scoring true antecedent of mi Tune !
  • 26. Prior work: Heuristic Loss Function • Disadvantage • Grid search over hyperparameters h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error 7/15 Grid search: 機械学習モデルのハイパーパラメータを自動的に最適化 Costs for linking mi to a candidate antecedent c ∈ C(mi):
  • 27. Proposed Reinforcement Learning methods • Model takes a sequence of actions -> Receive a reward • REINFORCE algorithm • Reward rescaling 8/15 New I bought a gift which is a chocolate for my girlfriend. a1 a2 a3 a4
  • 28. REINFORCE algorithm • Define probability distribution over action. • Maximize expected reward • Sample trajectories of actions to approximate gradient • アクション軌跡のサンプリングで勾配を近似 • (Policy gradient) 9/15
  • 29. REINFORCE algorithm • Competitive with heuristic loss 10/15
  • 30. REINFORCE algorithm • Competitive with heuristic loss • But not much 10/15
  • 31. REINFORCE algorithm • CON: • REINFORCE maximizes performance in expectation(choose better-result action) • Only need highest scoring action to be correct (choose better score for action) • Only links the current mention to a single antecedent(先行詞), but is trained to assign high probability to all correct antecedents. 10/15
  • 32. Reward Rescaling • Incorporate reward into the max-margin objective’s slack rescaling h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error max-margin objective 11/15
  • 33. Reward Rescaling • Incorporate reward into the max-margin objective’s slack rescaling h (c,mi) = 0 if c ∈ T (mi) if c and mi are coreferent αFN if c = NA ∧ T (mi) != {NA} if false new error αFA if c != NA ∧ T (mi) = {NA} if false anaphoric error αWL if c != NA ∧ c ∉ T (mi) if wrong link error max-margin objective 11/15
  • 34. Reward Rescaling • Since actions are independent, we can change an action a to a different action a’ and see what the (B3 coreference metric) reward we would have instead. 12/15
  • 35. Reward Rescaling • Since actions are independent, we can change an action a to a different action a’ and see what the (B3 coreference metric) reward we would have instead. Reward = 1 Regret = 99 12/15 New I bought a chocolate for my girlfriend. a
  • 36. Reward Rescaling • Since actions are independent, we can change an action a to a different action a’ and see what the reward we would have instead. Reward = 35 Regret = 65 12/15 New I bought a chocolate for my girlfriend. a’
  • 37. Reward Rescaling • Since actions are independent, we can change an action a to a different action a’ and see what the reward we would have instead. Reward = 100 Regret = 0 12/15 New I bought a chocolate for my girlfriend. a’’
  • 38. Reward Rescaling • Cost is the regret taking the action • Replaces the heuristic cost • Benefit from its max-margin loss as well as directly optimizing for coreference metrics h (c,mi) = max R(a1,…,a’,…,aT) Reward for best action - R(a1,…,(c,mi),…,aT) Reward for current action 13/15
  • 39. Reward Rescaling • Cost is the regret taking the action • Replaces the heuristic cost • Benefit from its max-margin loss as well as directly optimizing for coreference metrics h (c,mi) = max R(a1,…,a’,…,aT) Reward for best action - R(a1,…,(c,mi),…,aT) Reward for current action 13/15
  • 40. Experiment • B3 coreference metric for action sequence reward • MUC has the flaw of treating all errors equally • CEAFφ4 is slow to compute 14/15
  • 41. Experiment result • Reward-rescaling model make more errors • However, the errors are less severe • ~0.7% lower cost on average • Comparing to Heuristic Loss • Reward Rescaling make • More errors on • False anaphoric(照応) • False New (word) • Less error on • Wrong link 14/15
  • 42. Thank you • Question and comments ? 15/15 Reference • Deep Reinforcement Learning for Mention-Ranking Coreference Models (Kevin Clark, Christopher D. Manning) • Stanford CS224n Lecture 15: Coreference Resolution https://www.youtube.com/watch?v=rpwEWLaueRk • https://github.com/clarkkev/deep-coref