RL in ZP_Resolution

•

0 likes•12 views

Personal Project: Using reinforcement learning in NLP tasks of which the goal is to predict the missing pronouns in sentences so to improve the dialogue system.

Data & Analytics

RL in Zero-Pronoun Resolution
Olivia Zhi
September 28, 2018
Insight AI Fellow

Motivation
Hey Siri, I want to buy some shoes.
Siri Human
OK, here’s what I
found:
Here’s what I
found:
In white please.

Motivation
Hey Siri, I want to buy some shoes.
Siri Human
OK, here’s what I
found:
Here’s what I
found:
Those in white please.

Concepts
• Zero: In linguistics, a zero (denoted by "∅”) is a segment which is not pronounced or
written.
• Zero-Pronoun: A pronoun that is not written in a sentence.
Frequently occurred in language of Chinese, Japanese, etc. for coherence.
Sometimes also occurred in English.

Data
• Resource: Chinese portion of the OntoNotes 5.0 dataset
# of
annotated
ZPs
E.g. zero-pronoun (∅)
- Chinese: 我觉得 ∅ 还可以。
(- English: I think ∅ OK.)
E.g. antecedents
-Chinese: 餐厅, 食物, 环境, 它, 那里, 那个
(- English: Restaurant, Food, Environment,
it, there, that)

Goal
• What: Predicting which antecedents (some previously occurred noun.) a zero pronoun
refers to.
• Why: Helping machine understand better in the task of translation and dialogue.
• Use Case: Intelligent Personal Assistant/ Personal Digital Assistant (PDA)
• Techniques: Reinforcement Learning

Why RL?
• Traditional deep learning models heavily make local coreference decisions.
Only considering the coreference relationship between zero pronoun and one single
candidate antecedent one at a time while overlooking their impacts on future
decisions. RL can solve the problem by including the information of previous antecedents of
the zero-pronoun in the current state.
• Reinforcement Learning is flexible and it can be to adapt the model based on the aims.
• Reinforcement Learning is the state-of-the-art technology and has already shown impact on
such tasks.

Models/Algorithms
NP: candidate
antecedents
• State: ZP, handcrafted features, Candidate antecedents, Antecedents(predicted by the model)
• Action: Whether the candidate antecedent NPt is the antecedent the ZP refers to (1 if NP is the antecedent, 0 otherwise)
• Reward: F1 score for the selected antecedents
Existing model

Models/Algorithms
My current
model
• Actor-Critic (AC) Network:
- The actor and the critic branches share the weighs of the first layer.
- Actor: learns a policy 𝜋(𝑎|𝑠) (to pick anaphoric or non-anaphoric action) by receiving feedback from a critic.
- Critic: learns a value function 𝑉(𝑆), which is the baseline to determine how advantageous it is to be in a state.

Reward function
My current
model
● For each zero-pronoun - candidate antecedent pair:
- Large positive reward for correctly identifying the antecedent (TP)
- Large negative reward for misidentifying the (FP)
- Small negative reward for not identifying the antecedent (FN)
- Small positive reward for correctly identifying the candidate is not the antecedent (TN)

Problem & Solution
• Prevent Overfitting
- AC model gets best result at 15th episode vs. previous model at 30th episode
- However, the performances (based on precision, recall and F1 score) decrease after the 15th episode
• Solution:
Further split the train set to train and
validation set, select best model based on
validation set and look at the performance
on the test set.
Save the best
model here!

About Me
Data Science Program
Congcong (Olivia) Zhi
At Waterloo,
we’re all nerds!
Skills
● Python
● R
● PySpark
● SQL
● Git
● Reinforcement Learning
● Deep Learning
Interests
● NLP
(dialogue system, text mining,
information retrieval)
● Reinforcement Learning
● Other Deep Learning
Field
https://www.linkedin.com/in/congcongoliviazhi/

Recently uploaded

Ranking and Scoring Exercises for ResearchRajesh Mondal

社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社

Vastral Call Girls Book Now 7737669865 Top Class Escort Service Availablegargpaaro

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg

Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...HyderabadDolls

Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940

Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...HyderabadDolls

TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg

Digital Transformation Playbook by Graham WareGraham Ware

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515

Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Delhi Call girls

Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

Recently uploaded (20)

Ranking and Scoring Exercises for Research

社内勉強会資料_Object Recognition as Next Token Prediction

Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...

Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...

Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia

Belur $ Female Escorts Service in Kolkata (Adult Only) 8005736733 Escort Serv...

TrafficWave Generator Will Instantly drive targeted and engaging traffic back...

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...

Digital Transformation Playbook by Graham Ware

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...

Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...

Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Barbie - Brand Strategy PresentationErica Santiago

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Barbie - Brand Strategy Presentation

RL in ZP_Resolution

1. RL in Zero-Pronoun Resolution Olivia Zhi September 28, 2018 Insight AI Fellow

2. Motivation Hey Siri, I want to buy some shoes. Siri Human OK, here’s what I found: Here’s what I found: In white please.

3. Motivation Hey Siri, I want to buy some shoes. Siri Human OK, here’s what I found: Here’s what I found: Those in white please.

4. Concepts • Zero: In linguistics, a zero (denoted by "∅”) is a segment which is not pronounced or written. • Zero-Pronoun: A pronoun that is not written in a sentence. Frequently occurred in language of Chinese, Japanese, etc. for coherence. Sometimes also occurred in English.

5. Data • Resource: Chinese portion of the OntoNotes 5.0 dataset # of annotated ZPs E.g. zero-pronoun (∅) - Chinese: 我觉得 ∅ 还可以。 (- English: I think ∅ OK.) E.g. antecedents -Chinese: 餐厅, 食物, 环境, 它, 那里, 那个 (- English: Restaurant, Food, Environment, it, there, that)

6. Goal • What: Predicting which antecedents (some previously occurred noun.) a zero pronoun refers to. • Why: Helping machine understand better in the task of translation and dialogue. • Use Case: Intelligent Personal Assistant/ Personal Digital Assistant (PDA) • Techniques: Reinforcement Learning

7. Why RL? • Traditional deep learning models heavily make local coreference decisions. Only considering the coreference relationship between zero pronoun and one single candidate antecedent one at a time while overlooking their impacts on future decisions. RL can solve the problem by including the information of previous antecedents of the zero-pronoun in the current state. • Reinforcement Learning is flexible and it can be to adapt the model based on the aims. • Reinforcement Learning is the state-of-the-art technology and has already shown impact on such tasks.

8. Models/Algorithms NP: candidate antecedents • State: ZP, handcrafted features, Candidate antecedents, Antecedents(predicted by the model) • Action: Whether the candidate antecedent NPt is the antecedent the ZP refers to (1 if NP is the antecedent, 0 otherwise) • Reward: F1 score for the selected antecedents Existing model

9. Models/Algorithms My current model • Actor-Critic (AC) Network: - The actor and the critic branches share the weighs of the first layer. - Actor: learns a policy 𝜋(𝑎|𝑠) (to pick anaphoric or non-anaphoric action) by receiving feedback from a critic. - Critic: learns a value function 𝑉(𝑆), which is the baseline to determine how advantageous it is to be in a state.

10. Reward function My current model ● For each zero-pronoun - candidate antecedent pair: - Large positive reward for correctly identifying the antecedent (TP) - Large negative reward for misidentifying the (FP) - Small negative reward for not identifying the antecedent (FN) - Small positive reward for correctly identifying the candidate is not the antecedent (TN)

11. Evaluation Existing model My model

12. Problem & Solution • Prevent Overfitting - AC model gets best result at 15th episode vs. previous model at 30th episode - However, the performances (based on precision, recall and F1 score) decrease after the 15th episode • Solution: Further split the train set to train and validation set, select best model based on validation set and look at the performance on the test set. Save the best model here!

13. About Me Data Science Program Congcong (Olivia) Zhi At Waterloo, we’re all nerds! Skills ● Python ● R ● PySpark ● SQL ● Git ● Reinforcement Learning ● Deep Learning Interests ● NLP (dialogue system, text mining, information retrieval) ● Reinforcement Learning ● Other Deep Learning Field https://www.linkedin.com/in/congcongoliviazhi/

14. Q&A ☺

RL in ZP_Resolution

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

RL in ZP_Resolution