SlideShare a Scribd company logo
Abductive Commonsense
Reasoning
San Kim
2020.07.03
Chandra Bhagavatula(1), Ronan Le Bras(1), Chaitanya Malaviya(1), Keisuke Sakaguchi(1),
Ari Holtzman(1), Hannah Rashkin(1), Doug Downey(1), Scott Wen-tau Yih(2), Yejin Choi(1,3)
1. Allen Institute for AI
2. Facebook AI
3. Paul G. Allen School of Computer Scient & Engineering
Contributions
1. Abductive Commonsense Reasoning Challenges
1. Abductive NLI (𝛼𝑁𝐿𝐼)
2. Abductive NLG (𝛼𝑁𝐿𝐺)
2. ART: Dataset for Abductive Reasoning in Narrative Text
1. 20K Narrative Context
2. 200K Hypotheses
1. Experiments & Key Insights
Abductive Commonsense Reasoning
𝜶𝑵𝑳𝑰 is formulated as multiple choice problems consisting of a pair of
observations as context and a pair of hypothesis choices.
𝑂1: The observation at time 𝑡1
𝑂2: The observation at time 𝑡2 > 𝑡1.
ℎ+
: A plausible hypothesis that explains the two observations 𝑂1 and 𝑂2.
ℎ−: An implausible (or less plausible) hypothesis for observations 𝑂1 and 𝑂2.
Abductive Natural Language Generation
𝜶𝑵𝑳𝑮 is the task of generating a valid hypothesis ℎ+ given the two
observation 𝑂1 and 𝑂2. Formally, the task requires to maximize 𝑃 ℎ+
𝑂1, 𝑂2 .
Abductive Natural Language Inference
Abductive Reasoning
Adopted from [1]
A Probabilistic Framework for 𝛼𝑁𝐿𝐼
The 𝛼𝑁𝐿𝐼 task is to select the hypothesis ℎ∗
that is most probable given the
observations.
ℎ∗
= arg max
ℎ 𝑖
𝑃 𝐻 = ℎ𝑖
𝑂1, 𝑂2
Rewriting the objective using Bayes Rule conditioned on 𝑂1, we have:
𝑃 ℎ𝑖 𝑂1, 𝑂2 ∝ 𝑃 𝑂2 𝐻 𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1
Illustration of the graphical models described in the probabilistic framework.
Adopted from [1]
A Probabilistic Framework for 𝛼𝑁𝐿𝐼
(a) Hypothesis Only
• the strong assumption: the hypothesis is entirely independent of both observations, i.e.
(𝐻 ⊥ 𝑂1, 𝑂2).
• Maximize the marginal 𝑷 𝑯 .
(b, c) First (or Second) Observation Only
• Weaker assumption: the hypothesis depends on only one of the first 𝑂1 or second 𝑂2
observation.
• Maximize the conditional probability 𝑷 𝑯 𝑶 𝟏 or 𝑷 𝑶 𝟐 𝑯 .
Adopted from [1]
A Probabilistic Framework for 𝛼𝑁𝐿𝐼
(d) Linear Chain
• Uses both observations, but consider each observation’s influence on the hypothesis
independently.
• The model assumes that the three variables < 𝑶 𝟏, 𝑯, 𝑶 𝟐 > form a linear Markov chain.
• The second observation is conditionally independent of the first, given the hypothesis (i.e.
𝑂1 ⊥ 𝑂2 𝐻))
• ℎ∗
= arg max
ℎ 𝑖
𝑃 𝑂2 ℎ∗
𝑃 ℎ∗
𝑂1 𝑤ℎ𝑒𝑟𝑒 𝑂1 ⊥ 𝑂2 𝐻
(e) Fully Connected
• Jointly models all three random variables.
• Combine information across both observations to choose the correct hypothesis.
• ℎ∗ = arg max
ℎ 𝑖
𝑃 𝑂2 ℎ𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1
Adopted from [1]
Difference btw/ the Linear Chain and Fully Connected model
𝑂1: Carl went to the store desperately searching for flour tortillas for a recipe.
𝑂2: Carl left the store very frustrated.
ℎ1: The cashier was rude. ℎ2: The store had corn tortillas, but not flour ones.
𝑡0
𝑡1
𝑡2
Linear Chain
𝑂2 ℎ1 ℎ1
𝑂1𝑃 ) 𝑃 )
𝑃 ) 𝑃 )
:Plausible! :Plausible!
𝑂2 ℎ2
ℎ2
𝑂1 :Plausible!:Plausible…
Fully Connected
𝑂2 ℎ1 ℎ1 𝑂1𝑃 , ) 𝑃 )
𝑃 )
:Plausible… :Plausible!
𝑂2 ℎ2
ℎ2 𝑂2 :Plausible!:Plausible!
𝑂1
𝑃 , )𝑂1
ℎ1 ℎ2
ℎ1 ℎ2
>
<
𝛼𝑁𝐿𝐺 model
Given ℎ+
= 𝑤1
ℎ
⋯ 𝑤𝑙
ℎ
, 𝑂1 = 𝑤1
𝑜1
⋯ 𝑤 𝑚
𝑜1
, 𝑂2 = 𝑤1
𝑜2
⋯ 𝑤 𝑛
𝑜2
as sequence of tokens,
The 𝛼𝑁𝐿𝐺 task can be modeled as 𝑷 𝒉+ 𝑶 𝟏, 𝑶 𝟐 = 𝑷 𝒘𝒊
𝒉
𝒘<𝒊
𝒉
, 𝒘 𝟏
𝒐 𝟏
⋯ 𝒘 𝒎
𝒐 𝟏
, 𝒘 𝟏
𝒐 𝟐
⋯ 𝒘 𝒏
𝒐 𝟐
Optionally, The model can also be conditioned on background knowledge 𝓚.
ℒ = −
𝑖=1
𝑁
log 𝑃 𝑤𝑖
ℎ
𝑤<𝑖
ℎ
, 𝑤1
𝑜1
⋯ 𝑤 𝑚
𝑜1
, 𝑤1
𝑜2
⋯ 𝑤 𝑛
𝑜2
, 𝒦
Adopted from [1]
ConceptNet [5]
• ConceptNet is a multilingual knowledge base
• Is constructed of:
• Nodes ( a word or a short phrase)
• Edges (relation and weight)
• ConceptNet is a multilingual knowledge base
• Is constructed of:
• Nodes ( a word or a short phrase)
• Edges (relation and weight)
Birds is not capable of … {bark, chew their food,
breathe water}
• Related Works
• Wordnet
• Microsoft Concept Net
• Google knowledge base
Adopted from [6]
ATOMIC [2]
1. xIntent: Why does X cause an event?
2. xNeed: What does X need to do before the
event?
3. xAttr: How would X be described?
4. xEffect: What effects does the event have on X?
5. xWant: What would X likely want to do after
the event?
6. xReaction: How does X feel after the event?
7. oReact: How do others’ feel after the event?
8. oWant: What would others likely want to do
after the event?
9. oEffect: What effects does the event have on
others?
If-Then Relation Types
• If-Event-Then-Mental-State
• If-Event-Then-Event
• If-Event-Then-Persona
Inference dimension
Adopted from [2]
ATOMIC
• If-Event-Then-Mental-State: three relations relating to the mental pre- and post-conditions of an event.
X compliments Y X wants to be nice
X feels good
Y feels flattered
xIntent: likely intents of the event
xReaction: likely (emotional) reactions of the event’s subject
oReaction: likely (emotional) reactions of others
Adopted from [2]
ATOMIC
• If-Event-Then-Event: five relations relating to events that constitute probable pre- and post-conditions of
a given event.
X calls the police X needs to dial 911
X starts to panic
X wants to explain everything to the police
xNeeds
xEffect
xWant
oWant
oEffect
Others want to dispatch some officers
Y will smileX pays Y a compliment
• If-Event-Then-Persona: a stative relation that describes how the subject of an event is described or
perceived.
xAttrX is flatteringX pays Y a compliment
xAttrX is caring
ATOMIC
Adopted from [2]
COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction [3]
Adopted from [3]
COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction
Adopted from [3]
COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction
Loss function
ℒ = −
𝑡= 𝑠 +|𝑟|
𝑠 + 𝑟 +|𝑜|
log 𝑃 𝑥 𝑡 𝑥<𝑡
Input Template
Notation
• Natural language tuples in {s: subject, r: relation, o: object}
• Subject tokens: 𝑿 𝒔 = 𝑥0
𝑠
, ⋯ , 𝑥 𝑠
𝑠
• Relation tokens: 𝑿 𝒓
= 𝑥0
𝑟
, ⋯ , 𝑥 𝑟
𝑟
• Object tokens: 𝑿 𝒐
= 𝑥0
𝑜
, ⋯ , 𝑥 𝑜
𝑜
Adopted from [3]
COMET: COMmonsEnse Transformers for Automatic Knowledge Graph
Construction
Adversarial Filtering Algorithm
• A significant challenge in creating datasets is avoiding annotation artifacts.
• Annotation artifacts: unintentional patterns in the data that leak information about the
target label.
In each iteration 𝑖,
• 𝑀𝑖: adversarial model
• 𝒯𝑖: a random subset for training
• 𝒱𝑖: validation set
• ℎ 𝑘
+
, ℎ 𝑘
−
: plausible and implausible
hypotheses for an instance 𝑘.
• 𝛿 = ∆ 𝑀 𝑖
ℎ 𝑘
+
, ℎ 𝑘
−
: the difference in the
model evaluation of ℎ 𝑘
+
and ℎ 𝑘
−
.
With probability 𝑡𝑖, update instance 𝑘 that
𝑀𝑖 gets correct with a pair ℎ+, ℎ− ∈ ℋ𝑘
+
×
ℋ𝑘
−
of hypotheses that reduces the value
of 𝛿, where ℋ𝑘
+
(resp. ℋ𝑘
−
) is the pool of
plausible (resp. implausible) hypotheses for
instance 𝑘.
Adopted from [1]
Performance of baselines (𝛼𝑁𝐿𝐼)
Adopted from [1]
Performance of baselines (𝛼𝑁𝐿𝐼)
Adopted from [1]
Performance of baselines (𝛼𝑁𝐿𝐺)
Adopted from [1]
Adopted from [1]
BERT-Score
Evaluating Text Generation with BERT (like inception-score, Frechet-inception-distance, kernel-inception-distance in image generation)
Adopted from [4]
BERT-Score
BERT-Score
𝑅 𝐵𝐸𝑅𝑇 =
1
|𝑥|
𝑥 𝑖∈𝑥
max
𝑥 𝑗∈ 𝑥
𝐱 𝑖
⊺
𝐱𝑗 𝑃𝐵𝐸𝑅𝑇 =
1
| 𝑥|
𝑥ℎ∈ 𝑥
max
𝑥 𝑖∈𝑥
𝐱 𝑖
⊺
𝐱𝑗 𝐹𝐵𝐸𝑅𝑇 =
2 𝑃𝐵𝐸𝑅𝑇 ⋅ 𝑅 𝐵𝐸𝑅𝑇
𝑃𝐵𝐸𝑅𝑇 + 𝑅 𝐵𝐸𝑅𝑇
Importance Weighting using inverse document frequency(idf)
𝑖𝑑𝑓 𝑤 = − log
1
𝑀 𝑖=1
𝑀
𝕀 𝑤 ∈ 𝑥 𝑖 , where 𝕀 ⋅ is an indicator function.
𝑅 𝐵𝐸𝑅𝑇 =
Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓 𝑥𝑖 max
𝑥 𝑗∈ 𝑥
𝐱 𝑖
⊺
𝐱𝑗
Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓(𝑥𝑖)
Baseline Rescaling: compute baseline b using Common Crawl monolingual datasets.
𝑅 𝐵𝐸𝑅𝑇 =
𝑅 𝐵𝐸𝑅𝑇 − 𝑏
1 − 𝑏
BERT-Score
Adopted from [4]
Abductive Commonsense Reasoning
1. 𝑂1 ↛ ℎ−: ℎ− is unlikely to follow after the first observation 𝑂1.
2. ℎ− ↛ 𝑂2: ℎ− is plausible after 𝑂1 but unlikely to precede the second observation 𝑂2.
3. Plausible: 𝑂1, ℎ−
, 𝑂2 is a coherent narrative and forms a plausible alternative, but it less plausible
than 𝑂1, ℎ+
, 𝑂2 .
Adopted from [1]
Bert performance
Abductive Commonsense Reasoning
Adopted from [1]
Abductive Commonsense Reasoning
Adopted from [1]
Reference
[1] Chandra Bhagavatula et. al., Abductive Commonsense Reasoning, ICLR 2020.
[2] Maarten Sap et. al., ATOMIC: An atlas of Machine Commonsense for If-Then Reasoning,
arXiv:1811.00146
[3] Antonie Bosselut et. al., COMET: Commonsense Transformers for Automatic Knowledge Graph
Construction, ACL 2019
[4] Tianyi Zhang et. al., BERTScore: Evaluating Text Generation with BERT, ICLR 2020.
[5] Robyn Speer et. al., ConceptNet 5.5: An Open Multilingual Graph of General Knowledge,
arXiv:1612.03975
[6] MIT medialab, URL: https://conceptnet.io/

More Related Content

What's hot

Fundamental computing algorithms
Fundamental computing algorithmsFundamental computing algorithms
Fundamental computing algorithms
Ganesh Solanke
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
Parinaz Ameri
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
Ashwin Shiv
 
Fuzzy Logic and Neural Network
Fuzzy Logic and Neural NetworkFuzzy Logic and Neural Network
Fuzzy Logic and Neural Network
SHIMI S L
 
How principal components analysis is different from factor
How principal components analysis is different from factorHow principal components analysis is different from factor
How principal components analysis is different from factor
Arup Guha
 
Cwkaa 2010
Cwkaa 2010Cwkaa 2010
Cwkaa 2010
Sam Neaves
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
Sundong Kim
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
Natan Katz
 
Fuzzy Logic Ppt
Fuzzy Logic PptFuzzy Logic Ppt
Fuzzy Logic Ppt
rafi
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
Monika Choudhery
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
Pier Luca Lanzi
 
lecture 26
lecture 26lecture 26
lecture 26
sajinsc
 
Dataflow Analysis
Dataflow AnalysisDataflow Analysis
Dataflow Analysis
Miller Lee
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
일상 온
 
Chapter 17
Chapter 17Chapter 17
Chapter 17
ashish bansal
 
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Professor Lili Saghafi
 
Greedy method1
Greedy method1Greedy method1
Greedy method1
Rajendran
 

What's hot (20)

Fundamental computing algorithms
Fundamental computing algorithmsFundamental computing algorithms
Fundamental computing algorithms
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 
Fuzzy Logic and Neural Network
Fuzzy Logic and Neural NetworkFuzzy Logic and Neural Network
Fuzzy Logic and Neural Network
 
How principal components analysis is different from factor
How principal components analysis is different from factorHow principal components analysis is different from factor
How principal components analysis is different from factor
 
Cwkaa 2010
Cwkaa 2010Cwkaa 2010
Cwkaa 2010
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Fuzzy Logic Ppt
Fuzzy Logic PptFuzzy Logic Ppt
Fuzzy Logic Ppt
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
lecture 26
lecture 26lecture 26
lecture 26
 
Dataflow Analysis
Dataflow AnalysisDataflow Analysis
Dataflow Analysis
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
 
Chapter 17
Chapter 17Chapter 17
Chapter 17
 
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
 
Greedy method1
Greedy method1Greedy method1
Greedy method1
 

Similar to Abductive commonsense reasoning

Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
San Kim
 
SPIE Conference V3.0
SPIE Conference V3.0SPIE Conference V3.0
SPIE Conference V3.0
Robert Fry
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Association for Computational Linguistics
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
Kaniska Mandal
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
Syed Muhammad Zeejah Hashmi
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
EasyStudy3
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
Electronic Arts / DICE
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
Natan Katz
 
Icra 17
Icra 17Icra 17
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
Chia-Ching Lin
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
yang947066
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
ananth
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Eun Ji Lee
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1
Mahammad Valiyev
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy
재연 윤
 
Functors, Comonads, and Digital Image Processing
Functors, Comonads, and Digital Image ProcessingFunctors, Comonads, and Digital Image Processing
Functors, Comonads, and Digital Image Processing
Justin Le
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
AbrahamTadesse11
 

Similar to Abductive commonsense reasoning (20)

Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
SPIE Conference V3.0
SPIE Conference V3.0SPIE Conference V3.0
SPIE Conference V3.0
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Icra 17
Icra 17Icra 17
Icra 17
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1
 
Dcn 20170823 yjy
Dcn 20170823 yjyDcn 20170823 yjy
Dcn 20170823 yjy
 
Functors, Comonads, and Digital Image Processing
Functors, Comonads, and Digital Image ProcessingFunctors, Comonads, and Digital Image Processing
Functors, Comonads, and Digital Image Processing
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 

More from San Kim

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
San Kim
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx
San Kim
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
San Kim
 
slide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxslide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptx
San Kim
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
San Kim
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
San Kim
 
AI2 day.pptx
AI2 day.pptxAI2 day.pptx
AI2 day.pptx
San Kim
 
Temporal reasoning task
Temporal reasoning taskTemporal reasoning task
Temporal reasoning task
San Kim
 
Answering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalAnswering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrieval
San Kim
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understanding
San Kim
 
Electra
ElectraElectra
Electra
San Kim
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
San Kim
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
San Kim
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
San Kim
 
Gan seminar
Gan seminarGan seminar
Gan seminar
San Kim
 
Deep learning study 3
Deep learning study 3Deep learning study 3
Deep learning study 3
San Kim
 
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1
San Kim
 
Back propagation
Back propagationBack propagation
Back propagation
San Kim
 

More from San Kim (18)

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
 
slide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxslide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptx
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
 
AI2 day.pptx
AI2 day.pptxAI2 day.pptx
AI2 day.pptx
 
Temporal reasoning task
Temporal reasoning taskTemporal reasoning task
Temporal reasoning task
 
Answering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalAnswering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrieval
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understanding
 
Electra
ElectraElectra
Electra
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Deep learning study 3
Deep learning study 3Deep learning study 3
Deep learning study 3
 
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1
 
Back propagation
Back propagationBack propagation
Back propagation
 

Recently uploaded

Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 

Recently uploaded (20)

Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 

Abductive commonsense reasoning

  • 1. Abductive Commonsense Reasoning San Kim 2020.07.03 Chandra Bhagavatula(1), Ronan Le Bras(1), Chaitanya Malaviya(1), Keisuke Sakaguchi(1), Ari Holtzman(1), Hannah Rashkin(1), Doug Downey(1), Scott Wen-tau Yih(2), Yejin Choi(1,3) 1. Allen Institute for AI 2. Facebook AI 3. Paul G. Allen School of Computer Scient & Engineering
  • 2. Contributions 1. Abductive Commonsense Reasoning Challenges 1. Abductive NLI (𝛼𝑁𝐿𝐼) 2. Abductive NLG (𝛼𝑁𝐿𝐺) 2. ART: Dataset for Abductive Reasoning in Narrative Text 1. 20K Narrative Context 2. 200K Hypotheses 1. Experiments & Key Insights
  • 3. Abductive Commonsense Reasoning 𝜶𝑵𝑳𝑰 is formulated as multiple choice problems consisting of a pair of observations as context and a pair of hypothesis choices. 𝑂1: The observation at time 𝑡1 𝑂2: The observation at time 𝑡2 > 𝑡1. ℎ+ : A plausible hypothesis that explains the two observations 𝑂1 and 𝑂2. ℎ−: An implausible (or less plausible) hypothesis for observations 𝑂1 and 𝑂2. Abductive Natural Language Generation 𝜶𝑵𝑳𝑮 is the task of generating a valid hypothesis ℎ+ given the two observation 𝑂1 and 𝑂2. Formally, the task requires to maximize 𝑃 ℎ+ 𝑂1, 𝑂2 . Abductive Natural Language Inference
  • 5. A Probabilistic Framework for 𝛼𝑁𝐿𝐼 The 𝛼𝑁𝐿𝐼 task is to select the hypothesis ℎ∗ that is most probable given the observations. ℎ∗ = arg max ℎ 𝑖 𝑃 𝐻 = ℎ𝑖 𝑂1, 𝑂2 Rewriting the objective using Bayes Rule conditioned on 𝑂1, we have: 𝑃 ℎ𝑖 𝑂1, 𝑂2 ∝ 𝑃 𝑂2 𝐻 𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1 Illustration of the graphical models described in the probabilistic framework. Adopted from [1]
  • 6. A Probabilistic Framework for 𝛼𝑁𝐿𝐼 (a) Hypothesis Only • the strong assumption: the hypothesis is entirely independent of both observations, i.e. (𝐻 ⊥ 𝑂1, 𝑂2). • Maximize the marginal 𝑷 𝑯 . (b, c) First (or Second) Observation Only • Weaker assumption: the hypothesis depends on only one of the first 𝑂1 or second 𝑂2 observation. • Maximize the conditional probability 𝑷 𝑯 𝑶 𝟏 or 𝑷 𝑶 𝟐 𝑯 . Adopted from [1]
  • 7. A Probabilistic Framework for 𝛼𝑁𝐿𝐼 (d) Linear Chain • Uses both observations, but consider each observation’s influence on the hypothesis independently. • The model assumes that the three variables < 𝑶 𝟏, 𝑯, 𝑶 𝟐 > form a linear Markov chain. • The second observation is conditionally independent of the first, given the hypothesis (i.e. 𝑂1 ⊥ 𝑂2 𝐻)) • ℎ∗ = arg max ℎ 𝑖 𝑃 𝑂2 ℎ∗ 𝑃 ℎ∗ 𝑂1 𝑤ℎ𝑒𝑟𝑒 𝑂1 ⊥ 𝑂2 𝐻 (e) Fully Connected • Jointly models all three random variables. • Combine information across both observations to choose the correct hypothesis. • ℎ∗ = arg max ℎ 𝑖 𝑃 𝑂2 ℎ𝑖, 𝑂1 𝑃 ℎ𝑖 𝑂1 Adopted from [1]
  • 8. Difference btw/ the Linear Chain and Fully Connected model 𝑂1: Carl went to the store desperately searching for flour tortillas for a recipe. 𝑂2: Carl left the store very frustrated. ℎ1: The cashier was rude. ℎ2: The store had corn tortillas, but not flour ones. 𝑡0 𝑡1 𝑡2 Linear Chain 𝑂2 ℎ1 ℎ1 𝑂1𝑃 ) 𝑃 ) 𝑃 ) 𝑃 ) :Plausible! :Plausible! 𝑂2 ℎ2 ℎ2 𝑂1 :Plausible!:Plausible… Fully Connected 𝑂2 ℎ1 ℎ1 𝑂1𝑃 , ) 𝑃 ) 𝑃 ) :Plausible… :Plausible! 𝑂2 ℎ2 ℎ2 𝑂2 :Plausible!:Plausible! 𝑂1 𝑃 , )𝑂1 ℎ1 ℎ2 ℎ1 ℎ2 > <
  • 9. 𝛼𝑁𝐿𝐺 model Given ℎ+ = 𝑤1 ℎ ⋯ 𝑤𝑙 ℎ , 𝑂1 = 𝑤1 𝑜1 ⋯ 𝑤 𝑚 𝑜1 , 𝑂2 = 𝑤1 𝑜2 ⋯ 𝑤 𝑛 𝑜2 as sequence of tokens, The 𝛼𝑁𝐿𝐺 task can be modeled as 𝑷 𝒉+ 𝑶 𝟏, 𝑶 𝟐 = 𝑷 𝒘𝒊 𝒉 𝒘<𝒊 𝒉 , 𝒘 𝟏 𝒐 𝟏 ⋯ 𝒘 𝒎 𝒐 𝟏 , 𝒘 𝟏 𝒐 𝟐 ⋯ 𝒘 𝒏 𝒐 𝟐 Optionally, The model can also be conditioned on background knowledge 𝓚. ℒ = − 𝑖=1 𝑁 log 𝑃 𝑤𝑖 ℎ 𝑤<𝑖 ℎ , 𝑤1 𝑜1 ⋯ 𝑤 𝑚 𝑜1 , 𝑤1 𝑜2 ⋯ 𝑤 𝑛 𝑜2 , 𝒦 Adopted from [1]
  • 10. ConceptNet [5] • ConceptNet is a multilingual knowledge base • Is constructed of: • Nodes ( a word or a short phrase) • Edges (relation and weight) • ConceptNet is a multilingual knowledge base • Is constructed of: • Nodes ( a word or a short phrase) • Edges (relation and weight) Birds is not capable of … {bark, chew their food, breathe water} • Related Works • Wordnet • Microsoft Concept Net • Google knowledge base Adopted from [6]
  • 11. ATOMIC [2] 1. xIntent: Why does X cause an event? 2. xNeed: What does X need to do before the event? 3. xAttr: How would X be described? 4. xEffect: What effects does the event have on X? 5. xWant: What would X likely want to do after the event? 6. xReaction: How does X feel after the event? 7. oReact: How do others’ feel after the event? 8. oWant: What would others likely want to do after the event? 9. oEffect: What effects does the event have on others? If-Then Relation Types • If-Event-Then-Mental-State • If-Event-Then-Event • If-Event-Then-Persona Inference dimension Adopted from [2]
  • 12. ATOMIC • If-Event-Then-Mental-State: three relations relating to the mental pre- and post-conditions of an event. X compliments Y X wants to be nice X feels good Y feels flattered xIntent: likely intents of the event xReaction: likely (emotional) reactions of the event’s subject oReaction: likely (emotional) reactions of others Adopted from [2]
  • 13. ATOMIC • If-Event-Then-Event: five relations relating to events that constitute probable pre- and post-conditions of a given event. X calls the police X needs to dial 911 X starts to panic X wants to explain everything to the police xNeeds xEffect xWant oWant oEffect Others want to dispatch some officers Y will smileX pays Y a compliment • If-Event-Then-Persona: a stative relation that describes how the subject of an event is described or perceived. xAttrX is flatteringX pays Y a compliment xAttrX is caring
  • 15. COMET: COMmonsEnse Transformers for Automatic Knowledge Graph Construction [3] Adopted from [3]
  • 16. COMET: COMmonsEnse Transformers for Automatic Knowledge Graph Construction Adopted from [3]
  • 17. COMET: COMmonsEnse Transformers for Automatic Knowledge Graph Construction Loss function ℒ = − 𝑡= 𝑠 +|𝑟| 𝑠 + 𝑟 +|𝑜| log 𝑃 𝑥 𝑡 𝑥<𝑡 Input Template Notation • Natural language tuples in {s: subject, r: relation, o: object} • Subject tokens: 𝑿 𝒔 = 𝑥0 𝑠 , ⋯ , 𝑥 𝑠 𝑠 • Relation tokens: 𝑿 𝒓 = 𝑥0 𝑟 , ⋯ , 𝑥 𝑟 𝑟 • Object tokens: 𝑿 𝒐 = 𝑥0 𝑜 , ⋯ , 𝑥 𝑜 𝑜 Adopted from [3]
  • 18. COMET: COMmonsEnse Transformers for Automatic Knowledge Graph Construction
  • 19. Adversarial Filtering Algorithm • A significant challenge in creating datasets is avoiding annotation artifacts. • Annotation artifacts: unintentional patterns in the data that leak information about the target label. In each iteration 𝑖, • 𝑀𝑖: adversarial model • 𝒯𝑖: a random subset for training • 𝒱𝑖: validation set • ℎ 𝑘 + , ℎ 𝑘 − : plausible and implausible hypotheses for an instance 𝑘. • 𝛿 = ∆ 𝑀 𝑖 ℎ 𝑘 + , ℎ 𝑘 − : the difference in the model evaluation of ℎ 𝑘 + and ℎ 𝑘 − . With probability 𝑡𝑖, update instance 𝑘 that 𝑀𝑖 gets correct with a pair ℎ+, ℎ− ∈ ℋ𝑘 + × ℋ𝑘 − of hypotheses that reduces the value of 𝛿, where ℋ𝑘 + (resp. ℋ𝑘 − ) is the pool of plausible (resp. implausible) hypotheses for instance 𝑘. Adopted from [1]
  • 20. Performance of baselines (𝛼𝑁𝐿𝐼) Adopted from [1]
  • 21. Performance of baselines (𝛼𝑁𝐿𝐼) Adopted from [1]
  • 22. Performance of baselines (𝛼𝑁𝐿𝐺) Adopted from [1] Adopted from [1]
  • 23. BERT-Score Evaluating Text Generation with BERT (like inception-score, Frechet-inception-distance, kernel-inception-distance in image generation) Adopted from [4]
  • 24. BERT-Score BERT-Score 𝑅 𝐵𝐸𝑅𝑇 = 1 |𝑥| 𝑥 𝑖∈𝑥 max 𝑥 𝑗∈ 𝑥 𝐱 𝑖 ⊺ 𝐱𝑗 𝑃𝐵𝐸𝑅𝑇 = 1 | 𝑥| 𝑥ℎ∈ 𝑥 max 𝑥 𝑖∈𝑥 𝐱 𝑖 ⊺ 𝐱𝑗 𝐹𝐵𝐸𝑅𝑇 = 2 𝑃𝐵𝐸𝑅𝑇 ⋅ 𝑅 𝐵𝐸𝑅𝑇 𝑃𝐵𝐸𝑅𝑇 + 𝑅 𝐵𝐸𝑅𝑇 Importance Weighting using inverse document frequency(idf) 𝑖𝑑𝑓 𝑤 = − log 1 𝑀 𝑖=1 𝑀 𝕀 𝑤 ∈ 𝑥 𝑖 , where 𝕀 ⋅ is an indicator function. 𝑅 𝐵𝐸𝑅𝑇 = Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓 𝑥𝑖 max 𝑥 𝑗∈ 𝑥 𝐱 𝑖 ⊺ 𝐱𝑗 Σ 𝑥 𝑖∈𝑥 𝑖𝑑𝑓(𝑥𝑖) Baseline Rescaling: compute baseline b using Common Crawl monolingual datasets. 𝑅 𝐵𝐸𝑅𝑇 = 𝑅 𝐵𝐸𝑅𝑇 − 𝑏 1 − 𝑏
  • 26. Abductive Commonsense Reasoning 1. 𝑂1 ↛ ℎ−: ℎ− is unlikely to follow after the first observation 𝑂1. 2. ℎ− ↛ 𝑂2: ℎ− is plausible after 𝑂1 but unlikely to precede the second observation 𝑂2. 3. Plausible: 𝑂1, ℎ− , 𝑂2 is a coherent narrative and forms a plausible alternative, but it less plausible than 𝑂1, ℎ+ , 𝑂2 . Adopted from [1] Bert performance
  • 29. Reference [1] Chandra Bhagavatula et. al., Abductive Commonsense Reasoning, ICLR 2020. [2] Maarten Sap et. al., ATOMIC: An atlas of Machine Commonsense for If-Then Reasoning, arXiv:1811.00146 [3] Antonie Bosselut et. al., COMET: Commonsense Transformers for Automatic Knowledge Graph Construction, ACL 2019 [4] Tianyi Zhang et. al., BERTScore: Evaluating Text Generation with BERT, ICLR 2020. [5] Robyn Speer et. al., ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, arXiv:1612.03975 [6] MIT medialab, URL: https://conceptnet.io/