SlideShare a Scribd company logo
1 of 25
Tassilo Klein, Senior Research Scientist, SAP AI Research
September 7th , 2020
Towards Unsupervised Commonsense Reasoning
GPT-2 Media Coverage
Source: https://github.com/minimaxir/gpt-3-experiments/, https://twitter.com/yoavgo, https://lacker.io/
Q: Who were the founders of Microsoft?
A: Bill Gates and Paul Allen were the
founders of Microsoft.
……………………………………………………………………
GPT-3 >
GPT-3 >
GPT-3 > _
Q: What are cubs and cowboys?
A: Cubs are baby bears. Cowboys are people
who work on ranches.
……………………………………………………………………
GPT-3 >
GPT-3 >
GPT-3 > _
Q: Which is heavier, a mouse or an elephant?
A: An elephant is heavier than a mouse.
……………………………………………………………………
GPT-3 >
GPT-3 >
GPT-3 > _
Q: Are there any animals with three legs?
A: No, there are no animals with three legs.
……………………………………………………………………
GPT-3 >
GPT-3 >
GPT-3 > _
GPT-3 > _
Source: https://github.com/minimaxir/gpt-3-experiments/, https://twitter.com/yoavgo, https://lacker.io/
Q: What do dolphins and eagles have in common?
A: Both are birds.
……………………………………………………………………
GPT-3 >
GPT-3 >
GPT-3 > _
Q: What do the words ‘dolphins’ and ‘eagles’
have in common?
A: Both words start with the letter ‘e’.
……………………………………………………………………
GPT-3 >
GPT-3 >
GPT-3 > _
GPT-3 > _
GPT-3 Media Coverage
What is common sense?
• Concept is ambiguous
• Depends on many factors, e.g., culture, age, education, etc.
• Generic: natural laws, social conventions to unwritten rules
• Examples: “Knives cut things”, “Your parents are older than you”,
“Objects don’t disappear when you cover them with a blanket.”,
“Animals don’t drive cars”.
Definitions of common sense
• Merriam Webster - “sound and prudent judgment based
on a simple perception of the situation or facts”
• Cambridge Dictionary – “the basic level of practical
knowledge and judgment that we all need to help us live in
a reasonable and safe way.”
• […..]
• “Sound judgment derived from experience rather than
study”
• “Sound and prudent judgment based on a simple
perception of the situation or facts”
Common Sense in AI
“We shall therefore say that a program has common sense if it automatically
deduces for itself a sufficiently wide class of immediate consequences of anything
it is told and what it already knows.
[…]
Our ultimate objective is to make programs that learn from their experience as
effectively as humans do.”
John McCarthy, “Programs with Common Sense”, 1958
The great irony of common sense—and indeed AI itself—is that it is stuff that
everybody knows, yet nobody seems to know what exactly it is or how to build
machines that have it.”,
Gary Marcus. “Rebooting AI: Building Artificial Intelligence We Can Trust”, 2019
AI’s struggle with common sense
• Common sense completeness issue - the lack of a precise definition
• Supervised learning intractable
• Common training data for LM (e.g., Wikipedia) does not contain
commonsense knowledge (assumed triviality)
• Deep learning
• Great at pattern recognition, poor at adaptation
• Hard to incorporating abstract knowledge
• Common AI training is goal-oriented fashion, e.g., backpropagation
• Issue: pure goal-orientation leads to shortcuts, unable to generalize, no human-like
reasoning
• Ideal goal: fundamental rethinking learning - leveraging existing knowledge
vs
Human commonsense reasoning
• Human-like reasoning
• Extremely complex
• Intrinsics are far from being fully understood
• Captures time, space, causality, basic knowledge of physical objects
and their interactions
• Mechanisms such as conceptualization and compositionality
• Conceptualization is an abstract, simplified view of the world that we wish to
represent for some purpose, Gruber, 1995
• ”Concepts are the glue that holds our mental world together”, Murphy, 2002
• Compositionality is the capacity to understand and produce novel
combinations from known components, Montague, 1970
“The Big Book of Concepts”, Gregory Murphy, 2002“Universal grammar”, Richard Montague, 1970
“Toward principles for the design of ontologies used for knowledge sharing”, Gruber, 1995
Applications: Human-Centered AI & Robust AI
• Human-centered AI
• Advanced chatbots
• Assistants
• Interpretable AI
• Robust AI
• Generalization: distribution of events is long-tailed
• Infrequent & significant, e.g., “black swans”
“The Black Swan: The Impact of the Highly Improbable”, Nassim Nicholas Taleb, 2007
tail
Testing Common Sense Reasoning – Winograd Schemas
• Alternative to the Turing Test for commonsense reasoning
• Winograd schema introduced by Terry Winograd (1972)
• Schema structure:
• A sentence with two parties
• An ambigious pronoun refering to one of them
• Trigger-word(s) induce flipping the answer
• Objective: What does the pronoun refer to?
• Winograd Schema Challenge (WSC): 273 Multiple-choice questions
• “Google-proof” Winograd schemas, manually curated by AI experts
• Easy for humans, hard for machines
Turing, “Computing machinery and intelligence”, 1950, Winograd, “Understanding Natural Language”, 1972,
Levesque et al., “The Winograd Schema Challenge”, 2012
Example: “The trophy does not fit into the suitcase, because it is too big.”
“The trophy does not fit into the suitcase, because it is too small.”
Approaches
• Feature-based approaches: explicit rules from knowledge bases, internet search
queries, logic-based systems
• Neural-network-based approaches: semantic similarities on word embeddings,
RNNs/LSTMs to encode the local context, pre-trained on unstructured data
• Recently: Leveraging language model (LM) trained on large amounts of
unsupervised text, e.g., BERT
• Unsupervised: BERT LM likelihood scoring
• Supervised: BERT Masked LM Model
The trophy does not fit into the suitcase, because [MASK] is too big.
Answer: The trophy
Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding”, 2018, Kocijan et al., “A Surprisingly Robust Trick for Winograd Schema Challenge”, ACL, 2019,
Trinh and Le, “A Simple Method for Commonsense Reasoning”, 2018, Kocijan et al. “A Review of Winograd Schema Challenge Datasets and Approaches”, 2020
ScoreLM(“The trophy does not fit into the suitcase, because the trophy is too big.”)
ScoreLM(“The trophy does not fit into the suitcase, because the suitcase is too big.”)
BERT - Transformer Encoder Stack
http://jalammar.github.io/illustrated-transformer/Source:
“Deconstructing BERT, Part 2: Visualizing the Inner
Workings of Attention” https://towardsdatascience.com/
• 𝐴 ∈ ℝ 𝐿×𝐻×|𝐶|
• L: #layers
• H: #heads
• C: sequence
• Idea: Leverage the
attention tensor 𝐴
L
H
|C|
Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding”, 2018
Maximum Attention Score (MAS)
• Re-implementation of BERT for commonsense reasoning
• Exploitation of the associative leverage of self-attention
• Idea: max-pooling on attention level
• Retaining attention of a candidate only where it is most dominant
• Frequency of occurrence to weight the importance
• Implementation:
• Slicing attention tensor A into attention matrices Ac
• Isolate dominant links with a binary mask matrix Mc
• Score = Ratio of sum of masked attention values
ACL’19
“Attention Is (not) All You Need for Commonsense Reasoning”, Klein and Nabi, 2019
0 0 0
1 0 0
1 0 1
MCAC
MAS – Schematic Illustration
0.2 0.1 0.5
0.1 0.7 0.5
0 0.2 0.2
0.1 0 0.4
0.2 0.6 0.4
0.1 0.1 0.3
0.2 0.1 0.5
0 0.7 0.5
0 0.2 0
0 0 0
0.2 0 0
0.1 0 0.3
The trophy doesn’t fit in the suitcase because IT is too small
0 0.2
A1: A2:
“Attention Is (not) All You Need for Commonsense Reasoning”, Klein and Nabi, 2019
ACL’19
+ +
vs
Quantitative Results
Davis et al., “Human tests of materials for the winograd schema challenge”, 2016, Levesque et al., “The Winograd Schema Challenge”, 2012
Kocijan et al., “A Surprisingly Robust Trick for Winograd Schema Challenge”, 2019
Qualitative Results
1.0
0.5
0.0
The drain is clogged with hair. It has to be cleaned.
The drain is clogged with hair. It has to be removed.
Steve follows Fred's example in everything. He admires him hugely.
Steve follows Fred's example in everything. He influences him hugely.
The fish ate the worm . It was hungry.
The fish ate the worm . It was tasty.
The trophy doesn’t fit into the suitcase, because it is too big.
The trophy doesn’t fit into the suitcase, because it is too big.
Probability
The foxes are attacking the chickens at night. I have to kill them.
The foxes are attacking the chickens at night. I have to guard them.
Can we do better?
• Task: Devising a difficult task that allows to capture a deeper notion of
commonsense and generalize, without labels
• Idea:
• Exploit Structural Prior (no labels needed) → Mutual-Exclusivity
• Find consistency in answers
The trophy does not fit into the suitcase, because the trophy is too big.
The trophy does not fit into the suitcase, because the trophy is too small.
The trophy does not fit into the suitcase, because the suitcase is too big.
The trophy does not fit into the suitcase, because the suitcase is too small.
or
The trophy doesn’t fit into the suitcase, because it is too big/too small.
Contrastive Self-Supervised (CSS) - Method I
• BERT Masked LM Model
• Pair-level: “soft” mutual-exclusiveness using LM likelihoods (MEx)
The trophy does not fit into the suitcase, because [MASK] is too big.
Candidate 1: The trophy Candidate 2: The suitcase
Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding”, 2018
Sajjadi et al., “Regularization with stochastic transformations and perturbations for deep semi-supervised learning”, 2016
The trophy does not fit into the suitcase, because [MASK] is too small.
⊕ XOR operator
𝑖=1
𝑘
𝑐𝑖 ⟹
𝑖=1
𝑘
𝑝𝑖
𝑖=1
𝑘
𝑐𝑖 ⟹
𝑖=1
𝑘
𝑝𝑖
𝑐𝑖,𝑗 ∈ {0,1} candidate j in sentence i
¬𝑐𝑖 ⟹ (1 − 𝑝𝑖)
(𝑐𝑖,1 ⊕ 𝑐𝑖+1,1)⋀(𝑐𝑖,2 ⊕ 𝑐𝑖+1,2)⋀(𝑐𝑖,1 ⊕ 𝑐𝑖,2)
𝑝𝑖,1 𝑝𝑖+1,2 1 − 𝑝𝑖,2 𝑝𝑖+1,1 +
𝑝𝑖,2 𝑝𝑖+1,1 1 − 𝑝𝑖,1 𝑝𝑖+1,2
𝑝𝑖,𝑗 ∈ 0,1 LM likelihood of 𝑐𝑖,𝑗
ACL’20
Contrastive Self-Supervised (CSS) - Method II
• Sentence Level: Contrastive margin (CM)
• Training
• Joint loss:
• Self-supervised fine-tuning on DPR (no labels needed)
• Definite Pronoun Resolution (DPR) is similar to WSC-273
• Relaxed Winograd scheme constraints, i.e., dataset are not Google-proof.
• 1322 training samples
Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding”, 2018, Rahman and Ng, “Resolving complex cases of definite pronouns: The Winograd schema challenge”, 2012
𝔏 𝑓𝜃 = 𝔏(𝑓𝜃) 𝑀𝐸𝑥
+ 𝔏(𝑓𝜃) 𝐶𝑀
𝐿𝑎𝑛𝑔𝑢𝑎𝑔𝑒 𝑚𝑜𝑑𝑒𝑙 𝑓, 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑖𝑧𝑒𝑑 𝑏𝑦 𝜃
The trophy does not fit into the suitcase, because [MASK] is too big.
Candidate 1: The trophy Candidate 2: The suitcase
𝑚𝑎𝑥 𝑝𝑖,𝑗 − 𝑝𝑖,𝑗+1
ACL’20
CSS - Schematic Loss Illustration
Candidate 1 Candidate 2
LM Loss
The trophy does not fit into the suitcase, because it is too big.
The trophy does not fit into the suitcase, because it is too small.
MExLoss
S1:
S2:
1.0
0.5
0.0
MEx Loss
Contrastive margin
LMLoss
LMLikelihood
Results
Davis et al., “Human tests of materials for the winograd schema challenge”, 2016, Levesque et al., “The Winograd Schema Challenge”, 2012, Rahman and Ng, “Resolving complex cases of definite
pronouns: The Winograd schema challenge”, 2012, Emami et al., “The knowref coreference corpus: Removing gender and number cues for difficult pronominal anaphora resolution”, 2019
Conclusion
• BERT implicitly establishes complex relationships between entities
• Self-supervision is possible for commonsense reasoning
• Leveraging structural prior (mutual-exclusivity) instead of direct
supervision
• Outperforming all unsupervised approaches
• Comparable performance to supervised approaches
• Future work
• Relaxing the structural prior of twin-pairs
• Transferal of inductive bias to other commonsense-demanding downstream
tasks, e.g., Q&A
Q: What are the names of the papers presented?
A: “Attention Is (not) All You Need for
Commonsense Reasoning”, “Contrastive Self-
Supervised Learning for Commonsense Reasoning”
GPT-3 >
GPT-3 > _
Q: What’s the Github repo for the papers?
A: https://github.com/SAP-samples/acl2019-commonsense,
https://github.com/SAP-samples/acl2020-commonsense
GPT-3 >
GPT-3 > _
Q: Does SAP AI Research offer internships?
A: Yes, check out: https://jobs.sap.com/GPT-3 >
GPT-3 > _
Q: How to contact the presenter?
A: tassilo.klein@sap.com, tjklein.github.ioGPT-3 >
GPT-3 > _
Thanks for your attention
SAP AI Research, Berlin
Q: Is there anything else I should know?
A: Yes, check out the our research blog:
https://medium.com/sap-machine-learning-research
GPT-3 >
GPT-3 > _
GPT-3 > _

More Related Content

What's hot

From deep learning to deep reasoning
From deep learning to deep reasoningFrom deep learning to deep reasoning
From deep learning to deep reasoningDeakin University
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deakin University
 
Connectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceConnectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceAlaa Al Dahdouh
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning David Voyles
 
Meaning and the Semantic Web
Meaning and the Semantic WebMeaning and the Semantic Web
Meaning and the Semantic WebPhiloWeb
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deakin University
 

What's hot (7)

From deep learning to deep reasoning
From deep learning to deep reasoningFrom deep learning to deep reasoning
From deep learning to deep reasoning
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1
 
Connectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceConnectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial Intelligence
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
Meaning and the Semantic Web
Meaning and the Semantic WebMeaning and the Semantic Web
Meaning and the Semantic Web
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2
 

Similar to Towads Unsupervised Commonsense Reasoning in AI

Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
Reimagining authentic curriculum in the age of AI
Reimagining authentic curriculum in the age of AIReimagining authentic curriculum in the age of AI
Reimagining authentic curriculum in the age of AICharles Darwin University
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
DSL for Leadership Education
DSL for Leadership EducationDSL for Leadership Education
DSL for Leadership EducationAdam Jessep
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Miningebelani
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningJustin Beirold
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
BSSML17 - Topic Models
BSSML17 - Topic ModelsBSSML17 - Topic Models
BSSML17 - Topic ModelsBigML, Inc
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsSeldon
 
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...Artificial Intelligence Institute at UofSC
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeDeakin University
 
Ncsm.4.21.10.part1
Ncsm.4.21.10.part1Ncsm.4.21.10.part1
Ncsm.4.21.10.part1ihor
 
Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Kim Flintoff
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxGreg Makowski
 

Similar to Towads Unsupervised Commonsense Reasoning in AI (20)

Where Does It Break?
Where Does It Break?Where Does It Break?
Where Does It Break?
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Reimagining authentic curriculum in the age of AI
Reimagining authentic curriculum in the age of AIReimagining authentic curriculum in the age of AI
Reimagining authentic curriculum in the age of AI
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
BCII 2016 - Visualizing Complexity
BCII 2016 - Visualizing ComplexityBCII 2016 - Visualizing Complexity
BCII 2016 - Visualizing Complexity
 
DSL for Leadership Education
DSL for Leadership EducationDSL for Leadership Education
DSL for Leadership Education
 
AI Presentation 1
AI Presentation 1AI Presentation 1
AI Presentation 1
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine Learning
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
BSSML17 - Topic Models
BSSML17 - Topic ModelsBSSML17 - Topic Models
BSSML17 - Topic Models
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
 
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep ...
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI Landscape
 
Ncsm.4.21.10.part1
Ncsm.4.21.10.part1Ncsm.4.21.10.part1
Ncsm.4.21.10.part1
 
Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Towads Unsupervised Commonsense Reasoning in AI

  • 1. Tassilo Klein, Senior Research Scientist, SAP AI Research September 7th , 2020 Towards Unsupervised Commonsense Reasoning
  • 3. Source: https://github.com/minimaxir/gpt-3-experiments/, https://twitter.com/yoavgo, https://lacker.io/ Q: Who were the founders of Microsoft? A: Bill Gates and Paul Allen were the founders of Microsoft. …………………………………………………………………… GPT-3 > GPT-3 > GPT-3 > _ Q: What are cubs and cowboys? A: Cubs are baby bears. Cowboys are people who work on ranches. …………………………………………………………………… GPT-3 > GPT-3 > GPT-3 > _ Q: Which is heavier, a mouse or an elephant? A: An elephant is heavier than a mouse. …………………………………………………………………… GPT-3 > GPT-3 > GPT-3 > _ Q: Are there any animals with three legs? A: No, there are no animals with three legs. …………………………………………………………………… GPT-3 > GPT-3 > GPT-3 > _ GPT-3 > _
  • 4. Source: https://github.com/minimaxir/gpt-3-experiments/, https://twitter.com/yoavgo, https://lacker.io/ Q: What do dolphins and eagles have in common? A: Both are birds. …………………………………………………………………… GPT-3 > GPT-3 > GPT-3 > _ Q: What do the words ‘dolphins’ and ‘eagles’ have in common? A: Both words start with the letter ‘e’. …………………………………………………………………… GPT-3 > GPT-3 > GPT-3 > _ GPT-3 > _
  • 6. What is common sense? • Concept is ambiguous • Depends on many factors, e.g., culture, age, education, etc. • Generic: natural laws, social conventions to unwritten rules • Examples: “Knives cut things”, “Your parents are older than you”, “Objects don’t disappear when you cover them with a blanket.”, “Animals don’t drive cars”.
  • 7. Definitions of common sense • Merriam Webster - “sound and prudent judgment based on a simple perception of the situation or facts” • Cambridge Dictionary – “the basic level of practical knowledge and judgment that we all need to help us live in a reasonable and safe way.” • […..] • “Sound judgment derived from experience rather than study” • “Sound and prudent judgment based on a simple perception of the situation or facts”
  • 8. Common Sense in AI “We shall therefore say that a program has common sense if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows. […] Our ultimate objective is to make programs that learn from their experience as effectively as humans do.” John McCarthy, “Programs with Common Sense”, 1958 The great irony of common sense—and indeed AI itself—is that it is stuff that everybody knows, yet nobody seems to know what exactly it is or how to build machines that have it.”, Gary Marcus. “Rebooting AI: Building Artificial Intelligence We Can Trust”, 2019
  • 9. AI’s struggle with common sense • Common sense completeness issue - the lack of a precise definition • Supervised learning intractable • Common training data for LM (e.g., Wikipedia) does not contain commonsense knowledge (assumed triviality) • Deep learning • Great at pattern recognition, poor at adaptation • Hard to incorporating abstract knowledge • Common AI training is goal-oriented fashion, e.g., backpropagation • Issue: pure goal-orientation leads to shortcuts, unable to generalize, no human-like reasoning • Ideal goal: fundamental rethinking learning - leveraging existing knowledge vs
  • 10. Human commonsense reasoning • Human-like reasoning • Extremely complex • Intrinsics are far from being fully understood • Captures time, space, causality, basic knowledge of physical objects and their interactions • Mechanisms such as conceptualization and compositionality • Conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose, Gruber, 1995 • ”Concepts are the glue that holds our mental world together”, Murphy, 2002 • Compositionality is the capacity to understand and produce novel combinations from known components, Montague, 1970 “The Big Book of Concepts”, Gregory Murphy, 2002“Universal grammar”, Richard Montague, 1970 “Toward principles for the design of ontologies used for knowledge sharing”, Gruber, 1995
  • 11. Applications: Human-Centered AI & Robust AI • Human-centered AI • Advanced chatbots • Assistants • Interpretable AI • Robust AI • Generalization: distribution of events is long-tailed • Infrequent & significant, e.g., “black swans” “The Black Swan: The Impact of the Highly Improbable”, Nassim Nicholas Taleb, 2007 tail
  • 12. Testing Common Sense Reasoning – Winograd Schemas • Alternative to the Turing Test for commonsense reasoning • Winograd schema introduced by Terry Winograd (1972) • Schema structure: • A sentence with two parties • An ambigious pronoun refering to one of them • Trigger-word(s) induce flipping the answer • Objective: What does the pronoun refer to? • Winograd Schema Challenge (WSC): 273 Multiple-choice questions • “Google-proof” Winograd schemas, manually curated by AI experts • Easy for humans, hard for machines Turing, “Computing machinery and intelligence”, 1950, Winograd, “Understanding Natural Language”, 1972, Levesque et al., “The Winograd Schema Challenge”, 2012 Example: “The trophy does not fit into the suitcase, because it is too big.” “The trophy does not fit into the suitcase, because it is too small.”
  • 13. Approaches • Feature-based approaches: explicit rules from knowledge bases, internet search queries, logic-based systems • Neural-network-based approaches: semantic similarities on word embeddings, RNNs/LSTMs to encode the local context, pre-trained on unstructured data • Recently: Leveraging language model (LM) trained on large amounts of unsupervised text, e.g., BERT • Unsupervised: BERT LM likelihood scoring • Supervised: BERT Masked LM Model The trophy does not fit into the suitcase, because [MASK] is too big. Answer: The trophy Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding”, 2018, Kocijan et al., “A Surprisingly Robust Trick for Winograd Schema Challenge”, ACL, 2019, Trinh and Le, “A Simple Method for Commonsense Reasoning”, 2018, Kocijan et al. “A Review of Winograd Schema Challenge Datasets and Approaches”, 2020 ScoreLM(“The trophy does not fit into the suitcase, because the trophy is too big.”) ScoreLM(“The trophy does not fit into the suitcase, because the suitcase is too big.”)
  • 14. BERT - Transformer Encoder Stack http://jalammar.github.io/illustrated-transformer/Source: “Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention” https://towardsdatascience.com/ • 𝐴 ∈ ℝ 𝐿×𝐻×|𝐶| • L: #layers • H: #heads • C: sequence • Idea: Leverage the attention tensor 𝐴 L H |C| Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding”, 2018
  • 15. Maximum Attention Score (MAS) • Re-implementation of BERT for commonsense reasoning • Exploitation of the associative leverage of self-attention • Idea: max-pooling on attention level • Retaining attention of a candidate only where it is most dominant • Frequency of occurrence to weight the importance • Implementation: • Slicing attention tensor A into attention matrices Ac • Isolate dominant links with a binary mask matrix Mc • Score = Ratio of sum of masked attention values ACL’19 “Attention Is (not) All You Need for Commonsense Reasoning”, Klein and Nabi, 2019 0 0 0 1 0 0 1 0 1 MCAC
  • 16. MAS – Schematic Illustration 0.2 0.1 0.5 0.1 0.7 0.5 0 0.2 0.2 0.1 0 0.4 0.2 0.6 0.4 0.1 0.1 0.3 0.2 0.1 0.5 0 0.7 0.5 0 0.2 0 0 0 0 0.2 0 0 0.1 0 0.3 The trophy doesn’t fit in the suitcase because IT is too small 0 0.2 A1: A2: “Attention Is (not) All You Need for Commonsense Reasoning”, Klein and Nabi, 2019 ACL’19 + + vs
  • 17. Quantitative Results Davis et al., “Human tests of materials for the winograd schema challenge”, 2016, Levesque et al., “The Winograd Schema Challenge”, 2012 Kocijan et al., “A Surprisingly Robust Trick for Winograd Schema Challenge”, 2019
  • 18. Qualitative Results 1.0 0.5 0.0 The drain is clogged with hair. It has to be cleaned. The drain is clogged with hair. It has to be removed. Steve follows Fred's example in everything. He admires him hugely. Steve follows Fred's example in everything. He influences him hugely. The fish ate the worm . It was hungry. The fish ate the worm . It was tasty. The trophy doesn’t fit into the suitcase, because it is too big. The trophy doesn’t fit into the suitcase, because it is too big. Probability The foxes are attacking the chickens at night. I have to kill them. The foxes are attacking the chickens at night. I have to guard them.
  • 19. Can we do better? • Task: Devising a difficult task that allows to capture a deeper notion of commonsense and generalize, without labels • Idea: • Exploit Structural Prior (no labels needed) → Mutual-Exclusivity • Find consistency in answers The trophy does not fit into the suitcase, because the trophy is too big. The trophy does not fit into the suitcase, because the trophy is too small. The trophy does not fit into the suitcase, because the suitcase is too big. The trophy does not fit into the suitcase, because the suitcase is too small. or The trophy doesn’t fit into the suitcase, because it is too big/too small.
  • 20. Contrastive Self-Supervised (CSS) - Method I • BERT Masked LM Model • Pair-level: “soft” mutual-exclusiveness using LM likelihoods (MEx) The trophy does not fit into the suitcase, because [MASK] is too big. Candidate 1: The trophy Candidate 2: The suitcase Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding”, 2018 Sajjadi et al., “Regularization with stochastic transformations and perturbations for deep semi-supervised learning”, 2016 The trophy does not fit into the suitcase, because [MASK] is too small. ⊕ XOR operator 𝑖=1 𝑘 𝑐𝑖 ⟹ 𝑖=1 𝑘 𝑝𝑖 𝑖=1 𝑘 𝑐𝑖 ⟹ 𝑖=1 𝑘 𝑝𝑖 𝑐𝑖,𝑗 ∈ {0,1} candidate j in sentence i ¬𝑐𝑖 ⟹ (1 − 𝑝𝑖) (𝑐𝑖,1 ⊕ 𝑐𝑖+1,1)⋀(𝑐𝑖,2 ⊕ 𝑐𝑖+1,2)⋀(𝑐𝑖,1 ⊕ 𝑐𝑖,2) 𝑝𝑖,1 𝑝𝑖+1,2 1 − 𝑝𝑖,2 𝑝𝑖+1,1 + 𝑝𝑖,2 𝑝𝑖+1,1 1 − 𝑝𝑖,1 𝑝𝑖+1,2 𝑝𝑖,𝑗 ∈ 0,1 LM likelihood of 𝑐𝑖,𝑗 ACL’20
  • 21. Contrastive Self-Supervised (CSS) - Method II • Sentence Level: Contrastive margin (CM) • Training • Joint loss: • Self-supervised fine-tuning on DPR (no labels needed) • Definite Pronoun Resolution (DPR) is similar to WSC-273 • Relaxed Winograd scheme constraints, i.e., dataset are not Google-proof. • 1322 training samples Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding”, 2018, Rahman and Ng, “Resolving complex cases of definite pronouns: The Winograd schema challenge”, 2012 𝔏 𝑓𝜃 = 𝔏(𝑓𝜃) 𝑀𝐸𝑥 + 𝔏(𝑓𝜃) 𝐶𝑀 𝐿𝑎𝑛𝑔𝑢𝑎𝑔𝑒 𝑚𝑜𝑑𝑒𝑙 𝑓, 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑖𝑧𝑒𝑑 𝑏𝑦 𝜃 The trophy does not fit into the suitcase, because [MASK] is too big. Candidate 1: The trophy Candidate 2: The suitcase 𝑚𝑎𝑥 𝑝𝑖,𝑗 − 𝑝𝑖,𝑗+1 ACL’20
  • 22. CSS - Schematic Loss Illustration Candidate 1 Candidate 2 LM Loss The trophy does not fit into the suitcase, because it is too big. The trophy does not fit into the suitcase, because it is too small. MExLoss S1: S2: 1.0 0.5 0.0 MEx Loss Contrastive margin LMLoss LMLikelihood
  • 23. Results Davis et al., “Human tests of materials for the winograd schema challenge”, 2016, Levesque et al., “The Winograd Schema Challenge”, 2012, Rahman and Ng, “Resolving complex cases of definite pronouns: The Winograd schema challenge”, 2012, Emami et al., “The knowref coreference corpus: Removing gender and number cues for difficult pronominal anaphora resolution”, 2019
  • 24. Conclusion • BERT implicitly establishes complex relationships between entities • Self-supervision is possible for commonsense reasoning • Leveraging structural prior (mutual-exclusivity) instead of direct supervision • Outperforming all unsupervised approaches • Comparable performance to supervised approaches • Future work • Relaxing the structural prior of twin-pairs • Transferal of inductive bias to other commonsense-demanding downstream tasks, e.g., Q&A
  • 25. Q: What are the names of the papers presented? A: “Attention Is (not) All You Need for Commonsense Reasoning”, “Contrastive Self- Supervised Learning for Commonsense Reasoning” GPT-3 > GPT-3 > _ Q: What’s the Github repo for the papers? A: https://github.com/SAP-samples/acl2019-commonsense, https://github.com/SAP-samples/acl2020-commonsense GPT-3 > GPT-3 > _ Q: Does SAP AI Research offer internships? A: Yes, check out: https://jobs.sap.com/GPT-3 > GPT-3 > _ Q: How to contact the presenter? A: tassilo.klein@sap.com, tjklein.github.ioGPT-3 > GPT-3 > _ Thanks for your attention SAP AI Research, Berlin Q: Is there anything else I should know? A: Yes, check out the our research blog: https://medium.com/sap-machine-learning-research GPT-3 > GPT-3 > _ GPT-3 > _