SlideShare a Scribd company logo
Near human performance
in question answering?
Yoav Goldberg
Bar Ilan University
Near Human Performance
on the SQUAD Dataset
• "Neural systems achieve near-human performance
on Question Answering".
• Not really, let's see why.
Restricted QA Setup
• Restricted to questions that can be answered by
span selection.
• Need to find the answer in a given paragraph.
• The answer is guaranteed to be in the paragraph.
• Annotators see the paragraph when asking the
question, resulting in high lexical similarity
between question and answer.
Near Human Performance?
• Human performance: 91.2 F1.

Current best system: 84.0 F1.
• Humans on MTurk. Who are instructed to answer 5
answers in 2 minutes. That's 16 cent per question.
• Humans are wrong mostly in span boundaries.
• Also, when doing max-vote between several
humans, human perf goes up substantially.
How hard is the dataset?
• Do the questions require complex reasoning, or
can they be "cheated" using superficial clues?
The Rankie cycle is sometimes referred to as _________
words from questions some verb only NP in sent
______ have powers of ..... and veto .....
words from questionNoun Phrase
______ have powers of ..... and veto .....
words from questionNoun Phrase
all of this can be ignored...
... Shakespeare scholar _____
What Shakespeare scholar is ....?
hold
hold Noun Phrase
collection
collection
what is goal of criminal punishment
is goal of criminal punishment_____
all can be "cheated away" using
some smart template matching.
all can be "cheated away" using
some smart template matching.
and the template-matching systems
can be easily fooled by tailored examples
that don't fool humans
(Percy Liang, personal communication)
all can be "cheated away" using
some smart template matching.
and the template-matching systems
can be easily fooled by tailored examples
that don't fool humans
(Percy Liang, personal communication)Update: paper by Jia and Liang demonstrates this:
https://arxiv.org/abs/1707.07328
To Summarize
• DL methods gets near human performance on SQUAD but:
• Still 84 F1 vs. 91.2 F1.
• Restricted QA Setting (span selection, within paragraph,
answer always present, high lexical overlap).
• Compared to under-incentivized humans.
• (91.2 is a low estimate of human performance)
• Questions can be answered with "cheating".
• (84.0 is a high estimate of DL performance)
Take away
• Neural systems / RNNs / ConvNets do very clever
pattern matching. Not "intelligence", not
"reasoning".
• Not everything can be pattern-matched.
• Pattern matchers can be easily fooled.

More Related Content

Similar to Near human performance in question answering?

Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
Ana Marasović
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
Ayodele Odubela
 
Sippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest LouisvilleSippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest Louisville
Dawn Yankeelov
 
AI - history and recent breakthroughs
AI - history and recent breakthroughs AI - history and recent breakthroughs
AI - history and recent breakthroughs
Armando Vieira
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
A Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its ApplicationsA Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its Applications
Karen Gomez
 
Biological Foundations for Deep Learning: Towards Decision Networks
 Biological Foundations for Deep Learning: Towards Decision Networks Biological Foundations for Deep Learning: Towards Decision Networks
Biological Foundations for Deep Learning: Towards Decision Networks
diannepatricia
 
The 4th New Science
The 4th New ScienceThe 4th New Science
The 4th New Science
grplunkett
 
QuestionPro Integrates with TryMyUI to Launch the Survey Respondent Score
QuestionPro Integrates with TryMyUI to Launch the Survey Respondent ScoreQuestionPro Integrates with TryMyUI to Launch the Survey Respondent Score
QuestionPro Integrates with TryMyUI to Launch the Survey Respondent Score
James Wirth
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
Rinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine LearningRinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine Learning
Anna Chaney
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
Derek Kane
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deployment
Sandy Man
 
From Human Intelligence to Machine Intelligence
From Human Intelligence to Machine IntelligenceFrom Human Intelligence to Machine Intelligence
From Human Intelligence to Machine Intelligence
NUS-ISS
 
Machine Learning Introduction.pptx
Machine Learning Introduction.pptxMachine Learning Introduction.pptx
Machine Learning Introduction.pptx
Jeeva Nantham
 
Automatically Labeling Facts in a Never-Ending Langue Learning system
Automatically Labeling Facts in a Never-Ending Langue Learning systemAutomatically Labeling Facts in a Never-Ending Langue Learning system
Automatically Labeling Facts in a Never-Ending Langue Learning system
Estevam Hruschka
 
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale DataPredicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
philippbayer
 
Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0
Cameron Aaron
 
Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0Cameron Aaron
 
Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking Generalization
Ahmet Kuzubaşlı
 

Similar to Near human performance in question answering? (20)

Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
Sippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest LouisvilleSippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest Louisville
 
AI - history and recent breakthroughs
AI - history and recent breakthroughs AI - history and recent breakthroughs
AI - history and recent breakthroughs
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
A Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its ApplicationsA Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its Applications
 
Biological Foundations for Deep Learning: Towards Decision Networks
 Biological Foundations for Deep Learning: Towards Decision Networks Biological Foundations for Deep Learning: Towards Decision Networks
Biological Foundations for Deep Learning: Towards Decision Networks
 
The 4th New Science
The 4th New ScienceThe 4th New Science
The 4th New Science
 
QuestionPro Integrates with TryMyUI to Launch the Survey Respondent Score
QuestionPro Integrates with TryMyUI to Launch the Survey Respondent ScoreQuestionPro Integrates with TryMyUI to Launch the Survey Respondent Score
QuestionPro Integrates with TryMyUI to Launch the Survey Respondent Score
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
 
Rinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine LearningRinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine Learning
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deployment
 
From Human Intelligence to Machine Intelligence
From Human Intelligence to Machine IntelligenceFrom Human Intelligence to Machine Intelligence
From Human Intelligence to Machine Intelligence
 
Machine Learning Introduction.pptx
Machine Learning Introduction.pptxMachine Learning Introduction.pptx
Machine Learning Introduction.pptx
 
Automatically Labeling Facts in a Never-Ending Langue Learning system
Automatically Labeling Facts in a Never-Ending Langue Learning systemAutomatically Labeling Facts in a Never-Ending Langue Learning system
Automatically Labeling Facts in a Never-Ending Langue Learning system
 
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale DataPredicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
 
Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0
 
Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0Machine creativity TED Talk 2.0
Machine creativity TED Talk 2.0
 
Understanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking GeneralizationUnderstanding Deep Learning Requires Rethinking Generalization
Understanding Deep Learning Requires Rethinking Generalization
 

More from MLReview

Bayesian Non-parametric Models for Data Science using PyMC
 Bayesian Non-parametric Models for Data Science using PyMC Bayesian Non-parametric Models for Data Science using PyMC
Bayesian Non-parametric Models for Data Science using PyMC
MLReview
 
Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
  Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...  Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
MLReview
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
MLReview
 
PixelGAN Autoencoders
  PixelGAN Autoencoders  PixelGAN Autoencoders
PixelGAN Autoencoders
MLReview
 
Representing and comparing probabilities: Part 2
Representing and comparing probabilities: Part 2Representing and comparing probabilities: Part 2
Representing and comparing probabilities: Part 2
MLReview
 
Representing and comparing probabilities
Representing and comparing probabilitiesRepresenting and comparing probabilities
Representing and comparing probabilities
MLReview
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
MLReview
 
Theoretical Neuroscience and Deep Learning Theory
Theoretical Neuroscience and Deep Learning TheoryTheoretical Neuroscience and Deep Learning Theory
Theoretical Neuroscience and Deep Learning Theory
MLReview
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems
MLReview
 
Deep Learning for Semantic Composition
Deep Learning for Semantic CompositionDeep Learning for Semantic Composition
Deep Learning for Semantic Composition
MLReview
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
MLReview
 
Real-time Edge-aware Image Processing with the Bilateral Grid
Real-time Edge-aware Image Processing with the Bilateral GridReal-time Edge-aware Image Processing with the Bilateral Grid
Real-time Edge-aware Image Processing with the Bilateral Grid
MLReview
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
MLReview
 

More from MLReview (13)

Bayesian Non-parametric Models for Data Science using PyMC
 Bayesian Non-parametric Models for Data Science using PyMC Bayesian Non-parametric Models for Data Science using PyMC
Bayesian Non-parametric Models for Data Science using PyMC
 
Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
  Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...  Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
Machine Learning and Counterfactual Reasoning for "Personalized" Decision- ...
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
PixelGAN Autoencoders
  PixelGAN Autoencoders  PixelGAN Autoencoders
PixelGAN Autoencoders
 
Representing and comparing probabilities: Part 2
Representing and comparing probabilities: Part 2Representing and comparing probabilities: Part 2
Representing and comparing probabilities: Part 2
 
Representing and comparing probabilities
Representing and comparing probabilitiesRepresenting and comparing probabilities
Representing and comparing probabilities
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Theoretical Neuroscience and Deep Learning Theory
Theoretical Neuroscience and Deep Learning TheoryTheoretical Neuroscience and Deep Learning Theory
Theoretical Neuroscience and Deep Learning Theory
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems
 
Deep Learning for Semantic Composition
Deep Learning for Semantic CompositionDeep Learning for Semantic Composition
Deep Learning for Semantic Composition
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Real-time Edge-aware Image Processing with the Bilateral Grid
Real-time Edge-aware Image Processing with the Bilateral GridReal-time Edge-aware Image Processing with the Bilateral Grid
Real-time Edge-aware Image Processing with the Bilateral Grid
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
 

Recently uploaded

Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 

Recently uploaded (20)

Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 

Near human performance in question answering?

  • 1. Near human performance in question answering? Yoav Goldberg Bar Ilan University
  • 2.
  • 3. Near Human Performance on the SQUAD Dataset • "Neural systems achieve near-human performance on Question Answering". • Not really, let's see why.
  • 4. Restricted QA Setup • Restricted to questions that can be answered by span selection. • Need to find the answer in a given paragraph. • The answer is guaranteed to be in the paragraph. • Annotators see the paragraph when asking the question, resulting in high lexical similarity between question and answer.
  • 5. Near Human Performance? • Human performance: 91.2 F1.
 Current best system: 84.0 F1. • Humans on MTurk. Who are instructed to answer 5 answers in 2 minutes. That's 16 cent per question. • Humans are wrong mostly in span boundaries. • Also, when doing max-vote between several humans, human perf goes up substantially.
  • 6. How hard is the dataset? • Do the questions require complex reasoning, or can they be "cheated" using superficial clues?
  • 7.
  • 8. The Rankie cycle is sometimes referred to as _________ words from questions some verb only NP in sent
  • 9. ______ have powers of ..... and veto ..... words from questionNoun Phrase
  • 10. ______ have powers of ..... and veto ..... words from questionNoun Phrase all of this can be ignored...
  • 11. ... Shakespeare scholar _____ What Shakespeare scholar is ....?
  • 13. what is goal of criminal punishment is goal of criminal punishment_____
  • 14. all can be "cheated away" using some smart template matching.
  • 15. all can be "cheated away" using some smart template matching. and the template-matching systems can be easily fooled by tailored examples that don't fool humans (Percy Liang, personal communication)
  • 16. all can be "cheated away" using some smart template matching. and the template-matching systems can be easily fooled by tailored examples that don't fool humans (Percy Liang, personal communication)Update: paper by Jia and Liang demonstrates this: https://arxiv.org/abs/1707.07328
  • 17. To Summarize • DL methods gets near human performance on SQUAD but: • Still 84 F1 vs. 91.2 F1. • Restricted QA Setting (span selection, within paragraph, answer always present, high lexical overlap). • Compared to under-incentivized humans. • (91.2 is a low estimate of human performance) • Questions can be answered with "cheating". • (84.0 is a high estimate of DL performance)
  • 18. Take away • Neural systems / RNNs / ConvNets do very clever pattern matching. Not "intelligence", not "reasoning". • Not everything can be pattern-matched. • Pattern matchers can be easily fooled.