Recent Benchmarks
and Tasks for
Natural Language
Inference
Based on the research paper on survey of
benchmarks, resources and approaches in NLI.*
* Shane Storks, Qiaozi Gao, and Joyce Y. Chai. (2020). Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources,
and Approaches.
Natural Language Inference
Deep understanding of language, in order to be able to draw
inference based on reasoning ability and external knowledge (e.g
commonsense knowledge) about the world.
● For us this ability comes naturally, but for machines this task is really challenging.
● We use a lot of different external context and knowledge apart from what is
written in the text to draw conclusions, but machines lack this external
knowledge.
● Recent years have seen a surge in research efforts in the area, as the large scale
datasets and benchmarks have increased in NLP tasks.
Role of Benchmarks
● Provide training, testing and validation dataset, along with an evaluation
framework for NLP tasks.
● Helps in evaluating and comparing different approaches for the task and facilitates
development of new approaches to solve the task.
● Early Benchmarks focused on creating the tasks which were based on just
linguistic contexts to solve.
● Recent efforts focus on creating benchmarks which require deeper understanding
of the context, which goes beyond the use of linguistic context.
NLP Tasks
Reference Resolution
● Involves identifying the entity which is being referred to in a linguistic context
(answering what is being referred to by a pronoun in a given sentence).
● The task can be challenging depending on ambiguities resulting from presence of
multiple entities in a given sentence, thus requiring external commonsense
knowledge to solve the task.
● A common example of this type of benchmark is Winograd Schema Challenge
(Examples shown on next slide).
NLP Tasks (Reference Resolution Example)
Example of Reference Resolution, Answers in bold. Image Credit:Taken from research paper mentioned in first
slide.
NLP Tasks
Question Answering
● Involves answering questions based on a passage to demonstrate the
ability of comprehension about the passage (also known as Machine
Reading Comprehension).
● Requires mix of language processing and reasoning abilities.
● Some benchmarks require answering questions based on a given corpus,
which require reasoning based on multiple sentences to make it non
trivial.
● Benchmarks which require external knowledge are MCScript, CoQA and
OpenBookQA.
NLP Tasks (QA Benchmark Example)
Example of QA task, Answers in bold. Image Credit:Taken from research paper mentioned in first slide.
NLP Tasks
Textual Entailment
● Entailment here means to imply (something as a logical consequence of). In
this task a text and a hypothesis is given and the system needs to
identify if given text entails that hypothesis.
● The task requires common sense knowledge along with simpler language
processing skills.
● Many benchmarks expand on the task requiring the system to recognise
contradictions.
● An example benchmark for this task are RTE Challenges.
NLP Tasks (Textual Entailment Benchmark Example)
Example of RTE task, Answers in bold. Image Credit:Taken from research paper mentioned in first slide.
NLP Tasks
Plausible Inference
● Plausible inference requires system to infer hypothetical, intermediate or
uncertain conclusions based on a limited context in the given text.
● The task requires reasoning over linguistic context and external knowledge.
NLP Tasks (Plausible Inference Benchmark Example)
Example of Plausible Inference task, Answers in bold. Image Credit:Taken from research paper mentioned in
first slide.
NLP Tasks
Intuitive Psychology
● This task is similar to plausible inference, but specific to the domain of
intuitive psychology
● The task involves inference of emotions and intentions through the
behaviour given in the text.
● Solving the task requires intuitive social and psychological
commonsense knowledge.
NLP Tasks (Intuitive Psychology Benchmark Example)
Example of Intuitive Psychology task, Answers in bold. Image Credit:Taken from research paper mentioned in
first slide.

Recent benchmarks for natural language inference

  • 1.
    Recent Benchmarks and Tasksfor Natural Language Inference Based on the research paper on survey of benchmarks, resources and approaches in NLI.* * Shane Storks, Qiaozi Gao, and Joyce Y. Chai. (2020). Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches.
  • 2.
    Natural Language Inference Deepunderstanding of language, in order to be able to draw inference based on reasoning ability and external knowledge (e.g commonsense knowledge) about the world. ● For us this ability comes naturally, but for machines this task is really challenging. ● We use a lot of different external context and knowledge apart from what is written in the text to draw conclusions, but machines lack this external knowledge. ● Recent years have seen a surge in research efforts in the area, as the large scale datasets and benchmarks have increased in NLP tasks.
  • 3.
    Role of Benchmarks ●Provide training, testing and validation dataset, along with an evaluation framework for NLP tasks. ● Helps in evaluating and comparing different approaches for the task and facilitates development of new approaches to solve the task. ● Early Benchmarks focused on creating the tasks which were based on just linguistic contexts to solve. ● Recent efforts focus on creating benchmarks which require deeper understanding of the context, which goes beyond the use of linguistic context.
  • 4.
    NLP Tasks Reference Resolution ●Involves identifying the entity which is being referred to in a linguistic context (answering what is being referred to by a pronoun in a given sentence). ● The task can be challenging depending on ambiguities resulting from presence of multiple entities in a given sentence, thus requiring external commonsense knowledge to solve the task. ● A common example of this type of benchmark is Winograd Schema Challenge (Examples shown on next slide).
  • 5.
    NLP Tasks (ReferenceResolution Example) Example of Reference Resolution, Answers in bold. Image Credit:Taken from research paper mentioned in first slide.
  • 6.
    NLP Tasks Question Answering ●Involves answering questions based on a passage to demonstrate the ability of comprehension about the passage (also known as Machine Reading Comprehension). ● Requires mix of language processing and reasoning abilities. ● Some benchmarks require answering questions based on a given corpus, which require reasoning based on multiple sentences to make it non trivial. ● Benchmarks which require external knowledge are MCScript, CoQA and OpenBookQA.
  • 7.
    NLP Tasks (QABenchmark Example) Example of QA task, Answers in bold. Image Credit:Taken from research paper mentioned in first slide.
  • 8.
    NLP Tasks Textual Entailment ●Entailment here means to imply (something as a logical consequence of). In this task a text and a hypothesis is given and the system needs to identify if given text entails that hypothesis. ● The task requires common sense knowledge along with simpler language processing skills. ● Many benchmarks expand on the task requiring the system to recognise contradictions. ● An example benchmark for this task are RTE Challenges.
  • 9.
    NLP Tasks (TextualEntailment Benchmark Example) Example of RTE task, Answers in bold. Image Credit:Taken from research paper mentioned in first slide.
  • 10.
    NLP Tasks Plausible Inference ●Plausible inference requires system to infer hypothetical, intermediate or uncertain conclusions based on a limited context in the given text. ● The task requires reasoning over linguistic context and external knowledge.
  • 11.
    NLP Tasks (PlausibleInference Benchmark Example) Example of Plausible Inference task, Answers in bold. Image Credit:Taken from research paper mentioned in first slide.
  • 12.
    NLP Tasks Intuitive Psychology ●This task is similar to plausible inference, but specific to the domain of intuitive psychology ● The task involves inference of emotions and intentions through the behaviour given in the text. ● Solving the task requires intuitive social and psychological commonsense knowledge.
  • 13.
    NLP Tasks (IntuitivePsychology Benchmark Example) Example of Intuitive Psychology task, Answers in bold. Image Credit:Taken from research paper mentioned in first slide.