The document discusses recent benchmarks and tasks for natural language inference. It summarizes key benchmarks like Winograd Schema Challenge for reference resolution, MCScript and CoQA for question answering requiring external knowledge, RTE Challenges for textual entailment, and benchmarks for plausible inference and intuitive psychology requiring commonsense reasoning. Recent efforts aim to create benchmarks requiring deeper understanding beyond linguistic context alone.