Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

EMNLP 2019 paper: https://arxiv.org/abs/1909.03242

====

We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists. We present an in-depth analysis of the dataset, highlighting characteristics and challenges. Further, we present results for automatic veracity prediction, both with established baselines and with a novel method for joint ranking of evidence pages and predicting veracity that outperforms all baselines. Significant performance increases are achieved by encoding evidence, and by modelling metadata.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

  1. 1. Isabelle Augenstein, Christina Lioma, Dongsheng Wang, Lucas Chaves Lima, Casper Hansen, Christian Hansen, Jakob Grue Simonsen University of Copenhagen {augenstein | c.lioma | wang | lcl | c.hansen | chrh | simonsen}@di.ku.dk MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims Joint Veracity Prediction & Evidence Ranking Claims in MultiFCContributions • Novel fact checking dataset Largest such with naturally occurring claims 34 918 claims from 26 English fact checking portals Rich additional meta-data 10 evidence pages per claim • Joint veracity prediction and evidence ranking model Treats claims from different portals as different tasks / domains Encodes disparate labels with label embeddings Confusion MatrixEntities in Claims Fact Checking Portals Overall Results Dataset Download https://copenlu.github.io/publication/2019_emnlp_augenstein/ Error Analysis • Meta-data: topic tags most important, entities least important • Correctly predicting ‘true’ claims is much easier than ‘false’ ones • Most confusions happen over close labels • Longer claims are harder to classify correctly • High token overlap of claims & evidence → high evidence ranking • General topic tags frequently co-occur with incorrect predictions; more specific tags often co-occur with correct predictions

    Be the first to comment

EMNLP 2019 paper: https://arxiv.org/abs/1909.03242 ==== We contribute the largest publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists. We present an in-depth analysis of the dataset, highlighting characteristics and challenges. Further, we present results for automatic veracity prediction, both with established baselines and with a novel method for joint ranking of evidence pages and predicting veracity that outperforms all baselines. Significant performance increases are achieved by encoding evidence, and by modelling metadata.

Views

Total views

293

On Slideshare

0

From embeds

0

Number of embeds

6

Actions

Downloads

2

Shares

0

Comments

0

Likes

0

×