Beyond Fact Checking — Modelling Information Change in Scientific Communication

Beyond Fact Checking —
Modelling Information Change in
Scientific Communication
Isabelle Augenstein*
AAAI
11 February 2023
*credit for some slides: Dustin Wright
Scientists Journalist
s
The Public

How science is communicated matters
I can still do that
HIV Vaccine may raise risk
Never!
Scientists have found that
HIV vaccine has many side
effects!
Affects trust in science and
future actions
Kuru et al., (2019); Gustafson and Rice,
(2019); Fischhoff, (2012); Morton, (2010) https://www.nature.com/articles/450325a

The science communication process
Scientist
s
Journalist
s
The
Public

The public relies on journalists to learn scientific findings
The public perception of science
is largely shaped by how
journalists present science
instead of science itself.

… despite seeing substantial issues with how science is reported
The lack of domain-specific
scientific knowledge makes it
difficult to critically evaluate
science news coverage.

Skewed reporting of science undermines trust in science
Hyped-up polarised news
articles (”caffeine causes
cancer” / ”coffee cures cancer”)
lead to uncertainty and erosion
of trust in scientists
Schoenfeld and Ioannidis: ”Is everything we eat associated with cancer?
A systematic cookbook review”, American Journal of Clinical Nutrition,
2013. https://pubmed.ncbi.nlm.nih.gov/23193004/
https://www.vox.com/science-and-health/2019/6/11/18652225/hype-
science-press-releases

It’s easy for the message to change
Fang et al. (2016)
#Magnesium
saves lives
Reuters (2016) Twitter
The study findings suggest that
increased consumption of
magnesium-rich foods may
have health benefits.
Increasing dietary magnesium intake
is associated with a reduced risk of
stroke, heart failure, diabetes, and
all-cause mortality.

#Magnesium
saves lives
Increasing dietary magnesium
intake is associated with a reduced
risk of stroke, heart failure, diabetes,
and all-cause mortality.
Fang et al. (2016) Reuters (2016) Twitter

#Magnesium
saves lives
The study findings suggest
that increased consumption of

#Magnesium
saves lives

#Magnesium
saves lives
that increased consumption
of magnesium-rich foods
may have health benefits.
risk of stroke, heart failure,
diabetes, and all-cause mortality.
The message isn’t necessarily false, but it can be misleading and
inaccurate and lead to behavior change

Modelling Information Change -- Automatic Fact Checking
Claim Check-
Worthiness Detection
Evidence Document
Retrieval and Ranking
Recognising Textual
Entailment
Veracity Prediction
“Magnesium saves lives”
not check-worthy
check-worthy
“Magnesium saves lives”,
“Increasing dietary magnesium
diabetes, and all-cause mortality”
positive
negative
neutral
true
false
not enough info

Evidence Ranking for Automatic Fact Checking
Evidence Document
Retrieval and Ranking
“Magnesium saves lives”,
“The study findings suggest that increasing dietary magnesium
intake is associated with a reduced risk of stroke, heart failure,
diabetes, and all-cause mortality”
● Notion of similarity matters
○ Strict textual similarity (most prior work)
○ Similarity of information content (proposed here)
● Domain differences increase task difficulty
○ Measure similarity between <claim, evidence> from <news, news> (most prior work)
○ Measure similarity between <claim, evidence> from <news, press release/twitter>
(proposed here)

Overview of Today’s Talk
● Introduction
○ The Life Cycle of Science Communication
● Part 1: Exaggeration Detection
○ Measuring differences in stated causal relationships
○ Experiments with health science press releases
● Part 2: Modelling Information Change
○ Modelling information change in communicating scientific findings more broadly
○ Experiments with press releases and tweets in different scientific domains
● Outlook and Conclusion
○ Future research challenges

Exaggeration Detection of Science Press Releases
Fang et al. (2016) Reuters (2016)
Problem: the strength of the claim changing from a correlational statement
(“associated with”) to conditionally causal in the news (”suggest”, “may”)

Exaggeration in Science Journalism
Sumner et al. 20141 and Bratton et al. 20192: InSciOut
Sumner, P., Vivian-Griffiths, S., Boivin, J., Williams, A., Venetis, C. A., Davies, A., ... & Chambers, C. D. (2014). The association between exaggeration in health related science
news and academic press releases: retrospective observational study. Bmj, 349.
Bratton, L., Adams, R. C., Challenger, A., Boivin, J., Bott, L., Chambers, C. D., & Sumner, P. (2019). The association between exaggeration in health-related science news and
academic press releases: a replication study. Wellcome open research, 4.
Objective: To identify the source (press releases or news) of distortions,
exaggerations, or changes to the main conclusions drawn from research that could
potentially influence a reader’s health related behaviour.
Conclusions:
• 33% of press releases contain exaggerations of conclusions of scientific papers
• Exaggeration in news is strongly associated with exaggeration in press releases

Modelling Information Change – Causal Claim Strength Prediction
Label Type Language Cue
0 No Relation
1 Correlational
association, associated with, predictor, at
high risk of
2 Conditional causal
increase, decrease, lead to, effect on,
contribute to, result in (Cues indicating
doubt: may, might, appear to, probably)
3 Direct causal
increase, decrease, lead to, effective on,
contribute to, reduce, can
Li et al. ”An NLP Analysis of Exaggerated Claims in Science News.” In NLPmJ@EMNLP, 2017.
Yu et al. ”Measuring Correlation-to-Causation Exaggeration in Press Releases”. In Coling 2020.

Our Work on Exaggeration Detection in Science
 Formalize the task of scientific exaggeration detection:
predicting when a press release exaggerates a scientific paper
 Curate a dataset from expert annotations to benchmark performance
Input: primary finding of the paper as written in the abstract and the press release
 Investigate and develop methods for automatic scientific exaggeration detection
Semi-supervised method based on Pattern Exploiting Training (PET)
Wright et al. ”Semi-Supervised Exaggeration Detection of Health Science Press Releases”. In EMNLP 2021.
https://aclanthology.org/2021.emnlp-main.845/

Task Formulations
Label Type Language Cue
0 No Relation
1 Correlational
association, associated with,
predictor, at high risk of
2 Conditional causal
increase, decrease, lead to, effect
on, contribute to, result in (Cues
indicating doubt: may, might,
appear to, probably)
3 Direct causal
increase, decrease, lead to,
effective on, contribute to,
reduce, can
Li et al. ”An NLP Analysis of Exaggerated Claims
in Science News.” In NLPmJ@EMNLP, 2017.
Exaggeration detection
• Entailment-like task
• Paired (press release, abstract) data
ℒ𝑇1 =
0 Downplays
1 Same
2 Exaggerates
Causal claim strength prediction
• Text classification task
• Unpaired press releases and abstracts
• Final prediction compares strength of
paired press release and abstract
ℒ𝑇2 =
0 No Relation
1 Correlational
2 Conditional Causal
3 Direct Causal

Pattern Exploiting Training (Schick et al. 2020)
Eating chocolate
causes happiness
𝐶 0.01 0.21 0.15 𝟎. 𝟔𝟑
0 1 2 3
Traditional Classifier
Eating chocolate causes
happiness. The claim
strength is [MASK]
ℳ 0.01 0.21 0.15 𝟎. 𝟔𝟑
PET
Pattern: transform the input to a
cloze-style question
Verbalizer: predict tokens from
the language model which reflect
the data’s labels
Large pretrained
language model
𝑃0
𝑃1
𝑃2
ℳ0
ℳ1
ℳ2
𝑈
𝐶
𝐷 𝑈
Soft Labels
KL-Divergence Loss
(Unlabelled)

MT-PET
Eating chocolate causes
happiness. The claim strength
is [MASK]
ℳ
0.01 0.21 0.15 𝟎. 𝟔𝟑
Scientists claim eating chocolate
sometimes causes happiness.
Reporters claim eating chocolate
causes happiness. The reporters
claims are [MASK]
0.01 0.05 𝟎. 𝟗𝟒
𝑃𝑚
𝑃𝑎
𝑃𝑚
0
ℳ0
𝑈𝑚
𝐶
𝐷𝑚
𝑈𝑚
Soft Labels
KL-Divergence Loss
(Unlabelled)
𝑃𝑎
0
𝐷𝑎
𝑃𝑚
1
ℳ1
𝐷𝑚
𝑃𝑎
1
𝐷𝑎

MT-PET for Exaggeration Detection
Name Pattern
𝑃𝑇1
0 Scientists claim s. Reporters claim t. The reporters claims are
[MASK]
𝑃𝑇2
0 [Scientists|Reporters] say [s |t ]. The claim strength is [MASK]
𝑃𝑇1
1 Academic literature claims s. Popular media claims t. The media
claims are [MASK]
𝑃𝑇2
1 [Academic literature|Popular media] says [s |t ]. The claim
strength is [MASK]
Our tasks are T1 (exaggeration prediction) and T2 (claim strength prediction)
We develop patterns by hand and verbalizers semi-automatically using PETAL (Schick et al. 2020)
s and t are the claim text in the abstract and press release, respectively

Exaggeration Detection Verbalisers
Pattern Label Verbalizers
𝑃𝑇1
0
Downplays preliminary, competing, uncertainties
Same following, explicit
Exaggerates mistaken, wrong, hollow, naive, false, lies
𝑃𝑇1
1
Downplays hypothetical, theoretical, conditional
Same identical
Exaggerates mistaken, wrong, premature, fantasy, noisy, artificial
𝑃𝑇2
∗
No Relation sufficient, enough, authentic, medium
Correlational inferred, estimated, calculated, borderline,
approximately, variable, roughly
Cond. Causal cautious, premature, uncertain, conflicting, limited
Causal touted, proven, replicated, promoted, distorted

T1 (Exaggeration Detection) with MT-PET
28.06
33.1
29.05
41.9
39.87 39.12
47.8 47.99 47.35
25
30
35
40
45
50
P R F1
Supervised PET MT-PET
Substantial improvements when using PET (10 points)
Further improvements with MT-PET (8 points)
Demonstrates transfer of knowledge from claim strength prediction to exaggeration prediction

Learning Dynamics for T2 (Claim Strength Prediction)
MT-PET with 200 samples
approaches performance of vanilla
PET with 500 samples
MT-PET with 200 samples
approaches performance on
supervised learning with 4,500
samples
PET always outperforms supervised
learning

Overview of Today’s Talk
● Introduction
○ The Life Cycle of Science Communication
● Part 1: Exaggeration Detection
○ Measuring differences in stated causal relationships
○ Experiments with health science press releases
● Part 2: Modelling Information Change
○ Modelling information change in communicating scientific findings more broadly
○ Experiments with press releases and tweets in different scientific domains

Modelling Information Change in Scientific Communication
#Magnesium
saves lives
Problem: the message isn’t necessarily false, but it can be misleading
and inaccurate and lead to behavior change

Proposal: General Model of Information Change for SciComm
magnesium-rich foods may have
health benefits.
In California, drone delivery of a small
package would result in about 0.42 kg
of greenhouse gas emissions.
dd
4.09
1.14
Wright et al. ”Modeling Information Change in Science Communication with Semantically
Matched Paraphrases”. In EMNLP 2022. https://aclanthology.org/2022.emnlp-main.117/

Information Matching Score (IMS)
Substantial change No change
Completely different Completely the same
5
1 4
2 3
Matched findings

Data
News + paper processing
Abstract parser
RoBERTa fine-tuned on
PubMed abstracts
F1> 0.9
Background
Objective
Methods
Results
Conclusion
17,668 41,388
733,755
45.7M potential
<news,paper> pairs, 35.6M
potential <tweet,paper> pairs
Sentence BERT
Reimers and Gurevych (2019)
0 1
Bucketed
Sample
2,400 <news,paper> pairs
and 1,200 <tweet,paper>
pairs for annotation

Annotation
Computer
Science
Medicine
Biology
Psychology
pairs for annotation
🥔POTATO
Annotation UI
Computer Science
Medicine
Biology
Psychology
Domain expert annotators

Semantic Paraphrase and Information Change Dataset
(SPICED)
Computer Science
Medicine
Biology
Psychology
annotated pairs
Computer Science
Medicine
Biology
Psychology
Easy matched and
unmatched pairs based on
similarity
Computer Science
Medicine
Biology
Psychology
3600 annotated pairs and
2400 easy pairs

SPICED vs. Semantic Textual Similarity
Beckley, who is in the department of psychology and
neuroscience at Duke, said that the adult-onset
group had a history of anti-social behavior back to
childhood, but reported committing relatively fewer
crimes.
Our results showed that most of the
adult onset men began their
antisocial activities during early
childhood.
0.38 (max 1) 4.4 (max
5)
🌶 SPICED
STS

SPICED vs. other sentence matching tasks
where d is the edit distance
Measure the average normalised edit distance across the training set for matching sentences

Benchmarking
Paraphrase
Detection
NLI
MiniLM
MPNet
⛄
Zero-Shot Transfer
RoBERTa
SciBERT
CiteBERT
MiniLM-
FT
MPNet-FT
Fine-Tuning on 🌶

Benchmarking
● Paraphrase Detection: RoBERTa fine-tuned on the adversarial
paraphrases (Nighojkar and Licato, 2021)
● NLI: RoBERTa fine-tuned on SNLI, MNLI, FEVER, and ANLI
● MiniLM: Sentence-BERT based on MiniLM (Wang et al. 2020)
● MPNet: SBERT based on MPNet (Song et al. 2020)
⛄
Both SBERT models are pre-trained on a corpus of >1B sentence pairs using
contrastive learning

Benchmarking
RoBERTa
SciBERT
CiteBERT
MiniLM-FT
MPNet-FT
BERT models pretrained on general
domain (RoBERTa) and scientific text
(SciBERT, CiteBERT). Fine-tuning on
SPICED by minimizing mean-squared
error between the model’s prediction and
ground truth IMS.
SBERT models fine-tuning on SPICED
by minimizing cosine distance between
the model’s prediction and ground truth
IMS

Results
Paraphase/NLI models perform poorly
Best overall is SBERT + Fine-tune
Tweets are harder than news
Overall
News
Tweets
Potentially much room for
improvement (STS tasks see scores
in the 90s)
Pearson correlation

Zero-shot scientific evidence retrieval
Does training on SPICED improve performance on scientific
evidence retrieval for real-world claims?
🌶
300 claims from Twitter matched with 717
evidence sentences from news articles
CoVERT
4,086 claims from Reddit matched with 3,219
unique evidence sentences from news articles
COVID-Fact
Training
Testing

Results
CoVERT Covid-Fact
Method MAP MRR MAP MRR
BM25 12.450.00 20.780.00 35.180.00 52.980.00
MiniLM 26.840.00 37.980.00 50.110.00 64.780.00
+ FT 28.230.08 40.810.16
52.660.10 66.910.09
MPNet 25.210.00 35.540.00 52.390.00 66.210.00
+ FT 26.840.19 37.650.32 53.610.33 67.460.28

Journalist
s
RQ1: Do findings reported by different types of outlets express
different degrees of information change from their respective
papers?

papers?
Press Release
Sci&Tech
General outlet
Information
matching score
Linear mixed effect
regression model over
1.1M matched
<news,paper> pairs
Fixed effect: Subjects
Random effect: Paper
Controls
IV

papers?
YES
Scientific findings covered by
Press Release and SciTech
generally have less informational
changes compared with findings
presented in General Outlets
Audience design in
journalism
(Roland, 2009)

RQ2: Do different types of social media users systematically vary in
information change when discussing scientific findings?
Scientist
s
Journalist
s
The
Public

Organizational
Account age
Following Information
matching score
Linear mixed effect
regression model over 182K
matched <tweet,paper> pairs
Fixed effect: Subjects
Random effect:
Paper
Controls
IV
Followers
Verified

YES
Organizational Twitter accounts
keep more original information
from the paper finding

YES
Organizational Twitter accounts
keep more original information
from the paper finding
Verified + more followers change
information more

RQ3: Which parts of a paper are more likely to be miscommunicated
by the media?
Scientist
s
Journalist
s
Good translation
Overstating
Exaggeration
Abstract
Introduction
Results

Certaint
y
Scientists have found that
HIV vaccine has many side
effects!
HIV Vaccine may raise
the risk of certain
diseases
(Pei and Jurgens, 2021)
Exaggeratio
n
AI is conquering the
world!
Our new NLP model
performs better than
several human baselines
(Wright and Augenstein, 2021)
Analyzed over 1.1M matched
<news,paper> pairs
by the media?

Journalists tend to downplay
the certainty and strength of
findings in abstracts
(Pei and Jurgens, 2021)
by the media?

When comparing with
findings presented in other
sections and especially in
limitations, the news finding
are more likely to be
exaggerated and overstated
by the media?

Journalists might fail to report
the limitations of scientific
findings
(Fischhoff, 2012)
by the media?

Only studying abstracts is
not enough!
by the media?

Major Takeaways
● Careful science communication is important
○ The general public relies on general news outlets for science news
○ Overhyping of science news erodes trust
○ Exaggeration of findings can lead to behaviour change
#Magnesium
saves lives
Twitter

Major Takeaways
● Proposal: general model of information change
○ Prior work: focus on semantic textual similarity
Beckley, who is in the department of psychology and
neuroscience at Duke, said that the adult-onset
group had a history of anti-social behavior back to
childhood, but reported committing relatively fewer
crimes.
Our results showed that most
of the adult onset men began
their antisocial activities
during early childhood.
0.38 (max 1) 4.4 (max 5)
🌶 SPICED
STS

Major Takeaways
● New task definition, datasets and bechmarking for modelling
information change in science communication
○ Diverse benchmark consisting of data from four scientific domains and three
textual domains (publications, press releases, tweets)
○ Poor zero-shot performance of related tasks (paraphrasing, natural language
inference) demonstrate novelty of task
○ Downstream improvements for scientific fact checking highlight task importance
Model: copenlu/spiced
Dataset:
copenlu/spiced
Code: copenlu/scientific-information-change
PyPi package: pip install scientific-information-change

Major Takeaways
● Opens the door to asking new research questions about broad trends in
science communication
YES
Scientific findings covered by
Press Release and SciTech
generally have less informational
changes compared with findings
presented in General Outlets
Audience design in
journalism
(Roland, 2009)
RQ1: Do findings reported by different types of outlets express different degrees of information
change from their respective papers?

Future Work
● Information change prediction as auxiliary task for other downstream
scientific NLP tasks
○ E.g. Measuring selective reporting of findings in related work descriptions,
generating faithful summaries of scientific articles
● There is selective reporting of science news – what factors affect
journalists’ selection of scientific findings?
○ E.g. societal relevance, economic implications, entertainment value
● Information is changed in different ways thoughout the science
communication process – which types of change exist and are prevalent?
○ Taxonomy of information change needed

Acknowledgements
CopeNLU
https://copenlu.github.io/
Jiaxin Pei David Jurgens
Dustin Wright
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie
Skłodowska-Curie grant agreement No 801199.

References
Dustin Wright, Isabelle Augenstein. Exaggeration Detection of Science Press Releases. EMNLP 2021.
Paper: https://aclanthology.org/2021.emnlp-main.845/
Code, data and models: https://github.com/copenlu/scientific-exaggeration-detection
Dustin Wright, Jiaxin Pei, David Jurgens, Isabelle Augenstein. Modeling Information Change in
Science Communication with Semantically Matched Paraphrases. EMNLP 2022.
Paper: https://aclanthology.org/2022.emnlp-main.117/
Code, data and models: http://www.copenlu.com/publication/2022_emnlp_wright/

Open positions
1 PhD student, 1 postdoc – explainable fact checking
funded by an ERC starting grant
application deadline: 1 March 2023
start date: Autumn 2023
PhD: https://jobportal.ku.dk/phd/?show=158207
Postdoc: https://jobportal.ku.dk/videnskabelige-stillinger/?show=158206
1 PhD student – fair and accountable NLP
funded by Carlsberg Semper Ardens project
application deadline: 28 February 2023
start date: Autumn 2023
PhD: https://employment.ku.dk/all-vacancies/?show=158390

Thank you! Questions?
#Magnesium
saves lives

Beyond Fact Checking — Modelling Information Change in Scientific Communication

In this document