SlideShare a Scribd company logo
Determining the
Credibility of Science
Communication
Isabelle Augenstein*
augenstein@di.ku.dk
@IAugenstein
http://isabelleaugenstein.github.io/
*partial slide credit: Dustin Wright
SDP Workshop
10 June 2021
Introduction
2
Supporting the Life Cycle of Research
26/08/2021 3
Reviewing
Support
Citation
Analysis
Writing
Assistance
Information
Discovery
Conducting
Experiments
Paper
Writing
Peer Review
Research
Impact
Tracking
Information
Extraction
Summarisa
tion
Citation
Prediction
Reviewer
Matching
Review
Score
Prediction
Citation
Prediction
Citation
Trend
Analysis
Scholarly Document Processing
• Goal: to automatically process scientific text to support scholars
• Example NLP tasks
• Extract information about scientific concepts, e.g. drugs and proteins
• Recommend relevant papers to cite
• Challenges
• Supervised learning is hard: annotation is expensive, requiring
domain experts
• Language used is diverse across fields
• Different modalities
• Meta-data also important
26/08/2021 4
26/08/2021 5
Press Release
BBC DailyMail
The Express
Etc...
Credibility and Veracity of Science Communication
Fact Checking
26/08/2021 6
Focus on veracity
What about more subtle
forms of misinformation?
Credibility and Veracity of Science Communication
• Shortcomings of prior work
• Assumes scientific writing is credible
• Assumes claims made are supported by underlying evidence
• Examples issues
• When writing a paper
• Making claims not backed up by literature
• Missing important citations
• Presenting conclusions not supported by data
• Popular science communication
• Distortion of findings
• Exaggerations
• Outright misrepresentations
26/08/2021 7
• Cite-worthiness detection
• Detecting if a sentence should include a citation to prior work
• Useful for assistive writing of scientific papers
• Similar to claim detection in fact checking
• Exaggeration detection
• Detecting if a news article exaggerates claims made in a scientific
paper
• Useful for assistive writing & quality check of press releases
• Related to veracity prediction, but more nuanced task
Challenges Addressed In This Talk
Overview of Today’s Talk
• Introduction
• The Life Cycle of Scientific Research
• Part 1: Cite-Worthiness Detection
• The CiteWorth dataset
• Methods for cite-worthiness detection
• Part 2: Exaggeration Detection
• Task framing
• Semi-supervised learning for exaggeration detection
• Conclusion
• Future research challenges
CiteWorth: Cite-Worthiness Detection
for Improved Scientific Document
Understanding
Dustin Wright, Isabelle Augenstein
ACL 2021 (Findings)
10
Scholarly Document Processing
• Challenges
• Supervised learning is hard: annotation is expensive, requiring
domain experts
• The text is diverse across fields
• How can we improve tools for scholarly document processing
across fields?
• What training data is readily available?
26/08/2021 11
26/08/2021 12
Abstract
Sections
Figures
Captions
Citances
Paper Field
Citances in Machine Learning
26/08/2021 13
We use the model from the original BERT paper (Devlin et al. 2019).
Cite-worthiness: Is this a citance? Yes
Recommendation: What paper should be cited? Devlin et al. (2019)
Influence: Was this an influential paper? Yes
Intent: What is the purpose of the citation? Method
Cite-Worthiness Uses
26/08/2021 14
We use the model from the original BERT paper (Devlin et al. 2019).
As an auxiliary task in a multi-task setup
We use the model from the original BERT paper (Devlin et al. 2019).
CITE-WORTHY METHOD
Cite-Worthiness Uses
26/08/2021 15
We use the model from the original BERT paper (Devlin et al. 2019).
As a first step in citation recommendation
We use the model from the original BERT paper (Devlin et al. 2019).
CITE-WORTHY
Cite-Worthiness Uses
26/08/2021 16
We use the model from the original BERT paper (Devlin et al. 2019).
For assistive document editing
We use the model from the original BERT paper (Devlin et al. 2019).
CITE-WORTHY
Cite-Worthiness Datasets
• Tend to be small and limited to only a few domains
(e.g. Computer Science)
• No attention paid to how clean is the data
26/08/2021 17
We use the model from Devlin et al. (2019) as a baseline.
e.g. ungrammatical phrases
CiteWorth: Dataset Curation
26/08/2021 18
1. https://github.com/allenai/s2orc
We use the model from the original BERT paper (Devlin et al. 2019).
We use the model from the original BERT paper [1].
Parenthetical author/year and bracketed numerical citations only
Citations must be at the end of a sentence
• We limit citances as follows
• Source data: S2ORC1 – millions of extracted scientific
documents from Semantic Scholar
RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise?
CiteWorth: Cleaning the Data
26/08/2021 19
We use the model from the original BERT paper (Devlin et al. 2019). This
model uses self-attention and masked language modeling.
1. Extract whole paragraphs – data is curated at the paragraph level
2. Check all the gold citation spans if they are parenthetical author/year
or bracketed numerical
3. Check if all citation spans have been extracted for each sentence
4. Check if all citation spans come at the end of a sentence
5. Remove citation spans using gold spans
6. Check if any citation markers are left over (e.g. hanging
prepositions/punctuation)
RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise?
CiteWorth Final Dataset
• 1,181,793 sentences
• 10 different fields, 20,000+ paragraphs per field
• Much cleaner than a naive baseline which only
removes citation text based on gold spans
26/08/2021 20
RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise?
Method Sentences Clean (%) Citation Markers Removed (%)
Naive Baseline 92.07 92.78
CiteWorth (Ours) 98.90 98.10
Predicting on Individual Sentences
26/08/2021 21
Pretrained Language Models
Transformer Network
Logistic Regression
Multi-Head
Attention
Feed-
Forward
Add & Norm
Add & Norm
Embedding
2
2. https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
RQ2: What methods are most effective for automatically detecting cite-worthy sentences?
Convolutional Recurrent Net1
1. Michael Färber, Alexander Thiemann, and Adam Ja-towt. 2018b. To Cite, or Not to Cite? DetectingCitation Contexts in
Text. InEuropean Conferenceon Information Retrieval, pages 598–603. Springer.
Predicting on Individual Sentences
26/08/2021 22
Can context improve performance?
Method P R F1
Logistic
Regression
46.65 64.88 54.28
CRNN 50.87 62.21 55.97
Transformer 47.92 71.59 57.39
BERT 55.04 69.02 61.23
SciBERT 57.03 68.08 62.06
RQ2: What methods are most effective for automatically detecting cite-worthy sentences?
* Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, and Sven Dähne. 2016. Investigating the Influence of Noise and
Distractors on the Interpretation of Neural Networks. arXiv preprint arXiv:1611.07270.
Predicting Multiple Sentences at Once
26/08/2021 23
Are there variations across field?
RQ2: What methods are most effective for automatically detecting cite-worthy sentences?
Longformer*
[CLS] !"
"
!"
# [SEP] !#
"
!#
#
!#
$ [SEP]
Pooling
Classify
Pooling
Classify
… …
Method P R F1
SciBERT 57.03 68.08 62.06
Longformer-Solo 57.21 68.00 62.14
Longformer-Ctx 59.92 77.15 67.45 Δ 5 pts
* Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. CoRR, abs/2004.05150.
Transfer Learning
• Pretrain a model and fine tune on 10 tasks (NER, relation
extraction, text classification)
• Base: Original SciBERT model fine-tuned on downstream tasks
• LM: SciBERT with MLM fine-tuning on CiteWorth
• Cite: SciBERT fine-tuned on cite-worthiness detection
• LMCite: SciBERT with MLM fine-tuning on CiteWorth + fine-
tuned on cite-worthiness
26/08/2021 24
RQ4: Can large scale cite-worthiness data be used to perform transfer learning to downstream scientific
text tasks?
Transfer Learning
26/08/2021 25
78,2
78,25
78,3
78,35
78,4
78,45
78,5
78,55
78,6
78,65
78,7
78,75
Average Across Tasks
Base LM Cite LMCite
The best average performance across tasks is MLM + cite-worthiness fine-tuning
RQ4: Can large scale cite-worthiness data be used to perform transfer learning to downstream scientific
text tasks?
Conclusions
• We introduce CiteWorth – a large, rigorously cleaned
dataset for citation-related tasks
• We show that paragraph level context is crucial to
perform cite-worthiness detection
• We show that the data is diverse with a significant
domain effect
• We show that cite-worthiness is a highly transferable
task for scientific text
26/08/2021 26
Open Questions
• How to improve domain adaptation for scientific text?
• What other useful features are there?
• Author network
• Document level context
• Other types of structure (case study: Discourse Structure)
• Other tasks using this data e.g. citation
recommendation
26/08/2021 27
Overview of Today’s Talk
• Introduction
• The Life Cycle of Scientific Research
• Part 1: Cite-Worthiness Detection
• The CiteWorth dataset
• Methods for cite-worthiness detection
• Part 2: Exaggeration Detection
• Task framing
• Semi-supervised learning for exaggeration detection
• Conclusion
• Future research challenges
Semi-Supervised Exaggeration
Detection of Health Science Press
Releases
Dustin Wright, Isabelle Augenstein
EMNLP 2021 (Main)
29
Science Communication
26/08/2021 30
Press Release
BBC DailyMail
The Express
Etc...
Problem
26/08/2021 31
https://www.sciencedaily.com/releases/2021/05/210525101658.htm
Yijun Bao, Somayyeh Soltanian-Zadeh, Sina Farsiu, Yiyang Gong. Segmentation of neurons from fluorescence calcium
recordings beyond real time. Nature Machine Intelligence, 2021; DOI: 10.1038/s42256-021-00342-x
Abstract makes a
conditionally causal claim
(”potentially enabling”)
while the press release
makes a direct causal claim
Our Contributions
• Formalisation of the problem of scientific exaggeration
detection
• Curation of benchmark dataset for scientific
exaggeration detection
• Semi-supervised method based on Pattern Exploiting
Training (PET) to address the task
26/08/2021 32
Prior Work on Understanding Exaggeration in Science
• Manual attempts
• Sumner et al. 2014 and Bratton et al. 2019: InSciOut
• Manually label 823 pairs of press releases and abstracts
• Labels: causal claim strength of conclusions, advice given,
independent and dependent variables, etc.
• Find that about 33% of press releases contain exaggerated
conclusions
• Major problem: ”dominant link between academia and the
media” are press releases
• Automatic attempts
• Li et al. 2017, Yu et al. 2019, Yu et al. 2020
• Predict causal claim strength of conclusion sentences in
abstract and press release
• No clean paired data for evaluation
26/08/2021 33
Our Work on Exaggeration Detection in Science
• The focus of this work is predicting when a press release
exaggerates a scientific paper
• We focus on predicting this using the primary finding of the
paper as written in the abstract and the press release
• We build on previous work which focuses on causal claim
strength prediction of these primary findings
26/08/2021 34
Task Formulations and Evaluation Data
26/08/2021 35
Formal Problem Definition
26/08/2021 36
! = #$, &$, '$ ( ∈ [0 … -)}
Dataset !
Source documents &$
Target documents #$ written about &$
Labels '$, where
'$ ∈ 0
0 Downplays
1 Same
2 Exaggerates
indicates if #$ exaggerates, downplays, or faithfully represents &$
Learning goal: predict ' given & and #
Task Formulations
• T1
• Entailment-like task to predict
exaggeration label
• Paired (press release, abstract) data
26/08/2021 37
ℒ"# = %
0 Downplays
1 Same
2 Exaggerates
• T2
• Text classification task to predict causal
claim strength
• Unpaired press releases and abstracts
• Final prediction compares strength of
paired press release and abstract
ℒ": =
0 No Relation
1 Correlational
2 Conditional Causal
3 Direct Causal
Label Type Language Cue
0 No Relation
1 Correlational
association, associated with, predictor, at high risk
of
2 Conditional causal
increase, decrease, lead to, effect on, contribute to,
result in (Cues indicating doubt: may, might, appear
to, probably)
3 Direct causal
increase, decrease, lead to, effective on, contribute
to, reduce, can
Li et al. 2017
Evaluation Dataset Creation
26/08/2021 38
Start with the 823 labeled pairs from
Sumner et al. 2014 and Bratton et al. 2019
(InSciOut)
Collect original abstract text from Semantic
Scholar
Match original conclusion sentences to
paraphrased annotations via ROUGE
score
Manually inspect and discard missing or
incorrect abstracts
!
Downplays +, < +.
Same +, = +.
Exaggerates +, > +.
Final label: compare annotated claim
strength (+, for press release, +. for abstract)
Total data: 663 pairs (100 training, 553 test)
Few-Shot Learning: Multi-Task PET (MT-PET)
26/08/2021 39
PET (Schick et al. 2020)
26/08/2021 40
Eating chocolate
causes happiness
! 0.01 0.21 0.15 '. ()
0 1 2 3
Traditional Classifier
Eating chocolate causes
happiness. The claim
strength is [MASK]
ℳ 0.01 0.21 0.15 '. ()
PET
m
edium
estim
ated
cautious
distorted
Pattern: transform the input to a
cloze-style question
Verbalizer: predict tokens from
the language model which reflect
the data’s labels
Large pretrained
language model
+,
+-
+.
ℳ,
ℳ-
ℳ.
/
!
0 /
Soft Labels
KL-Divergence Loss
(Unlabelled)
MT-PET
26/08/2021 41
Eating chocolate causes
happiness. The claim strength
is [MASK]
ℳ
0.01 0.21 0.15 '. ()
m
edium
estim
ated
cautious
distorted
Scientists claim eating
chocolate sometimes causes
happiness. Reporters claim
eating chocolate causes
happiness. The reporters
claims are [MASK]
0.01 0.05 '. *+
prelim
inary
identical
naive
,-
,.
,-
/
ℳ/
0-
1
2-
0-
Soft Labels
KL-Divergence Loss
(Unlabelled)
,.
/
2.
,-
3
ℳ3
2-
,.
3
2.
Evaluation
26/08/2021 42
T1 (Exaggeration Detection) with MT-PET
26/08/2021 43
28,06
33,1
29,05
41,9
39,87 39,12
47,8 47,99 47,35
25
30
35
40
45
50
P R F1
Supervised PET MT-PET
Substantial improvements when using PET (10 points)
Further improvements with MT-PET (8 points)
Demonstrates transfer of knowledge from claim strength prediction to
exaggeration prediction
T2 (Claim Strength Prediction) with MT-PET
26/08/2021 44
49,28
51,07
49,03
55,76
58,58
56,57
56,68
60,13
57,44
45
50
55
60
P R F1
Supervised PET MT-PET
58,2
59,99
58,66
58,53
61,84 60,45
60,09 61,11
45
50
55
60
P R F1
Supervised PET MT-PET
MT-PET
outperforms
PET in both
scenarios
200 samples from T2, 100 samples from T1
4500 samples from T2, 100 samples from T1
T2 (Claim Strength Prediction) with MT-PET
26/08/2021 45
49,28
51,07
49,03
55,76
58,58
56,57
56,68
60,13
57,44
45
50
55
60
P R F1
Supervised PET MT-PET
58,2
59,99
58,66
58,53
61,84 60,45
60,09 61,11
45
50
55
60
P R F1
Supervised PET MT-PET
MT-PET with
200 samples
approaches
supervised
performance
with 4,500
samples
200 samples from T2, 100 samples from T1
4500 samples from T2, 100 samples from T1
Error Analysis
• All models:
• disproportionately get pairs involving direct causal claims
incorrect
• do best for correlational claims from abstracts and claims
from press releases which are correlational or stronger
• MT-PET:
• helps the most for the most difficult category -- causal claims
26/08/2021 46
Summary
• We formalize the problem of scientific exaggeration
detection, providing two task formulations for the
problem
• We curate a set of benchmark data to evaluate
automatic methods for performing the task
• We propose MT-PET, a few-shot learning method
based on PET, which we demonstrate outperforms
strong baselines
26/08/2021 47
Overview of Today’s Talk
• Introduction
• The Life Cycle of Scientific Research
• Part 1: Cite-Worthiness Detection
• The CiteWorth dataset
• Methods for cite-worthiness detection
• Part 2: Exaggeration Detection
• Task framing
• Semi-supervised learning for exaggeration detection
• Conclusion
• Future research challenges
Wrap-Up
49
Supporting the Life Cycle of Research
26/08/2021 50
Reviewing
Support
Citation
Analysis
Writing
Assistance
Information
Discovery
Conducting
Experiments
Paper
Writing
Peer Review
Research
Impact
Tracking
Information
Extraction
Summarisa
tion
Citation
Prediction
Reviewer
Matching
Review
Score
Prediction
Citation
Prediction
Citation
Trend
Analysis
Supporting the Life Cycle of Research
26/08/2021 51
Reviewing
Support
Citation
Analysis
Writing
Assistance
Information
Discovery
Conducting
Experiments
Paper
Writing
Peer Review
Research
Impact
Tracking
Information
Extraction
Summarisa
tion
Citation
Prediction
Credibility
Detection
Reviewer
Matching
Review
Score
Prediction
Citation
Prediction
Citation
Trend
Analysis
NEW
Overall Take-Aways
• Why scholarly document processing?
• Supporting the life cycle of research, from information discovery to
research impact tracking
• Why credibility detection for scholarly communication?
• Detect claims which should be backed up by evidence
(cite-worthiness detection)
• Detect inconsistencies between primary and secondary sources of
information (exaggeration detection)
Overall Take-Aways
• Overarching challenges
• Difficult NLP tasks (require understanding of pragmatics)
• Domain effects, importance of context pose further challenges
• Not well-studied yet
• Scarcity of available benchmarks
• Many opportunities for future work
• Explore more different settings
• Gather more datasets
• Methods for domain adaptation & few-shot learning
• Tools for journalists & authors
Thank you!
isabelleaugenstein.github.io
augenstein@di.ku.dk
@IAugenstein
github.com/isabelleaugenstein
26/08/2021 54
Acknowledgements
55
CopeNLU
https://copenlu.github.io/
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under the Marie Skłodowska-Curie grant agreement No 801199.
PhD student: Dustin Wright
Presented Papers
Isabelle Augenstein. Determining the Credibility of Science
Communication. SDP workshop, 2021.
Dustin Wright, Isabelle Augenstein. CiteWorth: Cite-Worthiness
Detection for Improved Scientific Document Understanding. ACL
Findings, 2021.
Dustin Wright, Isabelle Augenstein. Semi-Supervised Exaggeration
Detection of Health Science Press Releases. EMNLP, 2021.

More Related Content

What's hot

Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Johann Petrak
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Anubhav Jain
 
Document
DocumentDocument
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
NAVER Engineering
 
Genomics data analysis in Julia
Genomics data analysis in JuliaGenomics data analysis in Julia
Genomics data analysis in Julia
Jiahao Chen
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
Francesco Osborne
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
Garima Nanda
 
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Punit Sharnagat
 
A Julia package for iterative SVDs with applications to genomics data analysis
A Julia package for iterative SVDs with applications to genomics data analysisA Julia package for iterative SVDs with applications to genomics data analysis
A Julia package for iterative SVDs with applications to genomics data analysis
Jiahao Chen
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET Journal
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Anubhav Jain
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
butest
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Anubhav Jain
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
Nakul Sharma
 
Resource Allocation Using Metaheuristic Search
Resource Allocation Using Metaheuristic SearchResource Allocation Using Metaheuristic Search
Resource Allocation Using Metaheuristic Search
csandit
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
IJCSIS Research Publications
 
Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject Prediction
Shenghui Wang
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
feiwin
 
college resume.hannah.new
college resume.hannah.newcollege resume.hannah.new
college resume.hannah.new
Hannah Peeler
 
D1802023136
D1802023136D1802023136
D1802023136
IOSR Journals
 

What's hot (20)

Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Document
DocumentDocument
Document
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Genomics data analysis in Julia
Genomics data analysis in JuliaGenomics data analysis in Julia
Genomics data analysis in Julia
 
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts ...
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...
 
A Julia package for iterative SVDs with applications to genomics data analysis
A Julia package for iterative SVDs with applications to genomics data analysisA Julia package for iterative SVDs with applications to genomics data analysis
A Julia package for iterative SVDs with applications to genomics data analysis
 
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
IRJET- Finding the Original Writer of an Anonymous Text using Naïve Bayes Cla...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
Integrating natural language processing and software engineering
Integrating natural language processing and software engineeringIntegrating natural language processing and software engineering
Integrating natural language processing and software engineering
 
Resource Allocation Using Metaheuristic Search
Resource Allocation Using Metaheuristic SearchResource Allocation Using Metaheuristic Search
Resource Allocation Using Metaheuristic Search
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 
Non-parametric Subject Prediction
Non-parametric Subject PredictionNon-parametric Subject Prediction
Non-parametric Subject Prediction
 
Mining Product Reputations On the Web
Mining Product Reputations On the WebMining Product Reputations On the Web
Mining Product Reputations On the Web
 
college resume.hannah.new
college resume.hannah.newcollege resume.hannah.new
college resume.hannah.new
 
D1802023136
D1802023136D1802023136
D1802023136
 

Similar to Determining the Credibility of Science Communication

Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
IRJET Journal
 
Automatic Grading of Handwritten Answers
Automatic Grading of Handwritten AnswersAutomatic Grading of Handwritten Answers
Automatic Grading of Handwritten Answers
IRJET Journal
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine Learning
IRJET Journal
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
subash chandra
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
IRJET Journal
 
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar... Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
Lviv Data Science Summer School
 
Migration strategies for object oriented system to component based system
Migration strategies for object oriented system to component based systemMigration strategies for object oriented system to component based system
Migration strategies for object oriented system to component based system
ijfcstjournal
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
Databricks
 
C24011018
C24011018C24011018
C24011018
IJERA Editor
 
How to Write an Effective Technical Paper (1).pdf
How to Write an Effective Technical Paper (1).pdfHow to Write an Effective Technical Paper (1).pdf
How to Write an Effective Technical Paper (1).pdf
khalid khan
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
IJDKP
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
IJDKP
 
Enase20.ppt
Enase20.pptEnase20.ppt
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
Andre Freitas
 
Text Summarization and Conversion of Speech to Text
Text Summarization and Conversion of Speech to TextText Summarization and Conversion of Speech to Text
Text Summarization and Conversion of Speech to Text
IRJET Journal
 
Development of Computer Aided Learning Software for Use in Electric Circuit A...
Development of Computer Aided Learning Software for Use in Electric Circuit A...Development of Computer Aided Learning Software for Use in Electric Circuit A...
Development of Computer Aided Learning Software for Use in Electric Circuit A...
drboon
 
IRJET- Natural Language Query Processing
IRJET- Natural Language Query ProcessingIRJET- Natural Language Query Processing
IRJET- Natural Language Query Processing
IRJET Journal
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)
infoblog
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
Hajime Sasaki
 

Similar to Determining the Credibility of Science Communication (20)

Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
 
Automatic Grading of Handwritten Answers
Automatic Grading of Handwritten AnswersAutomatic Grading of Handwritten Answers
Automatic Grading of Handwritten Answers
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine Learning
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar... Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Ar...
 
Migration strategies for object oriented system to component based system
Migration strategies for object oriented system to component based systemMigration strategies for object oriented system to component based system
Migration strategies for object oriented system to component based system
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
 
C24011018
C24011018C24011018
C24011018
 
How to Write an Effective Technical Paper (1).pdf
How to Write an Effective Technical Paper (1).pdfHow to Write an Effective Technical Paper (1).pdf
How to Write an Effective Technical Paper (1).pdf
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
 
Enase20.ppt
Enase20.pptEnase20.ppt
Enase20.ppt
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Text Summarization and Conversion of Speech to Text
Text Summarization and Conversion of Speech to TextText Summarization and Conversion of Speech to Text
Text Summarization and Conversion of Speech to Text
 
Development of Computer Aided Learning Software for Use in Electric Circuit A...
Development of Computer Aided Learning Software for Use in Electric Circuit A...Development of Computer Aided Learning Software for Use in Electric Circuit A...
Development of Computer Aided Learning Software for Use in Electric Circuit A...
 
IRJET- Natural Language Query Processing
IRJET- Natural Language Query ProcessingIRJET- Natural Language Query Processing
IRJET- Natural Language Query Processing
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
 

More from Isabelle Augenstein

Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific Communication
Isabelle Augenstein
 
Automatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationAutomatically Detecting Scientific Misinformation
Automatically Detecting Scientific Misinformation
Isabelle Augenstein
 
Accountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingAccountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact Checking
Isabelle Augenstein
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
Isabelle Augenstein
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
Isabelle Augenstein
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
Isabelle Augenstein
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
Isabelle Augenstein
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
Isabelle Augenstein
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Isabelle Augenstein
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
Isabelle Augenstein
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
Isabelle Augenstein
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
Isabelle Augenstein
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
Isabelle Augenstein
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
Isabelle Augenstein
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
Isabelle Augenstein
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
Isabelle Augenstein
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Isabelle Augenstein
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
Isabelle Augenstein
 
Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured Text
Isabelle Augenstein
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant Supervision
Isabelle Augenstein
 

More from Isabelle Augenstein (20)

Beyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific CommunicationBeyond Fact Checking — Modelling Information Change in Scientific Communication
Beyond Fact Checking — Modelling Information Change in Scientific Communication
 
Automatically Detecting Scientific Misinformation
Automatically Detecting Scientific MisinformationAutomatically Detecting Scientific Misinformation
Automatically Detecting Scientific Misinformation
 
Accountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact CheckingAccountable and Robust Automatic Fact Checking
Accountable and Robust Automatic Fact Checking
 
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
 
Towards Explainable Fact Checking
Towards Explainable Fact CheckingTowards Explainable Fact Checking
Towards Explainable Fact Checking
 
Tracking False Information Online
Tracking False Information OnlineTracking False Information Online
Tracking False Information Online
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
 
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyondLearning with limited labelled data in NLP: multi-task learning and beyond
Learning with limited labelled data in NLP: multi-task learning and beyond
 
Learning to read for automated fact checking
Learning to read for automated fact checkingLearning to read for automated fact checking
Learning to read for automated fact checking
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
 
Weakly Supervised Machine Reading
Weakly Supervised Machine ReadingWeakly Supervised Machine Reading
Weakly Supervised Machine Reading
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
 
Distant Supervision with Imitation Learning
Distant Supervision with Imitation LearningDistant Supervision with Imitation Learning
Distant Supervision with Imitation Learning
 
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...Extracting Relations between Non-Standard Entities using Distant Supervision ...
Extracting Relations between Non-Standard Entities using Distant Supervision ...
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Lodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured TextLodifier: Generating Linked Data from Unstructured Text
Lodifier: Generating Linked Data from Unstructured Text
 
Relation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant SupervisionRelation Extraction from the Web using Distant Supervision
Relation Extraction from the Web using Distant Supervision
 

Recently uploaded

New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 

Recently uploaded (20)

New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 

Determining the Credibility of Science Communication

  • 1. Determining the Credibility of Science Communication Isabelle Augenstein* augenstein@di.ku.dk @IAugenstein http://isabelleaugenstein.github.io/ *partial slide credit: Dustin Wright SDP Workshop 10 June 2021
  • 3. Supporting the Life Cycle of Research 26/08/2021 3 Reviewing Support Citation Analysis Writing Assistance Information Discovery Conducting Experiments Paper Writing Peer Review Research Impact Tracking Information Extraction Summarisa tion Citation Prediction Reviewer Matching Review Score Prediction Citation Prediction Citation Trend Analysis
  • 4. Scholarly Document Processing • Goal: to automatically process scientific text to support scholars • Example NLP tasks • Extract information about scientific concepts, e.g. drugs and proteins • Recommend relevant papers to cite • Challenges • Supervised learning is hard: annotation is expensive, requiring domain experts • Language used is diverse across fields • Different modalities • Meta-data also important 26/08/2021 4
  • 5. 26/08/2021 5 Press Release BBC DailyMail The Express Etc... Credibility and Veracity of Science Communication
  • 6. Fact Checking 26/08/2021 6 Focus on veracity What about more subtle forms of misinformation?
  • 7. Credibility and Veracity of Science Communication • Shortcomings of prior work • Assumes scientific writing is credible • Assumes claims made are supported by underlying evidence • Examples issues • When writing a paper • Making claims not backed up by literature • Missing important citations • Presenting conclusions not supported by data • Popular science communication • Distortion of findings • Exaggerations • Outright misrepresentations 26/08/2021 7
  • 8. • Cite-worthiness detection • Detecting if a sentence should include a citation to prior work • Useful for assistive writing of scientific papers • Similar to claim detection in fact checking • Exaggeration detection • Detecting if a news article exaggerates claims made in a scientific paper • Useful for assistive writing & quality check of press releases • Related to veracity prediction, but more nuanced task Challenges Addressed In This Talk
  • 9. Overview of Today’s Talk • Introduction • The Life Cycle of Scientific Research • Part 1: Cite-Worthiness Detection • The CiteWorth dataset • Methods for cite-worthiness detection • Part 2: Exaggeration Detection • Task framing • Semi-supervised learning for exaggeration detection • Conclusion • Future research challenges
  • 10. CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding Dustin Wright, Isabelle Augenstein ACL 2021 (Findings) 10
  • 11. Scholarly Document Processing • Challenges • Supervised learning is hard: annotation is expensive, requiring domain experts • The text is diverse across fields • How can we improve tools for scholarly document processing across fields? • What training data is readily available? 26/08/2021 11
  • 13. Citances in Machine Learning 26/08/2021 13 We use the model from the original BERT paper (Devlin et al. 2019). Cite-worthiness: Is this a citance? Yes Recommendation: What paper should be cited? Devlin et al. (2019) Influence: Was this an influential paper? Yes Intent: What is the purpose of the citation? Method
  • 14. Cite-Worthiness Uses 26/08/2021 14 We use the model from the original BERT paper (Devlin et al. 2019). As an auxiliary task in a multi-task setup We use the model from the original BERT paper (Devlin et al. 2019). CITE-WORTHY METHOD
  • 15. Cite-Worthiness Uses 26/08/2021 15 We use the model from the original BERT paper (Devlin et al. 2019). As a first step in citation recommendation We use the model from the original BERT paper (Devlin et al. 2019). CITE-WORTHY
  • 16. Cite-Worthiness Uses 26/08/2021 16 We use the model from the original BERT paper (Devlin et al. 2019). For assistive document editing We use the model from the original BERT paper (Devlin et al. 2019). CITE-WORTHY
  • 17. Cite-Worthiness Datasets • Tend to be small and limited to only a few domains (e.g. Computer Science) • No attention paid to how clean is the data 26/08/2021 17 We use the model from Devlin et al. (2019) as a baseline. e.g. ungrammatical phrases
  • 18. CiteWorth: Dataset Curation 26/08/2021 18 1. https://github.com/allenai/s2orc We use the model from the original BERT paper (Devlin et al. 2019). We use the model from the original BERT paper [1]. Parenthetical author/year and bracketed numerical citations only Citations must be at the end of a sentence • We limit citances as follows • Source data: S2ORC1 – millions of extracted scientific documents from Semantic Scholar RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise?
  • 19. CiteWorth: Cleaning the Data 26/08/2021 19 We use the model from the original BERT paper (Devlin et al. 2019). This model uses self-attention and masked language modeling. 1. Extract whole paragraphs – data is curated at the paragraph level 2. Check all the gold citation spans if they are parenthetical author/year or bracketed numerical 3. Check if all citation spans have been extracted for each sentence 4. Check if all citation spans come at the end of a sentence 5. Remove citation spans using gold spans 6. Check if any citation markers are left over (e.g. hanging prepositions/punctuation) RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise?
  • 20. CiteWorth Final Dataset • 1,181,793 sentences • 10 different fields, 20,000+ paragraphs per field • Much cleaner than a naive baseline which only removes citation text based on gold spans 26/08/2021 20 RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise? Method Sentences Clean (%) Citation Markers Removed (%) Naive Baseline 92.07 92.78 CiteWorth (Ours) 98.90 98.10
  • 21. Predicting on Individual Sentences 26/08/2021 21 Pretrained Language Models Transformer Network Logistic Regression Multi-Head Attention Feed- Forward Add & Norm Add & Norm Embedding 2 2. https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270 RQ2: What methods are most effective for automatically detecting cite-worthy sentences? Convolutional Recurrent Net1 1. Michael Färber, Alexander Thiemann, and Adam Ja-towt. 2018b. To Cite, or Not to Cite? DetectingCitation Contexts in Text. InEuropean Conferenceon Information Retrieval, pages 598–603. Springer.
  • 22. Predicting on Individual Sentences 26/08/2021 22 Can context improve performance? Method P R F1 Logistic Regression 46.65 64.88 54.28 CRNN 50.87 62.21 55.97 Transformer 47.92 71.59 57.39 BERT 55.04 69.02 61.23 SciBERT 57.03 68.08 62.06 RQ2: What methods are most effective for automatically detecting cite-worthy sentences? * Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, and Sven Dähne. 2016. Investigating the Influence of Noise and Distractors on the Interpretation of Neural Networks. arXiv preprint arXiv:1611.07270.
  • 23. Predicting Multiple Sentences at Once 26/08/2021 23 Are there variations across field? RQ2: What methods are most effective for automatically detecting cite-worthy sentences? Longformer* [CLS] !" " !" # [SEP] !# " !# # !# $ [SEP] Pooling Classify Pooling Classify … … Method P R F1 SciBERT 57.03 68.08 62.06 Longformer-Solo 57.21 68.00 62.14 Longformer-Ctx 59.92 77.15 67.45 Δ 5 pts * Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. CoRR, abs/2004.05150.
  • 24. Transfer Learning • Pretrain a model and fine tune on 10 tasks (NER, relation extraction, text classification) • Base: Original SciBERT model fine-tuned on downstream tasks • LM: SciBERT with MLM fine-tuning on CiteWorth • Cite: SciBERT fine-tuned on cite-worthiness detection • LMCite: SciBERT with MLM fine-tuning on CiteWorth + fine- tuned on cite-worthiness 26/08/2021 24 RQ4: Can large scale cite-worthiness data be used to perform transfer learning to downstream scientific text tasks?
  • 25. Transfer Learning 26/08/2021 25 78,2 78,25 78,3 78,35 78,4 78,45 78,5 78,55 78,6 78,65 78,7 78,75 Average Across Tasks Base LM Cite LMCite The best average performance across tasks is MLM + cite-worthiness fine-tuning RQ4: Can large scale cite-worthiness data be used to perform transfer learning to downstream scientific text tasks?
  • 26. Conclusions • We introduce CiteWorth – a large, rigorously cleaned dataset for citation-related tasks • We show that paragraph level context is crucial to perform cite-worthiness detection • We show that the data is diverse with a significant domain effect • We show that cite-worthiness is a highly transferable task for scientific text 26/08/2021 26
  • 27. Open Questions • How to improve domain adaptation for scientific text? • What other useful features are there? • Author network • Document level context • Other types of structure (case study: Discourse Structure) • Other tasks using this data e.g. citation recommendation 26/08/2021 27
  • 28. Overview of Today’s Talk • Introduction • The Life Cycle of Scientific Research • Part 1: Cite-Worthiness Detection • The CiteWorth dataset • Methods for cite-worthiness detection • Part 2: Exaggeration Detection • Task framing • Semi-supervised learning for exaggeration detection • Conclusion • Future research challenges
  • 29. Semi-Supervised Exaggeration Detection of Health Science Press Releases Dustin Wright, Isabelle Augenstein EMNLP 2021 (Main) 29
  • 30. Science Communication 26/08/2021 30 Press Release BBC DailyMail The Express Etc...
  • 31. Problem 26/08/2021 31 https://www.sciencedaily.com/releases/2021/05/210525101658.htm Yijun Bao, Somayyeh Soltanian-Zadeh, Sina Farsiu, Yiyang Gong. Segmentation of neurons from fluorescence calcium recordings beyond real time. Nature Machine Intelligence, 2021; DOI: 10.1038/s42256-021-00342-x Abstract makes a conditionally causal claim (”potentially enabling”) while the press release makes a direct causal claim
  • 32. Our Contributions • Formalisation of the problem of scientific exaggeration detection • Curation of benchmark dataset for scientific exaggeration detection • Semi-supervised method based on Pattern Exploiting Training (PET) to address the task 26/08/2021 32
  • 33. Prior Work on Understanding Exaggeration in Science • Manual attempts • Sumner et al. 2014 and Bratton et al. 2019: InSciOut • Manually label 823 pairs of press releases and abstracts • Labels: causal claim strength of conclusions, advice given, independent and dependent variables, etc. • Find that about 33% of press releases contain exaggerated conclusions • Major problem: ”dominant link between academia and the media” are press releases • Automatic attempts • Li et al. 2017, Yu et al. 2019, Yu et al. 2020 • Predict causal claim strength of conclusion sentences in abstract and press release • No clean paired data for evaluation 26/08/2021 33
  • 34. Our Work on Exaggeration Detection in Science • The focus of this work is predicting when a press release exaggerates a scientific paper • We focus on predicting this using the primary finding of the paper as written in the abstract and the press release • We build on previous work which focuses on causal claim strength prediction of these primary findings 26/08/2021 34
  • 35. Task Formulations and Evaluation Data 26/08/2021 35
  • 36. Formal Problem Definition 26/08/2021 36 ! = #$, &$, '$ ( ∈ [0 … -)} Dataset ! Source documents &$ Target documents #$ written about &$ Labels '$, where '$ ∈ 0 0 Downplays 1 Same 2 Exaggerates indicates if #$ exaggerates, downplays, or faithfully represents &$ Learning goal: predict ' given & and #
  • 37. Task Formulations • T1 • Entailment-like task to predict exaggeration label • Paired (press release, abstract) data 26/08/2021 37 ℒ"# = % 0 Downplays 1 Same 2 Exaggerates • T2 • Text classification task to predict causal claim strength • Unpaired press releases and abstracts • Final prediction compares strength of paired press release and abstract ℒ": = 0 No Relation 1 Correlational 2 Conditional Causal 3 Direct Causal Label Type Language Cue 0 No Relation 1 Correlational association, associated with, predictor, at high risk of 2 Conditional causal increase, decrease, lead to, effect on, contribute to, result in (Cues indicating doubt: may, might, appear to, probably) 3 Direct causal increase, decrease, lead to, effective on, contribute to, reduce, can Li et al. 2017
  • 38. Evaluation Dataset Creation 26/08/2021 38 Start with the 823 labeled pairs from Sumner et al. 2014 and Bratton et al. 2019 (InSciOut) Collect original abstract text from Semantic Scholar Match original conclusion sentences to paraphrased annotations via ROUGE score Manually inspect and discard missing or incorrect abstracts ! Downplays +, < +. Same +, = +. Exaggerates +, > +. Final label: compare annotated claim strength (+, for press release, +. for abstract) Total data: 663 pairs (100 training, 553 test)
  • 39. Few-Shot Learning: Multi-Task PET (MT-PET) 26/08/2021 39
  • 40. PET (Schick et al. 2020) 26/08/2021 40 Eating chocolate causes happiness ! 0.01 0.21 0.15 '. () 0 1 2 3 Traditional Classifier Eating chocolate causes happiness. The claim strength is [MASK] ℳ 0.01 0.21 0.15 '. () PET m edium estim ated cautious distorted Pattern: transform the input to a cloze-style question Verbalizer: predict tokens from the language model which reflect the data’s labels Large pretrained language model +, +- +. ℳ, ℳ- ℳ. / ! 0 / Soft Labels KL-Divergence Loss (Unlabelled)
  • 41. MT-PET 26/08/2021 41 Eating chocolate causes happiness. The claim strength is [MASK] ℳ 0.01 0.21 0.15 '. () m edium estim ated cautious distorted Scientists claim eating chocolate sometimes causes happiness. Reporters claim eating chocolate causes happiness. The reporters claims are [MASK] 0.01 0.05 '. *+ prelim inary identical naive ,- ,. ,- / ℳ/ 0- 1 2- 0- Soft Labels KL-Divergence Loss (Unlabelled) ,. / 2. ,- 3 ℳ3 2- ,. 3 2.
  • 43. T1 (Exaggeration Detection) with MT-PET 26/08/2021 43 28,06 33,1 29,05 41,9 39,87 39,12 47,8 47,99 47,35 25 30 35 40 45 50 P R F1 Supervised PET MT-PET Substantial improvements when using PET (10 points) Further improvements with MT-PET (8 points) Demonstrates transfer of knowledge from claim strength prediction to exaggeration prediction
  • 44. T2 (Claim Strength Prediction) with MT-PET 26/08/2021 44 49,28 51,07 49,03 55,76 58,58 56,57 56,68 60,13 57,44 45 50 55 60 P R F1 Supervised PET MT-PET 58,2 59,99 58,66 58,53 61,84 60,45 60,09 61,11 45 50 55 60 P R F1 Supervised PET MT-PET MT-PET outperforms PET in both scenarios 200 samples from T2, 100 samples from T1 4500 samples from T2, 100 samples from T1
  • 45. T2 (Claim Strength Prediction) with MT-PET 26/08/2021 45 49,28 51,07 49,03 55,76 58,58 56,57 56,68 60,13 57,44 45 50 55 60 P R F1 Supervised PET MT-PET 58,2 59,99 58,66 58,53 61,84 60,45 60,09 61,11 45 50 55 60 P R F1 Supervised PET MT-PET MT-PET with 200 samples approaches supervised performance with 4,500 samples 200 samples from T2, 100 samples from T1 4500 samples from T2, 100 samples from T1
  • 46. Error Analysis • All models: • disproportionately get pairs involving direct causal claims incorrect • do best for correlational claims from abstracts and claims from press releases which are correlational or stronger • MT-PET: • helps the most for the most difficult category -- causal claims 26/08/2021 46
  • 47. Summary • We formalize the problem of scientific exaggeration detection, providing two task formulations for the problem • We curate a set of benchmark data to evaluate automatic methods for performing the task • We propose MT-PET, a few-shot learning method based on PET, which we demonstrate outperforms strong baselines 26/08/2021 47
  • 48. Overview of Today’s Talk • Introduction • The Life Cycle of Scientific Research • Part 1: Cite-Worthiness Detection • The CiteWorth dataset • Methods for cite-worthiness detection • Part 2: Exaggeration Detection • Task framing • Semi-supervised learning for exaggeration detection • Conclusion • Future research challenges
  • 50. Supporting the Life Cycle of Research 26/08/2021 50 Reviewing Support Citation Analysis Writing Assistance Information Discovery Conducting Experiments Paper Writing Peer Review Research Impact Tracking Information Extraction Summarisa tion Citation Prediction Reviewer Matching Review Score Prediction Citation Prediction Citation Trend Analysis
  • 51. Supporting the Life Cycle of Research 26/08/2021 51 Reviewing Support Citation Analysis Writing Assistance Information Discovery Conducting Experiments Paper Writing Peer Review Research Impact Tracking Information Extraction Summarisa tion Citation Prediction Credibility Detection Reviewer Matching Review Score Prediction Citation Prediction Citation Trend Analysis NEW
  • 52. Overall Take-Aways • Why scholarly document processing? • Supporting the life cycle of research, from information discovery to research impact tracking • Why credibility detection for scholarly communication? • Detect claims which should be backed up by evidence (cite-worthiness detection) • Detect inconsistencies between primary and secondary sources of information (exaggeration detection)
  • 53. Overall Take-Aways • Overarching challenges • Difficult NLP tasks (require understanding of pragmatics) • Domain effects, importance of context pose further challenges • Not well-studied yet • Scarcity of available benchmarks • Many opportunities for future work • Explore more different settings • Gather more datasets • Methods for domain adaptation & few-shot learning • Tools for journalists & authors
  • 55. Acknowledgements 55 CopeNLU https://copenlu.github.io/ This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 801199. PhD student: Dustin Wright
  • 56. Presented Papers Isabelle Augenstein. Determining the Credibility of Science Communication. SDP workshop, 2021. Dustin Wright, Isabelle Augenstein. CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding. ACL Findings, 2021. Dustin Wright, Isabelle Augenstein. Semi-Supervised Exaggeration Detection of Health Science Press Releases. EMNLP, 2021.