Analyzing language in fake news and political fact checking

Analyzing Language in Fake News and Political
Fact-Checking
&
A New Benchmark Dataset for Fake News
Detection

IDS Lab.
Motivation
- Automatic fake news detection is a challenging problem in
deception detection, and it has tremendous real-world
political and social impacts.

IDS Lab.
Approaches for fake news detection
1. (Linguistic) characteristic based approach
2. Knowledge based approach
3. Epidemic model

IDS Lab.
Politifact.com
- PolitiFact is a site led by Tampa Bay Times journalists who actively
fact-check suspicious statement.

IDS Lab.
Politifact.com: 6-point scale
Ratings Description
TRUE The statement is accurate and there’s nothing significant missing.
MOSTLY TRUE
The statement is accurate but needs clarification or additional
information.
HALF TRUE
The statement is partially accurate but leaves out important
details or takes things out of context.
MOSTLY FALSE
The statement contains an element of truth but ignores critical
facts that would give a different impression.
FALSE The statement is not accurate.
PANTS ON FIRE The statement is not accurate and makes a ridiculous claim.

IDS Lab.
Politifact.com – Examples
•Fake news classification is hard to resolve, easy to mislead.
•For example:
• Do you think this sentence is true or false?

IDS Lab.
Politifact.com – Examples
•Fake news classification is hard to resolve, easy to mislead.
•For example:
• Do you think this sentence is true or false?
-> The statement is true as stated, though only because the
speaker hedged their meaning with the quantifier just.

IDS Lab.
Preview
• This paper describes both fact-checking and fake
news classification.
• What's the difference between fact-checking and
fake news classification?

Predicting Truthfulness
(Model-based)
10

IDS Lab.
1.Model setting
2.Data setting
- 4366 labelled statements that direct quotes by the original
speaker from PolitiFact/PunditFact.
Experiments – Settings
Model Input
MaxEnt TF-idf vectors
Naïve Bayes TF-idf vectors + LIWC* feature vectors
LSTM
GLOVE
GLOVE + LIWC feature vectors
More True More False
True
Mostly
True
Half
True
Mostly
False
False
Pants-
on-fire
6-class 20% 21% 21% 14% 17% 7%
2-class 62% 38%
*Linguistic Inquiry and Word Count (LIWC)

IDS Lab.
Experiments – Reuslts
Model Feature 2-CLASS 6-CLASS
Majority Baseline 0.39 0.06
Naïve Bayes Text+LIWC 0.56 0.17
MaxEnt Text+LIWC 0.55 0.22
LSTM
Text+LIWC
0.52 0.19
LSTM Text 0.56 0.20

Fake News Analysis
(Linguistic-based)
13

IDS Lab.
Fake News Analysis (Linguistic-based)
1.The Different types of articles
Article type Description
Satire
Mimics real news but still cues the reader that it is not meant to be taken
seriously
Hoax Convinces readers of the validity of a paranoia-fueled story
Propaganda Misleads readers so that they believe a particular political/social agenda

IDS Lab.
Fake News Analysis (Linguistic-based)

IDS Lab.
Linguistic features in fake news.
(1) First-person and second-person pronouns are used more in less
reliable or deceptive news types.
- Editors at trustworthy sources are possibly more rigorous about
removing language that seems too personal,
Lexicon markers Ratio Text example Max
Swear(LIWC) 7.00 ... Ms. Rand, who has been damned to eternal torment .. S
2nd pers (You) 6.73 You would instinctively justify and rationalize your .. P
Modal adverb 2.63 .. investigation of Hillary Clinton was inevitably linked … S
Action adverb 2.18 ... if one foolishly assumes the US State Department ... S
1st pers singular (I) 2.06 I think its against the law of the land to finance riots ... S
Manner adverb 1.87 ... consequences of deliberately engineering extinction. S
Sexual 1.80 ... added that his daughter better not be pregnant. S
See (LIWC) 1.52 New Yorkers ... can bask in the beautiful image ... H

IDS Lab.
(2) Words that can be used to exaggerate – subjectives, superlatives,
and modal adverbs – are all used more by fake news.
In contrast, Words used to offer concrete figures – comparatives,
money, and numbers – appear more in truthful news.
Strong subjective 1.51 He has one of the most brilliant minds in basketball. H
Superlatives 1.17 Fresh water is the single most important natural … P
Number (LIWC) 0.43 ... 7 million foreign tourists coming to the country in 2010 S
Money (LIWC) 0.57 He has proposed to lift the state sales tax on groceries P

IDS Lab.
(3) We found that one distinctive feature of satire compared to other
types of untrusted news is its prominent use of adverbs
- Editors at trustworthy sources are possibly more rigorous about
removing language that seems too personal,
Swear(LIWC) 7.00 ... Ms. Rand, who has been damned to eternal torment .. S
2nd pers (You) 6.73 You would instinctively justify and rationalize your .. P
Modal adverb 2.63 .. investigation of Hillary Clinton was inevitably linked … S
Action adverb 2.18 ... if one foolishly assumes the US State Department ... S
1st pers singular (I) 2.06 I think its against the law of the land to finance riots ... S
Manner adverb 1.87 ... consequences of deliberately engineering extinction. S
Sexual 1.80 ... added that his daughter better not be pregnant. S
See (LIWC) 1.52 New Yorkers ... can bask in the beautiful image ... H

IDS Lab.
News Reliability Prediction
- Predicting the reliability of the news article into four categories:
trusted, satire, hoax, or propaganda.
- Using Max-Entropy classifier with L2 regularization on n-gram tf-
idf feature vectors.
Data Sources Random MaxEnt
Dev In-domain 0.26 0.91
Test Out-of-domain 0.26 0.65

Introduction to fake news
dataset(LIAR)
20

IDS Lab.
Liar, Liar Pants on Fire
- Statistical approaches to combating fake news has been dramatically
limited by the lack of labeled benchmark datasets.
- LIAR dataset consists of a decade-long, 12.8K manually labeled short
statements in various contexts from POLITIFACT.COM.
EX) Some random excerpts from the LIAR dataset
- LIAR dataset also include a rich set of meta-data for each speaker
(party affiliations, current job, home state, and credit history)
Statement
“The last quarter, it was just announced, our gross domestic product was below
zero. Who ever heard of this? Its never below zero.”
Speaker Donald Trump
Context presidential announcement speech
Label Pants on Fire
Justification
According to Bureau of Economic Analysis and National Bureau of Economic
Research, the growth in the gross domestic product has been below zero 42
times over 68 years. Thats a lot more than “never.” We rate his claim Pants on
Fire!

IDS Lab.
Proposed Model
- Hybrid Convolutional Neural Networks framework for integrating
text and meta-data.

IDS Lab.
Proposed Model – results
Model Valid. Test
Majority 0.204 0.208
SVMs 0.258 0.255
Logistic Regression 0.257 0.247
Bi-LSTMs 0.223 0.233
CNNs 0.260 0.270
Hybrid CNNs
Text + Subject 0.263 0.235
Text + Speaker 0.277 0.248
Text + Job 0.270 0.258
Text + State 0.246 0.256
Text + Party 0.259 0.248
Text + Context 0.251 0.243
Text + History 0.246 0.241
Text + All 0.247 0.274

IDS Lab.
결론
•Fake news detection에 대한 관심은 점점 증가하고 있음. But, Machine
learning(Deep learning)을 사용한 연구는 아직 미비함.
•Fake news detection과 Fact-checking 혼동되는 경향이 있음.
- 다른 접근 방식이 필요함
• Fact-checking은 linguistic한 접근으로는 높은 성능이 나오지 않음.

IDS Lab.
한글 LIAR dataset?
•http://factcheck.snu.ac.kr/v2/facts?part=all

Analyzing language in fake news and political fact checking

Recommended

Recommended

More Related Content

Similar to Analyzing language in fake news and political fact checking

Similar to Analyzing language in fake news and political fact checking (20)

Recently uploaded

Recently uploaded (20)

Analyzing language in fake news and political fact checking

Editor's Notes