3. IDS Lab.
Motivation
- Automatic fake news detection is a challenging problem in
deception detection, and it has tremendous real-world
political and social impacts.
4. IDS Lab.
Approaches for fake news detection
1. (Linguistic) characteristic based approach
2. Knowledge based approach
3. Epidemic model
6. IDS Lab.
Politifact.com: 6-point scale
Ratings Description
TRUE The statement is accurate and there’s nothing significant missing.
MOSTLY TRUE
The statement is accurate but needs clarification or additional
information.
HALF TRUE
The statement is partially accurate but leaves out important
details or takes things out of context.
MOSTLY FALSE
The statement contains an element of truth but ignores critical
facts that would give a different impression.
FALSE The statement is not accurate.
PANTS ON FIRE The statement is not accurate and makes a ridiculous claim.
7. IDS Lab.
Politifact.com – Examples
•Fake news classification is hard to resolve, easy to mislead.
•For example:
• Do you think this sentence is true or false?
8. IDS Lab.
Politifact.com – Examples
•Fake news classification is hard to resolve, easy to mislead.
•For example:
• Do you think this sentence is true or false?
-> The statement is true as stated, though only because the
speaker hedged their meaning with the quantifier just.
9. IDS Lab.
Preview
• This paper describes both fact-checking and fake
news classification.
• What's the difference between fact-checking and
fake news classification?
14. IDS Lab.
Fake News Analysis (Linguistic-based)
1.The Different types of articles
Article type Description
Satire
Mimics real news but still cues the reader that it is not meant to be taken
seriously
Hoax Convinces readers of the validity of a paranoia-fueled story
Propaganda Misleads readers so that they believe a particular political/social agenda
16. IDS Lab.
Linguistic features in fake news.
(1) First-person and second-person pronouns are used more in less
reliable or deceptive news types.
- Editors at trustworthy sources are possibly more rigorous about
removing language that seems too personal,
Lexicon markers Ratio Text example Max
Swear(LIWC) 7.00 ... Ms. Rand, who has been damned to eternal torment .. S
2nd pers (You) 6.73 You would instinctively justify and rationalize your .. P
Modal adverb 2.63 .. investigation of Hillary Clinton was inevitably linked … S
Action adverb 2.18 ... if one foolishly assumes the US State Department ... S
1st pers singular (I) 2.06 I think its against the law of the land to finance riots ... S
Manner adverb 1.87 ... consequences of deliberately engineering extinction. S
Sexual 1.80 ... added that his daughter better not be pregnant. S
See (LIWC) 1.52 New Yorkers ... can bask in the beautiful image ... H
17. IDS Lab.
Linguistic features in fake news.
(2) Words that can be used to exaggerate – subjectives, superlatives,
and modal adverbs – are all used more by fake news.
In contrast, Words used to offer concrete figures – comparatives,
money, and numbers – appear more in truthful news.
Lexicon markers Ratio Text example Max
Strong subjective 1.51 He has one of the most brilliant minds in basketball. H
Superlatives 1.17 Fresh water is the single most important natural … P
Number (LIWC) 0.43 ... 7 million foreign tourists coming to the country in 2010 S
Money (LIWC) 0.57 He has proposed to lift the state sales tax on groceries P
18. IDS Lab.
Linguistic features in fake news.
(3) We found that one distinctive feature of satire compared to other
types of untrusted news is its prominent use of adverbs
- Editors at trustworthy sources are possibly more rigorous about
removing language that seems too personal,
Lexicon markers Ratio Text example Max
Swear(LIWC) 7.00 ... Ms. Rand, who has been damned to eternal torment .. S
2nd pers (You) 6.73 You would instinctively justify and rationalize your .. P
Modal adverb 2.63 .. investigation of Hillary Clinton was inevitably linked … S
Action adverb 2.18 ... if one foolishly assumes the US State Department ... S
1st pers singular (I) 2.06 I think its against the law of the land to finance riots ... S
Manner adverb 1.87 ... consequences of deliberately engineering extinction. S
Sexual 1.80 ... added that his daughter better not be pregnant. S
See (LIWC) 1.52 New Yorkers ... can bask in the beautiful image ... H
19. IDS Lab.
News Reliability Prediction
- Predicting the reliability of the news article into four categories:
trusted, satire, hoax, or propaganda.
- Using Max-Entropy classifier with L2 regularization on n-gram tf-
idf feature vectors.
Data Sources Random MaxEnt
Dev In-domain 0.26 0.91
Test Out-of-domain 0.26 0.65
21. IDS Lab.
Liar, Liar Pants on Fire
- Statistical approaches to combating fake news has been dramatically
limited by the lack of labeled benchmark datasets.
- LIAR dataset consists of a decade-long, 12.8K manually labeled short
statements in various contexts from POLITIFACT.COM.
EX) Some random excerpts from the LIAR dataset
- LIAR dataset also include a rich set of meta-data for each speaker
(party affiliations, current job, home state, and credit history)
Statement
“The last quarter, it was just announced, our gross domestic product was below
zero. Who ever heard of this? Its never below zero.”
Speaker Donald Trump
Context presidential announcement speech
Label Pants on Fire
Justification
According to Bureau of Economic Analysis and National Bureau of Economic
Research, the growth in the gross domestic product has been below zero 42
times over 68 years. Thats a lot more than “never.” We rate his claim Pants on
Fire!
22. IDS Lab.
Proposed Model
- Hybrid Convolutional Neural Networks framework for integrating
text and meta-data.
23. IDS Lab.
Proposed Model – results
Model Valid. Test
Majority 0.204 0.208
SVMs 0.258 0.255
Logistic Regression 0.257 0.247
Bi-LSTMs 0.223 0.233
CNNs 0.260 0.270
Hybrid CNNs
Text + Subject 0.263 0.235
Text + Speaker 0.277 0.248
Text + Job 0.270 0.258
Text + State 0.246 0.256
Text + Party 0.259 0.248
Text + Context 0.251 0.243
Text + History 0.246 0.241
Text + All 0.247 0.274
24. IDS Lab.
결론
•Fake news detection에 대한 관심은 점점 증가하고 있음. But, Machine
learning(Deep learning)을 사용한 연구는 아직 미비함.
•Fake news detection과 Fact-checking 혼동되는 경향이 있음.
- 다른 접근 방식이 필요함
• Fact-checking은 linguistic한 접근으로는 높은 성능이 나오지 않음.
25. IDS Lab.
한글 LIAR dataset?
•http://factcheck.snu.ac.kr/v2/facts?part=all
Editor's Notes
Linguistic characteristics
Knowledge-based approach
Epidemic model
논문의 예시 붙여서
TRUE – The statement is accurate and there’s nothing significant missing.
MOSTLY TRUE – The statement is accurate but needs clarification or additional information.
HALF TRUE – The statement is partially accurate but leaves out important details or takes things out of context.
MOSTLY FALSE – The statement contains an element of truth but ignores critical facts that would give a different impression.
FALSE – The statement is not accurate.
PANTS ON FIRE – The statement is not accurate and makes a ridiculous claim.
논문의 예시 붙여서
사이트 설명
사이트 설명
Linguistic characteristics
Knowledge-based approach
Epidemic model
Statement에 대한 모델링
Article에 대해 추출
Linguistic characteristics
Knowledge-based approach
Epidemic model
Linguistic characteristics
Knowledge-based approach
Epidemic model
Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
Statement에 대한 모델링
Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).