SlideShare a Scribd company logo
1 of 25
Analyzing Language in Fake News and Political
Fact-Checking
&
A New Benchmark Dataset for Fake News
Detection
Intro
2
IDS Lab.
Motivation
- Automatic fake news detection is a challenging problem in
deception detection, and it has tremendous real-world
political and social impacts.
IDS Lab.
Approaches for fake news detection
1. (Linguistic) characteristic based approach
2. Knowledge based approach
3. Epidemic model
IDS Lab.
Politifact.com
- PolitiFact is a site led by Tampa Bay Times journalists who actively
fact-check suspicious statement.
IDS Lab.
Politifact.com: 6-point scale
Ratings Description
TRUE The statement is accurate and there’s nothing significant missing.
MOSTLY TRUE
The statement is accurate but needs clarification or additional
information.
HALF TRUE
The statement is partially accurate but leaves out important
details or takes things out of context.
MOSTLY FALSE
The statement contains an element of truth but ignores critical
facts that would give a different impression.
FALSE The statement is not accurate.
PANTS ON FIRE The statement is not accurate and makes a ridiculous claim.
IDS Lab.
Politifact.com – Examples
•Fake news classification is hard to resolve, easy to mislead.
•For example:
• Do you think this sentence is true or false?
IDS Lab.
Politifact.com – Examples
•Fake news classification is hard to resolve, easy to mislead.
•For example:
• Do you think this sentence is true or false?
-> The statement is true as stated, though only because the
speaker hedged their meaning with the quantifier just.
IDS Lab.
Preview
• This paper describes both fact-checking and fake
news classification.
• What's the difference between fact-checking and
fake news classification?
Predicting Truthfulness
(Model-based)
10
IDS Lab.
1.Model setting
2.Data setting
- 4366 labelled statements that direct quotes by the original
speaker from PolitiFact/PunditFact.
Experiments – Settings
Model Input
MaxEnt TF-idf vectors
Naïve Bayes TF-idf vectors + LIWC* feature vectors
LSTM
GLOVE
GLOVE + LIWC feature vectors
More True More False
True
Mostly
True
Half
True
Mostly
False
False
Pants-
on-fire
6-class 20% 21% 21% 14% 17% 7%
2-class 62% 38%
*Linguistic Inquiry and Word Count (LIWC)
IDS Lab.
Experiments – Reuslts
Model Feature 2-CLASS 6-CLASS
Majority Baseline 0.39 0.06
Naïve Bayes Text+LIWC 0.56 0.17
MaxEnt Text+LIWC 0.55 0.22
LSTM
Text+LIWC
0.52 0.19
LSTM Text 0.56 0.20
Fake News Analysis
(Linguistic-based)
13
IDS Lab.
Fake News Analysis (Linguistic-based)
1.The Different types of articles
Article type Description
Satire
Mimics real news but still cues the reader that it is not meant to be taken
seriously
Hoax Convinces readers of the validity of a paranoia-fueled story
Propaganda Misleads readers so that they believe a particular political/social agenda
IDS Lab.
Fake News Analysis (Linguistic-based)
IDS Lab.
Linguistic features in fake news.
(1) First-person and second-person pronouns are used more in less
reliable or deceptive news types.
- Editors at trustworthy sources are possibly more rigorous about
removing language that seems too personal,
Lexicon markers Ratio Text example Max
Swear(LIWC) 7.00 ... Ms. Rand, who has been damned to eternal torment .. S
2nd pers (You) 6.73 You would instinctively justify and rationalize your .. P
Modal adverb 2.63 .. investigation of Hillary Clinton was inevitably linked … S
Action adverb 2.18 ... if one foolishly assumes the US State Department ... S
1st pers singular (I) 2.06 I think its against the law of the land to finance riots ... S
Manner adverb 1.87 ... consequences of deliberately engineering extinction. S
Sexual 1.80 ... added that his daughter better not be pregnant. S
See (LIWC) 1.52 New Yorkers ... can bask in the beautiful image ... H
IDS Lab.
Linguistic features in fake news.
(2) Words that can be used to exaggerate – subjectives, superlatives,
and modal adverbs – are all used more by fake news.
In contrast, Words used to offer concrete figures – comparatives,
money, and numbers – appear more in truthful news.
Lexicon markers Ratio Text example Max
Strong subjective 1.51 He has one of the most brilliant minds in basketball. H
Superlatives 1.17 Fresh water is the single most important natural … P
Number (LIWC) 0.43 ... 7 million foreign tourists coming to the country in 2010 S
Money (LIWC) 0.57 He has proposed to lift the state sales tax on groceries P
IDS Lab.
Linguistic features in fake news.
(3) We found that one distinctive feature of satire compared to other
types of untrusted news is its prominent use of adverbs
- Editors at trustworthy sources are possibly more rigorous about
removing language that seems too personal,
Lexicon markers Ratio Text example Max
Swear(LIWC) 7.00 ... Ms. Rand, who has been damned to eternal torment .. S
2nd pers (You) 6.73 You would instinctively justify and rationalize your .. P
Modal adverb 2.63 .. investigation of Hillary Clinton was inevitably linked … S
Action adverb 2.18 ... if one foolishly assumes the US State Department ... S
1st pers singular (I) 2.06 I think its against the law of the land to finance riots ... S
Manner adverb 1.87 ... consequences of deliberately engineering extinction. S
Sexual 1.80 ... added that his daughter better not be pregnant. S
See (LIWC) 1.52 New Yorkers ... can bask in the beautiful image ... H
IDS Lab.
News Reliability Prediction
- Predicting the reliability of the news article into four categories:
trusted, satire, hoax, or propaganda.
- Using Max-Entropy classifier with L2 regularization on n-gram tf-
idf feature vectors.
Data Sources Random MaxEnt
Dev In-domain 0.26 0.91
Test Out-of-domain 0.26 0.65
Introduction to fake news
dataset(LIAR)
20
IDS Lab.
Liar, Liar Pants on Fire
- Statistical approaches to combating fake news has been dramatically
limited by the lack of labeled benchmark datasets.
- LIAR dataset consists of a decade-long, 12.8K manually labeled short
statements in various contexts from POLITIFACT.COM.
EX) Some random excerpts from the LIAR dataset
- LIAR dataset also include a rich set of meta-data for each speaker
(party affiliations, current job, home state, and credit history)
Statement
“The last quarter, it was just announced, our gross domestic product was below
zero. Who ever heard of this? Its never below zero.”
Speaker Donald Trump
Context presidential announcement speech
Label Pants on Fire
Justification
According to Bureau of Economic Analysis and National Bureau of Economic
Research, the growth in the gross domestic product has been below zero 42
times over 68 years. Thats a lot more than “never.” We rate his claim Pants on
Fire!
IDS Lab.
Proposed Model
- Hybrid Convolutional Neural Networks framework for integrating
text and meta-data.
IDS Lab.
Proposed Model – results
Model Valid. Test
Majority 0.204 0.208
SVMs 0.258 0.255
Logistic Regression 0.257 0.247
Bi-LSTMs 0.223 0.233
CNNs 0.260 0.270
Hybrid CNNs
Text + Subject 0.263 0.235
Text + Speaker 0.277 0.248
Text + Job 0.270 0.258
Text + State 0.246 0.256
Text + Party 0.259 0.248
Text + Context 0.251 0.243
Text + History 0.246 0.241
Text + All 0.247 0.274
IDS Lab.
결론
•Fake news detection에 대한 관심은 점점 증가하고 있음. But, Machine
learning(Deep learning)을 사용한 연구는 아직 미비함.
•Fake news detection과 Fact-checking 혼동되는 경향이 있음.
- 다른 접근 방식이 필요함
• Fact-checking은 linguistic한 접근으로는 높은 성능이 나오지 않음.
IDS Lab.
한글 LIAR dataset?
•http://factcheck.snu.ac.kr/v2/facts?part=all

More Related Content

Similar to Analyzing language in fake news and political fact checking

Assignment DetailsBecause crime is one of the more appealing theme.docx
Assignment DetailsBecause crime is one of the more appealing theme.docxAssignment DetailsBecause crime is one of the more appealing theme.docx
Assignment DetailsBecause crime is one of the more appealing theme.docx
rosemariebrayshaw
 
Modern law enforcement operations and strategies are driven by best .docx
Modern law enforcement operations and strategies are driven by best .docxModern law enforcement operations and strategies are driven by best .docx
Modern law enforcement operations and strategies are driven by best .docx
clairbycraft
 
ASIS NYC InT Presentation
ASIS NYC InT PresentationASIS NYC InT Presentation
ASIS NYC InT Presentation
Daniel McGarvey
 
Individual Project #1You are an intelligence analyst for the Feder.docx
Individual Project #1You are an intelligence analyst for the Feder.docxIndividual Project #1You are an intelligence analyst for the Feder.docx
Individual Project #1You are an intelligence analyst for the Feder.docx
widdowsonerica
 
Review My Essay
Review My EssayReview My Essay
Review My Essay
Jessica Falcon
 
Please accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docxPlease accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docx
randymartin91030
 

Similar to Analyzing language in fake news and political fact checking (20)

Know Your Adversary: Analyzing the Human Element in Evolving Cyber Threats
Know Your Adversary: Analyzing the Human Element in Evolving Cyber ThreatsKnow Your Adversary: Analyzing the Human Element in Evolving Cyber Threats
Know Your Adversary: Analyzing the Human Element in Evolving Cyber Threats
 
Rp data breach-investigation-report-2015-en_xg
Rp data breach-investigation-report-2015-en_xgRp data breach-investigation-report-2015-en_xg
Rp data breach-investigation-report-2015-en_xg
 
What's Next: The World of Fake News
What's Next: The World of Fake NewsWhat's Next: The World of Fake News
What's Next: The World of Fake News
 
2017 Data Breach Investigations Report
2017 Data Breach Investigations Report2017 Data Breach Investigations Report
2017 Data Breach Investigations Report
 
Colaboración Juan Pablo Somiedo Foreknowledge issue3r
Colaboración Juan Pablo Somiedo Foreknowledge issue3rColaboración Juan Pablo Somiedo Foreknowledge issue3r
Colaboración Juan Pablo Somiedo Foreknowledge issue3r
 
Assignment DetailsBecause crime is one of the more appealing theme.docx
Assignment DetailsBecause crime is one of the more appealing theme.docxAssignment DetailsBecause crime is one of the more appealing theme.docx
Assignment DetailsBecause crime is one of the more appealing theme.docx
 
Data Mining Online Audiences with D8A Group
Data Mining Online Audiences with D8A GroupData Mining Online Audiences with D8A Group
Data Mining Online Audiences with D8A Group
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
 
Modern law enforcement operations and strategies are driven by best .docx
Modern law enforcement operations and strategies are driven by best .docxModern law enforcement operations and strategies are driven by best .docx
Modern law enforcement operations and strategies are driven by best .docx
 
Explainability for NLP
Explainability for NLPExplainability for NLP
Explainability for NLP
 
ASIS NYC InT Presentation
ASIS NYC InT PresentationASIS NYC InT Presentation
ASIS NYC InT Presentation
 
Individual Project #1You are an intelligence analyst for the Feder.docx
Individual Project #1You are an intelligence analyst for the Feder.docxIndividual Project #1You are an intelligence analyst for the Feder.docx
Individual Project #1You are an intelligence analyst for the Feder.docx
 
Review My Essay
Review My EssayReview My Essay
Review My Essay
 
Statistics in Research Papers
Statistics in Research PapersStatistics in Research Papers
Statistics in Research Papers
 
Please accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docxPlease accept this assignment 25 pages minimum double space courie.docx
Please accept this assignment 25 pages minimum double space courie.docx
 
Jason Samide - State of Security & 2016 Predictions
Jason Samide - State of Security & 2016 PredictionsJason Samide - State of Security & 2016 Predictions
Jason Samide - State of Security & 2016 Predictions
 
Violence On Television Essay. TV isnt Violent Enough Essay Example StudyHipp...
Violence On Television Essay. TV isnt Violent Enough Essay Example  StudyHipp...Violence On Television Essay. TV isnt Violent Enough Essay Example  StudyHipp...
Violence On Television Essay. TV isnt Violent Enough Essay Example StudyHipp...
 
John F Kennedy Assassination Essay.pdf
John F Kennedy Assassination Essay.pdfJohn F Kennedy Assassination Essay.pdf
John F Kennedy Assassination Essay.pdf
 
NWR: Investigative journalism: Student Collaboration
NWR: Investigative journalism: Student CollaborationNWR: Investigative journalism: Student Collaboration
NWR: Investigative journalism: Student Collaboration
 
benfords Law
benfords Lawbenfords Law
benfords Law
 

Recently uploaded

Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 

Recently uploaded (20)

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 

Analyzing language in fake news and political fact checking

  • 1. Analyzing Language in Fake News and Political Fact-Checking & A New Benchmark Dataset for Fake News Detection
  • 3. IDS Lab. Motivation - Automatic fake news detection is a challenging problem in deception detection, and it has tremendous real-world political and social impacts.
  • 4. IDS Lab. Approaches for fake news detection 1. (Linguistic) characteristic based approach 2. Knowledge based approach 3. Epidemic model
  • 5. IDS Lab. Politifact.com - PolitiFact is a site led by Tampa Bay Times journalists who actively fact-check suspicious statement.
  • 6. IDS Lab. Politifact.com: 6-point scale Ratings Description TRUE The statement is accurate and there’s nothing significant missing. MOSTLY TRUE The statement is accurate but needs clarification or additional information. HALF TRUE The statement is partially accurate but leaves out important details or takes things out of context. MOSTLY FALSE The statement contains an element of truth but ignores critical facts that would give a different impression. FALSE The statement is not accurate. PANTS ON FIRE The statement is not accurate and makes a ridiculous claim.
  • 7. IDS Lab. Politifact.com – Examples •Fake news classification is hard to resolve, easy to mislead. •For example: • Do you think this sentence is true or false?
  • 8. IDS Lab. Politifact.com – Examples •Fake news classification is hard to resolve, easy to mislead. •For example: • Do you think this sentence is true or false? -> The statement is true as stated, though only because the speaker hedged their meaning with the quantifier just.
  • 9. IDS Lab. Preview • This paper describes both fact-checking and fake news classification. • What's the difference between fact-checking and fake news classification?
  • 11. IDS Lab. 1.Model setting 2.Data setting - 4366 labelled statements that direct quotes by the original speaker from PolitiFact/PunditFact. Experiments – Settings Model Input MaxEnt TF-idf vectors Naïve Bayes TF-idf vectors + LIWC* feature vectors LSTM GLOVE GLOVE + LIWC feature vectors More True More False True Mostly True Half True Mostly False False Pants- on-fire 6-class 20% 21% 21% 14% 17% 7% 2-class 62% 38% *Linguistic Inquiry and Word Count (LIWC)
  • 12. IDS Lab. Experiments – Reuslts Model Feature 2-CLASS 6-CLASS Majority Baseline 0.39 0.06 Naïve Bayes Text+LIWC 0.56 0.17 MaxEnt Text+LIWC 0.55 0.22 LSTM Text+LIWC 0.52 0.19 LSTM Text 0.56 0.20
  • 14. IDS Lab. Fake News Analysis (Linguistic-based) 1.The Different types of articles Article type Description Satire Mimics real news but still cues the reader that it is not meant to be taken seriously Hoax Convinces readers of the validity of a paranoia-fueled story Propaganda Misleads readers so that they believe a particular political/social agenda
  • 15. IDS Lab. Fake News Analysis (Linguistic-based)
  • 16. IDS Lab. Linguistic features in fake news. (1) First-person and second-person pronouns are used more in less reliable or deceptive news types. - Editors at trustworthy sources are possibly more rigorous about removing language that seems too personal, Lexicon markers Ratio Text example Max Swear(LIWC) 7.00 ... Ms. Rand, who has been damned to eternal torment .. S 2nd pers (You) 6.73 You would instinctively justify and rationalize your .. P Modal adverb 2.63 .. investigation of Hillary Clinton was inevitably linked … S Action adverb 2.18 ... if one foolishly assumes the US State Department ... S 1st pers singular (I) 2.06 I think its against the law of the land to finance riots ... S Manner adverb 1.87 ... consequences of deliberately engineering extinction. S Sexual 1.80 ... added that his daughter better not be pregnant. S See (LIWC) 1.52 New Yorkers ... can bask in the beautiful image ... H
  • 17. IDS Lab. Linguistic features in fake news. (2) Words that can be used to exaggerate – subjectives, superlatives, and modal adverbs – are all used more by fake news. In contrast, Words used to offer concrete figures – comparatives, money, and numbers – appear more in truthful news. Lexicon markers Ratio Text example Max Strong subjective 1.51 He has one of the most brilliant minds in basketball. H Superlatives 1.17 Fresh water is the single most important natural … P Number (LIWC) 0.43 ... 7 million foreign tourists coming to the country in 2010 S Money (LIWC) 0.57 He has proposed to lift the state sales tax on groceries P
  • 18. IDS Lab. Linguistic features in fake news. (3) We found that one distinctive feature of satire compared to other types of untrusted news is its prominent use of adverbs - Editors at trustworthy sources are possibly more rigorous about removing language that seems too personal, Lexicon markers Ratio Text example Max Swear(LIWC) 7.00 ... Ms. Rand, who has been damned to eternal torment .. S 2nd pers (You) 6.73 You would instinctively justify and rationalize your .. P Modal adverb 2.63 .. investigation of Hillary Clinton was inevitably linked … S Action adverb 2.18 ... if one foolishly assumes the US State Department ... S 1st pers singular (I) 2.06 I think its against the law of the land to finance riots ... S Manner adverb 1.87 ... consequences of deliberately engineering extinction. S Sexual 1.80 ... added that his daughter better not be pregnant. S See (LIWC) 1.52 New Yorkers ... can bask in the beautiful image ... H
  • 19. IDS Lab. News Reliability Prediction - Predicting the reliability of the news article into four categories: trusted, satire, hoax, or propaganda. - Using Max-Entropy classifier with L2 regularization on n-gram tf- idf feature vectors. Data Sources Random MaxEnt Dev In-domain 0.26 0.91 Test Out-of-domain 0.26 0.65
  • 20. Introduction to fake news dataset(LIAR) 20
  • 21. IDS Lab. Liar, Liar Pants on Fire - Statistical approaches to combating fake news has been dramatically limited by the lack of labeled benchmark datasets. - LIAR dataset consists of a decade-long, 12.8K manually labeled short statements in various contexts from POLITIFACT.COM. EX) Some random excerpts from the LIAR dataset - LIAR dataset also include a rich set of meta-data for each speaker (party affiliations, current job, home state, and credit history) Statement “The last quarter, it was just announced, our gross domestic product was below zero. Who ever heard of this? Its never below zero.” Speaker Donald Trump Context presidential announcement speech Label Pants on Fire Justification According to Bureau of Economic Analysis and National Bureau of Economic Research, the growth in the gross domestic product has been below zero 42 times over 68 years. Thats a lot more than “never.” We rate his claim Pants on Fire!
  • 22. IDS Lab. Proposed Model - Hybrid Convolutional Neural Networks framework for integrating text and meta-data.
  • 23. IDS Lab. Proposed Model – results Model Valid. Test Majority 0.204 0.208 SVMs 0.258 0.255 Logistic Regression 0.257 0.247 Bi-LSTMs 0.223 0.233 CNNs 0.260 0.270 Hybrid CNNs Text + Subject 0.263 0.235 Text + Speaker 0.277 0.248 Text + Job 0.270 0.258 Text + State 0.246 0.256 Text + Party 0.259 0.248 Text + Context 0.251 0.243 Text + History 0.246 0.241 Text + All 0.247 0.274
  • 24. IDS Lab. 결론 •Fake news detection에 대한 관심은 점점 증가하고 있음. But, Machine learning(Deep learning)을 사용한 연구는 아직 미비함. •Fake news detection과 Fact-checking 혼동되는 경향이 있음. - 다른 접근 방식이 필요함 • Fact-checking은 linguistic한 접근으로는 높은 성능이 나오지 않음.
  • 25. IDS Lab. 한글 LIAR dataset? •http://factcheck.snu.ac.kr/v2/facts?part=all

Editor's Notes

  1. Linguistic characteristics Knowledge-based approach Epidemic model
  2. 논문의 예시 붙여서 TRUE – The statement is accurate and there’s nothing significant missing. MOSTLY TRUE – The statement is accurate but needs clarification or additional information. HALF TRUE – The statement is partially accurate but leaves out important details or takes things out of context. MOSTLY FALSE – The statement contains an element of truth but ignores critical facts that would give a different impression. FALSE – The statement is not accurate. PANTS ON FIRE – The statement is not accurate and makes a ridiculous claim.
  3. 논문의 예시 붙여서
  4. 사이트 설명
  5. 사이트 설명
  6. Linguistic characteristics Knowledge-based approach Epidemic model
  7. Statement에 대한 모델링
  8. Article에 대해 추출
  9. Linguistic characteristics Knowledge-based approach Epidemic model
  10. Linguistic characteristics Knowledge-based approach Epidemic model
  11. Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
  12. Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
  13. Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
  14. Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
  15. Statement에 대한 모델링
  16. Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
  17. Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).
  18. Manner adverb : 양 부사 양태(樣態) 부사(방식•방법 등을 나타내는 부사, carefully, fast, so, how 등).