SlideShare a Scribd company logo
1 of 25
North Campus, University of Kashmir
Delina, Baramulla - 193103
FAKE NEWS DETECTION USING VARIOUS DEEP
LEARNING MODELS: A Comparative Analysis
Project Members:
Urain Ahmad Shah (03)
Abdul Wahid (37)
Waseem Raja (39)
Project Head:
Er. Khalid Hussain
Asst. Prof. Deptt. of CSE
MAJOR PROJECT
1
 I N T R O D U C T I O N T O FA K E N E W S
 P R O B L E M S TAT E M E N T
 M O T I VAT I O N
 O B J E C T I V E S
 D ATA S E T S
 I N T R O D U C T I O N
 P R O J E C T F R A M E W O R K
 D ATA P R E - P R O C E S S I N G
 R E S U LT S A N D P E R F O R M A N C E M E T R I C S
 C O N C L U S I O N
 F U T U R E W O R K
 R E F E R E N C E S 2
Contents
3
Introduction to Fake News
 For some years, mostly since the rise of social media, fake news has become a society problem, in some occasion spreading
more and faster than the true information.
 The fake news has become a major issue which can make people believe things that are false or inaccurate. Stories, that
have been completely made up to get people to believe something that is not true. Believing fake news may make people
act in a certain way.
 The researchers estimate that at least $200 million will go toward fake news in U.S. presidential election (Fake News
Creates Real Losses | Institutional Investor).
 Back in 2013, the AP’s Twitter account was hacked and published an update about then-US President Barack Obama being
injured in an explosion. The tweet wiped out around $130 billion in stock value in a matter of minutes (Extra, Extra: How
Fake News Affects the Stock Market | Contracts-For-Difference.com).
Problem Statement
4
 As we know fake news detection is a very critical topic which comes under natural language processing, but the problem is
that no one has highlighted this topic that much. While, in the other fields of natural language processing like language
translation, text summarization and speech recognition researchers have reached the top most level. Now there are some
research papers published in this field, but the problem still lies as they are focusing only on accuracy. As they should focus
on accuracy as well as generalisability and stability of models when the context of the fake news changes.
 The main aim of our project is to ensure the generalisability of findings and proposing a best model in order to develop a
knowledge-based systems based on artificial intelligence in the detection of fake news.
Motivation
5
 Trend Micro sees three current major motivations behind fake news: political, financial gain, and character assassination.
 Political propaganda is designed to get people to change their mind about their political beliefs or some other opinion.
Fake news has a similar intent, but will use falsehoods to manipulate public opinion faster and across a wider audience.
 The most obvious financial motivation could be advertising. Social media manipulation can be used to drive traffic to a
particular site, in what is already called clickbait. However, Trend also see a danger of using fake news to manipulate
share prices.
 Character assassination by fake news could have many targets. The most obvious one is the politician. But private
individuals are also at risk. For example, Mexican journalists are routinely harassed by Twitter bots under the control of
drug cartels.
Objectives
6
 To design and compare various Deep Learning Models for performing the fake news detection task by using
three different data sets and to measure the effect of using feature extraction technique one-hot and word
embedding technique Word2Vec.
 To design and compare various Advanced pre-trained Models (Transformers) for performing the fake news
detection task by using three different datasets.
 To analyse the performance of both Deep Learning and Advanced Pre-trained models by various performance
metrics and to analyse how they differ in terms of loss and accuracy when trained on different sizes of datasets.
Datasets
7
For the sake of this work, we have collected 3 different datasets with different sizes:
 A first dataset publicly available on Kaggle, contains 20800 news. The dataset is a collection of the fake and real
news of propagated during the time of U.S. General Presidential Election-2016 (Fake News | Kaggle).
 A second dataset also publicly available on Kaggle. The dataset contains news of United States of America of the
year 2017 with 44898 news (Fake and real news dataset | Kaggle).
 A third dataset which is also publicly available on Kaggle, contains articles from credible journalism
organizations like the New York Times with 6335 news (Fake News | Kaggle).
 All the three datasets are quite balanced.
8
Project Framework
9
Data Pre-processing
10
 Data pre-processing is an important part when it comes to natural language processing. It helps to get rid of
unhelpful parts of the data, or noise, by converting all the characters to lower case, removing punctuation marks,
and removing stop words.
 The first step was to drop all the duplicate items, lowercase all the text.
 The next step was to remove stop words.
 After that, every text was split by white-space and removed suffices from words by stemming them.
Results and Performance Metrics
11
• The performance and accuracy metrics are defined with the help of the following performance parameters:
1. True Positive (TP): Number of news the model predicted to be real that are actually real. In other words, the
number of correct predictions of real news.
2. True Negative (TN): Numbers of news the model predicted to be fake that are actually fake. In other words,
the number of correct predictions of fake news.
3. False Positive (FP): Number of news the model predicted to be real that are actually fake. In other words, the
number of incorrect predictions of real news.
4. False Negative (FN): Numbers of news the model predicted to be fake that are actually real. In other words,
the number of incorrect predictions of fake news.
12
The evaluation metrics used are defined as follows:
 Accuracy: It is obtained by dividing the number of correct predictions by the total number of predictions. It gives the
overall performance of the model.
Accuracy = TP + TN
TP + TN + FP + FN
It gives the measure of total correct predictions by the model.
 Precision: This metrics gives the ratio between number of real news predicted correctly to the total number of real news.
Precision = TP
TP + FP
 Recall: It gives the true negative rate (TNR). It is the ability of the model to correctly identify the real news. It is also
called sensitivity.
Sensitivity = TP
TP + FN
 F1 Score: Also called as F-Measure or F-Score. It calculates harmonic mean of precision and recall. F1 Score is the
best if there exists a balance between precision and recall.
F1 Score = 2* Precision * Recall
Precision + Recall
Performance evaluation of Deep Learning Models
13
Model
Datasets
Dataset 1 Dataset 2 Dataset 3
A P R F1-Score A P R F1-Score A P R F1-Score
RNN
.90 .90 .90 .90 .94 .94 .94 .94 .76 .76 .76 .76
LSTM
.91 .91 .91 .91 .94 .94 .94 .94 .77 .77 .77 .77
Stacked LSTM
.92 .92 .92 .92 .93 .93 .93 .93 .76 .77 .76 .76
Bi-LSTM
.91 .91 .91 .91 .93 .93 .93 .93 .78 .78 .78 .78
Performance of deep learning models using one-hot technique for feature extraction
14
Model
Datasets
Dataset 1 Dataset 2 Dataset 3
A P R F1-Score A P R F1-Score A P R F1-Score
RNN
.89 .90 .90 .89 .94 .94 .94 .94 .57 .58 .57 .57
LSTM
.89 .89 .89 .89 .96 .96 .96 .96 .56 .58 .56 .54
Stacked LSTM
.91 .91 .92 .91 .96 .96 .96 .96 .54 .54 .54 .53
Bi-LSTM
.90 .90 .91 .90 .96 .96 .96 .96 .55 .56 .55 .54
Performance of deep learning models using Word2Vec technique for word embedding
15
RNN LSTM Stacked LSTM Bi-LSTM
Dataset 1
One-hot 3093 326
184 2432
3155 264
320 2296
3122 297
198 2418
3083 336
216 2400
Word2Vec 444 72
40 359
468 48
40 359
436 80
1 398
446 70
29 370
Dataset 2
One-hot 6709 330
605 7173
6680 353
557 7221
6577 462
523 7255
6559 480
513 7265
Word2Vec 980 121
18 1126
1080 21
68 1076
1051 50
34 1110
1064 37
46 1098
Dataset 3
One-hot 819 252
275 745
827 244
249 771
856 215
276 744
888 183
324 696
Word2Vec 69 89
50 109
86 72
61 98
59 99
41 118
77 81
63 96
Confusion Matrix of Deep learning models
16
Comparison of validation loss in all deep learning models with different dataset size.
17
ROC of dataset 1 ROC of dataset 2 ROC of dataset 3
ROC Curve Classifiers of all deep learning models using one-hot.
18
ROC of dataset 1 ROC of dataset 2 ROC of dataset 3
ROC Curve Classifiers of all deep learning models using Word2Vec.
Performance evaluation of advanced pre-trained Models
19
Model
Datasets
Dataset 1 Dataset 2 Dataset 3
A P R F1-
Score
A P R F1-
Score
A P R F1-Score
BERT .97 .97 .97 .97 1.0 1.0 1.0 1.0 .88 .88 .88 .88
ELECTRA .95 .95 .95 .95 1.0 1.0 1.0 1.0 .81 .81 .81 .81
ALBERT .92 .91 .92 .91 1.0 1.0 1.0 1.0 .85 .85 .85 .85
DeBERTa .93 .92 .93 .92 1.0 1.0 1.0 1.0 .90 .90 .90 .90
SqueezeBER
T
.93 .93 .93 .93 1.0 1.0 1.0 1.0 .81 .81 .81 .81
Performance of advanced pre-trained models
20
BERT ELECTRA ALBERT DeBERTa SqueezeBERT
Dataset 1
494 17
12 517
496 22
21 376
465 51
26 373
488 21
16 515
478 41
14 705
Dataset 2
4282 8
5 4685
4287 2
9 4682
4263 3
3 4711
4286 1
4 4689
4331 6
2 4641
Dataset 3
566 73
80 748
519 125
118 505
552 88
107 520
587 65
60 555
513 110
131 513
Confusion matrix of advanced pre-trained models
21
ROC of dataset 1 ROC of dataset 2 ROC of dataset 3
ROC Curve Classifiers of all advanced pre-trained models.
Conclusion
22
 After analysing these results, some conclusions were found in terms of accuracy, F1 score, precision, recall and
ROC score advanced pre-trained models outperformed the deep learning models.
 In deep leaning Word2Vec word embedding showed better performance as compared to one-hot feature
extraction technique. Among the models of deep leaning, Stacked LSTM performed extremely well in terms of
both accuracy and validation loss.
 More importantly it was found that advanced pre-trained models are robust to the size of the dataset and can
perform significantly better on very small datasets. The performance of deep learning models greatly depends on
the size of the dataset.
 As compared to the advanced pre-trained models deep learning models have a higher probability of overfitting.
In advanced pre-trained models BERT showed better results than other models.
 What we find amazing was that, raw data was given input to these advanced pre-trained models, and yet they
outperformed the deep learning models which had been provided with clean and pre-processed data.
23
 In future work, we will try to implement advanced pre-trained models by training them on one dataset and
testing on other dataset with small difference in their semantic meaning, and further focus on transfer learning in
order to achieve the best performance.
 But the real problem lies in the datasets as there is no proper dataset which carries all kind of fake news, so in
order to propose a fake news model we will need a benchmark dataset which contains different types of fake
news, so that our model can predict any type of fake news correctly.
Future Work
References
24
1. Aman Agarwal, Mamta Mittal, Akshat Pathak and Lalit Mohan Goyal (2020) Fake News Detection Using a
Blend of Neural Networks: An Application of Deep Learning
2. Sebastian Kula, Rafal Kozik and Michal Choras (2021) Implementation of the BERT-derived architectures to
tackle disinformation challenges
3. Georgios Gravanis, Athena I. Vakali, Konstantinos Diamantaras, Panagiotis Karadais (2019) Behind the cues: A
benchmarking study for fake news detection
4. Shlok Gilda (2017) Notice of Violation of IEEE Publication Principles: Evaluating machine learning algorithms
for fake news detection
5. Ray Oshikawa, Jing Qian, William Yang Wang (2018) A Survey on Natural Language Processing for Fake
News Detection
25
THANK YOU

More Related Content

Similar to Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection

Research on Haberman dataset also business required document
Research on Haberman dataset also business required documentResearch on Haberman dataset also business required document
Research on Haberman dataset also business required documentManjuYadav65
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGkevig
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGijnlc
 
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...cscpconf
 
A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...csandit
 
Module 1 - CaseFRAMEWORKS OF INFORMATION SECURITY MANAGEMENT.docx
Module 1 - CaseFRAMEWORKS OF INFORMATION SECURITY MANAGEMENT.docxModule 1 - CaseFRAMEWORKS OF INFORMATION SECURITY MANAGEMENT.docx
Module 1 - CaseFRAMEWORKS OF INFORMATION SECURITY MANAGEMENT.docxroushhsiu
 
IRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSManishReddy706923
 
Detection of Fake Accounts in Instagram Using Machine Learning
Detection of Fake Accounts in Instagram Using Machine LearningDetection of Fake Accounts in Instagram Using Machine Learning
Detection of Fake Accounts in Instagram Using Machine LearningAIRCC Publishing Corporation
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Machine Learning Final presentation
Machine Learning Final presentation Machine Learning Final presentation
Machine Learning Final presentation AyanaRukasar
 
IRJET - Fake News Detection: A Survey
IRJET -  	  Fake News Detection: A SurveyIRJET -  	  Fake News Detection: A Survey
IRJET - Fake News Detection: A SurveyIRJET Journal
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News DetectorIrisYoon5
 
Fraud Detection using Data Mining Project
Fraud Detection using Data Mining ProjectFraud Detection using Data Mining Project
Fraud Detection using Data Mining ProjectAlbert Kennedy III
 
Parallel Programming Approaches for an Agent-based Simulation of Concurrent P...
Parallel Programming Approaches for an Agent-based Simulation of Concurrent P...Parallel Programming Approaches for an Agent-based Simulation of Concurrent P...
Parallel Programming Approaches for an Agent-based Simulation of Concurrent P...Subhajit Sahu
 
Evolution and Influence Measurement Association Information Network
Evolution and Influence Measurement Association Information NetworkEvolution and Influence Measurement Association Information Network
Evolution and Influence Measurement Association Information NetworkEditor IJCATR
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunitiesChouaieb NEMRI
 

Similar to Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection (20)

Research on Haberman dataset also business required document
Research on Haberman dataset also business required documentResearch on Haberman dataset also business required document
Research on Haberman dataset also business required document
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 
wendi_ppt
wendi_pptwendi_ppt
wendi_ppt
 
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
 
A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...
 
Module 1 - CaseFRAMEWORKS OF INFORMATION SECURITY MANAGEMENT.docx
Module 1 - CaseFRAMEWORKS OF INFORMATION SECURITY MANAGEMENT.docxModule 1 - CaseFRAMEWORKS OF INFORMATION SECURITY MANAGEMENT.docx
Module 1 - CaseFRAMEWORKS OF INFORMATION SECURITY MANAGEMENT.docx
 
IRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic Regression
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 
Detection of Fake Accounts in Instagram Using Machine Learning
Detection of Fake Accounts in Instagram Using Machine LearningDetection of Fake Accounts in Instagram Using Machine Learning
Detection of Fake Accounts in Instagram Using Machine Learning
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Machine Learning Final presentation
Machine Learning Final presentation Machine Learning Final presentation
Machine Learning Final presentation
 
IRJET - Fake News Detection: A Survey
IRJET -  	  Fake News Detection: A SurveyIRJET -  	  Fake News Detection: A Survey
IRJET - Fake News Detection: A Survey
 
Fake News Detector
Fake News DetectorFake News Detector
Fake News Detector
 
Fraud Detection using Data Mining Project
Fraud Detection using Data Mining ProjectFraud Detection using Data Mining Project
Fraud Detection using Data Mining Project
 
Fake News Detection
Fake News DetectionFake News Detection
Fake News Detection
 
Parallel Programming Approaches for an Agent-based Simulation of Concurrent P...
Parallel Programming Approaches for an Agent-based Simulation of Concurrent P...Parallel Programming Approaches for an Agent-based Simulation of Concurrent P...
Parallel Programming Approaches for an Agent-based Simulation of Concurrent P...
 
Evolution and Influence Measurement Association Information Network
Evolution and Influence Measurement Association Information NetworkEvolution and Influence Measurement Association Information Network
Evolution and Influence Measurement Association Information Network
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection

  • 1. North Campus, University of Kashmir Delina, Baramulla - 193103 FAKE NEWS DETECTION USING VARIOUS DEEP LEARNING MODELS: A Comparative Analysis Project Members: Urain Ahmad Shah (03) Abdul Wahid (37) Waseem Raja (39) Project Head: Er. Khalid Hussain Asst. Prof. Deptt. of CSE MAJOR PROJECT 1
  • 2.  I N T R O D U C T I O N T O FA K E N E W S  P R O B L E M S TAT E M E N T  M O T I VAT I O N  O B J E C T I V E S  D ATA S E T S  I N T R O D U C T I O N  P R O J E C T F R A M E W O R K  D ATA P R E - P R O C E S S I N G  R E S U LT S A N D P E R F O R M A N C E M E T R I C S  C O N C L U S I O N  F U T U R E W O R K  R E F E R E N C E S 2 Contents
  • 3. 3 Introduction to Fake News  For some years, mostly since the rise of social media, fake news has become a society problem, in some occasion spreading more and faster than the true information.  The fake news has become a major issue which can make people believe things that are false or inaccurate. Stories, that have been completely made up to get people to believe something that is not true. Believing fake news may make people act in a certain way.  The researchers estimate that at least $200 million will go toward fake news in U.S. presidential election (Fake News Creates Real Losses | Institutional Investor).  Back in 2013, the AP’s Twitter account was hacked and published an update about then-US President Barack Obama being injured in an explosion. The tweet wiped out around $130 billion in stock value in a matter of minutes (Extra, Extra: How Fake News Affects the Stock Market | Contracts-For-Difference.com).
  • 4. Problem Statement 4  As we know fake news detection is a very critical topic which comes under natural language processing, but the problem is that no one has highlighted this topic that much. While, in the other fields of natural language processing like language translation, text summarization and speech recognition researchers have reached the top most level. Now there are some research papers published in this field, but the problem still lies as they are focusing only on accuracy. As they should focus on accuracy as well as generalisability and stability of models when the context of the fake news changes.  The main aim of our project is to ensure the generalisability of findings and proposing a best model in order to develop a knowledge-based systems based on artificial intelligence in the detection of fake news.
  • 5. Motivation 5  Trend Micro sees three current major motivations behind fake news: political, financial gain, and character assassination.  Political propaganda is designed to get people to change their mind about their political beliefs or some other opinion. Fake news has a similar intent, but will use falsehoods to manipulate public opinion faster and across a wider audience.  The most obvious financial motivation could be advertising. Social media manipulation can be used to drive traffic to a particular site, in what is already called clickbait. However, Trend also see a danger of using fake news to manipulate share prices.  Character assassination by fake news could have many targets. The most obvious one is the politician. But private individuals are also at risk. For example, Mexican journalists are routinely harassed by Twitter bots under the control of drug cartels.
  • 6. Objectives 6  To design and compare various Deep Learning Models for performing the fake news detection task by using three different data sets and to measure the effect of using feature extraction technique one-hot and word embedding technique Word2Vec.  To design and compare various Advanced pre-trained Models (Transformers) for performing the fake news detection task by using three different datasets.  To analyse the performance of both Deep Learning and Advanced Pre-trained models by various performance metrics and to analyse how they differ in terms of loss and accuracy when trained on different sizes of datasets.
  • 7. Datasets 7 For the sake of this work, we have collected 3 different datasets with different sizes:  A first dataset publicly available on Kaggle, contains 20800 news. The dataset is a collection of the fake and real news of propagated during the time of U.S. General Presidential Election-2016 (Fake News | Kaggle).  A second dataset also publicly available on Kaggle. The dataset contains news of United States of America of the year 2017 with 44898 news (Fake and real news dataset | Kaggle).  A third dataset which is also publicly available on Kaggle, contains articles from credible journalism organizations like the New York Times with 6335 news (Fake News | Kaggle).  All the three datasets are quite balanced.
  • 8. 8
  • 10. Data Pre-processing 10  Data pre-processing is an important part when it comes to natural language processing. It helps to get rid of unhelpful parts of the data, or noise, by converting all the characters to lower case, removing punctuation marks, and removing stop words.  The first step was to drop all the duplicate items, lowercase all the text.  The next step was to remove stop words.  After that, every text was split by white-space and removed suffices from words by stemming them.
  • 11. Results and Performance Metrics 11 • The performance and accuracy metrics are defined with the help of the following performance parameters: 1. True Positive (TP): Number of news the model predicted to be real that are actually real. In other words, the number of correct predictions of real news. 2. True Negative (TN): Numbers of news the model predicted to be fake that are actually fake. In other words, the number of correct predictions of fake news. 3. False Positive (FP): Number of news the model predicted to be real that are actually fake. In other words, the number of incorrect predictions of real news. 4. False Negative (FN): Numbers of news the model predicted to be fake that are actually real. In other words, the number of incorrect predictions of fake news.
  • 12. 12 The evaluation metrics used are defined as follows:  Accuracy: It is obtained by dividing the number of correct predictions by the total number of predictions. It gives the overall performance of the model. Accuracy = TP + TN TP + TN + FP + FN It gives the measure of total correct predictions by the model.  Precision: This metrics gives the ratio between number of real news predicted correctly to the total number of real news. Precision = TP TP + FP  Recall: It gives the true negative rate (TNR). It is the ability of the model to correctly identify the real news. It is also called sensitivity. Sensitivity = TP TP + FN  F1 Score: Also called as F-Measure or F-Score. It calculates harmonic mean of precision and recall. F1 Score is the best if there exists a balance between precision and recall. F1 Score = 2* Precision * Recall Precision + Recall
  • 13. Performance evaluation of Deep Learning Models 13 Model Datasets Dataset 1 Dataset 2 Dataset 3 A P R F1-Score A P R F1-Score A P R F1-Score RNN .90 .90 .90 .90 .94 .94 .94 .94 .76 .76 .76 .76 LSTM .91 .91 .91 .91 .94 .94 .94 .94 .77 .77 .77 .77 Stacked LSTM .92 .92 .92 .92 .93 .93 .93 .93 .76 .77 .76 .76 Bi-LSTM .91 .91 .91 .91 .93 .93 .93 .93 .78 .78 .78 .78 Performance of deep learning models using one-hot technique for feature extraction
  • 14. 14 Model Datasets Dataset 1 Dataset 2 Dataset 3 A P R F1-Score A P R F1-Score A P R F1-Score RNN .89 .90 .90 .89 .94 .94 .94 .94 .57 .58 .57 .57 LSTM .89 .89 .89 .89 .96 .96 .96 .96 .56 .58 .56 .54 Stacked LSTM .91 .91 .92 .91 .96 .96 .96 .96 .54 .54 .54 .53 Bi-LSTM .90 .90 .91 .90 .96 .96 .96 .96 .55 .56 .55 .54 Performance of deep learning models using Word2Vec technique for word embedding
  • 15. 15 RNN LSTM Stacked LSTM Bi-LSTM Dataset 1 One-hot 3093 326 184 2432 3155 264 320 2296 3122 297 198 2418 3083 336 216 2400 Word2Vec 444 72 40 359 468 48 40 359 436 80 1 398 446 70 29 370 Dataset 2 One-hot 6709 330 605 7173 6680 353 557 7221 6577 462 523 7255 6559 480 513 7265 Word2Vec 980 121 18 1126 1080 21 68 1076 1051 50 34 1110 1064 37 46 1098 Dataset 3 One-hot 819 252 275 745 827 244 249 771 856 215 276 744 888 183 324 696 Word2Vec 69 89 50 109 86 72 61 98 59 99 41 118 77 81 63 96 Confusion Matrix of Deep learning models
  • 16. 16 Comparison of validation loss in all deep learning models with different dataset size.
  • 17. 17 ROC of dataset 1 ROC of dataset 2 ROC of dataset 3 ROC Curve Classifiers of all deep learning models using one-hot.
  • 18. 18 ROC of dataset 1 ROC of dataset 2 ROC of dataset 3 ROC Curve Classifiers of all deep learning models using Word2Vec.
  • 19. Performance evaluation of advanced pre-trained Models 19 Model Datasets Dataset 1 Dataset 2 Dataset 3 A P R F1- Score A P R F1- Score A P R F1-Score BERT .97 .97 .97 .97 1.0 1.0 1.0 1.0 .88 .88 .88 .88 ELECTRA .95 .95 .95 .95 1.0 1.0 1.0 1.0 .81 .81 .81 .81 ALBERT .92 .91 .92 .91 1.0 1.0 1.0 1.0 .85 .85 .85 .85 DeBERTa .93 .92 .93 .92 1.0 1.0 1.0 1.0 .90 .90 .90 .90 SqueezeBER T .93 .93 .93 .93 1.0 1.0 1.0 1.0 .81 .81 .81 .81 Performance of advanced pre-trained models
  • 20. 20 BERT ELECTRA ALBERT DeBERTa SqueezeBERT Dataset 1 494 17 12 517 496 22 21 376 465 51 26 373 488 21 16 515 478 41 14 705 Dataset 2 4282 8 5 4685 4287 2 9 4682 4263 3 3 4711 4286 1 4 4689 4331 6 2 4641 Dataset 3 566 73 80 748 519 125 118 505 552 88 107 520 587 65 60 555 513 110 131 513 Confusion matrix of advanced pre-trained models
  • 21. 21 ROC of dataset 1 ROC of dataset 2 ROC of dataset 3 ROC Curve Classifiers of all advanced pre-trained models.
  • 22. Conclusion 22  After analysing these results, some conclusions were found in terms of accuracy, F1 score, precision, recall and ROC score advanced pre-trained models outperformed the deep learning models.  In deep leaning Word2Vec word embedding showed better performance as compared to one-hot feature extraction technique. Among the models of deep leaning, Stacked LSTM performed extremely well in terms of both accuracy and validation loss.  More importantly it was found that advanced pre-trained models are robust to the size of the dataset and can perform significantly better on very small datasets. The performance of deep learning models greatly depends on the size of the dataset.  As compared to the advanced pre-trained models deep learning models have a higher probability of overfitting. In advanced pre-trained models BERT showed better results than other models.  What we find amazing was that, raw data was given input to these advanced pre-trained models, and yet they outperformed the deep learning models which had been provided with clean and pre-processed data.
  • 23. 23  In future work, we will try to implement advanced pre-trained models by training them on one dataset and testing on other dataset with small difference in their semantic meaning, and further focus on transfer learning in order to achieve the best performance.  But the real problem lies in the datasets as there is no proper dataset which carries all kind of fake news, so in order to propose a fake news model we will need a benchmark dataset which contains different types of fake news, so that our model can predict any type of fake news correctly. Future Work
  • 24. References 24 1. Aman Agarwal, Mamta Mittal, Akshat Pathak and Lalit Mohan Goyal (2020) Fake News Detection Using a Blend of Neural Networks: An Application of Deep Learning 2. Sebastian Kula, Rafal Kozik and Michal Choras (2021) Implementation of the BERT-derived architectures to tackle disinformation challenges 3. Georgios Gravanis, Athena I. Vakali, Konstantinos Diamantaras, Panagiotis Karadais (2019) Behind the cues: A benchmarking study for fake news detection 4. Shlok Gilda (2017) Notice of Violation of IEEE Publication Principles: Evaluating machine learning algorithms for fake news detection 5. Ray Oshikawa, Jing Qian, William Yang Wang (2018) A Survey on Natural Language Processing for Fake News Detection