Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection

North Campus, University of Kashmir
Delina, Baramulla - 193103
FAKE NEWS DETECTION USING VARIOUS DEEP
LEARNING MODELS: A Comparative Analysis
Project Members:
Urain Ahmad Shah (03)
Abdul Wahid (37)
Waseem Raja (39)
Project Head:
Er. Khalid Hussain
Asst. Prof. Deptt. of CSE
MAJOR PROJECT
1

 I N T R O D U C T I O N T O FA K E N E W S
 P R O B L E M S TAT E M E N T
 M O T I VAT I O N
 O B J E C T I V E S
 D ATA S E T S
 I N T R O D U C T I O N
 P R O J E C T F R A M E W O R K
 D ATA P R E - P R O C E S S I N G
 R E S U LT S A N D P E R F O R M A N C E M E T R I C S
 C O N C L U S I O N
 F U T U R E W O R K
 R E F E R E N C E S 2
Contents

3
Introduction to Fake News
 For some years, mostly since the rise of social media, fake news has become a society problem, in some occasion spreading
more and faster than the true information.
 The fake news has become a major issue which can make people believe things that are false or inaccurate. Stories, that
have been completely made up to get people to believe something that is not true. Believing fake news may make people
act in a certain way.
 The researchers estimate that at least $200 million will go toward fake news in U.S. presidential election (Fake News
Creates Real Losses | Institutional Investor).
 Back in 2013, the AP’s Twitter account was hacked and published an update about then-US President Barack Obama being
injured in an explosion. The tweet wiped out around $130 billion in stock value in a matter of minutes (Extra, Extra: How
Fake News Affects the Stock Market | Contracts-For-Difference.com).

Problem Statement
4
 As we know fake news detection is a very critical topic which comes under natural language processing, but the problem is
that no one has highlighted this topic that much. While, in the other fields of natural language processing like language
translation, text summarization and speech recognition researchers have reached the top most level. Now there are some
research papers published in this field, but the problem still lies as they are focusing only on accuracy. As they should focus
on accuracy as well as generalisability and stability of models when the context of the fake news changes.
 The main aim of our project is to ensure the generalisability of findings and proposing a best model in order to develop a
knowledge-based systems based on artificial intelligence in the detection of fake news.

Motivation
5
 Trend Micro sees three current major motivations behind fake news: political, financial gain, and character assassination.
 Political propaganda is designed to get people to change their mind about their political beliefs or some other opinion.
Fake news has a similar intent, but will use falsehoods to manipulate public opinion faster and across a wider audience.
 The most obvious financial motivation could be advertising. Social media manipulation can be used to drive traffic to a
particular site, in what is already called clickbait. However, Trend also see a danger of using fake news to manipulate
share prices.
 Character assassination by fake news could have many targets. The most obvious one is the politician. But private
individuals are also at risk. For example, Mexican journalists are routinely harassed by Twitter bots under the control of
drug cartels.

Objectives
6
 To design and compare various Deep Learning Models for performing the fake news detection task by using
three different data sets and to measure the effect of using feature extraction technique one-hot and word
embedding technique Word2Vec.
 To design and compare various Advanced pre-trained Models (Transformers) for performing the fake news
detection task by using three different datasets.
 To analyse the performance of both Deep Learning and Advanced Pre-trained models by various performance
metrics and to analyse how they differ in terms of loss and accuracy when trained on different sizes of datasets.

Datasets
7
For the sake of this work, we have collected 3 different datasets with different sizes:
 A first dataset publicly available on Kaggle, contains 20800 news. The dataset is a collection of the fake and real
news of propagated during the time of U.S. General Presidential Election-2016 (Fake News | Kaggle).
 A second dataset also publicly available on Kaggle. The dataset contains news of United States of America of the
year 2017 with 44898 news (Fake and real news dataset | Kaggle).
 A third dataset which is also publicly available on Kaggle, contains articles from credible journalism
organizations like the New York Times with 6335 news (Fake News | Kaggle).
 All the three datasets are quite balanced.

Data Pre-processing
10
 Data pre-processing is an important part when it comes to natural language processing. It helps to get rid of
unhelpful parts of the data, or noise, by converting all the characters to lower case, removing punctuation marks,
and removing stop words.
 The first step was to drop all the duplicate items, lowercase all the text.
 The next step was to remove stop words.
 After that, every text was split by white-space and removed suffices from words by stemming them.

Results and Performance Metrics
11
• The performance and accuracy metrics are defined with the help of the following performance parameters:
1. True Positive (TP): Number of news the model predicted to be real that are actually real. In other words, the
number of correct predictions of real news.
2. True Negative (TN): Numbers of news the model predicted to be fake that are actually fake. In other words,
the number of correct predictions of fake news.
3. False Positive (FP): Number of news the model predicted to be real that are actually fake. In other words, the
number of incorrect predictions of real news.
4. False Negative (FN): Numbers of news the model predicted to be fake that are actually real. In other words,
the number of incorrect predictions of fake news.

12
The evaluation metrics used are defined as follows:
 Accuracy: It is obtained by dividing the number of correct predictions by the total number of predictions. It gives the
overall performance of the model.
Accuracy = TP + TN
TP + TN + FP + FN
It gives the measure of total correct predictions by the model.
 Precision: This metrics gives the ratio between number of real news predicted correctly to the total number of real news.
Precision = TP
TP + FP
 Recall: It gives the true negative rate (TNR). It is the ability of the model to correctly identify the real news. It is also
called sensitivity.
Sensitivity = TP
TP + FN
 F1 Score: Also called as F-Measure or F-Score. It calculates harmonic mean of precision and recall. F1 Score is the
best if there exists a balance between precision and recall.
F1 Score = 2* Precision * Recall
Precision + Recall

Performance evaluation of Deep Learning Models
13
Model
Datasets
Dataset 1 Dataset 2 Dataset 3
A P R F1-Score A P R F1-Score A P R F1-Score
RNN
.90 .90 .90 .90 .94 .94 .94 .94 .76 .76 .76 .76
LSTM
.91 .91 .91 .91 .94 .94 .94 .94 .77 .77 .77 .77
Stacked LSTM
.92 .92 .92 .92 .93 .93 .93 .93 .76 .77 .76 .76
Bi-LSTM
.91 .91 .91 .91 .93 .93 .93 .93 .78 .78 .78 .78
Performance of deep learning models using one-hot technique for feature extraction

14
Model
Datasets
A P R F1-Score A P R F1-Score A P R F1-Score
RNN
.89 .90 .90 .89 .94 .94 .94 .94 .57 .58 .57 .57
LSTM
.89 .89 .89 .89 .96 .96 .96 .96 .56 .58 .56 .54
Stacked LSTM
.91 .91 .92 .91 .96 .96 .96 .96 .54 .54 .54 .53
Bi-LSTM
.90 .90 .91 .90 .96 .96 .96 .96 .55 .56 .55 .54
Performance of deep learning models using Word2Vec technique for word embedding

15
RNN LSTM Stacked LSTM Bi-LSTM
Dataset 1
One-hot 3093 326
184 2432
3155 264
320 2296
3122 297
198 2418
3083 336
216 2400
Word2Vec 444 72
40 359
468 48
40 359
436 80
1 398
446 70
29 370
Dataset 2
One-hot 6709 330
605 7173
6680 353
557 7221
6577 462
523 7255
6559 480
513 7265
Word2Vec 980 121
18 1126
1080 21
68 1076
1051 50
34 1110
1064 37
46 1098
Dataset 3
One-hot 819 252
275 745
827 244
249 771
856 215
276 744
888 183
324 696
Word2Vec 69 89
50 109
86 72
61 98
59 99
41 118
77 81
63 96
Confusion Matrix of Deep learning models

16
Comparison of validation loss in all deep learning models with different dataset size.

17
ROC of dataset 1 ROC of dataset 2 ROC of dataset 3
ROC Curve Classifiers of all deep learning models using one-hot.

18
ROC Curve Classifiers of all deep learning models using Word2Vec.

Performance evaluation of advanced pre-trained Models
19
Model
Datasets
A P R F1-
Score
A P R F1-
Score
A P R F1-Score
BERT .97 .97 .97 .97 1.0 1.0 1.0 1.0 .88 .88 .88 .88
ELECTRA .95 .95 .95 .95 1.0 1.0 1.0 1.0 .81 .81 .81 .81
ALBERT .92 .91 .92 .91 1.0 1.0 1.0 1.0 .85 .85 .85 .85
DeBERTa .93 .92 .93 .92 1.0 1.0 1.0 1.0 .90 .90 .90 .90
SqueezeBER
T
.93 .93 .93 .93 1.0 1.0 1.0 1.0 .81 .81 .81 .81
Performance of advanced pre-trained models

20
BERT ELECTRA ALBERT DeBERTa SqueezeBERT
Dataset 1
494 17
12 517
496 22
21 376
465 51
26 373
488 21
16 515
478 41
14 705
Dataset 2
4282 8
5 4685
4287 2
9 4682
4263 3
3 4711
4286 1
4 4689
4331 6
2 4641
Dataset 3
566 73
80 748
519 125
118 505
552 88
107 520
587 65
60 555
513 110
131 513
Confusion matrix of advanced pre-trained models

21
ROC Curve Classifiers of all advanced pre-trained models.

Conclusion
22
 After analysing these results, some conclusions were found in terms of accuracy, F1 score, precision, recall and
ROC score advanced pre-trained models outperformed the deep learning models.
 In deep leaning Word2Vec word embedding showed better performance as compared to one-hot feature
extraction technique. Among the models of deep leaning, Stacked LSTM performed extremely well in terms of
both accuracy and validation loss.
 More importantly it was found that advanced pre-trained models are robust to the size of the dataset and can
perform significantly better on very small datasets. The performance of deep learning models greatly depends on
the size of the dataset.
 As compared to the advanced pre-trained models deep learning models have a higher probability of overfitting.
In advanced pre-trained models BERT showed better results than other models.
 What we find amazing was that, raw data was given input to these advanced pre-trained models, and yet they
outperformed the deep learning models which had been provided with clean and pre-processed data.

23
 In future work, we will try to implement advanced pre-trained models by training them on one dataset and
testing on other dataset with small difference in their semantic meaning, and further focus on transfer learning in
order to achieve the best performance.
 But the real problem lies in the datasets as there is no proper dataset which carries all kind of fake news, so in
order to propose a fake news model we will need a benchmark dataset which contains different types of fake
news, so that our model can predict any type of fake news correctly.
Future Work

References
24
1. Aman Agarwal, Mamta Mittal, Akshat Pathak and Lalit Mohan Goyal (2020) Fake News Detection Using a
Blend of Neural Networks: An Application of Deep Learning
2. Sebastian Kula, Rafal Kozik and Michal Choras (2021) Implementation of the BERT-derived architectures to
tackle disinformation challenges
3. Georgios Gravanis, Athena I. Vakali, Konstantinos Diamantaras, Panagiotis Karadais (2019) Behind the cues: A
benchmarking study for fake news detection
4. Shlok Gilda (2017) Notice of Violation of IEEE Publication Principles: Evaluating machine learning algorithms
for fake news detection
5. Ray Oshikawa, Jing Qian, William Yang Wang (2018) A Survey on Natural Language Processing for Fake
News Detection

Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection

Recommended

Recommended

More Related Content

Similar to Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection

Similar to Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection (20)

Recently uploaded

Recently uploaded (20)

Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection