Comparative Analysis of Deep Learning and Advanced Pre-trained Models for Fake News Detection
1. North Campus, University of Kashmir
Delina, Baramulla - 193103
FAKE NEWS DETECTION USING VARIOUS DEEP
LEARNING MODELS: A Comparative Analysis
Project Members:
Urain Ahmad Shah (03)
Abdul Wahid (37)
Waseem Raja (39)
Project Head:
Er. Khalid Hussain
Asst. Prof. Deptt. of CSE
MAJOR PROJECT
1
2. I N T R O D U C T I O N T O FA K E N E W S
P R O B L E M S TAT E M E N T
M O T I VAT I O N
O B J E C T I V E S
D ATA S E T S
I N T R O D U C T I O N
P R O J E C T F R A M E W O R K
D ATA P R E - P R O C E S S I N G
R E S U LT S A N D P E R F O R M A N C E M E T R I C S
C O N C L U S I O N
F U T U R E W O R K
R E F E R E N C E S 2
Contents
3. 3
Introduction to Fake News
For some years, mostly since the rise of social media, fake news has become a society problem, in some occasion spreading
more and faster than the true information.
The fake news has become a major issue which can make people believe things that are false or inaccurate. Stories, that
have been completely made up to get people to believe something that is not true. Believing fake news may make people
act in a certain way.
The researchers estimate that at least $200 million will go toward fake news in U.S. presidential election (Fake News
Creates Real Losses | Institutional Investor).
Back in 2013, the AP’s Twitter account was hacked and published an update about then-US President Barack Obama being
injured in an explosion. The tweet wiped out around $130 billion in stock value in a matter of minutes (Extra, Extra: How
Fake News Affects the Stock Market | Contracts-For-Difference.com).
4. Problem Statement
4
As we know fake news detection is a very critical topic which comes under natural language processing, but the problem is
that no one has highlighted this topic that much. While, in the other fields of natural language processing like language
translation, text summarization and speech recognition researchers have reached the top most level. Now there are some
research papers published in this field, but the problem still lies as they are focusing only on accuracy. As they should focus
on accuracy as well as generalisability and stability of models when the context of the fake news changes.
The main aim of our project is to ensure the generalisability of findings and proposing a best model in order to develop a
knowledge-based systems based on artificial intelligence in the detection of fake news.
5. Motivation
5
Trend Micro sees three current major motivations behind fake news: political, financial gain, and character assassination.
Political propaganda is designed to get people to change their mind about their political beliefs or some other opinion.
Fake news has a similar intent, but will use falsehoods to manipulate public opinion faster and across a wider audience.
The most obvious financial motivation could be advertising. Social media manipulation can be used to drive traffic to a
particular site, in what is already called clickbait. However, Trend also see a danger of using fake news to manipulate
share prices.
Character assassination by fake news could have many targets. The most obvious one is the politician. But private
individuals are also at risk. For example, Mexican journalists are routinely harassed by Twitter bots under the control of
drug cartels.
6. Objectives
6
To design and compare various Deep Learning Models for performing the fake news detection task by using
three different data sets and to measure the effect of using feature extraction technique one-hot and word
embedding technique Word2Vec.
To design and compare various Advanced pre-trained Models (Transformers) for performing the fake news
detection task by using three different datasets.
To analyse the performance of both Deep Learning and Advanced Pre-trained models by various performance
metrics and to analyse how they differ in terms of loss and accuracy when trained on different sizes of datasets.
7. Datasets
7
For the sake of this work, we have collected 3 different datasets with different sizes:
A first dataset publicly available on Kaggle, contains 20800 news. The dataset is a collection of the fake and real
news of propagated during the time of U.S. General Presidential Election-2016 (Fake News | Kaggle).
A second dataset also publicly available on Kaggle. The dataset contains news of United States of America of the
year 2017 with 44898 news (Fake and real news dataset | Kaggle).
A third dataset which is also publicly available on Kaggle, contains articles from credible journalism
organizations like the New York Times with 6335 news (Fake News | Kaggle).
All the three datasets are quite balanced.
10. Data Pre-processing
10
Data pre-processing is an important part when it comes to natural language processing. It helps to get rid of
unhelpful parts of the data, or noise, by converting all the characters to lower case, removing punctuation marks,
and removing stop words.
The first step was to drop all the duplicate items, lowercase all the text.
The next step was to remove stop words.
After that, every text was split by white-space and removed suffices from words by stemming them.
11. Results and Performance Metrics
11
• The performance and accuracy metrics are defined with the help of the following performance parameters:
1. True Positive (TP): Number of news the model predicted to be real that are actually real. In other words, the
number of correct predictions of real news.
2. True Negative (TN): Numbers of news the model predicted to be fake that are actually fake. In other words,
the number of correct predictions of fake news.
3. False Positive (FP): Number of news the model predicted to be real that are actually fake. In other words, the
number of incorrect predictions of real news.
4. False Negative (FN): Numbers of news the model predicted to be fake that are actually real. In other words,
the number of incorrect predictions of fake news.
12. 12
The evaluation metrics used are defined as follows:
Accuracy: It is obtained by dividing the number of correct predictions by the total number of predictions. It gives the
overall performance of the model.
Accuracy = TP + TN
TP + TN + FP + FN
It gives the measure of total correct predictions by the model.
Precision: This metrics gives the ratio between number of real news predicted correctly to the total number of real news.
Precision = TP
TP + FP
Recall: It gives the true negative rate (TNR). It is the ability of the model to correctly identify the real news. It is also
called sensitivity.
Sensitivity = TP
TP + FN
F1 Score: Also called as F-Measure or F-Score. It calculates harmonic mean of precision and recall. F1 Score is the
best if there exists a balance between precision and recall.
F1 Score = 2* Precision * Recall
Precision + Recall
13. Performance evaluation of Deep Learning Models
13
Model
Datasets
Dataset 1 Dataset 2 Dataset 3
A P R F1-Score A P R F1-Score A P R F1-Score
RNN
.90 .90 .90 .90 .94 .94 .94 .94 .76 .76 .76 .76
LSTM
.91 .91 .91 .91 .94 .94 .94 .94 .77 .77 .77 .77
Stacked LSTM
.92 .92 .92 .92 .93 .93 .93 .93 .76 .77 .76 .76
Bi-LSTM
.91 .91 .91 .91 .93 .93 .93 .93 .78 .78 .78 .78
Performance of deep learning models using one-hot technique for feature extraction
14. 14
Model
Datasets
Dataset 1 Dataset 2 Dataset 3
A P R F1-Score A P R F1-Score A P R F1-Score
RNN
.89 .90 .90 .89 .94 .94 .94 .94 .57 .58 .57 .57
LSTM
.89 .89 .89 .89 .96 .96 .96 .96 .56 .58 .56 .54
Stacked LSTM
.91 .91 .92 .91 .96 .96 .96 .96 .54 .54 .54 .53
Bi-LSTM
.90 .90 .91 .90 .96 .96 .96 .96 .55 .56 .55 .54
Performance of deep learning models using Word2Vec technique for word embedding
21. 21
ROC of dataset 1 ROC of dataset 2 ROC of dataset 3
ROC Curve Classifiers of all advanced pre-trained models.
22. Conclusion
22
After analysing these results, some conclusions were found in terms of accuracy, F1 score, precision, recall and
ROC score advanced pre-trained models outperformed the deep learning models.
In deep leaning Word2Vec word embedding showed better performance as compared to one-hot feature
extraction technique. Among the models of deep leaning, Stacked LSTM performed extremely well in terms of
both accuracy and validation loss.
More importantly it was found that advanced pre-trained models are robust to the size of the dataset and can
perform significantly better on very small datasets. The performance of deep learning models greatly depends on
the size of the dataset.
As compared to the advanced pre-trained models deep learning models have a higher probability of overfitting.
In advanced pre-trained models BERT showed better results than other models.
What we find amazing was that, raw data was given input to these advanced pre-trained models, and yet they
outperformed the deep learning models which had been provided with clean and pre-processed data.
23. 23
In future work, we will try to implement advanced pre-trained models by training them on one dataset and
testing on other dataset with small difference in their semantic meaning, and further focus on transfer learning in
order to achieve the best performance.
But the real problem lies in the datasets as there is no proper dataset which carries all kind of fake news, so in
order to propose a fake news model we will need a benchmark dataset which contains different types of fake
news, so that our model can predict any type of fake news correctly.
Future Work
24. References
24
1. Aman Agarwal, Mamta Mittal, Akshat Pathak and Lalit Mohan Goyal (2020) Fake News Detection Using a
Blend of Neural Networks: An Application of Deep Learning
2. Sebastian Kula, Rafal Kozik and Michal Choras (2021) Implementation of the BERT-derived architectures to
tackle disinformation challenges
3. Georgios Gravanis, Athena I. Vakali, Konstantinos Diamantaras, Panagiotis Karadais (2019) Behind the cues: A
benchmarking study for fake news detection
4. Shlok Gilda (2017) Notice of Violation of IEEE Publication Principles: Evaluating machine learning algorithms
for fake news detection
5. Ray Oshikawa, Jing Qian, William Yang Wang (2018) A Survey on Natural Language Processing for Fake
News Detection