Capitol Tech U Doctoral Presentation - April 2024.pptx
Generating Fake News Detection Model Using A Two-Stage Evolutionary Approach.docx
1. Base paper Title: Generating Fake News Detection Model Using A Two-Stage Evolutionary
Approach
Modified Title: Creating a Two-Stage Evolutionary Approach to Detecting Fake News
Abstract
While fake news is morally reprehensible, irresponsible parties intentionally use it to
achieve their goals by disseminating it to vulnerable and targeted groups. Machine learning
techniques have been researched extensively to detect fake news. On the other hand,
evolutionary-based algorithms are now gaining popularity in the research community. In this
study, a two-stage evolutionary approach is proposed to generate and optimize a mathematical
equation for fake news detection. In the first stage, tree-based Genetic Programming (GP)
algorithm is used to generate mathematical expressions to detect correlations between the
language-independent (Lang-IND) features, extracted from Fake.my-COVID19 dataset, the
newly curated fake news dataset in a mixed Malay - English language. The uniqueness of the
proposed approach is that the mathematical expressions are formed by basic arithmetic
operators or to include complex arithmetic operators such as addition, multiplication,
subtraction, division, square, abs, log1p, sign, square root, and exponential together with Lang-
IND features as the variables. Prior to second stage of the evolutionary approach, a sensitivity
analysis is applied to shorten the best equation while maintaining the F1-score performance. In
the second stage, an Adaptive Differential Evolution (ADE), is used to fine-tune the
mathematical model. The experimental results conclude that the proposed two-stage
evolutionary approach can be applied in fake news detection and the model can learn to predict
using the Lang-IND features. Results from the first stage shows that the equation from GP
scores F1-score of 83.23% on Fake.my-COVID19 dataset using complex arithmetic operators
and at tree depth of 8. After the fine-tuning stage, the model performance increases the F1-
score to 84.44%. The performance of the proposed two-stage evolutionary approach
outperforms the baseline performance of six commonly-used machine learning algorithms,
with Random Forest having the highest F1-score of 84.07%. The mathematical model is also
tested separately on two other unseen datasets of different domain topic or language and
achieves acceptable F1-scores.
2. Existing System
As digital technology advances, people tend to spread unreliable news or fake news to
their online contacts without verifying. Fake news can be detrimental in some circumstances.
A study mentions that fake news is a phenomenon that has a direct impact on how anxiety,
panic, despair, fear, exhaustion, psychological distress, and emotional overload develop in
people of all ages [1]. In COVID-19 case, it may cause distrusts in Governments, researchers,
and health professionals, which indirectly impact the public decision, such as the mandatory
vaccination uptake [2]. In addition to political and health implications, fake news can have
impact on businesses, thus appropriate marketing strategies should be formulated in advance
[3]. Governments from all over the world have been battling the COVID-19 Infodemic in
addition to the COVID-19 pandemic itself. However, multilingual nations like Malaysia may
find this to be more difficult. Malaysian people speak 137 languages [4] which make fake news
detection more difficult. To overcome this issue, the Malaysian Communications and
Multimedia Commission (MCMC) launched sebenarnya.my portal to allow users to verify
unconfirmed news before spreading it on social media, instant messaging, blogs, websites, and
other online platforms. Through this portal, it is expected that Malaysia would be able to stop
the propagation of fake news successfully. To date, the majority of research publications on
fake news detection focus on English language and relatively few focus on the low resource
languages. The absence of datasets and the Natural Language Processing (NLP) tools designed
to interpret them is the cause of the low resource languages. With the explosion of fake news
created during the COVID-19 pandemic, the urgency increases in exploring these low resource
languages in fake news detection
Drawback in Existing System
Limited Interpretability:
Evolutionary algorithms may produce models that are hard to interpret or explain.
This lack of transparency can be a significant drawback, especially in applications
where understanding the decision-making process is crucial, such as in legal or ethical
contexts.
Limited Transferability:
Models generated through an evolutionary approach may be highly tailored to the
specific training data and may not generalize well to different datasets or domains.
This lack of transferability can limit the model's practical use.
3. Dynamic Nature of Fake News:
Fake news is dynamic, with evolving tactics. An evolutionary approach may
struggle to adapt quickly to emerging trends or changes in the characteristics of fake
news.
Scalability Issues:
The approach may face scalability issues when dealing with large datasets or when
deployed in real-time applications. Ensuring the model's efficiency and
responsiveness becomes crucial in such scenarios.
Proposed System
Proposed two-stage evolutionary approach outperforms the baseline performance of six
commonly-used machine learning algorithms, with Random Forest having the highest
F1-score of 84.07%.
Proposed a hybrid multi-thread metaheuristic approach that runs three different swarm-
based metaheuristic algorithms, GWO, Particle Swarm Optimization (PSO) and
Dragonfly Optimization algorithms in parallel for fake news detection.
Proposed a fake news detection model using deep learning technique named
Bidirectional LSTM (Bi-LSTM), a sequence of two LSTM with one taking the input in
a forward direction and the other in a backward direction.
Proposed a two-stage evolutionary approach to detect fake news. Based on the
experimental results, tree-based GP can learn from Fake.my-COVID19 dataset and
generate best equation using complex operators with the F1-score of 83.23%.
Algorithm
Initialization:
Begin by defining the initial population of candidate models. These models could
have varying architectures, hyperparameters, or feature representations.
Randomly generate a set of diverse individuals within the population.
Retraining:
Take the best-performing models from the first stage and retrain them on the
training dataset. This can involve further optimizing weights, biases, or other
parameters.
4. Monitoring and Updating:
Monitor the model's performance over time and update it as needed to adapt to
changes in fake news characteristics.
Advantages
Global Optimization:
Evolutionary algorithms aim to find globally optimal solutions rather than getting
stuck in local optima. This can enhance the chances of discovering high-performing
models for fake news detection across different aspects of the problem space.
Adaptability:
Evolutionary algorithms can adapt to changes in the characteristics of fake news
over time. By evolving models in stages, the approach may be more resilient to
emerging trends or evolving tactics used in spreading fake news.
Reduced Human Bias:
The automated nature of the evolutionary process reduces the potential for human
bias in the model development phase. It allows the algorithm to objectively explore
and evaluate a wide range of possibilities without preconceived notions.
Dynamic Learning:
The model can evolve dynamically, adapting to changes in the prevalence and nature
of fake news. This is especially important given the constantly evolving landscape of
misinformation.
Software Specification
Processor : I3 core processor
Ram : 4 GB
Hard disk : 500 GB
Software Specification
Operating System : Windows 10 /11
Frond End : Python
Back End : Mysql Server
IDE Tools : Pycharm