A Review on Deep-Learning-Based Cyberbullying Detection
1. Base paper Title: A Review on Deep-Learning-Based Cyberbullying Detection
Modified Title: A Review of Cyberbullying Detection Using Deep Learning
Abstract
Bullying is described as an undesirable behavior by others that harms an individual
physically, mentally, or socially. Cyberbullying is a virtual form (e.g., textual or image) of
bullying or harassment, also known as online bullying. Cyberbullying detection is a pressing
need in today’s world, as the prevalence of cyberbullying is continually growing, resulting in
mental health issues. Conventional machine learning models were previously used to identify
cyberbullying. However, current research demonstrates that deep learning surpasses traditional
machine learning algorithms in identifying cyberbullying for several reasons, including
handling extensive data, efficiently classifying text and images, extracting features
automatically through hidden layers, and many others. This paper reviews the existing surveys
and identifies the gaps in those studies. We also present a deep-learning-based defense
ecosystem for cyberbullying detection, including data representation techniques and different
deep-learning-based models and frameworks. We have critically analyzed the existing DL-
based cyberbullying detection techniques and identified their significant contributions and the
future research directions they have presented. We have also summarized the datasets being
used, including the DL architecture being used and the tasks that are accomplished for each
dataset. Finally, several challenges faced by the existing researchers and the open issues to be
addressed in the future have been presented.
Existing System
Bully that occurs through the Internet is called cyberbullying, or cyber harassment [1].
There are different forms of cyberbullying that we can observe nowadays. For example, writing
indecent textual content and sharing inappropriate visual content, e.g., memes. Social media
platforms such as Facebook, Instagram, Twitter, etc. have made it easier for us to create
content, interact with others and connect with others. However, unfiltered exchange of message
content and the missing protection of private information can lead to bullying on different
social media platforms. Cyberbullies could be in any form, including flames, vitriolic
comments, sending offensive emails, humiliating pictures, mean remarks made by comments,
and harassing others by posting on blogs or social media. Bullies may bring severe
2. consequences such as depression, which may even lead people to commit suicide. Detecting
cyberbullying is important to stop the threatening problem. Detection of cyberbullying is a
difficult task due to the lack of identifiable parameters and the absence of a quantifiable
standard. These contents are short, noisy, and unstructured, with incorrect spelling and
symbols. Sometimes users intentionally obfuscate the words or phrases (e.g., b***h, a**, etc.)
in the sentence to deceive automatic detection. Researchers use traditional machine learning
(ML) algorithms to identify cyberbullying (i.e., text and image format), whereas the majority
of the existing solutions are based on supervised learning methods. Due to the subjective nature
of bully expressions, traditional ML models perform lower in detecting cyber harassment than
the deep learning (DL)-based approaches. A recent study shows that DL models outperform
traditional ML algorithms regarding cyberbullying identification. Deep Neural Networks such
as Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), Long Short-term Memory
(LSTM), Bi-LSTM and several other DL models can be used to detect this problem.
Drawback in Existing System
Data Bias and Imbalance: The availability of biased or imbalanced datasets might
lead to biased model outcomes, affecting the system's ability to accurately detect
cyberbullying across diverse contexts, languages, or demographics.
Complex Model Interpretability: Deep learning models, particularly complex neural
networks, often lack interpretability, making it challenging to understand how and why
a model classified specific content as cyberbullying. This lack of transparency can
hinder trust in the system's decisions.
Contextual Understanding: Understanding the context, sarcasm, slang, or subtle
nuances in language is challenging for deep learning models. They may misinterpret or
fail to detect subtle forms of cyberbullying, leading to false positives or false negatives.
Generalization Across Platforms and Languages: Models developed on specific
social media platforms or languages might not generalize well to others due to
differences in communication styles, user behavior, and platform-specific features.
Proposed System
3. Data Collection and Annotation: Gather diverse and annotated datasets containing
examples of cyberbullying across multiple platforms, languages, and demographics.
Deep Learning Models: Develop deep neural network architectures (e.g., recurrent
neural networks, convolutional neural networks, transformers) for cyberbullying
detection.
Real-time Detection and Monitoring: Enable real-time monitoring and detection of
cyberbullying content by analyzing text inputs from users.
Privacy and Ethical Considerations: Ensure compliance with privacy regulations and
ethical guidelines when handling users' content and personal information.
Algorithm
Convolutional Neural Networks (CNNs): Explain how CNNs are utilized for text
classification tasks in cyberbullying detection, emphasizing their ability to capture local
patterns in text data.
Recurrent Neural Networks (RNNs): Discuss the application of RNNs, especially
LSTM and GRU variants, in capturing sequential dependencies and contextual
information in cyberbullying detection.
Transformers and Attention Mechanisms: Explore the effectiveness of transformer
architectures (e.g., BERT, GPT) and attention mechanisms in understanding context
and semantics for cyberbullying detection.
Advantages
Complex Pattern Recognition: Deep learning models excel in recognizing intricate
patterns within textual data, allowing them to identify nuanced and subtle forms of
cyberbullying that might evade traditional methods.
Semantic Understanding: Deep learning algorithms, especially those employing
transformer architectures, possess the ability to comprehend contextual meaning,
sarcasm, and the tone of text, enhancing the detection accuracy by understanding the
intent behind messages.
Feature Learning: Deep learning models autonomously learn meaningful
representations from raw data, bypassing the need for manual feature engineering. This
aids in capturing intricate linguistic features crucial for cyberbullying detection.
4. Software Specification
Processor : I3 core processor
Ram : 4 GB
Hard disk : 500 GB
Software Specification
Operating System : Windows 10 /11
Frond End : Python
Back End : Mysql Server
IDE Tools : Pycharm