1. DEEP LEARNING FOR
DETECTING
CYBERBULLYING ACROSS
MULTIPLE SOCIAL MEDIA
PLATFORMS
Sweta Agrawal, Amit Awekar
Indian Institute of Technology,
Guwahati
19 Jan 2018
Kamel Ben Kmala10/12/2018
3. INTRODUCTION
•What is cyberbullying ?
•Between 10% and 40% of internet users
are victims of cyberbullying
• Definition of what constitutes
cyberbullying is quite subjective
4. Past Works VS This Work
Target only one social
media platform
Only one topic of
cyberbullying
Rely on handcrafted
features
Targets three different
types of social networks
(Formspring, Twitter,
Wikipedia)
Three topics (personal
attack, racism and
sexism)
No feature engineering
Past Works This Work
11. Conclusion
DNN models can be used for cyberbullying
detection on various topics across multiple
SMPs using three datasets and four DNN
models
Cyberbullying detection models can be further
improved to take a variety of actions depending
on the perceived seriousness of the posts.
Cyberbullying has been defined by the National Crime Prevention Council as the use of the Internet, cell phones or other devices to send or post text or images intended to hurt or embarrass another person.
Detection of cyberbullying in social media is a challenging task. Definition of what constitutes cyberbullying is quite subjective. For example, frequent use of swear words might be considered as bullying by the general population. However, for teen oriented social media platforms such as Formspring, this does not necessarily mean bullying.
Without doing any feature engineering by developing deep learning based models along with transfer learning.
Twitter dataset contains examples of racism and sexism. Wikipedia dataset contains examples of personal attack. However, Formspring dataset is not specifically about any single topic.
……………………………………………………………………………………………
//// FormSpring: ***The dataset includes 12K annotated question and answer pairs. Among these pairs, 825 were labeled as containing cyberbullying.
///// Twitter: ****This dataset includes 16K annotated tweets. 3117 are labeled as sexist, 1937 as racist, and the remaining are marked as neither sexist nor racist.
//// Wikipedia: **** This data set includes over 100k labeled discussion comments from English Wikipedia’s talk pages. There are total 13590 comments labeled as personal attack.
Four DNN based models were experimented for cyberbullying detection: CNN, LSTM, BLSTM, and BLSTM with attention.
These models are listed in the increasing complexity of their neural architecture and amount of information used by these models.
CNNs recently used for sentiment classification
-Long Short Term Memory networks are a special kind of RNN, capable of learning long-term dependencies.
-Bidirectional LSTMs further increase the amount of input information available to the network by encoding information in both forward and backward direction.
-Attention mechanisms allow for a more direct dependence between the states of the model at different points in time.
This is the general architecture that was used across four models. Various models differ only in the Neural Architecture layer while having identical rest of the layers.
……………………………………………
-The embedding layer processes a fixed size sequence of words. Each word is represented as a real-valued vector, also known as word embeddings. Three methods were experimented for initializing word embeddings: random, GloVe, and SSWE.
-To avoid overfitting, two dropout layers were used, one before the neural architecture layer and one after, with dropout rates of 0.25 and 0.5 respectively.
This table shows the Results for Traditional ML Models Using F1 Score
Four models: logistic regression (LR), support vector machine (SVM), random forest (RF), and naive Bayes (NB), as these are used in previous works
As compared to DNN models, performance of all four traditional machine learning models was significantly lower.
The training datasets had a major problem of class imbalance with posts marked as bullying in the minority. As a result, all models were biased towards labeling the posts as non-bullying. To remove this bias, we oversampled the data from bullying class thrice. That is, we replicated bullying posts thrice in the training data.
Oversampling particularly helps the smallest dataset Formspring where number of training instances for bullying class is quite small (825).
– Datasets: F (Formspring), T (Twitter), W (Wikipedia)
– Datasets with oversampling of bullying posts: F+ (Formspring), T+ (Twitter), W+ (Wikipedia)
– Evaluation measures: P (Precision), R (Recall), F1 (F1 score)
This significantly improved the performance of all DNN models with major leap in all three evaluation measures. This Table shows the effect of oversampling for a variety of word embedding methods with BLSTM Attention as the detection model.