This project focuses on building a multi-label classification model to detect different types of toxicity in Wikipedia comments. The dataset is from https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge.
Many models built by Kaggle participants center around word embeddings and deep learning models. We would like to observe if complex approaches of word embeddings coupled with deep learning pose any significant improvement in accuracy over simpler approaches.
Hence, we explored traditional approaches for vectorization using Bag-of-Words (BoW) and Term Frequency – Inverse Document Frequency (TF-IDF), as well as newer approaches like word embeddings.
We have deployed a range of classification methods, from simpler classification methods like Naïve Bayes, Logistic Regression, Support Vector Machines, to more complex methods of Neural Networks and CNN.
3. Problem Statement
Background
Online discussion is an integral aspect of social
media platforms but it has become an avenue
for abuse and harassment.
Hi! I am back again! Last warning!
Stop undoing my edits or die!
Severe
Toxic
Threat
Obscene
Insult
Identity-
Hate
Toxic
6 Classes
Objective
Build a multi-label classification
model to detect different types of
toxicity for each comment.
4. • Kaggle Competition Dataset
• 150k comments from Wikipedia Talk pages
• 6 Classes of Toxicity
• Toxic
• Severe toxic
• Obscene
• Threat
• Insult
• Identity hate
Dataset
Dataset can be downloaded from: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview
5. Dataset
id comment_text toxic severe
_toxic
obscene threat insult identity
_hate
1 Hi! I am back again! Last warning!
Stop undoing my edits or die!
1 0 0 1 0 0
2 Gays are disgusting. It's just my
opinion but gays are disgusting.
1 0 0 0 1 1
3 IM GOING TO KILL YOU ALL!!
I'm serious, you are all mean nasty
idiots who deserve to die and I'm
going to throw every single one of
you in a shredder!!!!!!!!!
1 1 0 1 1 0
6 classes
Sample Records
6. 15,294
8,449 7,877
1,595 1,405
478
Toxic Obscene Insult Severe
Toxic
Identity
Hate
Threat
Class distribution• About 90% of comments
are clean.
• Word clouds for all
toxic labels look very
similar to each other.
Word cloud for toxic comments Word cloud for clean comments
EDA
8. Initial Approach: Results
Results of 6-label Classification*
* with subset of ~50k records
Why???
Let’s look at Confusion Matrix…
High Ave.
Accuracy 96%
but
very low Micro-Recall
and Micro-F1 score
avg_accuracy = sum(accuracy) / no_of_labels
micro_precision = sum(TP) / (sum(TP) + sum(FP))
micro_recall = sum(TP) / (sum(TP) + sum(FN))
micro_f1_score = (2 * micro_precision * micro_recall) / (micro_precision + micro_recall)
9. Initial Approach: Challenges
High Accuracy but High False Negatives
Embeddings + Logistic Regression Embeddings + Neural Network
• High FN is an issue, we want it to be low
• High accuracy is driven by imbalance data
(90% clean)
How do we deal
with these issues?
11. Revised Approach
1. Down-sampling 60:40 to deal with imbalanced data
90%
10%
5% 5%
1% 1% 0%
Clean Toxic Obscene Insult Identity Hate Severe Toxic Theat
Before Down-Sampling
59%
39%
22% 21%
4% 4%
1%
Clean Toxic Obscene Insult Identity Hate Severe Toxic Theat
After Down-Sampling
12. Revised Approach
Probability Matrix
2. Conditional Probabilities of Each Class
The other 5 labels are
highly dependent on Toxic.
Therefore, this leads us to use a
2-level classification approach.
P(Toxic | Severe Toxic) = 100%
P(Toxic | Obscene) = 95%
P(Toxic | Insult) = 95%
P(Toxic | Identity Hate) = 94%
P(Toxic | Threat) = 95%
13. Revised Approach: Flow Chart
2-level Classification Approach
Down Sampling
Feature
Engineering
1st
Level
Pass only comments that
are predicted as toxic
Binary Classifier
Toxic Clean
Multi-label Classifier
2nd
Level
Severe
Toxic
Threat Obscene
Insult
Identity-
Hate
16. Word-level Features
Pre-processing Steps:
• Retained case structure (lower case, upper case)
• Removed URLs, HTML tags, stop words
• Removed punctuation, except for “!”
• Lemmatization
Vectorization Methods:
• Bag of Words (BoW)
• Term Frequency Inverse Document Frequency (TF-IDF)
• Binary Representation of each Word
Word Embeddings:
• Pre-trained GloVe word embeddings
Revised Approach: Feature Engineering
Total length Number of unique words
Comment-level Features
• Word count
• Character count
• Word density (Word count / Character count)
• Total length
• Capitals
• Proportion of capitals
• Number of exclamation marks
• Number of unique words
• Proportion of unique words
18. Revised Approach: Model Results
Best Performing Model: Embeddings + CNN
Model Selected: TF-IDF + SVM
Reasons:
• Less complex model at a cost of < 0.5% reduction in
f1-score.
• Simpler features (TF-IDF vs. Embeddings)
• Greater interpretability of model’s results as feature
importance can easily be extracted with SVM unlike
NN/CNN
1st level Binary Classification
Accuracy Precision Recall F1-Score
BoW + Naïve Bayes 87.53% 87.76% 87.53% 87.59%
TF-IDF + SVM 88.74% 88.70% 88.74% 88.70%
Binary + Logistic Regression 88.35% 88.31% 88.35% 88.30%
TF-IDF + NN 88.86% 88.81% 88.86% 88.81%
TF-IDF + CNN 88.06% 88.01% 88.06% 88.02%
Embeddings + CNN 89.20% 89.20% 89.10% 89.10%
Model
Level 1: Binary Label
Comparing Between Initial and Final Approach:
• Reduction in False Negatives (predicted as Clean
when actually Toxic)
• Improvement in True Positives (predicted as Toxic
when actually Toxic)
Initial Approach
Embeddings + Logistic
Regression
Final Approach
TF-IDF + SVM
0 1
0 6173 556 6729
1 680 3565 4245
Predicted
Actual
0 1
Actual 0 9203 8 9211
1 777 204 981
Predicted
19. Revised Approach: Model Results
2nd level Multi-label Classification
Best Performing Model: Embeddings + CNN
Model Selected: Binary + Logistic Regression
Reasons:
• Less complex model at a cost of < 1% reduction in micro
f1-score
• Simpler features (TF-IDF vs. Embeddings)
• Greater interpretability of model’s results as feature
importance can easily be extracted with logistic
regression.
Average
Accuracy
Micro-
Precision
Micro-
Recall
Micro-F1-
Score
BoW + Naïve Bayes 82.76% 63.15% 72.14% 67.35%
TFIDF + SVM 87.11% 78.21% 66.13% 71.66%
Binary + Logistic Regression 87.33% 77.29% 68.79% 72.79%
TFIDF + NN 85.38% 74.53% 61.78% 67.56%
TFIDF + CNN 86.21% 73.99% 67.95% 70.84%
Embeddings + CNN 86.90% 72.70% 74.70% 73.40%
Model
Level 2: Multi-Label
20. Revised Approach: Model Results
2nd level Multi-label Classification
Comments:
• Multi-label classification produced results with low
precision and recall due to certain labels having
significantly less records.
• For these 3 labels, False Negatives are high, ie. predicted
as Clean when actually are Severe Toxic / Threat / Identity
Hate.
Label Class Precision Recall F1-score No. of Records
Severe Toxic 1 0.59 0.23 0.33 396
Threat 1 0.59 0.35 0.44 117
Identity Hate 1 0.67 0.44 0.53 383
Insult 1 0.75 0.72 0.73 2118
Obscene 1 0.83 0.80 0.81 2256
Severe Toxic
0 1
0 3818 63 3881
1 306 90 396
Threat
0 1
0 4131 29 4160
1 76 41 117
Identity Hate
0 1
0 3811 83 3894
1 213 170 383
Predicted
Actual
Predicted
Actual
Predicted
Actual
26. Revised Approach: Examples
Comment True Labels Predicted Labels
Hello, you disgusting
transsexual.
You are a grotesque
blight upon the earth.
Please kill yourself.
Toxic, Obscene,
Threat, Identity Hate,
Insult
Toxic, Obscene,
Threat, Identity Hate,
Insult
Wtf, Why You deleting
Maashel? Bad things
will happen to you now.
Toxic, Threat Toxic
am going to shoot you
in the head and laugh
as your brains splatter
onto the ground.
Toxic, Severe Toxic,
Threat
Toxic, Threat