Toxic Comment Classification using Neural Network and Machine Learning

SMU Master of IT in Business
ISSS610 APPLIED MACHINE LEARNING
Project Group 5:
Ayushi JAISWAL
Milouni DESAI
SOH Hui Shan
TEO Yaling
Toxic Comment Classification

Agenda
• Introduction
- Problem Statement
- Dataset
- EDA
- Overall Approach
• Initial Approach
- Challenges
• Revised Approach
- Flow Chart
- Feature Engineering
- Model Results & Feature Importance
- Examples

Problem Statement
Background
Online discussion is an integral aspect of social
media platforms but it has become an avenue
for abuse and harassment.
Hi! I am back again! Last warning!
Stop undoing my edits or die!
Severe
Toxic
Threat
Obscene
Insult
Identity-
Hate
Toxic
6 Classes
Objective
Build a multi-label classification
model to detect different types of
toxicity for each comment.

• Kaggle Competition Dataset
• 150k comments from Wikipedia Talk pages
• 6 Classes of Toxicity
• Toxic
• Severe toxic
• Obscene
• Threat
• Insult
• Identity hate
Dataset
Dataset can be downloaded from: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview

Dataset
id comment_text toxic severe
_toxic
obscene threat insult identity
_hate
1 Hi! I am back again! Last warning!
Stop undoing my edits or die!
1 0 0 1 0 0
2 Gays are disgusting. It's just my
opinion but gays are disgusting.
1 0 0 0 1 1
3 IM GOING TO KILL YOU ALL!!
I'm serious, you are all mean nasty
idiots who deserve to die and I'm
going to throw every single one of
you in a shredder!!!!!!!!!
1 1 0 1 1 0
6 classes
Sample Records

15,294
8,449 7,877
1,595 1,405
478
Toxic Obscene Insult Severe
Toxic
Identity
Hate
Threat
Class distribution• About 90% of comments
are clean.
• Word clouds for all
toxic labels look very
similar to each other.
Word cloud for toxic comments Word cloud for clean comments
EDA

Approach
Feature
Engineering
Preprocessing
EDA
Modelling
Evaluation
Validation &
Modelling
Naïve Bayes,
Convolutional Neural
Networks (CNN),
Long Short-Term
Memory Networks
(LSTM)
Text Preprocessing
Remove punctuation,
Lemmatization,
Remove stopwords
Feature Engineering
Bag of Words, TF-IDF,
Word Embeddings
EDA
Data Visualization of
Distributions and
Word Clouds
Validation & Evaluation
Precision, Recall,
F1-score, Accuracy

Initial Approach: Results
Results of 6-label Classification*
* with subset of ~50k records
Why???
Let’s look at Confusion Matrix…
High Ave.
Accuracy 96%
but
very low Micro-Recall
and Micro-F1 score
avg_accuracy = sum(accuracy) / no_of_labels
micro_precision = sum(TP) / (sum(TP) + sum(FP))
micro_recall = sum(TP) / (sum(TP) + sum(FN))
micro_f1_score = (2 * micro_precision * micro_recall) / (micro_precision + micro_recall)

Initial Approach: Challenges
High Accuracy but High False Negatives
Embeddings + Logistic Regression Embeddings + Neural Network
• High FN is an issue, we want it to be low
• High accuracy is driven by imbalance data
(90% clean)
How do we deal
with these issues?

Revised Approach
1. Down-sampling 60:40 to deal with imbalanced data
90%
10%
5% 5%
1% 1% 0%
Clean Toxic Obscene Insult Identity Hate Severe Toxic Theat
Before Down-Sampling
59%
39%
22% 21%
4% 4%
1%
Clean Toxic Obscene Insult Identity Hate Severe Toxic Theat
After Down-Sampling

Revised Approach
Probability Matrix
2. Conditional Probabilities of Each Class
The other 5 labels are
highly dependent on Toxic.
Therefore, this leads us to use a
2-level classification approach.
P(Toxic | Severe Toxic) = 100%
P(Toxic | Obscene) = 95%
P(Toxic | Insult) = 95%
P(Toxic | Identity Hate) = 94%
P(Toxic | Threat) = 95%

Revised Approach: Flow Chart
2-level Classification Approach
Down Sampling
Feature
Engineering
1st
Level
Pass only comments that
are predicted as toxic
Binary Classifier
Toxic Clean
Multi-label Classifier
2nd
Level
Severe
Toxic
Threat Obscene
Insult
Identity-
Hate

Revised Approach: Flow Chart
Down Sampling
Feature
Engineering
Binary Classifier
Toxic Clean
Multi-label Classifier
Severe
Toxic
Threat Obscene
Insult
Identity-
Hate
Accuracy
Precision
Recall
F1-score
Average Accuracy
Micro - Precision
Micro - Recall
Micro - F1-score
Overall
Accuracy
1st
Level
2nd
Level
2-level Classification Approach

Revised Approach:
Feature Engineering

Word-level Features
Pre-processing Steps:
• Retained case structure (lower case, upper case)
• Removed URLs, HTML tags, stop words
• Removed punctuation, except for “!”
• Lemmatization
Vectorization Methods:
• Bag of Words (BoW)
• Term Frequency Inverse Document Frequency (TF-IDF)
• Binary Representation of each Word
Word Embeddings:
• Pre-trained GloVe word embeddings
Revised Approach: Feature Engineering
Total length Number of unique words
Comment-level Features
• Word count
• Character count
• Word density (Word count / Character count)
• Total length
• Capitals
• Proportion of capitals
• Number of exclamation marks
• Number of unique words
• Proportion of unique words

Revised Approach:
Model Results

Revised Approach: Model Results
Best Performing Model: Embeddings + CNN
Model Selected: TF-IDF + SVM
Reasons:
• Less complex model at a cost of < 0.5% reduction in
f1-score.
• Simpler features (TF-IDF vs. Embeddings)
• Greater interpretability of model’s results as feature
importance can easily be extracted with SVM unlike
NN/CNN
1st level Binary Classification
Accuracy Precision Recall F1-Score
BoW + Naïve Bayes 87.53% 87.76% 87.53% 87.59%
TF-IDF + SVM 88.74% 88.70% 88.74% 88.70%
Binary + Logistic Regression 88.35% 88.31% 88.35% 88.30%
TF-IDF + NN 88.86% 88.81% 88.86% 88.81%
TF-IDF + CNN 88.06% 88.01% 88.06% 88.02%
Embeddings + CNN 89.20% 89.20% 89.10% 89.10%
Model
Level 1: Binary Label
Comparing Between Initial and Final Approach:
• Reduction in False Negatives (predicted as Clean
when actually Toxic)
• Improvement in True Positives (predicted as Toxic
when actually Toxic)
Initial Approach
Embeddings + Logistic
Regression
Final Approach
TF-IDF + SVM
0 1
0 6173 556 6729
1 680 3565 4245
Predicted
Actual
0 1
Actual 0 9203 8 9211
1 777 204 981
Predicted

2nd level Multi-label Classification
Best Performing Model: Embeddings + CNN
Model Selected: Binary + Logistic Regression
Reasons:
• Less complex model at a cost of < 1% reduction in micro
f1-score
• Simpler features (TF-IDF vs. Embeddings)
• Greater interpretability of model’s results as feature
importance can easily be extracted with logistic
regression.
Average
Accuracy
Micro-
Precision
Micro-
Recall
Micro-F1-
Score
BoW + Naïve Bayes 82.76% 63.15% 72.14% 67.35%
TFIDF + SVM 87.11% 78.21% 66.13% 71.66%
Binary + Logistic Regression 87.33% 77.29% 68.79% 72.79%
TFIDF + NN 85.38% 74.53% 61.78% 67.56%
TFIDF + CNN 86.21% 73.99% 67.95% 70.84%
Embeddings + CNN 86.90% 72.70% 74.70% 73.40%
Model
Level 2: Multi-Label

2nd level Multi-label Classification
Comments:
• Multi-label classification produced results with low
precision and recall due to certain labels having
significantly less records.
• For these 3 labels, False Negatives are high, ie. predicted
as Clean when actually are Severe Toxic / Threat / Identity
Hate.
Label Class Precision Recall F1-score No. of Records
Severe Toxic 1 0.59 0.23 0.33 396
Threat 1 0.59 0.35 0.44 117
Identity Hate 1 0.67 0.44 0.53 383
Insult 1 0.75 0.72 0.73 2118
Obscene 1 0.83 0.80 0.81 2256
Severe Toxic
0 1
0 3818 63 3881
1 306 90 396
Threat
0 1
0 4131 29 4160
1 76 41 117
Identity Hate
0 1
0 3811 83 3894
1 213 170 383
Predicted
Actual
Predicted
Actual
Predicted
Actual

Revised Approach: Final Model
Overall Accuracy on Test Results:
Selection of Final Model:
1st – level: TF-IDF + SVM
2nd – level: Binary + Logistic Regression
Level Model
Average
Accuracy
Micro-
Precision
Micro-
Recall
Micro-F1-
Score
1 TFIDF + SVM
2 Binary + Logistic Regression
95.71% 86.40% 83.25% 84.79%
Accuracy Precision Recall F1-Score
Average
Accuracy
Micro-
Precision
Micro-
Recall
Micro-F1-
Score
BoW + Naïve Bayes 87.53% 87.76% 87.53% 87.59% 82.76% 63.15% 72.14% 67.35%
TFIDF + SVM 88.74% 88.70% 88.74% 88.70% 87.11% 78.21% 66.13% 71.66%
Binary + Logistic Regression 88.35% 88.31% 88.35% 88.30% 87.33% 77.29% 68.79% 72.79%
TFIDF + NN 88.86% 88.81% 88.86% 88.81% 85.38% 74.53% 61.78% 67.56%
TFIDF + CNN 88.06% 88.01% 88.06% 88.02% 86.21% 73.99% 67.95% 70.84%
Embeddings + CNN 89.20% 89.20% 89.10% 89.10% 86.90% 72.70% 74.70% 73.40%
Model
Level 1: Binary Label Level 2: Multi-Label

Revised Approach:
Feature Importance

Feature Importance
Label: Toxic
Label: Severe toxic

Feature Importance
Label: Obscene
Label: Identity Hate

Feature Importance
Label: Threat
Label: Insult

Revised Approach: Examples
Comment True Labels Predicted Labels
Hello, you disgusting
transsexual.
You are a grotesque
blight upon the earth.
Please kill yourself.
Toxic, Obscene,
Threat, Identity Hate,
Insult
Toxic, Obscene,
Threat, Identity Hate,
Insult
Wtf, Why You deleting
Maashel? Bad things
will happen to you now.
Toxic, Threat Toxic
am going to shoot you
in the head and laugh
as your brains splatter
onto the ground.
Toxic, Severe Toxic,
Threat
Toxic, Threat

Toxic Comment Classification using Neural Network and Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Toxic Comment Classification using Neural Network and Machine Learning

Similar to Toxic Comment Classification using Neural Network and Machine Learning (20)

Recently uploaded

Recently uploaded (20)

Toxic Comment Classification using Neural Network and Machine Learning