SlideShare a Scribd company logo
1 of 25
Do Characters Abuse More
Than Words?
By: Yashar Mehdad
Joel Tetreault
(Yahoo! Research)
1
Outline
1. Introduction
2. Related Work
3. Methodology
4. Evaluation
5. Conclusion
6. Review
2
1. INTRODUCTION
3
Introduction
 Rise of online communities
 Connect different communities
 Ease of communication
“Hurl insults, bully & threaten through
the use of profanity & hate speech”
Abusive
Language
4
“Kill all Jews, they are a headache”
 Straightforward methods to handle
blacklists & regular expressions
“Kill yrself a$$hole”
 Tokenization & Normalization issue
 Complex methods needed
5
2. RELATED WORK
6
2. Related Work
 Profanity detection
Sood et al 2012
 Hate speech detection
Warner & Hirschberg 2012
 Cyberbullying
Dadvar et al 2013
 Generally abusive language detection
Chen et al 2012, Djuric et al 2015
7
 Majority of the work focused on
supervised classification with
canonical NLP features
 Token n-gram features
 Hand crafted regular expressions &
blacklist features
8
 Features which model the user’s past
behavior
Dadvar et al 2013
 Semi supervised LDA approach
Xiang et al 2012
 Paragraph2vec approach
Djuric et al 2015
 Usage of many features
Nobata et al 2016
9
3. METHODOLOGY
10
3. Methodology
 Supervised classification methods with
lexical & morphological features to
measure various aspects of user
comments
 Hybrid method based on
discriminative & generative classifiers
 Binary classification task (abusive or
not) 11
 Feature classes
Tokens
Characters
Distributional semantics
 Methods
Distributional Representation of Comments
(C2V)
Recurrent Neural Network Language Model
(RNNLM)
Support Vector Machine with Naïve Bayes
(NBSVM)
12
3.1. Distributional Representation of
Comments (C2V)
 Modeling lexical semantics using
vector space model (Mikolov et al
2013)
 A vector representation with comment
embeddings
 A skip-bigram model to train the
embeddings of the words in comments
13
C2V
Mikolov et al. 2014
14
C2V
 Window size 5 & 10
 Low dimensional models (100,300)
d300w10 d300w5
d100w10 d100w5
 10 Iterations
 Multi-core LibLinear Library Logisitc
regression classifier
15
3.2. Recurrent Neural Network
Language Model (RNNLM)
 RNN is potential in representing more
advanced patterns
 Models are trained for both classes
(abusive & clean)
 Token n-gram where n=1,2,3,4,5
 Character n-grams where n=1,2,3,4,5
Space is considered as a character to
investigate character Vs word claim
16
RNNLM
 Testing:- Estimate the ratio of
probability of the comment belonging
to each class via bayes rule
 If the probability of a comment given
the abusive language model is higher
than its probability given non-abusive
language model, then the comment is
classified as abusive
17
RNNLM
 Ratios are used to calculate AUC matrix
 RNNLM toolkit implemented in Mikolov
2011
 One “word” model
 Two character models as “char1” & “char2”
 Bptt:- No. of steps to propagate error back
18
word char1 char2
No. of hidden
layers
50 50 200
bptt 4 4 10
3.3. Support Vector machine with Naïve
Bayes Features (NBSVM)
 SVM + NB
 Compute log ratio vector between the
average character n-gram counts from
abusive & non-abusive comments
 Input to the SVM
Log ratio vector * binary pattern for each
character n-gram in
comment vector
Multi-core LibLinear Library
19
4. EVALUATION
20
4. Evaluation
 Dataset:- Data sets used in Djuric et al &
Nobata et al
 Labels:- From a combination of in-house
raters users reactively flagging bad
comments & abusive language pattern
detectors
 5 fold cross validation & report AUC
 Recall, precision & F1 score
 A token n-gram classifier with logistic
regression classifier
21
Results
22
5. CONCLUSION
23
5. Conclusion
 Character based approaches fared
best in cases with irregular
normalization or obfuscation of words
 Has shown the superiority of simple
character-based approaches over the
previous state-of-art, as well as token-
based ones & two deep learning
approaches
24
THANK YOU
25

More Related Content

What's hot

Introduction+to+software+design
Introduction+to+software+designIntroduction+to+software+design
Introduction+to+software+designMunazza-Mah-Jabeen
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Datatypes in C Language
Datatypes in C LanguageDatatypes in C Language
Datatypes in C LanguagePooja Patel
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
LetterOfRecommendation
LetterOfRecommendationLetterOfRecommendation
LetterOfRecommendationJoel Voigt
 
Mythri_Resume_Fresher
Mythri_Resume_FresherMythri_Resume_Fresher
Mythri_Resume_Fresherr mythri
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...Minh Pham
 
seq2seq learning for end-to-end dialogue systems
seq2seq learning for end-to-end dialogue systemsseq2seq learning for end-to-end dialogue systems
seq2seq learning for end-to-end dialogue systemsJordy Van Landeghem
 
Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Traian Rebedea
 
New Developments in the BREACH attack
New Developments in the BREACH attackNew Developments in the BREACH attack
New Developments in the BREACH attackE Hacking
 

What's hot (15)

Introduction+to+software+design
Introduction+to+software+designIntroduction+to+software+design
Introduction+to+software+design
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Datatypes in C Language
Datatypes in C LanguageDatatypes in C Language
Datatypes in C Language
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
LetterOfRecommendation
LetterOfRecommendationLetterOfRecommendation
LetterOfRecommendation
 
Mythri_Resume_Fresher
Mythri_Resume_FresherMythri_Resume_Fresher
Mythri_Resume_Fresher
 
Multi-modal NLP Systems in Healthcare
Multi-modal NLP Systems in HealthcareMulti-modal NLP Systems in Healthcare
Multi-modal NLP Systems in Healthcare
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 
Notesparadigms
NotesparadigmsNotesparadigms
Notesparadigms
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
 
seq2seq learning for end-to-end dialogue systems
seq2seq learning for end-to-end dialogue systemsseq2seq learning for end-to-end dialogue systems
seq2seq learning for end-to-end dialogue systems
 
Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...Web services for supporting the interactions of learners in the social web - ...
Web services for supporting the interactions of learners in the social web - ...
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
New Developments in the BREACH attack
New Developments in the BREACH attackNew Developments in the BREACH attack
New Developments in the BREACH attack
 

Similar to Do characters abuse more than words?

Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphras...
Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphras...Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphras...
Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphras...JanPhilipWahle
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksJinho Choi
 
Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...IAESIJAI
 
Deep Learning | Speaker Indentification
Deep Learning | Speaker IndentificationDeep Learning | Speaker Indentification
Deep Learning | Speaker IndentificationSai Kiran Kadam
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET Journal
 
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Francisco Manuel Rangel Pardo
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
 
Detection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningDetection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningijaia
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
Offline Handwritten Thai Character Recognition Using Single Tier Classifier a...
Offline Handwritten Thai Character Recognition Using Single Tier Classifier a...Offline Handwritten Thai Character Recognition Using Single Tier Classifier a...
Offline Handwritten Thai Character Recognition Using Single Tier Classifier a...Ferdin Joe John Joseph PhD
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingParrotAI
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
Tony clark caise 13-presentation
Tony clark  caise 13-presentationTony clark  caise 13-presentation
Tony clark caise 13-presentationcaise2013vlc
 

Similar to Do characters abuse more than words? (20)

Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
DeepPavlov 2019
DeepPavlov 2019DeepPavlov 2019
DeepPavlov 2019
 
Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphras...
Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphras...Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphras...
Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphras...
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
 
Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...Detecting cyberbullying text using the approaches with machine learning model...
Detecting cyberbullying text using the approaches with machine learning model...
 
1808.10245v1 (1).pdf
1808.10245v1 (1).pdf1808.10245v1 (1).pdf
1808.10245v1 (1).pdf
 
Deep Learning | Speaker Indentification
Deep Learning | Speaker IndentificationDeep Learning | Speaker Indentification
Deep Learning | Speaker Indentification
 
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
 
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
 
Abusive Language Detection.pptx
Abusive Language Detection.pptxAbusive Language Detection.pptx
Abusive Language Detection.pptx
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
 
Detection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningDetection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learning
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Offline Handwritten Thai Character Recognition Using Single Tier Classifier a...
Offline Handwritten Thai Character Recognition Using Single Tier Classifier a...Offline Handwritten Thai Character Recognition Using Single Tier Classifier a...
Offline Handwritten Thai Character Recognition Using Single Tier Classifier a...
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
Tony clark caise 13-presentation
Tony clark  caise 13-presentationTony clark  caise 13-presentation
Tony clark caise 13-presentation
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 

Do characters abuse more than words?

  • 1. Do Characters Abuse More Than Words? By: Yashar Mehdad Joel Tetreault (Yahoo! Research) 1
  • 2. Outline 1. Introduction 2. Related Work 3. Methodology 4. Evaluation 5. Conclusion 6. Review 2
  • 4. Introduction  Rise of online communities  Connect different communities  Ease of communication “Hurl insults, bully & threaten through the use of profanity & hate speech” Abusive Language 4
  • 5. “Kill all Jews, they are a headache”  Straightforward methods to handle blacklists & regular expressions “Kill yrself a$$hole”  Tokenization & Normalization issue  Complex methods needed 5
  • 7. 2. Related Work  Profanity detection Sood et al 2012  Hate speech detection Warner & Hirschberg 2012  Cyberbullying Dadvar et al 2013  Generally abusive language detection Chen et al 2012, Djuric et al 2015 7
  • 8.  Majority of the work focused on supervised classification with canonical NLP features  Token n-gram features  Hand crafted regular expressions & blacklist features 8
  • 9.  Features which model the user’s past behavior Dadvar et al 2013  Semi supervised LDA approach Xiang et al 2012  Paragraph2vec approach Djuric et al 2015  Usage of many features Nobata et al 2016 9
  • 11. 3. Methodology  Supervised classification methods with lexical & morphological features to measure various aspects of user comments  Hybrid method based on discriminative & generative classifiers  Binary classification task (abusive or not) 11
  • 12.  Feature classes Tokens Characters Distributional semantics  Methods Distributional Representation of Comments (C2V) Recurrent Neural Network Language Model (RNNLM) Support Vector Machine with Naïve Bayes (NBSVM) 12
  • 13. 3.1. Distributional Representation of Comments (C2V)  Modeling lexical semantics using vector space model (Mikolov et al 2013)  A vector representation with comment embeddings  A skip-bigram model to train the embeddings of the words in comments 13
  • 15. C2V  Window size 5 & 10  Low dimensional models (100,300) d300w10 d300w5 d100w10 d100w5  10 Iterations  Multi-core LibLinear Library Logisitc regression classifier 15
  • 16. 3.2. Recurrent Neural Network Language Model (RNNLM)  RNN is potential in representing more advanced patterns  Models are trained for both classes (abusive & clean)  Token n-gram where n=1,2,3,4,5  Character n-grams where n=1,2,3,4,5 Space is considered as a character to investigate character Vs word claim 16
  • 17. RNNLM  Testing:- Estimate the ratio of probability of the comment belonging to each class via bayes rule  If the probability of a comment given the abusive language model is higher than its probability given non-abusive language model, then the comment is classified as abusive 17
  • 18. RNNLM  Ratios are used to calculate AUC matrix  RNNLM toolkit implemented in Mikolov 2011  One “word” model  Two character models as “char1” & “char2”  Bptt:- No. of steps to propagate error back 18 word char1 char2 No. of hidden layers 50 50 200 bptt 4 4 10
  • 19. 3.3. Support Vector machine with Naïve Bayes Features (NBSVM)  SVM + NB  Compute log ratio vector between the average character n-gram counts from abusive & non-abusive comments  Input to the SVM Log ratio vector * binary pattern for each character n-gram in comment vector Multi-core LibLinear Library 19
  • 21. 4. Evaluation  Dataset:- Data sets used in Djuric et al & Nobata et al  Labels:- From a combination of in-house raters users reactively flagging bad comments & abusive language pattern detectors  5 fold cross validation & report AUC  Recall, precision & F1 score  A token n-gram classifier with logistic regression classifier 21
  • 24. 5. Conclusion  Character based approaches fared best in cases with irregular normalization or obfuscation of words  Has shown the superiority of simple character-based approaches over the previous state-of-art, as well as token- based ones & two deep learning approaches 24

Editor's Notes

  1. The rise of online communities over the last ten years, in various forms such as message boards, twitter, discussion forums, etc., have allowed people from disparate backgrounds to connect in a way that would not have been possible before. However, the ease of communication online has made it possible for both anonymous and non-anonymous posters to
  2. Straightforward methods to handle abusive languages like …………………….. They are concious bastardizations of words in an effort to evade blacklists So characters often play an important role in the comment language
  3. In this model, every paragraph is mapped to a uniques vector represented by a column in a matrix and every word is also mapped to a unique vector represented by a column in a matrix The paragraph vector & word vectors are averaged or concatenated to predict the next word in a context
  4. After being trained the paragraph vectors can be used as features for the paragrap can be used to feed these features directly to conventional ML techniques such as logistic regression svm Or k-means Advantages Can work for tasks that do not have enough labeled data Take into consideration the word order at least in a small context in the same way that an n-gram model with a large n would do