SlideShare a Scribd company logo
EVALUATING SEMANTIC FEATURE REPRESENTATIONS TO
EFFICIENTLY DETECT HATE INTENT ON SOCIAL MEDIA
@wayasas @hemant_pt
Yasas Senarath Hemant Purohit
2020 IEEE International Conference on Semantic Computing
(IEEE-ICSC ’20)
San Diego, California
Feb 04, 2020
Outline
2
¨ Introduction
¤ Social Media
¤ Malicious Intent
¤ Background
¨ Problem and Contribution
¤ Data and Task
¤ Contributions
¨ Methodology
¤ Hybrid Feature Representation Framework
¤ Features
¨ Results and Discussion
¨ Conclusion
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Outline
3
¨ Introduction
¤ Social Media
¤ Malicious Intent
¤ Background
¨ Problem and Contribution
¤ Data and Task
¤ Contributions
¨ Methodology
¤ Hybrid Feature Representation Framework
¤ Features
¨ Results and Discussion
¨ Conclusion
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Motivation: Diverse Intent behind Social Media
Sharing
4
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Social Media is an integral part of many of our daily lives!
Motivation: Malicious Intent on Social Media
¨ Social Media
¤ Malicious intent highly
profound in recent years
¨ Challenges
¤ Distinguishing intent: hate
speech vs. sarcasm vs. angry
rant
¤ Inefficiency in formalizing &
representing the context
5
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
members of nontraditional
religions r all subhuman trash
•hateful
you sure u ain’t colored?
• hateful
such a sucker 4some Oreos.
• Normal
Image source:
https://www.deviantart.com/ryujin2490/art/ANGRY-TWITTER-BIRD-252230315
Background
6
¨ Levels of Hate Speech
¤ Presence of Hate Speech [2, 6]
¤ Type of Hate Speech: offensive, abusive, hateful speech, aggressive,
and cyberbullying [2, 4]
¨ Classifiers [1]
¤ Naive Bayes
¤ Logistic Regression [3, 8]
¤ Random Forest [3, 7]
¤ Support Vector Machine* [6-9]
¤ Deep Learning [5]
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Background: Features [1]
7
¨ Surface-level features
¤ Bag-of-Words, TFIDF
¨ Lexical Resources
¤ Hate Speech Lexicons
¤ Sentiment Lexicons
¨ Linguistic Features
¤ POS tags
¨ Knowledge-based Features
¤ ConceptNet (with custom rules)
¨ Meta-information
¤ User relevant information
¨ Transfer Learning
¤ Sentiment Analysis
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Outline
8
¨ Introduction
¤ Social Media
¤ Malicious Intent
¤ Background
¨ Problem and Contribution
¤ Data and Task
¤ Contributions
¨ Methodology
¤ Hybrid Feature Representation Framework
¤ Features
¨ Results and Discussion
¨ Conclusion
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Task and Data
¨ Task:
¤ Given a social media post, detect
whether it has hateful intent
¨ Datasets:
¤ DWMW17 [3]
n ~25k tweets
n query for words in Hatebase
n labels: Hate, Offensive and
Neither
¤ FDCL18 [4]
n ~ 60k tweets
n randomly sampled from Twitter
stream
n labels: Normal, Spam, Abusive,
Hateful
9
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
0%
20%
40%
60%
80%
100%
DWMW17
(24783)
FDCL18 (60227)
Label Distribution
Nomal/Spam/Neither
Hate/Offense/Abusive
#BendersRule
English language tweets
Contribution
10
¨ Proposed a set of diverse features capturing a variety of data
semantics for learning a hate speech classification model
¨ Validated the significance of proposed features on each dataset
¨ Evaluated prediction performance on each dataset based on models
trained on the other (cross-prediction performance)
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Outline
11
¨ Introduction
¤ Social Media
¤ Malicious Intent
¤ Background
¨ Problem and Contribution
¤ Data and Task
¤ Contributions
¨ Methodology
¤ Hybrid Feature Representation Framework
¤ Features
¨ Results and Discussion
¨ Conclusion
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Methodology: Pipeline
¨ Classical Data Mining Pipeline
¨ Preprocess
¤ Normalize (Usernames and
URLs)
¤ Tokenization
¨ Features*
¨ Classifier
¤ Liner SVM
12
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Tweet Text
Preprocess
Feature Extractor
Classifier
Label
(Hate Speech / Normal)
Methodology: Feature Extractor
¨ Corpus-based semantic features
¤ TFIDF
¤ N-gram for N=[1, 2, 3]
¨ Distributional semantics-based
features
¤ Average of word embeddings
¨ Declarative knowledge-based
semantic features
¤ Hatebase
¤ FrameNet
13
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Tweet Text
Preprocess
Feature
Extractor
Classifier
Label
(Hate Speech / Normal)
Corpus Based
Features
Distributional
Semantic Features
Knowledge Based
Features
Methodology: Hatebase
¨ Let 𝑓! be function mapping a
word to feature vector based
on some parameter/s in our
KB
14
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
∑!"#
$
𝑓𝑗 𝑤𝑖
𝑛
Knowledge
Base
Tweet
FKB(Tweet)
Tweet Text
Preprocess
Feature
Extractor
Classifier
Label
(Hate Speech / Normal)
Corpus Based
Features
Distributional
Semantic Features
Knowledge Based
Features
Hatebase
FrameNet
Methodology: FKB(Tweet) | KB = Hatebase
¨ Offensiveness (𝑓")
¤ discretized Value
¤ Freedman Diaconis Estimator
¨ Unambiguous (𝑓%)
¤ 1D vector with Boolean-
value
¨ Hateful-Meaning (𝑓#)
¤ bag-of-words vector of the
hateful definition
¨ Non-hateful-Meaning (𝑓$)
¤ bag-of-words vector of the
non-hateful definition
15
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Tweet Text
Preprocess
Feature
Extractor
Classifier
Label
(Hate Speech / Normal)
Corpus Based Features
Distributional Semantic Features
Knowledge
Based Features
Hatebase
FrameNet
Methodology: FrameNet Features
16
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Tweet Text
Preprocess
Feature
Extractor
Classifier
Label
(Hate Speech / Normal)
Corpus Based Features
Distributional Semantic Features
Knowledge
Based Features
SLINGTweet
Frames
(PropBank)
Mapping
Frames
(FrameNet)
Bag of Frames
Features
Hatebase
FrameNet
Outline
17
¨ Introduction
¤ Social Media
¤ Malicious Intent
¤ Background
¨ Problem and Contribution
¤ Data and Task
¤ Contributions
¨ Methodology
¤ Hybrid Feature Representation Framework
¤ Features
¨ Results and Discussion
¨ Conclusion
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Results
18
¨ Five-fold cross validation performance
*baseline
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Features
M1* TFIDF
M2
Hatebase
Features
+ Offensiveness
M3 + Unambiguous
M4 + Hateful Meaning
M5 + Non-Hateful
Meaning
M6 + FrameNet Features
M7 + Mean Embedding
Cross-Predication Performance
19
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
0
10
20
30
40
50
60
70
80
90
100
DWMW17/FDCL18 FDCL18/DWMW17
F1Score
Train/Test Dataset
M1 M7
Features
M1* TFIDF
M2
Hatebase
Features
+ Offensiveness
M3 + Unambiguous
M4 + Hateful Meaning
M5 + Non-Hateful
Meaning
M6 + FrameNet Features
M7 + Mean Embedding
Discussion
¨ TFIDF features – Highly
Predictive
¤ However, do not help in
generalizing the models
¨ Knowledge base features
enhance precision
¨ Larger vocabulary of Word
Embeddings improve
performance
20
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Outline
21
¨ Introduction
¤ Social Media
¤ Malicious Intent
¤ Background
¨ Problem and Contribution
¤ Data and Task
¤ Contributions
¨ Methodology
¤ Hybrid Feature Representation Framework
¤ Features
¨ Results and Discussion
¨ Conclusion
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Conclusion
22
¨ Limitations and Future Work
¤ Polysemy words with multiple meanings can hinder the actual text
interpretation
¤ Multilingual social media posts
¨ Conclusions
¤ Novel empirical study of diverse semantic feature representations
for hate speech detection on social media
¤ Absolute gain in F1 score up to 3.0% for the models with hybrid
feature representation
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
References
23
[1] A. Schmidt and M. Wiegand, “A survey on hate speech detection using natural language processing,” in Proc.
of the Fifth Int’l Workshop on Natural Language Processing for Social Media, 2017, pp. 1–10.
[2] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar, “Semeval-2019 task 6: Identifying
and categorizing offensive language in social media (offenseval),” in SemEval, 2019, pp. 75–86.
[3] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of
offensive language,” in ICWSM, 2017.
[4] A. M. Founta, C. Djouvas, D. Chatzakou, I. Leontiadis, J. Blackburn, G. Stringhini, A. Vakali, M. Sirivianos, and
N. Kourtellis, “Large scale crowdsourcing and characterization of twitter abusive behavior,” in ICWSM, 2018.
[5] K.Dinakar,B.Jones,C.Havasi,H.Lieberman,andR.Picard,“Common sense reasoning for detection, prevention, and
mitigation of cyberbullying,” ACM Tran. on Interactive Intelligent Systems, vol. 2, no. 3, p. 18, 2012.
[6] P. Burnap and M. L. Williams, “Cyber hate speech on twitter: An application of machine classification and
statistical modeling for policy and decision making,” Policy & Internet, vol. 7, no. 2, pp. 223–242, 2015.
[7] Y. Chen, Y. Zhou, S. Zhu, and H. Xu, “Detecting offensive lan- guage in social media to protect adolescent
online safety,” in PASSAT- SOCIALCOM. IEEE, 2012, pp. 71–80.
[8] Y. Mehdad and J. Tetreault, “Do characters abuse more than words?” in Proc. of the 17th Annual Meeting of
the Special Interest Group on Discourse and Dialogue, 2016, pp. 299–303.
[9] G. Xiang, B. Fan, L. Wang, J. Hong, and C. Rose, “Detecting offensive tweets via topical feature discovery
over a large scale twitter corpus,” in CIKM. ACM, 2012, pp. 1980–1984.
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
Acknowledgement Resources
Questions?24
Thank you
IIS #657379
@wayasas
ywijesu@gmu.edu
More Questions?
https://git.gmu.edu/ysenarath
/public/hate-intent-detection
Results
25

More Related Content

Similar to Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media

Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
dbpublications
 
Mining and Comparing Engagement Dynamics Across Multiple Social Media Platfor...
Mining and Comparing Engagement Dynamics Across Multiple Social Media Platfor...Mining and Comparing Engagement Dynamics Across Multiple Social Media Platfor...
Mining and Comparing Engagement Dynamics Across Multiple Social Media Platfor...
The Open University
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic Web
Fabrizio Orlandi
 
Dev8D Presentation Pascal Belouin
Dev8D Presentation Pascal BelouinDev8D Presentation Pascal Belouin
Dev8D Presentation Pascal Belouin
Pascal Belouin
 
Dev8 D Presentation
Dev8 D PresentationDev8 D Presentation
Dev8 D Presentation
Pascal Belouin
 
Proposal.docx
Proposal.docxProposal.docx
Proposal.docx
AmitabhSrivastava23
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Amit Sheth
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Symeon Papadopoulos
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
Zina Petrushyna
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
REVEAL - Social Media Verification
 
IRJET - Profanity Statistical Analyzer
 IRJET -  	  Profanity Statistical Analyzer IRJET -  	  Profanity Statistical Analyzer
IRJET - Profanity Statistical Analyzer
IRJET Journal
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social Media
Janna Joceli Omena
 
clay-fink-resume-current
clay-fink-resume-currentclay-fink-resume-current
clay-fink-resume-current
clayfink
 
Strategic perspectives 3
Strategic perspectives 3Strategic perspectives 3
Strategic perspectives 3
archiejones4
 
Live Social Semantics @ ISWC2009
Live Social Semantics @ ISWC2009Live Social Semantics @ ISWC2009
Live Social Semantics @ ISWC2009
Martin Szomszor
 
Community detection in complex social networks
Community detection in complex social networksCommunity detection in complex social networks
Community detection in complex social networks
Aboul Ella Hassanien
 
The Semiotic Inspection Method - Overview, Analysis and Critique
The Semiotic Inspection Method - Overview, Analysis and CritiqueThe Semiotic Inspection Method - Overview, Analysis and Critique
The Semiotic Inspection Method - Overview, Analysis and Critique
Omar Sosa-Tzec
 
Applications of Big Data Techniques for Social Analytics
Applications of Big Data Techniques for Social Analytics Applications of Big Data Techniques for Social Analytics
Applications of Big Data Techniques for Social Analytics
Dickson Lukose
 
A hybrid approach based on personality traits for hate speech detection in Ar...
A hybrid approach based on personality traits for hate speech detection in Ar...A hybrid approach based on personality traits for hate speech detection in Ar...
A hybrid approach based on personality traits for hate speech detection in Ar...
IJECEIAES
 
Analyzing Emoji in Text
Analyzing Emoji in TextAnalyzing Emoji in Text
Analyzing Emoji in Text
Sanjaya Wijeratne
 

Similar to Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media (20)

Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
 
Mining and Comparing Engagement Dynamics Across Multiple Social Media Platfor...
Mining and Comparing Engagement Dynamics Across Multiple Social Media Platfor...Mining and Comparing Engagement Dynamics Across Multiple Social Media Platfor...
Mining and Comparing Engagement Dynamics Across Multiple Social Media Platfor...
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic Web
 
Dev8D Presentation Pascal Belouin
Dev8D Presentation Pascal BelouinDev8D Presentation Pascal Belouin
Dev8D Presentation Pascal Belouin
 
Dev8 D Presentation
Dev8 D PresentationDev8 D Presentation
Dev8 D Presentation
 
Proposal.docx
Proposal.docxProposal.docx
Proposal.docx
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
 
Doctoral seminar (DBIS RWTH Aachen)
Doctoral seminar  (DBIS RWTH Aachen)Doctoral seminar  (DBIS RWTH Aachen)
Doctoral seminar (DBIS RWTH Aachen)
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
 
IRJET - Profanity Statistical Analyzer
 IRJET -  	  Profanity Statistical Analyzer IRJET -  	  Profanity Statistical Analyzer
IRJET - Profanity Statistical Analyzer
 
Approaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social MediaApproaches of Data Analysis: Networks generated through Social Media
Approaches of Data Analysis: Networks generated through Social Media
 
clay-fink-resume-current
clay-fink-resume-currentclay-fink-resume-current
clay-fink-resume-current
 
Strategic perspectives 3
Strategic perspectives 3Strategic perspectives 3
Strategic perspectives 3
 
Live Social Semantics @ ISWC2009
Live Social Semantics @ ISWC2009Live Social Semantics @ ISWC2009
Live Social Semantics @ ISWC2009
 
Community detection in complex social networks
Community detection in complex social networksCommunity detection in complex social networks
Community detection in complex social networks
 
The Semiotic Inspection Method - Overview, Analysis and Critique
The Semiotic Inspection Method - Overview, Analysis and CritiqueThe Semiotic Inspection Method - Overview, Analysis and Critique
The Semiotic Inspection Method - Overview, Analysis and Critique
 
Applications of Big Data Techniques for Social Analytics
Applications of Big Data Techniques for Social Analytics Applications of Big Data Techniques for Social Analytics
Applications of Big Data Techniques for Social Analytics
 
A hybrid approach based on personality traits for hate speech detection in Ar...
A hybrid approach based on personality traits for hate speech detection in Ar...A hybrid approach based on personality traits for hate speech detection in Ar...
A hybrid approach based on personality traits for hate speech detection in Ar...
 
Analyzing Emoji in Text
Analyzing Emoji in TextAnalyzing Emoji in Text
Analyzing Emoji in Text
 

More from Yasas Senarath

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
Yasas Senarath
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
Yasas Senarath
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
Yasas Senarath
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
Yasas Senarath
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big Data
Yasas Senarath
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
Yasas Senarath
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
Yasas Senarath
 

More from Yasas Senarath (7)

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big Data
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 

Recently uploaded

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 

Recently uploaded (20)

Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 

Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media

  • 1. EVALUATING SEMANTIC FEATURE REPRESENTATIONS TO EFFICIENTLY DETECT HATE INTENT ON SOCIAL MEDIA @wayasas @hemant_pt Yasas Senarath Hemant Purohit 2020 IEEE International Conference on Semantic Computing (IEEE-ICSC ’20) San Diego, California Feb 04, 2020
  • 2. Outline 2 ¨ Introduction ¤ Social Media ¤ Malicious Intent ¤ Background ¨ Problem and Contribution ¤ Data and Task ¤ Contributions ¨ Methodology ¤ Hybrid Feature Representation Framework ¤ Features ¨ Results and Discussion ¨ Conclusion Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 3. Outline 3 ¨ Introduction ¤ Social Media ¤ Malicious Intent ¤ Background ¨ Problem and Contribution ¤ Data and Task ¤ Contributions ¨ Methodology ¤ Hybrid Feature Representation Framework ¤ Features ¨ Results and Discussion ¨ Conclusion Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 4. Motivation: Diverse Intent behind Social Media Sharing 4 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 Social Media is an integral part of many of our daily lives!
  • 5. Motivation: Malicious Intent on Social Media ¨ Social Media ¤ Malicious intent highly profound in recent years ¨ Challenges ¤ Distinguishing intent: hate speech vs. sarcasm vs. angry rant ¤ Inefficiency in formalizing & representing the context 5 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 members of nontraditional religions r all subhuman trash •hateful you sure u ain’t colored? • hateful such a sucker 4some Oreos. • Normal Image source: https://www.deviantart.com/ryujin2490/art/ANGRY-TWITTER-BIRD-252230315
  • 6. Background 6 ¨ Levels of Hate Speech ¤ Presence of Hate Speech [2, 6] ¤ Type of Hate Speech: offensive, abusive, hateful speech, aggressive, and cyberbullying [2, 4] ¨ Classifiers [1] ¤ Naive Bayes ¤ Logistic Regression [3, 8] ¤ Random Forest [3, 7] ¤ Support Vector Machine* [6-9] ¤ Deep Learning [5] Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 7. Background: Features [1] 7 ¨ Surface-level features ¤ Bag-of-Words, TFIDF ¨ Lexical Resources ¤ Hate Speech Lexicons ¤ Sentiment Lexicons ¨ Linguistic Features ¤ POS tags ¨ Knowledge-based Features ¤ ConceptNet (with custom rules) ¨ Meta-information ¤ User relevant information ¨ Transfer Learning ¤ Sentiment Analysis Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 8. Outline 8 ¨ Introduction ¤ Social Media ¤ Malicious Intent ¤ Background ¨ Problem and Contribution ¤ Data and Task ¤ Contributions ¨ Methodology ¤ Hybrid Feature Representation Framework ¤ Features ¨ Results and Discussion ¨ Conclusion Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 9. Task and Data ¨ Task: ¤ Given a social media post, detect whether it has hateful intent ¨ Datasets: ¤ DWMW17 [3] n ~25k tweets n query for words in Hatebase n labels: Hate, Offensive and Neither ¤ FDCL18 [4] n ~ 60k tweets n randomly sampled from Twitter stream n labels: Normal, Spam, Abusive, Hateful 9 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 0% 20% 40% 60% 80% 100% DWMW17 (24783) FDCL18 (60227) Label Distribution Nomal/Spam/Neither Hate/Offense/Abusive #BendersRule English language tweets
  • 10. Contribution 10 ¨ Proposed a set of diverse features capturing a variety of data semantics for learning a hate speech classification model ¨ Validated the significance of proposed features on each dataset ¨ Evaluated prediction performance on each dataset based on models trained on the other (cross-prediction performance) Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 11. Outline 11 ¨ Introduction ¤ Social Media ¤ Malicious Intent ¤ Background ¨ Problem and Contribution ¤ Data and Task ¤ Contributions ¨ Methodology ¤ Hybrid Feature Representation Framework ¤ Features ¨ Results and Discussion ¨ Conclusion Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 12. Methodology: Pipeline ¨ Classical Data Mining Pipeline ¨ Preprocess ¤ Normalize (Usernames and URLs) ¤ Tokenization ¨ Features* ¨ Classifier ¤ Liner SVM 12 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 Tweet Text Preprocess Feature Extractor Classifier Label (Hate Speech / Normal)
  • 13. Methodology: Feature Extractor ¨ Corpus-based semantic features ¤ TFIDF ¤ N-gram for N=[1, 2, 3] ¨ Distributional semantics-based features ¤ Average of word embeddings ¨ Declarative knowledge-based semantic features ¤ Hatebase ¤ FrameNet 13 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 Tweet Text Preprocess Feature Extractor Classifier Label (Hate Speech / Normal) Corpus Based Features Distributional Semantic Features Knowledge Based Features
  • 14. Methodology: Hatebase ¨ Let 𝑓! be function mapping a word to feature vector based on some parameter/s in our KB 14 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 ∑!"# $ 𝑓𝑗 𝑤𝑖 𝑛 Knowledge Base Tweet FKB(Tweet) Tweet Text Preprocess Feature Extractor Classifier Label (Hate Speech / Normal) Corpus Based Features Distributional Semantic Features Knowledge Based Features Hatebase FrameNet
  • 15. Methodology: FKB(Tweet) | KB = Hatebase ¨ Offensiveness (𝑓") ¤ discretized Value ¤ Freedman Diaconis Estimator ¨ Unambiguous (𝑓%) ¤ 1D vector with Boolean- value ¨ Hateful-Meaning (𝑓#) ¤ bag-of-words vector of the hateful definition ¨ Non-hateful-Meaning (𝑓$) ¤ bag-of-words vector of the non-hateful definition 15 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 Tweet Text Preprocess Feature Extractor Classifier Label (Hate Speech / Normal) Corpus Based Features Distributional Semantic Features Knowledge Based Features Hatebase FrameNet
  • 16. Methodology: FrameNet Features 16 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 Tweet Text Preprocess Feature Extractor Classifier Label (Hate Speech / Normal) Corpus Based Features Distributional Semantic Features Knowledge Based Features SLINGTweet Frames (PropBank) Mapping Frames (FrameNet) Bag of Frames Features Hatebase FrameNet
  • 17. Outline 17 ¨ Introduction ¤ Social Media ¤ Malicious Intent ¤ Background ¨ Problem and Contribution ¤ Data and Task ¤ Contributions ¨ Methodology ¤ Hybrid Feature Representation Framework ¤ Features ¨ Results and Discussion ¨ Conclusion Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 18. Results 18 ¨ Five-fold cross validation performance *baseline Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 Features M1* TFIDF M2 Hatebase Features + Offensiveness M3 + Unambiguous M4 + Hateful Meaning M5 + Non-Hateful Meaning M6 + FrameNet Features M7 + Mean Embedding
  • 19. Cross-Predication Performance 19 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20 0 10 20 30 40 50 60 70 80 90 100 DWMW17/FDCL18 FDCL18/DWMW17 F1Score Train/Test Dataset M1 M7 Features M1* TFIDF M2 Hatebase Features + Offensiveness M3 + Unambiguous M4 + Hateful Meaning M5 + Non-Hateful Meaning M6 + FrameNet Features M7 + Mean Embedding
  • 20. Discussion ¨ TFIDF features – Highly Predictive ¤ However, do not help in generalizing the models ¨ Knowledge base features enhance precision ¨ Larger vocabulary of Word Embeddings improve performance 20 Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 21. Outline 21 ¨ Introduction ¤ Social Media ¤ Malicious Intent ¤ Background ¨ Problem and Contribution ¤ Data and Task ¤ Contributions ¨ Methodology ¤ Hybrid Feature Representation Framework ¤ Features ¨ Results and Discussion ¨ Conclusion Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 22. Conclusion 22 ¨ Limitations and Future Work ¤ Polysemy words with multiple meanings can hinder the actual text interpretation ¤ Multilingual social media posts ¨ Conclusions ¤ Novel empirical study of diverse semantic feature representations for hate speech detection on social media ¤ Absolute gain in F1 score up to 3.0% for the models with hybrid feature representation Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 23. References 23 [1] A. Schmidt and M. Wiegand, “A survey on hate speech detection using natural language processing,” in Proc. of the Fifth Int’l Workshop on Natural Language Processing for Social Media, 2017, pp. 1–10. [2] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar, “Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval),” in SemEval, 2019, pp. 75–86. [3] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” in ICWSM, 2017. [4] A. M. Founta, C. Djouvas, D. Chatzakou, I. Leontiadis, J. Blackburn, G. Stringhini, A. Vakali, M. Sirivianos, and N. Kourtellis, “Large scale crowdsourcing and characterization of twitter abusive behavior,” in ICWSM, 2018. [5] K.Dinakar,B.Jones,C.Havasi,H.Lieberman,andR.Picard,“Common sense reasoning for detection, prevention, and mitigation of cyberbullying,” ACM Tran. on Interactive Intelligent Systems, vol. 2, no. 3, p. 18, 2012. [6] P. Burnap and M. L. Williams, “Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making,” Policy & Internet, vol. 7, no. 2, pp. 223–242, 2015. [7] Y. Chen, Y. Zhou, S. Zhu, and H. Xu, “Detecting offensive lan- guage in social media to protect adolescent online safety,” in PASSAT- SOCIALCOM. IEEE, 2012, pp. 71–80. [8] Y. Mehdad and J. Tetreault, “Do characters abuse more than words?” in Proc. of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2016, pp. 299–303. [9] G. Xiang, B. Fan, L. Wang, J. Hong, and C. Rose, “Detecting offensive tweets via topical feature discovery over a large scale twitter corpus,” in CIKM. ACM, 2012, pp. 1980–1984. Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent on Social Media, IEEE-ICSC ‘20
  • 24. Acknowledgement Resources Questions?24 Thank you IIS #657379 @wayasas ywijesu@gmu.edu More Questions? https://git.gmu.edu/ysenarath /public/hate-intent-detection