SlideShare a Scribd company logo
1 of 15
Detecting Offensive Language in Social Media to Protect
Adolescent Online Safety
Written by
ying chen, et al.
2012 ASE /IEEE International Conference on Social Computing and 2012 ASE/IEEE
International Conference on Privacy, Security,
Risk and Trust
Presented by
ziar khan
Abstract
Since the textual contents on online social media are highly unstructured, informal, and
often misspelled, existing research on message level offensive language detection cannot
accurately detect offensive contents.
In this paper, they designed a framework called Lexical Syntactic Feature (LSF)
architecture to detect offensive contents and identify potential offensive users in social
media.
They distinguished the contribution of pejoratives/profanities and obscenities in
determining offensive content, and introduce hand authoring syntactic rules in identifying
name calling harassments.
In particular, they incorporated a user’s writing style, structure and specific cyberbullying
content as features to predict the users capability to send out offensive content.
Results from experiments showed that their LSF framework performed significantly better
than existing methods in offensive content detection.
Introduction
With the rapid growth of social media, users especially adolescents are spending significant
amount of time on various social networking sites to connect with others, to share
information, and to pursue common interests.
It has been found that 70% of teens use social media sites on a daily basis and nearly one
in four teens hit their favorite social media sites 10 or more times a day.
19% of teens report that someone has written or posted mean or embarrassing things about
them on social networking sites.
As adolescents are more likely to be negatively affected by biased and harmful contents
than adults, detecting online offensive contents to protect adolescents online safety
becomes an urgent task.
To address concerns on children access to offensive contents over Internet, administrators
of social media often manually review online contents to detect and delete offensive
materials, however, manually reviewing are labour intensive and time consuming.
Introduction
Some automatic contents filtering software packages, such as Appen, eblaster,
iambigbrother, Internet Security Suite etc, have been developed to detect and filter online
offensive contents. Most of them simply blocked web pages and paragraphs that contained
dirty words.
These word based approaches not only affect the readability and usability of web sites, but
also fail to identify subtle offensive messages.
To address these limitations, in this paper, they proposed Lexical Syntactic Feature based
(LSF) language model to effectively detect offensive language in social media and to
protect adolescents.
LSF not only examines messages, but also the person who posts the messages and his/her
patterns of posting.
LSF can be implemented as a client side application for individuals and groups who are
concerned about adolescents online safety.
Users are also allowed to adjust the threshold of acceptable level of offensive contents.
Related Work
A. Offensiveness Content Filtering Methods in Social Media : Popular online social
networking sites apply several mechanisms to screen offensive contents.
For example, Youtube’s safety mode, once activated, can hide all comments containing
offensive languages from users. But pre-screened content will still appear. The
pejoratives replaced by asterisks, if users simply click "Text Comments."
On Facebook, users can add comma separated keywords to the "Moderation Blacklist."
When people include blacklisted keywords in a post and/or a comment on a page, the
content will be automatically identified as spam and thus be screened.
Twitter client, “Tweetie 1.3,” was rejected by Apple Company for allowing foul
languages to appear in users’ tweets.
In general, the majority of popular social media use simple lexicon based approach to
filter offensive contents. Their lexicons are either predefined (such as Youtube) or
composed by the users themselves (such as Facebook).
For adolescents who often lack cognitive awareness of risks, these approaches are
hardly effective to prevent them from being exposed to offensive contents. Therefore,
parents need more sophisticate software and techniques to efficiently detect offensive
contents to protect their adolescents from potential exposure to vulgar, pornographic
and hateful languages.
Related Work
B. Using Text Mining Techniques to Detect Online Offensive Contents : Offensive
language identification in social media is a difficult task because the textual contents in
such environment is often unstructured, informal, and even misspelled.
Researchers have studied intelligent ways to identify offensive contents using text
mining approach. Implementing text mining techniques to analyze online data requires
the following phases:
1) Data acquisition and preprocess,
2) Feature extraction, and
3) Classification.
The major challenges of using text mining to detect offensive contents lie on the feature
selection phase, which will be elaborated in the following sections.
a) Message level Feature Extraction: Most offensive content detection research extracts
two kinds of features: lexical and syntactic features.
Lexical features treat each word and phrase as an entity. Word patterns such as
appearance of certain keywords and their frequencies are often used to represent the
language model. For example Bag of Words(BoW) and N-grams.
Related Work
Syntactic features: Although lexical features perform well in detecting offensive
entities, without considering the syntactical structure of the whole sentence, they fail to
distinguish sentences offensiveness which contain same words but in different orders.
Therefore, to consider syntactical features in sentences, natural language parsers are
introduced to parse sentences on grammatical structures before feature selection.
b) User level Offensiveness Detection: Most contemporary research on detecting online
offensive languages only focus on sentence level and message level constructs. Since no
detection technique is 100% accurate. However, user level detection is a more
challenging task. Some efforts at user level are:
Kontostathis et al. propose a rule based communication model to track and categorize
online predators.
Pendar, uses lexical features with machine learning classifiers to differentiate victims
from predators in online chatting environment.
Pazienza and Tudorache, propose utilizing user profiling features such as online
presence and conversations to detect aggressive discussions.
However, in the above research user information such as users’writing styles or posting
trends or reputations have not been included to improve the detection rate.
Research Questions
They identify the following research questions to prevent adolescents from exposing to
offensive textual content:
• How to design an effective framework that incorporates both message level and user
level features to detect and prevent offensive content in social media?
• What strategy is effective in detecting and evaluating level of offensiveness in a
message? Will advanced linguistic analysis improve the accuracy and reduce false
positives in detecting message level offensiveness?
• What strategy is effective in detecting and predicting user level offensiveness? Besides
using information from message-level offensiveness, could user profile information
further improve the performance?
• Is the proposed framework efficient enough to be deployed on real time social media?
Design Framework
In order to tackle these challenges, they proposed a Lexical Syntactic Feature (LSF)
based framework to detect offensive contents and identify offensive users in social
media. Their Framework include two phases of offensiveness detection.
Phase 1 aims to detect the offensiveness on the sentence level.
Phase 2 derives offensiveness on the user level.
Design Framework
The system consists of pre-processing and two major components: sentence
offensiveness prediction and user offensiveness estimation.
During the pre-processing stage, users conversation history is chunked into posts, and
then into sentences.
During sentence offensiveness prediction, each sentence’s offensiveness can be derived
from two features: its words’ offensiveness using lexical features and its context
through syntactical features. i.e. grammatically parse sentences into dependency sets to
capture all dependency types between a word and other words in the same sentence, and
mark some of its related words as intensifiers.
During user offensiveness estimation stage, sentence offensiveness and users language
patterns are helped to predict users’ likelihood of being offensive.
Sentence Offensiveness Calculation
They proposed a new method of sentence level analysis based on offensive word
lexicons and sentence syntactic structures. Firstly, they construct two offensive word
dictionaries based on different strengths of offensiveness. Secondly, the concept of
syntactic intensifier is introduced to adjust words’ offensiveness levels based on their
context. Lastly, for each sentence, an offensiveness value is generated by aggregating its
words’ offensiveness. Mathematically
For each offensive word, w, in sentence, s, its offensiveness
a1 if w is a strongly offensive word
Ow = a2 if w is a weakly offensive word
0 otherwise
The value of intensifier Iw , for offensive word, w, can be calculated as ∑k
j=1 dj
Consequently, the offensiveness value of sentence, s, becomes a determined linear
combination of words’ offensiveness Os = ∑ Ow Iw
b1 if user identifier
dj = b2 if is an offensive word
0 otherwise
User offensiveness estimation
Given a user, u, they retrieve his/her conversation history which contains several posts
{p1, …, pm} and each post contains sentences {s1, …. , sn}. Sentence offensiveness
values are denoted as {Os1, …. , Osn}. The original offensiveness value of post p,
Op = ∑Os . The offensiveness value of modified posts can be presented as, Op
s . So the final post offensiveness Op
` of post p can be calculated as
Op
` = max(Op, Ops). Hence, the offensiveness value of Ou, can be presented
as Ou = 1/m∑ Op
`
Additional Features Extracted From Users’ Language Profiles
they also considered additional features such as:
Sentence styles. Users may use punctuation and words with all uppercase letters to
indicate feelings or speaking volume. Punctuation, such as exclamation marks, can
emphasize offensiveness of posts. (i.e. Both “He is stupid!” and “He is STUPID.” are
stronger than “He is stupid.”).
Experiment 1: Sentence Offensiveness Calculation
According to the above Fig, none of the baseline approaches provides recall rate higher
than 70%, because many of the offensive sentences are imperatives, which omit all user
identifiers. Among the baseline approaches, the BoW approach has the highest recall
rate 66%. However, BoW generates a high false positive rate.
Experiment 2: User Offensiveness Estimation-with presence
of strongly offensive words
Machine learning techniques—Naïve Bayes (NB) and SVM—are used to perform the
classification, and 10-fold cross validation was conducted in this experiment. To fully
evaluate the effectiveness of users’ sentence offensiveness value (LSF), style features,
structure features and content specific features for user offensiveness estimation, data
fed sequentially into the classifiers, and get the result in Fig.
According to Fig., offensive words and user language features are not compensating to
each other to improve the detection rate, which means they are not independent. In
contrast, incorporating with user language features, the classifiers have better detection
rate than just adopting LSF.
Conclusion
In this study, they investigate existing text mining methods in detecting offensive
contents for protecting adolescent online safety. Proposed Lexical Syntactical Feature
(LSF) approach to identify offensive contents in social media, and predict a user’s
potentiality to send out offensive contents.
Their research has several contributions. First, practically conceptualize the notion of
online offensive contents, and further distinguish the contribution of pejoratives/
profanities and obscenities in determining offensive contents, and introduce hand
authoring syntactic rules in identifying name calling harassment.
Second, improved the traditional machine learning methods by not only using lexical
features to detect offensive languages, but also incorporating style features, structure
features and context-specific features to better predict a user’s potentiality to send out
offensive content in social media.
Experimental result shows that the LSF sentence offensiveness prediction and user
offensiveness estimate algorithms outperform traditional learning based approaches in
terms of precision, recall and f-score. It also achieves high processing speed for
effective deployment in social media.

More Related Content

What's hot

Cyber crime ppt
Cyber crime pptCyber crime ppt
Cyber crime ppt
MOE515253
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
Ashish Arora
 
Graphical password authentication
Graphical password authenticationGraphical password authentication
Graphical password authentication
Asim Kumar Pathak
 
Best topics for seminar
Best topics for seminarBest topics for seminar
Best topics for seminar
shilpi nagpal
 
PHISHING PROJECT REPORT
PHISHING PROJECT REPORTPHISHING PROJECT REPORT
PHISHING PROJECT REPORT
vineetkathan
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
Santosh Kumar
 

What's hot (20)

final presentation fake news detection.pptx
final presentation fake news detection.pptxfinal presentation fake news detection.pptx
final presentation fake news detection.pptx
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection project
 
Cyber crime ppt
Cyber crime pptCyber crime ppt
Cyber crime ppt
 
Password management
Password managementPassword management
Password management
 
7 Cyber Crimes on Social Media Against Women [India]
7 Cyber Crimes on Social Media Against Women [India]7 Cyber Crimes on Social Media Against Women [India]
7 Cyber Crimes on Social Media Against Women [India]
 
Computer science seminar topics
Computer science seminar topicsComputer science seminar topics
Computer science seminar topics
 
Presentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social NetworksPresentation-Detecting Spammers on Social Networks
Presentation-Detecting Spammers on Social Networks
 
Cybercrime presentation
Cybercrime presentationCybercrime presentation
Cybercrime presentation
 
Fake News detection.pptx
Fake News detection.pptxFake News detection.pptx
Fake News detection.pptx
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
 
Face detection and recognition with pi
Face detection and recognition with piFace detection and recognition with pi
Face detection and recognition with pi
 
Three Level Security System Using Image Based Aunthentication
Three Level Security System Using Image Based AunthenticationThree Level Security System Using Image Based Aunthentication
Three Level Security System Using Image Based Aunthentication
 
Graphical password authentication
Graphical password authenticationGraphical password authentication
Graphical password authentication
 
Cyber crime in India PPT .pptx
Cyber crime in India PPT .pptxCyber crime in India PPT .pptx
Cyber crime in India PPT .pptx
 
Cyber security and current trends
Cyber security and current trendsCyber security and current trends
Cyber security and current trends
 
Cyber crime-140128140443-phpapp02 (1)
Cyber crime-140128140443-phpapp02 (1)Cyber crime-140128140443-phpapp02 (1)
Cyber crime-140128140443-phpapp02 (1)
 
Best topics for seminar
Best topics for seminarBest topics for seminar
Best topics for seminar
 
PHISHING PROJECT REPORT
PHISHING PROJECT REPORTPHISHING PROJECT REPORT
PHISHING PROJECT REPORT
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
 

Viewers also liked

Cyber bullying powerpoint
Cyber bullying powerpointCyber bullying powerpoint
Cyber bullying powerpoint
shannonmf
 
*Cyber bullying presentation*
*Cyber bullying presentation**Cyber bullying presentation*
*Cyber bullying presentation*
Amber Dee
 
Senior graduation project
Senior graduation projectSenior graduation project
Senior graduation project
meghantheisen
 
Bullying
BullyingBullying
Bullying
mpwbs
 
Cyber bullying slide share
Cyber bullying slide shareCyber bullying slide share
Cyber bullying slide share
br03wood
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
MithunPChandra
 
Cyber Bullying Questionnaire Results
Cyber Bullying Questionnaire ResultsCyber Bullying Questionnaire Results
Cyber Bullying Questionnaire Results
a2columnd12
 

Viewers also liked (18)

Cyber bullying powerpoint
Cyber bullying powerpointCyber bullying powerpoint
Cyber bullying powerpoint
 
*Cyber bullying presentation*
*Cyber bullying presentation**Cyber bullying presentation*
*Cyber bullying presentation*
 
Cyberbullying Presentation
Cyberbullying PresentationCyberbullying Presentation
Cyberbullying Presentation
 
Cyberbullying powerpoint
Cyberbullying powerpointCyberbullying powerpoint
Cyberbullying powerpoint
 
Senior graduation project
Senior graduation projectSenior graduation project
Senior graduation project
 
Cyberbullying for Teachers
Cyberbullying for TeachersCyberbullying for Teachers
Cyberbullying for Teachers
 
Cyberbullying project
Cyberbullying projectCyberbullying project
Cyberbullying project
 
Bullying
BullyingBullying
Bullying
 
Cyber bullying slide share
Cyber bullying slide shareCyber bullying slide share
Cyber bullying slide share
 
Bullying & cyber bullying
Bullying & cyber bullyingBullying & cyber bullying
Bullying & cyber bullying
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
 
Cyber Bullying Questionnaire Results
Cyber Bullying Questionnaire ResultsCyber Bullying Questionnaire Results
Cyber Bullying Questionnaire Results
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
 
Pamanahong Papel o Pananaliksik (Research Paper)
Pamanahong Papel o Pananaliksik (Research Paper)Pamanahong Papel o Pananaliksik (Research Paper)
Pamanahong Papel o Pananaliksik (Research Paper)
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Detection of cyber-bullying

Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
INFOGAIN PUBLICATION
 
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxPRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
ResearchWap
 

Similar to Detection of cyber-bullying (20)

Smart detection of offensive words in social media using the soundex algorith...
Smart detection of offensive words in social media using the soundex algorith...Smart detection of offensive words in social media using the soundex algorith...
Smart detection of offensive words in social media using the soundex algorith...
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Insult detection using a partitional CNN-LSTM model
Insult detection using a partitional CNN-LSTM modelInsult detection using a partitional CNN-LSTM model
Insult detection using a partitional CNN-LSTM model
 
Semiotics contributions to accessible interface design
Semiotics contributions to accessible interface designSemiotics contributions to accessible interface design
Semiotics contributions to accessible interface design
 
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature Review
 
From Social Care Professional To App Developer: there isn't an App for that!
From Social Care Professional To App Developer: there isn't an App for that! From Social Care Professional To App Developer: there isn't an App for that!
From Social Care Professional To App Developer: there isn't an App for that!
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
Research on language and identity
Research on language and identityResearch on language and identity
Research on language and identity
 
Ieee format 5th nccci_a study on factors influencing as a best practice for...
Ieee format 5th nccci_a study on factors influencing as  a  best practice for...Ieee format 5th nccci_a study on factors influencing as  a  best practice for...
Ieee format 5th nccci_a study on factors influencing as a best practice for...
 
Hate Speech Recognition System through NLP and Deep Learning
Hate Speech Recognition System through NLP and Deep LearningHate Speech Recognition System through NLP and Deep Learning
Hate Speech Recognition System through NLP and Deep Learning
 
Cb4301449454
Cb4301449454Cb4301449454
Cb4301449454
 
G1803024452
G1803024452G1803024452
G1803024452
 
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
 
identifying malevolent facebook requests
identifying malevolent facebook requestsidentifying malevolent facebook requests
identifying malevolent facebook requests
 
Hate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningHate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine Learning
 
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
 
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxPRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
 
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxPRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Detection of cyber-bullying

  • 1. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety Written by ying chen, et al. 2012 ASE /IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust Presented by ziar khan
  • 2. Abstract Since the textual contents on online social media are highly unstructured, informal, and often misspelled, existing research on message level offensive language detection cannot accurately detect offensive contents. In this paper, they designed a framework called Lexical Syntactic Feature (LSF) architecture to detect offensive contents and identify potential offensive users in social media. They distinguished the contribution of pejoratives/profanities and obscenities in determining offensive content, and introduce hand authoring syntactic rules in identifying name calling harassments. In particular, they incorporated a user’s writing style, structure and specific cyberbullying content as features to predict the users capability to send out offensive content. Results from experiments showed that their LSF framework performed significantly better than existing methods in offensive content detection.
  • 3. Introduction With the rapid growth of social media, users especially adolescents are spending significant amount of time on various social networking sites to connect with others, to share information, and to pursue common interests. It has been found that 70% of teens use social media sites on a daily basis and nearly one in four teens hit their favorite social media sites 10 or more times a day. 19% of teens report that someone has written or posted mean or embarrassing things about them on social networking sites. As adolescents are more likely to be negatively affected by biased and harmful contents than adults, detecting online offensive contents to protect adolescents online safety becomes an urgent task. To address concerns on children access to offensive contents over Internet, administrators of social media often manually review online contents to detect and delete offensive materials, however, manually reviewing are labour intensive and time consuming.
  • 4. Introduction Some automatic contents filtering software packages, such as Appen, eblaster, iambigbrother, Internet Security Suite etc, have been developed to detect and filter online offensive contents. Most of them simply blocked web pages and paragraphs that contained dirty words. These word based approaches not only affect the readability and usability of web sites, but also fail to identify subtle offensive messages. To address these limitations, in this paper, they proposed Lexical Syntactic Feature based (LSF) language model to effectively detect offensive language in social media and to protect adolescents. LSF not only examines messages, but also the person who posts the messages and his/her patterns of posting. LSF can be implemented as a client side application for individuals and groups who are concerned about adolescents online safety. Users are also allowed to adjust the threshold of acceptable level of offensive contents.
  • 5. Related Work A. Offensiveness Content Filtering Methods in Social Media : Popular online social networking sites apply several mechanisms to screen offensive contents. For example, Youtube’s safety mode, once activated, can hide all comments containing offensive languages from users. But pre-screened content will still appear. The pejoratives replaced by asterisks, if users simply click "Text Comments." On Facebook, users can add comma separated keywords to the "Moderation Blacklist." When people include blacklisted keywords in a post and/or a comment on a page, the content will be automatically identified as spam and thus be screened. Twitter client, “Tweetie 1.3,” was rejected by Apple Company for allowing foul languages to appear in users’ tweets. In general, the majority of popular social media use simple lexicon based approach to filter offensive contents. Their lexicons are either predefined (such as Youtube) or composed by the users themselves (such as Facebook). For adolescents who often lack cognitive awareness of risks, these approaches are hardly effective to prevent them from being exposed to offensive contents. Therefore, parents need more sophisticate software and techniques to efficiently detect offensive contents to protect their adolescents from potential exposure to vulgar, pornographic and hateful languages.
  • 6. Related Work B. Using Text Mining Techniques to Detect Online Offensive Contents : Offensive language identification in social media is a difficult task because the textual contents in such environment is often unstructured, informal, and even misspelled. Researchers have studied intelligent ways to identify offensive contents using text mining approach. Implementing text mining techniques to analyze online data requires the following phases: 1) Data acquisition and preprocess, 2) Feature extraction, and 3) Classification. The major challenges of using text mining to detect offensive contents lie on the feature selection phase, which will be elaborated in the following sections. a) Message level Feature Extraction: Most offensive content detection research extracts two kinds of features: lexical and syntactic features. Lexical features treat each word and phrase as an entity. Word patterns such as appearance of certain keywords and their frequencies are often used to represent the language model. For example Bag of Words(BoW) and N-grams.
  • 7. Related Work Syntactic features: Although lexical features perform well in detecting offensive entities, without considering the syntactical structure of the whole sentence, they fail to distinguish sentences offensiveness which contain same words but in different orders. Therefore, to consider syntactical features in sentences, natural language parsers are introduced to parse sentences on grammatical structures before feature selection. b) User level Offensiveness Detection: Most contemporary research on detecting online offensive languages only focus on sentence level and message level constructs. Since no detection technique is 100% accurate. However, user level detection is a more challenging task. Some efforts at user level are: Kontostathis et al. propose a rule based communication model to track and categorize online predators. Pendar, uses lexical features with machine learning classifiers to differentiate victims from predators in online chatting environment. Pazienza and Tudorache, propose utilizing user profiling features such as online presence and conversations to detect aggressive discussions. However, in the above research user information such as users’writing styles or posting trends or reputations have not been included to improve the detection rate.
  • 8. Research Questions They identify the following research questions to prevent adolescents from exposing to offensive textual content: • How to design an effective framework that incorporates both message level and user level features to detect and prevent offensive content in social media? • What strategy is effective in detecting and evaluating level of offensiveness in a message? Will advanced linguistic analysis improve the accuracy and reduce false positives in detecting message level offensiveness? • What strategy is effective in detecting and predicting user level offensiveness? Besides using information from message-level offensiveness, could user profile information further improve the performance? • Is the proposed framework efficient enough to be deployed on real time social media?
  • 9. Design Framework In order to tackle these challenges, they proposed a Lexical Syntactic Feature (LSF) based framework to detect offensive contents and identify offensive users in social media. Their Framework include two phases of offensiveness detection. Phase 1 aims to detect the offensiveness on the sentence level. Phase 2 derives offensiveness on the user level.
  • 10. Design Framework The system consists of pre-processing and two major components: sentence offensiveness prediction and user offensiveness estimation. During the pre-processing stage, users conversation history is chunked into posts, and then into sentences. During sentence offensiveness prediction, each sentence’s offensiveness can be derived from two features: its words’ offensiveness using lexical features and its context through syntactical features. i.e. grammatically parse sentences into dependency sets to capture all dependency types between a word and other words in the same sentence, and mark some of its related words as intensifiers. During user offensiveness estimation stage, sentence offensiveness and users language patterns are helped to predict users’ likelihood of being offensive.
  • 11. Sentence Offensiveness Calculation They proposed a new method of sentence level analysis based on offensive word lexicons and sentence syntactic structures. Firstly, they construct two offensive word dictionaries based on different strengths of offensiveness. Secondly, the concept of syntactic intensifier is introduced to adjust words’ offensiveness levels based on their context. Lastly, for each sentence, an offensiveness value is generated by aggregating its words’ offensiveness. Mathematically For each offensive word, w, in sentence, s, its offensiveness a1 if w is a strongly offensive word Ow = a2 if w is a weakly offensive word 0 otherwise The value of intensifier Iw , for offensive word, w, can be calculated as ∑k j=1 dj Consequently, the offensiveness value of sentence, s, becomes a determined linear combination of words’ offensiveness Os = ∑ Ow Iw b1 if user identifier dj = b2 if is an offensive word 0 otherwise
  • 12. User offensiveness estimation Given a user, u, they retrieve his/her conversation history which contains several posts {p1, …, pm} and each post contains sentences {s1, …. , sn}. Sentence offensiveness values are denoted as {Os1, …. , Osn}. The original offensiveness value of post p, Op = ∑Os . The offensiveness value of modified posts can be presented as, Op s . So the final post offensiveness Op ` of post p can be calculated as Op ` = max(Op, Ops). Hence, the offensiveness value of Ou, can be presented as Ou = 1/m∑ Op ` Additional Features Extracted From Users’ Language Profiles they also considered additional features such as: Sentence styles. Users may use punctuation and words with all uppercase letters to indicate feelings or speaking volume. Punctuation, such as exclamation marks, can emphasize offensiveness of posts. (i.e. Both “He is stupid!” and “He is STUPID.” are stronger than “He is stupid.”).
  • 13. Experiment 1: Sentence Offensiveness Calculation According to the above Fig, none of the baseline approaches provides recall rate higher than 70%, because many of the offensive sentences are imperatives, which omit all user identifiers. Among the baseline approaches, the BoW approach has the highest recall rate 66%. However, BoW generates a high false positive rate.
  • 14. Experiment 2: User Offensiveness Estimation-with presence of strongly offensive words Machine learning techniques—Naïve Bayes (NB) and SVM—are used to perform the classification, and 10-fold cross validation was conducted in this experiment. To fully evaluate the effectiveness of users’ sentence offensiveness value (LSF), style features, structure features and content specific features for user offensiveness estimation, data fed sequentially into the classifiers, and get the result in Fig. According to Fig., offensive words and user language features are not compensating to each other to improve the detection rate, which means they are not independent. In contrast, incorporating with user language features, the classifiers have better detection rate than just adopting LSF.
  • 15. Conclusion In this study, they investigate existing text mining methods in detecting offensive contents for protecting adolescent online safety. Proposed Lexical Syntactical Feature (LSF) approach to identify offensive contents in social media, and predict a user’s potentiality to send out offensive contents. Their research has several contributions. First, practically conceptualize the notion of online offensive contents, and further distinguish the contribution of pejoratives/ profanities and obscenities in determining offensive contents, and introduce hand authoring syntactic rules in identifying name calling harassment. Second, improved the traditional machine learning methods by not only using lexical features to detect offensive languages, but also incorporating style features, structure features and context-specific features to better predict a user’s potentiality to send out offensive content in social media. Experimental result shows that the LSF sentence offensiveness prediction and user offensiveness estimate algorithms outperform traditional learning based approaches in terms of precision, recall and f-score. It also achieves high processing speed for effective deployment in social media.