SlideShare a Scribd company logo
Detecting Offensive Language in Social Media to Protect
Adolescent Online Safety
Written by
ying chen, et al.
2012 ASE /IEEE International Conference on Social Computing and 2012 ASE/IEEE
International Conference on Privacy, Security,
Risk and Trust
Presented by
ziar khan
Abstract
Since the textual contents on online social media are highly unstructured, informal, and
often misspelled, existing research on message level offensive language detection cannot
accurately detect offensive contents.
In this paper, they designed a framework called Lexical Syntactic Feature (LSF)
architecture to detect offensive contents and identify potential offensive users in social
media.
They distinguished the contribution of pejoratives/profanities and obscenities in
determining offensive content, and introduce hand authoring syntactic rules in identifying
name calling harassments.
In particular, they incorporated a user’s writing style, structure and specific cyberbullying
content as features to predict the users capability to send out offensive content.
Results from experiments showed that their LSF framework performed significantly better
than existing methods in offensive content detection.
Introduction
With the rapid growth of social media, users especially adolescents are spending significant
amount of time on various social networking sites to connect with others, to share
information, and to pursue common interests.
It has been found that 70% of teens use social media sites on a daily basis and nearly one
in four teens hit their favorite social media sites 10 or more times a day.
19% of teens report that someone has written or posted mean or embarrassing things about
them on social networking sites.
As adolescents are more likely to be negatively affected by biased and harmful contents
than adults, detecting online offensive contents to protect adolescents online safety
becomes an urgent task.
To address concerns on children access to offensive contents over Internet, administrators
of social media often manually review online contents to detect and delete offensive
materials, however, manually reviewing are labour intensive and time consuming.
Introduction
Some automatic contents filtering software packages, such as Appen, eblaster,
iambigbrother, Internet Security Suite etc, have been developed to detect and filter online
offensive contents. Most of them simply blocked web pages and paragraphs that contained
dirty words.
These word based approaches not only affect the readability and usability of web sites, but
also fail to identify subtle offensive messages.
To address these limitations, in this paper, they proposed Lexical Syntactic Feature based
(LSF) language model to effectively detect offensive language in social media and to
protect adolescents.
LSF not only examines messages, but also the person who posts the messages and his/her
patterns of posting.
LSF can be implemented as a client side application for individuals and groups who are
concerned about adolescents online safety.
Users are also allowed to adjust the threshold of acceptable level of offensive contents.
Related Work
A. Offensiveness Content Filtering Methods in Social Media : Popular online social
networking sites apply several mechanisms to screen offensive contents.
For example, Youtube’s safety mode, once activated, can hide all comments containing
offensive languages from users. But pre-screened content will still appear. The
pejoratives replaced by asterisks, if users simply click "Text Comments."
On Facebook, users can add comma separated keywords to the "Moderation Blacklist."
When people include blacklisted keywords in a post and/or a comment on a page, the
content will be automatically identified as spam and thus be screened.
Twitter client, “Tweetie 1.3,” was rejected by Apple Company for allowing foul
languages to appear in users’ tweets.
In general, the majority of popular social media use simple lexicon based approach to
filter offensive contents. Their lexicons are either predefined (such as Youtube) or
composed by the users themselves (such as Facebook).
For adolescents who often lack cognitive awareness of risks, these approaches are
hardly effective to prevent them from being exposed to offensive contents. Therefore,
parents need more sophisticate software and techniques to efficiently detect offensive
contents to protect their adolescents from potential exposure to vulgar, pornographic
and hateful languages.
Related Work
B. Using Text Mining Techniques to Detect Online Offensive Contents : Offensive
language identification in social media is a difficult task because the textual contents in
such environment is often unstructured, informal, and even misspelled.
Researchers have studied intelligent ways to identify offensive contents using text
mining approach. Implementing text mining techniques to analyze online data requires
the following phases:
1) Data acquisition and preprocess,
2) Feature extraction, and
3) Classification.
The major challenges of using text mining to detect offensive contents lie on the feature
selection phase, which will be elaborated in the following sections.
a) Message level Feature Extraction: Most offensive content detection research extracts
two kinds of features: lexical and syntactic features.
Lexical features treat each word and phrase as an entity. Word patterns such as
appearance of certain keywords and their frequencies are often used to represent the
language model. For example Bag of Words(BoW) and N-grams.
Related Work
Syntactic features: Although lexical features perform well in detecting offensive
entities, without considering the syntactical structure of the whole sentence, they fail to
distinguish sentences offensiveness which contain same words but in different orders.
Therefore, to consider syntactical features in sentences, natural language parsers are
introduced to parse sentences on grammatical structures before feature selection.
b) User level Offensiveness Detection: Most contemporary research on detecting online
offensive languages only focus on sentence level and message level constructs. Since no
detection technique is 100% accurate. However, user level detection is a more
challenging task. Some efforts at user level are:
Kontostathis et al. propose a rule based communication model to track and categorize
online predators.
Pendar, uses lexical features with machine learning classifiers to differentiate victims
from predators in online chatting environment.
Pazienza and Tudorache, propose utilizing user profiling features such as online
presence and conversations to detect aggressive discussions.
However, in the above research user information such as users’writing styles or posting
trends or reputations have not been included to improve the detection rate.
Research Questions
They identify the following research questions to prevent adolescents from exposing to
offensive textual content:
• How to design an effective framework that incorporates both message level and user
level features to detect and prevent offensive content in social media?
• What strategy is effective in detecting and evaluating level of offensiveness in a
message? Will advanced linguistic analysis improve the accuracy and reduce false
positives in detecting message level offensiveness?
• What strategy is effective in detecting and predicting user level offensiveness? Besides
using information from message-level offensiveness, could user profile information
further improve the performance?
• Is the proposed framework efficient enough to be deployed on real time social media?
Design Framework
In order to tackle these challenges, they proposed a Lexical Syntactic Feature (LSF)
based framework to detect offensive contents and identify offensive users in social
media. Their Framework include two phases of offensiveness detection.
Phase 1 aims to detect the offensiveness on the sentence level.
Phase 2 derives offensiveness on the user level.
Design Framework
The system consists of pre-processing and two major components: sentence
offensiveness prediction and user offensiveness estimation.
During the pre-processing stage, users conversation history is chunked into posts, and
then into sentences.
During sentence offensiveness prediction, each sentence’s offensiveness can be derived
from two features: its words’ offensiveness using lexical features and its context
through syntactical features. i.e. grammatically parse sentences into dependency sets to
capture all dependency types between a word and other words in the same sentence, and
mark some of its related words as intensifiers.
During user offensiveness estimation stage, sentence offensiveness and users language
patterns are helped to predict users’ likelihood of being offensive.
Sentence Offensiveness Calculation
They proposed a new method of sentence level analysis based on offensive word
lexicons and sentence syntactic structures. Firstly, they construct two offensive word
dictionaries based on different strengths of offensiveness. Secondly, the concept of
syntactic intensifier is introduced to adjust words’ offensiveness levels based on their
context. Lastly, for each sentence, an offensiveness value is generated by aggregating its
words’ offensiveness. Mathematically
For each offensive word, w, in sentence, s, its offensiveness
a1 if w is a strongly offensive word
Ow = a2 if w is a weakly offensive word
0 otherwise
The value of intensifier Iw , for offensive word, w, can be calculated as ∑k
j=1 dj
Consequently, the offensiveness value of sentence, s, becomes a determined linear
combination of words’ offensiveness Os = ∑ Ow Iw
b1 if user identifier
dj = b2 if is an offensive word
0 otherwise
User offensiveness estimation
Given a user, u, they retrieve his/her conversation history which contains several posts
{p1, …, pm} and each post contains sentences {s1, …. , sn}. Sentence offensiveness
values are denoted as {Os1, …. , Osn}. The original offensiveness value of post p,
Op = ∑Os . The offensiveness value of modified posts can be presented as, Op
s . So the final post offensiveness Op
` of post p can be calculated as
Op
` = max(Op, Ops). Hence, the offensiveness value of Ou, can be presented
as Ou = 1/m∑ Op
`
Additional Features Extracted From Users’ Language Profiles
they also considered additional features such as:
Sentence styles. Users may use punctuation and words with all uppercase letters to
indicate feelings or speaking volume. Punctuation, such as exclamation marks, can
emphasize offensiveness of posts. (i.e. Both “He is stupid!” and “He is STUPID.” are
stronger than “He is stupid.”).
Experiment 1: Sentence Offensiveness Calculation
According to the above Fig, none of the baseline approaches provides recall rate higher
than 70%, because many of the offensive sentences are imperatives, which omit all user
identifiers. Among the baseline approaches, the BoW approach has the highest recall
rate 66%. However, BoW generates a high false positive rate.
Experiment 2: User Offensiveness Estimation-with presence
of strongly offensive words
Machine learning techniques—Naïve Bayes (NB) and SVM—are used to perform the
classification, and 10-fold cross validation was conducted in this experiment. To fully
evaluate the effectiveness of users’ sentence offensiveness value (LSF), style features,
structure features and content specific features for user offensiveness estimation, data
fed sequentially into the classifiers, and get the result in Fig.
According to Fig., offensive words and user language features are not compensating to
each other to improve the detection rate, which means they are not independent. In
contrast, incorporating with user language features, the classifiers have better detection
rate than just adopting LSF.
Conclusion
In this study, they investigate existing text mining methods in detecting offensive
contents for protecting adolescent online safety. Proposed Lexical Syntactical Feature
(LSF) approach to identify offensive contents in social media, and predict a user’s
potentiality to send out offensive contents.
Their research has several contributions. First, practically conceptualize the notion of
online offensive contents, and further distinguish the contribution of pejoratives/
profanities and obscenities in determining offensive contents, and introduce hand
authoring syntactic rules in identifying name calling harassment.
Second, improved the traditional machine learning methods by not only using lexical
features to detect offensive languages, but also incorporating style features, structure
features and context-specific features to better predict a user’s potentiality to send out
offensive content in social media.
Experimental result shows that the LSF sentence offensiveness prediction and user
offensiveness estimate algorithms outperform traditional learning based approaches in
terms of precision, recall and f-score. It also achieves high processing speed for
effective deployment in social media.

More Related Content

What's hot

Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
ankit panigrahy
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
dataalcott
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
Sanjay Kumar
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
KumbidiGaming
 
Detection of phishing websites
Detection of phishing websitesDetection of phishing websites
Detection of phishing websites
m srikanth
 
Digital image forgery detection
Digital image forgery detectionDigital image forgery detection
Digital image forgery detection
AB Rizvi
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition pptSantosh Kumar
 
ppt of gesture recognition
ppt of gesture recognitionppt of gesture recognition
ppt of gesture recognitionAayush Agrawal
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
raihansikdar
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learning
dataalcott
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and Python
Akash Satamkar
 
Traffic sign recognition
Traffic sign recognitionTraffic sign recognition
Traffic sign recognition
AKR Education
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
Partnered Health
 
CYBER TERRORISM
     CYBER TERRORISM     CYBER TERRORISM
CYBER TERRORISM
Tejesh Dhaypule
 
Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptx
Kunal Kalamkar
 
Prediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxPrediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptx
kumari36
 
Hate speech on the internet
Hate speech on the internetHate speech on the internet
Hate speech on the internet
ioannis iglezakis
 
Image recognition
Image recognitionImage recognition
Image recognition
Aseed Usmani
 
Cyber security with ai
Cyber security with aiCyber security with ai
Cyber security with ai
Burhan Ahmed
 
PPT on Phishing
PPT on PhishingPPT on Phishing
PPT on Phishing
Pankaj Yadav
 

What's hot (20)

Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
 
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdfCYBERBULLYING DETECTION USING              MACHINE LEARNING-1 (1).pdf
CYBERBULLYING DETECTION USING MACHINE LEARNING-1 (1).pdf
 
Detection of phishing websites
Detection of phishing websitesDetection of phishing websites
Detection of phishing websites
 
Digital image forgery detection
Digital image forgery detectionDigital image forgery detection
Digital image forgery detection
 
Face recognition ppt
Face recognition pptFace recognition ppt
Face recognition ppt
 
ppt of gesture recognition
ppt of gesture recognitionppt of gesture recognition
ppt of gesture recognition
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learning
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and Python
 
Traffic sign recognition
Traffic sign recognitionTraffic sign recognition
Traffic sign recognition
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
 
CYBER TERRORISM
     CYBER TERRORISM     CYBER TERRORISM
CYBER TERRORISM
 
Spam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptxSpam email detection using machine learning PPT.pptx
Spam email detection using machine learning PPT.pptx
 
Prediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxPrediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptx
 
Hate speech on the internet
Hate speech on the internetHate speech on the internet
Hate speech on the internet
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Cyber security with ai
Cyber security with aiCyber security with ai
Cyber security with ai
 
PPT on Phishing
PPT on PhishingPPT on Phishing
PPT on Phishing
 

Viewers also liked

Cyber bullying powerpoint
Cyber bullying powerpointCyber bullying powerpoint
Cyber bullying powerpointshannonmf
 
*Cyber bullying presentation*
*Cyber bullying presentation**Cyber bullying presentation*
*Cyber bullying presentation*Amber Dee
 
Cyberbullying Presentation
Cyberbullying PresentationCyberbullying Presentation
Cyberbullying Presentation
Mary Alice Osborne
 
Cyberbullying powerpoint
Cyberbullying powerpointCyberbullying powerpoint
Cyberbullying powerpoint
josiebrookeday
 
Senior graduation project
Senior graduation projectSenior graduation project
Senior graduation projectmeghantheisen
 
Cyberbullying project
Cyberbullying projectCyberbullying project
Cyberbullying project
JoannaNieves
 
Bullying
BullyingBullying
Bullyingmpwbs
 
Cyber bullying slide share
Cyber bullying slide shareCyber bullying slide share
Cyber bullying slide share
br03wood
 
Bullying & cyber bullying
Bullying & cyber bullyingBullying & cyber bullying
Bullying & cyber bullying
Ziar Khan
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
MithunPChandra
 
Cyber Bullying Questionnaire Results
Cyber Bullying Questionnaire ResultsCyber Bullying Questionnaire Results
Cyber Bullying Questionnaire Resultsa2columnd12
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
Vikrant Arya
 
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
Avigail Gabaleo Maximo
 
Pamanahong Papel o Pananaliksik (Research Paper)
Pamanahong Papel o Pananaliksik (Research Paper)Pamanahong Papel o Pananaliksik (Research Paper)
Pamanahong Papel o Pananaliksik (Research Paper)
Merland Mabait
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
Yanchang Zhao
 

Viewers also liked (18)

Cyber bullying powerpoint
Cyber bullying powerpointCyber bullying powerpoint
Cyber bullying powerpoint
 
*Cyber bullying presentation*
*Cyber bullying presentation**Cyber bullying presentation*
*Cyber bullying presentation*
 
Cyberbullying Presentation
Cyberbullying PresentationCyberbullying Presentation
Cyberbullying Presentation
 
Cyberbullying powerpoint
Cyberbullying powerpointCyberbullying powerpoint
Cyberbullying powerpoint
 
Senior graduation project
Senior graduation projectSenior graduation project
Senior graduation project
 
Cyberbullying for Teachers
Cyberbullying for TeachersCyberbullying for Teachers
Cyberbullying for Teachers
 
Cyberbullying project
Cyberbullying projectCyberbullying project
Cyberbullying project
 
Bullying
BullyingBullying
Bullying
 
Cyber bullying slide share
Cyber bullying slide shareCyber bullying slide share
Cyber bullying slide share
 
Bullying & cyber bullying
Bullying & cyber bullyingBullying & cyber bullying
Bullying & cyber bullying
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
 
Cyber Bullying Questionnaire Results
Cyber Bullying Questionnaire ResultsCyber Bullying Questionnaire Results
Cyber Bullying Questionnaire Results
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
Bahagi ng pamanahong papel (Para sa mga Baby Thesis)
 
Pamanahong Papel o Pananaliksik (Research Paper)
Pamanahong Papel o Pananaliksik (Research Paper)Pamanahong Papel o Pananaliksik (Research Paper)
Pamanahong Papel o Pananaliksik (Research Paper)
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Detection of cyber-bullying

Smart detection of offensive words in social media using the soundex algorith...
Smart detection of offensive words in social media using the soundex algorith...Smart detection of offensive words in social media using the soundex algorith...
Smart detection of offensive words in social media using the soundex algorith...
IJECEIAES
 
Insult detection using a partitional CNN-LSTM model
Insult detection using a partitional CNN-LSTM modelInsult detection using a partitional CNN-LSTM model
Insult detection using a partitional CNN-LSTM model
CSITiaesprime
 
Semiotics contributions to accessible interface design
Semiotics contributions to accessible interface designSemiotics contributions to accessible interface design
Semiotics contributions to accessible interface design
Inés Laitano
 
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
dbpublications
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature Review
Dr. Amarjeet Singh
 
From Social Care Professional To App Developer: there isn't an App for that!
From Social Care Professional To App Developer: there isn't an App for that! From Social Care Professional To App Developer: there isn't an App for that!
From Social Care Professional To App Developer: there isn't an App for that!
Miriam O' Sullivan
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
INFOGAIN PUBLICATION
 
Research on language and identity
Research on language and identityResearch on language and identity
Research on language and identity
Azmi Latiff
 
Ieee format 5th nccci_a study on factors influencing as a best practice for...
Ieee format 5th nccci_a study on factors influencing as  a  best practice for...Ieee format 5th nccci_a study on factors influencing as  a  best practice for...
Ieee format 5th nccci_a study on factors influencing as a best practice for...
International Journal of Advance Research and Innovative Ideas in Education
 
Hate Speech Recognition System through NLP and Deep Learning
Hate Speech Recognition System through NLP and Deep LearningHate Speech Recognition System through NLP and Deep Learning
Hate Speech Recognition System through NLP and Deep Learning
IRJET Journal
 
Cb4301449454
Cb4301449454Cb4301449454
Cb4301449454
IJERA Editor
 
G1803024452
G1803024452G1803024452
G1803024452
IOSR Journals
 
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
IJECEIAES
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Sara Alvarez
 
identifying malevolent facebook requests
identifying malevolent facebook requestsidentifying malevolent facebook requests
identifying malevolent facebook requests
INFOGAIN PUBLICATION
 
Hate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningHate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine Learning
IRJET Journal
 
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
IJECEIAES
 
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxPRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
ResearchWap
 
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxPRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
ResearchWap
 

Similar to Detection of cyber-bullying (20)

Smart detection of offensive words in social media using the soundex algorith...
Smart detection of offensive words in social media using the soundex algorith...Smart detection of offensive words in social media using the soundex algorith...
Smart detection of offensive words in social media using the soundex algorith...
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Insult detection using a partitional CNN-LSTM model
Insult detection using a partitional CNN-LSTM modelInsult detection using a partitional CNN-LSTM model
Insult detection using a partitional CNN-LSTM model
 
Semiotics contributions to accessible interface design
Semiotics contributions to accessible interface designSemiotics contributions to accessible interface design
Semiotics contributions to accessible interface design
 
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
Cyber bullying Detection based on Semantic-Enhanced Marginalized Denoising Au...
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature Review
 
From Social Care Professional To App Developer: there isn't an App for that!
From Social Care Professional To App Developer: there isn't an App for that! From Social Care Professional To App Developer: there isn't an App for that!
From Social Care Professional To App Developer: there isn't an App for that!
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
Research on language and identity
Research on language and identityResearch on language and identity
Research on language and identity
 
Ieee format 5th nccci_a study on factors influencing as a best practice for...
Ieee format 5th nccci_a study on factors influencing as  a  best practice for...Ieee format 5th nccci_a study on factors influencing as  a  best practice for...
Ieee format 5th nccci_a study on factors influencing as a best practice for...
 
Hate Speech Recognition System through NLP and Deep Learning
Hate Speech Recognition System through NLP and Deep LearningHate Speech Recognition System through NLP and Deep Learning
Hate Speech Recognition System through NLP and Deep Learning
 
Cb4301449454
Cb4301449454Cb4301449454
Cb4301449454
 
G1803024452
G1803024452G1803024452
G1803024452
 
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
 
identifying malevolent facebook requests
identifying malevolent facebook requestsidentifying malevolent facebook requests
identifying malevolent facebook requests
 
Hate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningHate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine Learning
 
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
 
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxPRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
 
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docxPRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
PRAGMATIC ANALYSIS OF WHATSAPP CHATS.docx
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Detection of cyber-bullying

  • 1. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety Written by ying chen, et al. 2012 ASE /IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust Presented by ziar khan
  • 2. Abstract Since the textual contents on online social media are highly unstructured, informal, and often misspelled, existing research on message level offensive language detection cannot accurately detect offensive contents. In this paper, they designed a framework called Lexical Syntactic Feature (LSF) architecture to detect offensive contents and identify potential offensive users in social media. They distinguished the contribution of pejoratives/profanities and obscenities in determining offensive content, and introduce hand authoring syntactic rules in identifying name calling harassments. In particular, they incorporated a user’s writing style, structure and specific cyberbullying content as features to predict the users capability to send out offensive content. Results from experiments showed that their LSF framework performed significantly better than existing methods in offensive content detection.
  • 3. Introduction With the rapid growth of social media, users especially adolescents are spending significant amount of time on various social networking sites to connect with others, to share information, and to pursue common interests. It has been found that 70% of teens use social media sites on a daily basis and nearly one in four teens hit their favorite social media sites 10 or more times a day. 19% of teens report that someone has written or posted mean or embarrassing things about them on social networking sites. As adolescents are more likely to be negatively affected by biased and harmful contents than adults, detecting online offensive contents to protect adolescents online safety becomes an urgent task. To address concerns on children access to offensive contents over Internet, administrators of social media often manually review online contents to detect and delete offensive materials, however, manually reviewing are labour intensive and time consuming.
  • 4. Introduction Some automatic contents filtering software packages, such as Appen, eblaster, iambigbrother, Internet Security Suite etc, have been developed to detect and filter online offensive contents. Most of them simply blocked web pages and paragraphs that contained dirty words. These word based approaches not only affect the readability and usability of web sites, but also fail to identify subtle offensive messages. To address these limitations, in this paper, they proposed Lexical Syntactic Feature based (LSF) language model to effectively detect offensive language in social media and to protect adolescents. LSF not only examines messages, but also the person who posts the messages and his/her patterns of posting. LSF can be implemented as a client side application for individuals and groups who are concerned about adolescents online safety. Users are also allowed to adjust the threshold of acceptable level of offensive contents.
  • 5. Related Work A. Offensiveness Content Filtering Methods in Social Media : Popular online social networking sites apply several mechanisms to screen offensive contents. For example, Youtube’s safety mode, once activated, can hide all comments containing offensive languages from users. But pre-screened content will still appear. The pejoratives replaced by asterisks, if users simply click "Text Comments." On Facebook, users can add comma separated keywords to the "Moderation Blacklist." When people include blacklisted keywords in a post and/or a comment on a page, the content will be automatically identified as spam and thus be screened. Twitter client, “Tweetie 1.3,” was rejected by Apple Company for allowing foul languages to appear in users’ tweets. In general, the majority of popular social media use simple lexicon based approach to filter offensive contents. Their lexicons are either predefined (such as Youtube) or composed by the users themselves (such as Facebook). For adolescents who often lack cognitive awareness of risks, these approaches are hardly effective to prevent them from being exposed to offensive contents. Therefore, parents need more sophisticate software and techniques to efficiently detect offensive contents to protect their adolescents from potential exposure to vulgar, pornographic and hateful languages.
  • 6. Related Work B. Using Text Mining Techniques to Detect Online Offensive Contents : Offensive language identification in social media is a difficult task because the textual contents in such environment is often unstructured, informal, and even misspelled. Researchers have studied intelligent ways to identify offensive contents using text mining approach. Implementing text mining techniques to analyze online data requires the following phases: 1) Data acquisition and preprocess, 2) Feature extraction, and 3) Classification. The major challenges of using text mining to detect offensive contents lie on the feature selection phase, which will be elaborated in the following sections. a) Message level Feature Extraction: Most offensive content detection research extracts two kinds of features: lexical and syntactic features. Lexical features treat each word and phrase as an entity. Word patterns such as appearance of certain keywords and their frequencies are often used to represent the language model. For example Bag of Words(BoW) and N-grams.
  • 7. Related Work Syntactic features: Although lexical features perform well in detecting offensive entities, without considering the syntactical structure of the whole sentence, they fail to distinguish sentences offensiveness which contain same words but in different orders. Therefore, to consider syntactical features in sentences, natural language parsers are introduced to parse sentences on grammatical structures before feature selection. b) User level Offensiveness Detection: Most contemporary research on detecting online offensive languages only focus on sentence level and message level constructs. Since no detection technique is 100% accurate. However, user level detection is a more challenging task. Some efforts at user level are: Kontostathis et al. propose a rule based communication model to track and categorize online predators. Pendar, uses lexical features with machine learning classifiers to differentiate victims from predators in online chatting environment. Pazienza and Tudorache, propose utilizing user profiling features such as online presence and conversations to detect aggressive discussions. However, in the above research user information such as users’writing styles or posting trends or reputations have not been included to improve the detection rate.
  • 8. Research Questions They identify the following research questions to prevent adolescents from exposing to offensive textual content: • How to design an effective framework that incorporates both message level and user level features to detect and prevent offensive content in social media? • What strategy is effective in detecting and evaluating level of offensiveness in a message? Will advanced linguistic analysis improve the accuracy and reduce false positives in detecting message level offensiveness? • What strategy is effective in detecting and predicting user level offensiveness? Besides using information from message-level offensiveness, could user profile information further improve the performance? • Is the proposed framework efficient enough to be deployed on real time social media?
  • 9. Design Framework In order to tackle these challenges, they proposed a Lexical Syntactic Feature (LSF) based framework to detect offensive contents and identify offensive users in social media. Their Framework include two phases of offensiveness detection. Phase 1 aims to detect the offensiveness on the sentence level. Phase 2 derives offensiveness on the user level.
  • 10. Design Framework The system consists of pre-processing and two major components: sentence offensiveness prediction and user offensiveness estimation. During the pre-processing stage, users conversation history is chunked into posts, and then into sentences. During sentence offensiveness prediction, each sentence’s offensiveness can be derived from two features: its words’ offensiveness using lexical features and its context through syntactical features. i.e. grammatically parse sentences into dependency sets to capture all dependency types between a word and other words in the same sentence, and mark some of its related words as intensifiers. During user offensiveness estimation stage, sentence offensiveness and users language patterns are helped to predict users’ likelihood of being offensive.
  • 11. Sentence Offensiveness Calculation They proposed a new method of sentence level analysis based on offensive word lexicons and sentence syntactic structures. Firstly, they construct two offensive word dictionaries based on different strengths of offensiveness. Secondly, the concept of syntactic intensifier is introduced to adjust words’ offensiveness levels based on their context. Lastly, for each sentence, an offensiveness value is generated by aggregating its words’ offensiveness. Mathematically For each offensive word, w, in sentence, s, its offensiveness a1 if w is a strongly offensive word Ow = a2 if w is a weakly offensive word 0 otherwise The value of intensifier Iw , for offensive word, w, can be calculated as ∑k j=1 dj Consequently, the offensiveness value of sentence, s, becomes a determined linear combination of words’ offensiveness Os = ∑ Ow Iw b1 if user identifier dj = b2 if is an offensive word 0 otherwise
  • 12. User offensiveness estimation Given a user, u, they retrieve his/her conversation history which contains several posts {p1, …, pm} and each post contains sentences {s1, …. , sn}. Sentence offensiveness values are denoted as {Os1, …. , Osn}. The original offensiveness value of post p, Op = ∑Os . The offensiveness value of modified posts can be presented as, Op s . So the final post offensiveness Op ` of post p can be calculated as Op ` = max(Op, Ops). Hence, the offensiveness value of Ou, can be presented as Ou = 1/m∑ Op ` Additional Features Extracted From Users’ Language Profiles they also considered additional features such as: Sentence styles. Users may use punctuation and words with all uppercase letters to indicate feelings or speaking volume. Punctuation, such as exclamation marks, can emphasize offensiveness of posts. (i.e. Both “He is stupid!” and “He is STUPID.” are stronger than “He is stupid.”).
  • 13. Experiment 1: Sentence Offensiveness Calculation According to the above Fig, none of the baseline approaches provides recall rate higher than 70%, because many of the offensive sentences are imperatives, which omit all user identifiers. Among the baseline approaches, the BoW approach has the highest recall rate 66%. However, BoW generates a high false positive rate.
  • 14. Experiment 2: User Offensiveness Estimation-with presence of strongly offensive words Machine learning techniques—Naïve Bayes (NB) and SVM—are used to perform the classification, and 10-fold cross validation was conducted in this experiment. To fully evaluate the effectiveness of users’ sentence offensiveness value (LSF), style features, structure features and content specific features for user offensiveness estimation, data fed sequentially into the classifiers, and get the result in Fig. According to Fig., offensive words and user language features are not compensating to each other to improve the detection rate, which means they are not independent. In contrast, incorporating with user language features, the classifiers have better detection rate than just adopting LSF.
  • 15. Conclusion In this study, they investigate existing text mining methods in detecting offensive contents for protecting adolescent online safety. Proposed Lexical Syntactical Feature (LSF) approach to identify offensive contents in social media, and predict a user’s potentiality to send out offensive contents. Their research has several contributions. First, practically conceptualize the notion of online offensive contents, and further distinguish the contribution of pejoratives/ profanities and obscenities in determining offensive contents, and introduce hand authoring syntactic rules in identifying name calling harassment. Second, improved the traditional machine learning methods by not only using lexical features to detect offensive languages, but also incorporating style features, structure features and context-specific features to better predict a user’s potentiality to send out offensive content in social media. Experimental result shows that the LSF sentence offensiveness prediction and user offensiveness estimate algorithms outperform traditional learning based approaches in terms of precision, recall and f-score. It also achieves high processing speed for effective deployment in social media.