SlideShare a Scribd company logo
10 X October 2022
https://doi.org/10.22214/ijraset.2022.45847
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
782
© IJRASET: All Rights are Reserved | SJImpact Factor 7.538 | ISRA Journal Impact Factor 7.894 |
A Review Paper on Cyber Harassment Detection
Using Machine Learning Algorithm on Social
Networking Website
Miss. Sneha Gajanan Sambare1
, Dr. Sanjay. L. Haridas2
1
PG Student, (M. Tech. Electronics Engineering), JD College of Engineering & Management
2
Professor, Department of Electronics & Telecommunication Engineering, JD College of Engineering & Management
Abstract: Cyber bullying is nothing but bullying with the use of digital technology. Cyber bullying can take place on social media
platforms, messaging platforms, gaming platforms and mobile phones etc. Cyber bullying is a big issue that is encountered by the
individual on internet that affects teenagers as well as adults. It has lead to nuisances like suicide, mental health problems and
depression. Therefore regulation of content on Social media platforms has become a growing need. Also detecting Cyber bullying
detection at early stages can help to alleviate impacts on the victims. In the following research we make use of data from two
different data sets. The first one is tweets from Twitter and second one is comments based on personal attacks from Wikipedia
forums. The approach is to build a model based on detection of cyber bullying in text data using Natural Language Processing
and Machine learning. We try to build a model which will provide accuracy up to 90% for tweets and accuracy up to 80% for
Wikipedia forums. Cyber bullying detection will be done as a binary classification problem where we are detecting two major form
of cyber bullying: hate speech on Twitter and Personal attacks on Wikipedia and classifying them as containing Cyber bullying or
not. The end result of this proposed work may reduce negative impacts on the victims as a result of early detection.
Keywords: Cyber bullying, Social Media, Machine Learning,
I. INTRODUCTION
With the development of internet, social media is trending these days. Everyone jokes with everyone. But the problem arises when
someone or a group of people starts to joke at you instead of joking with you. More often than not online abuse turnout to be real life
threat to the victim. Cyber bullying is repeated behavior which aimed at scaring, angering or shaming an individual even after being
asked to stop by the aimed individual. Cyber bullying is common these days and can hurt the victim mentally, emotionally or
physically or all of them at once.
The feeling of being targeted can lead to people from sharing their problems and dealing with the problems. The victim can also turn
towards drugs alcohol to deal with their mental and physical pain. Bullying via textual messages or pictures or videos on social media
platforms has proven to be very harmful for adults. The term Bullying can be defined as an aggressive, intentional act that is carried
out by a group or an individual against a victim who cannot easily defend him or herself repeatedly and over time even after being
asked to stop.
The problems associated with cyber bullying makes it compulsory to detect it before it turns out t be a crime. Since until cyber
bullying is detected it can’t be stopped. Hence for bullying to stop, it needs to be identified first and then it must be reported. Early
detection will also helps in supporting the victim and to help him dealing with the problems arising due to bullying. In this project we
will be building a model that will be composed of two data sets one from twitter data set “Hate Speech Twitter Dataset by Waseem,
Zeerak and Hovy” that contains 17000 tweets labeled for sexism or racism and second one from Wikipedia Dataset The Wikipedia
dataset by Wulczyn, Thain and Dixon that contains 1M comments labeled for Personal attacks. For the analysis 40000 comments are
taken into consideration from the dataset from which 13000 comments are labeled as Cyber bullying due to personal attack. After
collecting data set we will be training the system in order to detect bullying.
This detection will be done as a binary classification problem. The algorithm used will be Natural Language Processing and Machine
learning. As soon as bullying is detected it needs to be reported in order to stop it. The end result will implicate positive results on the
victim’s mental and physical issues.
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
783
© IJRASET: All Rights are Reserved | SJImpact Factor 7.538 | ISRA Journal Impact Factor 7.894 |
II. BACKGROUND
Researches on Cyber bullying incidents show that 11.4% of 720 young people surveyed in the NCT DELHI were victims of cyber
bullying in a 2018 survey by Child Right and You, an NGO in India, and almost half of them did not even report it to their teachers,
parents or guardians as they were so terrified. 22.8% aged 13-18 who used the internet for around 3 hours a day were vulnerable to
cyber bullying while 28% of people who use internet more than 4 hours a day were victims of cyber bullying. There are so many
other reports suggested us that the impact of cyber bullying is affecting badly the people and children between ages of 13 to 20.
People are facing so many difficulties in terms of health, mental fitness and their decision making ability in work and other areas of
life. Researchers also suggest taking cyber bullying seriously on the national level and try to find solution to cyber bullying. Cyber
bullying should be considered as national level issue. As it is known to most of us that in the year 2016 an incident called Blue Whale
Challenge led to lots of child suicides in known as Blue Whale Challenge arises in Russia and other countries. The dangerous game
that spread over different social networking websites and it was a one to one relationship between an administrator and a participant.
For fifty days specific tasks were given to participants starting from easy ones and then leading to the fatal events. Initially they were
easy like waking up at 4:30 AM or watching a horror movie and then it finally ends up in suicide attempts. Hence it is quite necessary
that these kinds of incidents must be reported at prior stages so as to avoid the hazardous effect.
III. MOTIVATION
1) To eradicate the evils existing on social media with the use of proper technology.
2) To help in identifying the crimes on social media and also identifying the criminals before the situation aggravates
3) Use of machine learning algorithms to detect the bullying on social media.
4) Existing methods are not proven to be very efficient methods of detection of cyber bullying so a system is must for countering
cyber bullying.
IV. PROBLEM STATEMENT
1) Cyber bullying is a new type of bullying that follows an individual from the entrance of their work place and back to their homes.
2) Individuals are bullied from the start of the morning till they reach their homes at night and go to bed for sleep.
3) Existing methods are applied only when bullying becomes an act of crime not before that.
4) So it is very necessary to detect and monitor the anti-social elements before they do a crime.
V. LITERATURE REVIEW
1) Manuel F. LĂłpez-VizcaĂ­no, Francisco J. NĂłvoa, Victor Carneiro and Fidel Cacheda [16] proposed a two machine model which
was a learning model for early detection of cyberbullying. They did the experiments using real world data set and following a
time aware evaluation.
2) Syed Mahbub, Eric Pardede and A. S. M. Kayes [15] proposes analysis of the effects of predatory approach words in the
detection of cyberbullying and proposes a mechanism for generation of a dictionary of such approach words. This was systematic
approach design that takes the generation of keyword dictionary into consideration.
3) Patxi Galan-Garcia [2] proposed a hypothesis that a troll (cyber bully) on social networking sites under a fake profile always has
a real profile to check how others view the fake profile. He proposed a machine learning approach to determine such profiles.
The identification process studied some profiles that have some kind of close relationship to them. The method used was to use
ML to select profiles to study, retrieve information about tweets, select features to be used from profiles, and find the author of
the tweets. 1900 tweets from 19 different profiles were used. It had an accuracy of 68% for author identification. This was later
used in a case study at a school in Spain, where cyberbullying had to find the real owner of the profile among some of the
suspected students, and this method worked in the case. The following method still has some drawbacks. For example a case
where the trolling account doesn't have a real account to fool the system or experts who can change the writing style and behavior
so that no patterns are found. Changing writing style will require more efficient algorithms.
4) Kelly Reynolds, April Kontostathis and Lynne Edwards [6] proposed a Formspring (A forum for anonymous question answers)
dataset which gives recall of 78.5% accuracy using Machine learning Algorithms and oversampling due to imbalance in cyber
bullying posts. They used the labeled data, in conjunction with machine learning techniques which utilizes Weka tool kit, to train
a computer to recognize harassment contents.
5) Maral Dadvar and Kai Eckert [8] trained deep neural networks on Twitter, Wikipedia, Formspring datasets and used the model
on YouTube dataset and investigated the performance of their model in new social media platforms for the same and achieved F1
score of 0.97 using Bidirectional Long Short-Term Memory(BLSTM) model.
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
784
© IJRASET: All Rights are Reserved | SJImpact Factor 7.538 | ISRA Journal Impact Factor 7.894 |
6) Sweta Agrawal and Amit Awekar [9] used similar same datasets for training Deep Neural Networks they provided several useful
insights about cyberbullying detection. Their proposed system was basically the first of its kind that systematically analyzes
cyberbullying detection using deep learning based models and transfer learning. They resolved that how the vocabulary for such
models changes across various Social Media Platforms.
VI. PROPOSED SYSTEM
Figure: Proposed System
The proposed system consists of two datasets that we will be using to train our model.
A. Twitter Dataset: The Twitter Dataset is combined from two datasets containing hate speech: Hate Speech Twitter Dataset by
Waseem, Zeerak and Hovy, Dirk [11] which contains 17000 tweets labeled for sexism or racism.
The tweets are mined using the annotations. 5900 tweets are lost due to accounts being deactivated or tweet deleted and second
dataset of Hate Speech Language Dataset by Thomas Davidson, Dana Warmsley, Michael Macy and Ingmar Weber, [12]. It
contained 25000 tweets obtained by crowd sourcing. This gives total 35787 tweets for the task distribution. For the following dataset,
70 % (25,050) of this dataset is used as training data and 30 percent as testing data (10,737).
B.Wikipedia Dataset The Wikipedia dataset by Wulczyn, Thain and Dixon [13] contains 1M comments labeled for Personal attacks.
For our research analysis 40000 comments are used from the databanks from which 13000 comments are labeled as cyber bullying
due to smear campaign. These comments are extracted from conversations between editors of pages on Wikipedia labeled by 10
annotators via Crowd Flower. For this dataset the same split (70 percent i.e 28000 to training data and 30 percent i.e 12000 to testing
data) is used. Most of the existing system techniques are manual, which includes human intervention and decision making. Also
existing system looks for patterns that already exist in the data. To overcome these deficiencies we are making use of Support Vector
Machine (SVM). SVM is a computer algorithm that assigns the labels to object based on examples. For instance, an SVM can learn to
categorize fraudulent or no fraudulent credit card activities by examining hundreds or thousands of fraudulent and or no fraudulent
credit card activity reports. SVM is basically used to plot a hyper plane that creates a boundary between data points in number of
features (N)-dimensional space. Linear SVM is used in the following case which is optimum for linearly separable data. In case of 0
misclassification, i.e. the class of data point is accurately predicted by our model, we only have to change the gradient from the
regularization arguments. A random forest consists of different individual decision trees which individually predict a class for given
query points and the class with maximum votes will be the final result. Decision Tree is a building block for random forest which
provides a prediction by decision rules learned from feature vectors. A group of these uncorrelated trees provides a more corrective
decision for classification. First we will design a Python language based portal which will in which we will be training our system
using the algorithm with the help of data sets. Python is highly readable language also it has fewer syntactical constructions than other
languages. Using Python language based portal, the two data sets and SVM we will train our network and as the system is trained if
we enter the keywords or group of words then the system will show whether the entered content is bullying or not. A flag or warning
message will pop out to show whether the entered content is bullying or not.
VII. CONCLUSION
A larger part of cyberbullying detection aimed at routine detection of cyberbullying incidents on online platforms focuses on machine
learning models to detect bullying. Due to ever changing nature of online platforms detection of such events can have multiple ways.
Cyber bullying on the online platforms is very menacing and leads to unfortunate events like depression, mental and physical illness
and even suicide. Hence there is a need to control its scattering.
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue X Oct 2022- Available at www.ijraset.com
785
© IJRASET: All Rights are Reserved | SJImpact Factor 7.538 | ISRA Journal Impact Factor 7.894 |
This leads to important aspect of online activities that is cyber bullying detection on social media platforms. In this proposed system
we presented an idea for detection of cyber bullying and if integrated with real time applications it will help in restricting abusers
bullying. We demonstrated the build up for two types of data: Hate speech Data on Twitter and Personal attacks on Wikipedia. For
Hate speech Natural Language Processing techniques proved effective having accuracy of over 90% using basic Machine learning
algorithms. As a result it gives better results with BoW and TF-IDF models instead of Word2Vec models. However, personal attacks
were difficult to detect through the model because of the fact that the comments usually did not use any common sentiment that could
be used to train the network. Word2Vec models which utilizes context of features proved effective in both datasets giving like to like
results in comparatively less features after combining with Multi Layered Perceptrons. Additional perspective can also be taken into
consideration for more specific results in future works. The nature of attacks are required to be studied further for a more specific and
accurate detection.
REFERENCES
[1] H. Ting, W. S. Liou, D. Liberona, S. L. Wang, and G. M. T. Bermudez, “Towards the detection of cyberbullying based on social network mining techniques,” in
Proceedings of 4th International Conference on Behavioral, Economic, and SocioCultural Computing, BESC 2017, 2017.
[2] P. Galán-García, J. G. de la Puerta, C. L. Gómez, I. Santos, and P. G. Bringas, “Supervised machine learning for the detection of troll profiles in twitter social
network: Application to a real case of cyberbullying,” 2014.
[3] A. Mangaonkar, A. Hayrapetian, and R. Raje, “Collaborative detection of cyberbullying behavior in Twitter data,” 2015.
[4] R. Zhao, A. Zhou, and K. Mao, “Automatic detection of cyberbullying on social networks based on bullying features,” 2016.
[5] V. Banerjee, J. Telavane, P. Gaikwad, and P. Vartak, “Detection of Cyberbullying Using Deep Neural Network,” 2019.
[6] K. Reynolds, A. Kontostathis, and L. Edwards, “Using machine learning to detect cyberbullying,” 2011.
[7] J. Yadav, D. Kumar, and D. Chauhan, “Cyberbullying Detection using Pre-Trained BERT Model,” 2020.
[8] M. Dadvar and K. Eckert, “Cyberbullying Detection in Social Networks Using Deep Learning Based Models; A Reproducibility Study,” arXiv. 2018.
[9] S. Agrawal and A. Awekar, “Deep learning for detecting cyberbullying across multiple social media platforms,” arXiv. 2018.
[10] Y. N. Silva, C. Rich, and D. Hall, “BullyBlocker: Towards the identification of cyber bullying in social networking sites,” 2016.
[11] Z. Waseem and D. Hovy, “Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter,” 2016.
[12] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” 2017.
[13] E. Wulczyn, N. Thain, and L. Dixon, “Ex machina: Personal attacks seen at scale,” 2017.
[14] A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep learning architectures: a review,” Artif. Intell. Rev., vol. 53, no. 6, 2020, doi: 10.1007/s10462-
019-09794-5.
[15] “Detection of Harassment Type of Cyberbullying: A Dictionary of Approach Words and Its Impact” Syed Mahbub, Eric Pardede and A. S. M. Kayes
[16] “Early detection of cyberbullying on social media networks” Manuel F. López-Vizcaíno, Francisco J. Nóvoa, Victor Carneiro and Fidel Cacheda.
A Review Paper On Cyber Harassment Detection Using Machine Learning Algorithm On Social Networking Website

More Related Content

Similar to A Review Paper On Cyber Harassment Detection Using Machine Learning Algorithm On Social Networking Website

A review of cyberbullying and cyber threats in education 2
A review of cyberbullying and cyber threats in education 2A review of cyberbullying and cyber threats in education 2
A review of cyberbullying and cyber threats in education 2
IAEME Publication
 
Draft for prezi and poster
Draft for prezi and posterDraft for prezi and poster
Draft for prezi and poster
victoriahoo
 
Seminar Paper Heller
Seminar Paper HellerSeminar Paper Heller
Seminar Paper Heller
Kyle Heller
 
Bullyspoiler: Detect and Block Cyberbullying Tweet and Bully using Deep Learn...
Bullyspoiler: Detect and Block Cyberbullying Tweet and Bully using Deep Learn...Bullyspoiler: Detect and Block Cyberbullying Tweet and Bully using Deep Learn...
Bullyspoiler: Detect and Block Cyberbullying Tweet and Bully using Deep Learn...
Associate Professor in VSB Coimbatore
 

Similar to A Review Paper On Cyber Harassment Detection Using Machine Learning Algorithm On Social Networking Website (15)

NETS5003 Policy Primer: Cyber harassment
NETS5003 Policy Primer: Cyber harassmentNETS5003 Policy Primer: Cyber harassment
NETS5003 Policy Primer: Cyber harassment
 
A REVIEW OF CYBERBULLYING AND CYBER THREATS IN EDUCATION
A REVIEW OF CYBERBULLYING AND CYBER THREATS IN EDUCATIONA REVIEW OF CYBERBULLYING AND CYBER THREATS IN EDUCATION
A REVIEW OF CYBERBULLYING AND CYBER THREATS IN EDUCATION
 
A REVIEW OF CYBERBULLYING AND CYBER THREATS IN EDUCATION 2
A REVIEW OF CYBERBULLYING AND CYBER THREATS IN EDUCATION 2A REVIEW OF CYBERBULLYING AND CYBER THREATS IN EDUCATION 2
A REVIEW OF CYBERBULLYING AND CYBER THREATS IN EDUCATION 2
 
A review of cyberbullying and cyber threats in education 2
A review of cyberbullying and cyber threats in education 2A review of cyberbullying and cyber threats in education 2
A review of cyberbullying and cyber threats in education 2
 
Draft for prezi and poster
Draft for prezi and posterDraft for prezi and poster
Draft for prezi and poster
 
Seminar Paper Heller
Seminar Paper HellerSeminar Paper Heller
Seminar Paper Heller
 
6356152.pdf
6356152.pdf6356152.pdf
6356152.pdf
 
Cyber Crime: Youth in Social Decline
Cyber Crime: Youth in Social DeclineCyber Crime: Youth in Social Decline
Cyber Crime: Youth in Social Decline
 
Categorize balanced dataset for troll detection
Categorize balanced dataset for troll detectionCategorize balanced dataset for troll detection
Categorize balanced dataset for troll detection
 
Bullyspoiler: Detect and Block Cyberbullying Tweet and Bully using Deep Learn...
Bullyspoiler: Detect and Block Cyberbullying Tweet and Bully using Deep Learn...Bullyspoiler: Detect and Block Cyberbullying Tweet and Bully using Deep Learn...
Bullyspoiler: Detect and Block Cyberbullying Tweet and Bully using Deep Learn...
 
Defense Against The Digital Dark Arts: Navigating Online Spaces as a Journali...
Defense Against The Digital Dark Arts: Navigating Online Spaces as a Journali...Defense Against The Digital Dark Arts: Navigating Online Spaces as a Journali...
Defense Against The Digital Dark Arts: Navigating Online Spaces as a Journali...
 
Cyber crime
Cyber crimeCyber crime
Cyber crime
 
slide deck oral presentation.pdf
slide deck oral presentation.pdfslide deck oral presentation.pdf
slide deck oral presentation.pdf
 
Research Paper On Cybercrime
Research Paper On CybercrimeResearch Paper On Cybercrime
Research Paper On Cybercrime
 
Cyber crime modified
Cyber crime modifiedCyber crime modified
Cyber crime modified
 

More from Lisa Muthukumar

More from Lisa Muthukumar (20)

Analytical Essay Purdue Owl Apa Sample. Online assignment writing service.
Analytical Essay Purdue Owl Apa Sample. Online assignment writing service.Analytical Essay Purdue Owl Apa Sample. Online assignment writing service.
Analytical Essay Purdue Owl Apa Sample. Online assignment writing service.
 
Learn How To Tell The TIME Properly In English 7ESL
Learn How To Tell The TIME Properly In English 7ESLLearn How To Tell The TIME Properly In English 7ESL
Learn How To Tell The TIME Properly In English 7ESL
 
Where To Buy College Essay Of The Highest Quality
Where To Buy College Essay Of The Highest QualityWhere To Buy College Essay Of The Highest Quality
Where To Buy College Essay Of The Highest Quality
 
What Is A Reaction Essay. Reaction Essay. 2019-01-24
What Is A Reaction Essay. Reaction Essay. 2019-01-24What Is A Reaction Essay. Reaction Essay. 2019-01-24
What Is A Reaction Essay. Reaction Essay. 2019-01-24
 
Writing A Strong Essay. Paper Rater Writing A Strong Essay. 2022-11-05.pdf Wr...
Writing A Strong Essay. Paper Rater Writing A Strong Essay. 2022-11-05.pdf Wr...Writing A Strong Essay. Paper Rater Writing A Strong Essay. 2022-11-05.pdf Wr...
Writing A Strong Essay. Paper Rater Writing A Strong Essay. 2022-11-05.pdf Wr...
 
List Of Transitional Words For Writing Essays. Online assignment writing serv...
List Of Transitional Words For Writing Essays. Online assignment writing serv...List Of Transitional Words For Writing Essays. Online assignment writing serv...
List Of Transitional Words For Writing Essays. Online assignment writing serv...
 
38 Biography Templates - DOC, PDF, Excel Biograp
38 Biography Templates - DOC, PDF, Excel Biograp38 Biography Templates - DOC, PDF, Excel Biograp
38 Biography Templates - DOC, PDF, Excel Biograp
 
How To Start Writing Essay About Yourself. How To Sta
How To Start Writing Essay About Yourself. How To StaHow To Start Writing Essay About Yourself. How To Sta
How To Start Writing Essay About Yourself. How To Sta
 
Writing A Summary In 3 Steps Summary Writing, T
Writing A Summary In 3 Steps Summary Writing, TWriting A Summary In 3 Steps Summary Writing, T
Writing A Summary In 3 Steps Summary Writing, T
 
012 College Application Essay Examples About Yo
012 College Application Essay Examples About Yo012 College Application Essay Examples About Yo
012 College Application Essay Examples About Yo
 
Thanksgiving Writing Paper Free Printable - Printable F
Thanksgiving Writing Paper Free Printable - Printable FThanksgiving Writing Paper Free Printable - Printable F
Thanksgiving Writing Paper Free Printable - Printable F
 
PPT - English Essay Writing Help Online PowerPoint Presentation, Fre
PPT - English Essay Writing Help Online PowerPoint Presentation, FrePPT - English Essay Writing Help Online PowerPoint Presentation, Fre
PPT - English Essay Writing Help Online PowerPoint Presentation, Fre
 
Free Lined Writing Paper Printable - Printable Templ
Free Lined Writing Paper Printable - Printable TemplFree Lined Writing Paper Printable - Printable Templ
Free Lined Writing Paper Printable - Printable Templ
 
Descriptive Essay College Writing Tutor. Online assignment writing service.
Descriptive Essay College Writing Tutor. Online assignment writing service.Descriptive Essay College Writing Tutor. Online assignment writing service.
Descriptive Essay College Writing Tutor. Online assignment writing service.
 
9 Best Images Of Journal Writing Pap. Online assignment writing service.
9 Best Images Of Journal Writing Pap. Online assignment writing service.9 Best Images Of Journal Writing Pap. Online assignment writing service.
9 Best Images Of Journal Writing Pap. Online assignment writing service.
 
Divorce Agreement Template - Fill Out, Sign Online And
Divorce Agreement Template - Fill Out, Sign Online AndDivorce Agreement Template - Fill Out, Sign Online And
Divorce Agreement Template - Fill Out, Sign Online And
 
Diversity On Campus Essay - Mfawriting515.Web.Fc2.Com
Diversity On Campus Essay - Mfawriting515.Web.Fc2.ComDiversity On Campus Essay - Mfawriting515.Web.Fc2.Com
Diversity On Campus Essay - Mfawriting515.Web.Fc2.Com
 
What Is Narrative Form Writing - HISTORYZA
What Is Narrative Form Writing - HISTORYZAWhat Is Narrative Form Writing - HISTORYZA
What Is Narrative Form Writing - HISTORYZA
 
College Admission Essays That Worked - The Oscillati
College Admission Essays That Worked - The OscillatiCollege Admission Essays That Worked - The Oscillati
College Admission Essays That Worked - The Oscillati
 
5 Steps To Write A Good Essay REssaysondemand
5 Steps To Write A Good Essay REssaysondemand5 Steps To Write A Good Essay REssaysondemand
5 Steps To Write A Good Essay REssaysondemand
 

Recently uploaded

IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff
17thcssbs2
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
ashishpaul799
 

Recently uploaded (20)

IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17
 
INU_CAPSTONEDESIGN_ᄇᅔᄆᅔᆯᄇᅄᆫ호486_á„‹á…„á†žá„…á…©á„ƒá…łá„‹á…­á†Œ 발표자료.pdf
INU_CAPSTONEDESIGN_ᄇᅔᄆᅔᆯᄇᅄᆫ호486_á„‹á…„á†žá„…á…©á„ƒá…łá„‹á…­á†Œ 발표자료.pdfINU_CAPSTONEDESIGN_ᄇᅔᄆᅔᆯᄇᅄᆫ호486_á„‹á…„á†žá„…á…©á„ƒá…łá„‹á…­á†Œ 발표자료.pdf
INU_CAPSTONEDESIGN_ᄇᅔᄆᅔᆯᄇᅄᆫ호486_á„‹á…„á†žá„…á…©á„ƒá…łá„‹á…­á†Œ 발표자료.pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resources
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 2 STEPS Using Odoo 17
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & EngineeringBasic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
Basic Civil Engg Notes_Chapter-6_Environment Pollution & Engineering
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Open Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPointOpen Educational Resources Primer PowerPoint
Open Educational Resources Primer PowerPoint
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
 

A Review Paper On Cyber Harassment Detection Using Machine Learning Algorithm On Social Networking Website

  • 1. 10 X October 2022 https://doi.org/10.22214/ijraset.2022.45847
  • 2. International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 10 Issue X Oct 2022- Available at www.ijraset.com 782 © IJRASET: All Rights are Reserved | SJImpact Factor 7.538 | ISRA Journal Impact Factor 7.894 | A Review Paper on Cyber Harassment Detection Using Machine Learning Algorithm on Social Networking Website Miss. Sneha Gajanan Sambare1 , Dr. Sanjay. L. Haridas2 1 PG Student, (M. Tech. Electronics Engineering), JD College of Engineering & Management 2 Professor, Department of Electronics & Telecommunication Engineering, JD College of Engineering & Management Abstract: Cyber bullying is nothing but bullying with the use of digital technology. Cyber bullying can take place on social media platforms, messaging platforms, gaming platforms and mobile phones etc. Cyber bullying is a big issue that is encountered by the individual on internet that affects teenagers as well as adults. It has lead to nuisances like suicide, mental health problems and depression. Therefore regulation of content on Social media platforms has become a growing need. Also detecting Cyber bullying detection at early stages can help to alleviate impacts on the victims. In the following research we make use of data from two different data sets. The first one is tweets from Twitter and second one is comments based on personal attacks from Wikipedia forums. The approach is to build a model based on detection of cyber bullying in text data using Natural Language Processing and Machine learning. We try to build a model which will provide accuracy up to 90% for tweets and accuracy up to 80% for Wikipedia forums. Cyber bullying detection will be done as a binary classification problem where we are detecting two major form of cyber bullying: hate speech on Twitter and Personal attacks on Wikipedia and classifying them as containing Cyber bullying or not. The end result of this proposed work may reduce negative impacts on the victims as a result of early detection. Keywords: Cyber bullying, Social Media, Machine Learning, I. INTRODUCTION With the development of internet, social media is trending these days. Everyone jokes with everyone. But the problem arises when someone or a group of people starts to joke at you instead of joking with you. More often than not online abuse turnout to be real life threat to the victim. Cyber bullying is repeated behavior which aimed at scaring, angering or shaming an individual even after being asked to stop by the aimed individual. Cyber bullying is common these days and can hurt the victim mentally, emotionally or physically or all of them at once. The feeling of being targeted can lead to people from sharing their problems and dealing with the problems. The victim can also turn towards drugs alcohol to deal with their mental and physical pain. Bullying via textual messages or pictures or videos on social media platforms has proven to be very harmful for adults. The term Bullying can be defined as an aggressive, intentional act that is carried out by a group or an individual against a victim who cannot easily defend him or herself repeatedly and over time even after being asked to stop. The problems associated with cyber bullying makes it compulsory to detect it before it turns out t be a crime. Since until cyber bullying is detected it can’t be stopped. Hence for bullying to stop, it needs to be identified first and then it must be reported. Early detection will also helps in supporting the victim and to help him dealing with the problems arising due to bullying. In this project we will be building a model that will be composed of two data sets one from twitter data set “Hate Speech Twitter Dataset by Waseem, Zeerak and Hovy” that contains 17000 tweets labeled for sexism or racism and second one from Wikipedia Dataset The Wikipedia dataset by Wulczyn, Thain and Dixon that contains 1M comments labeled for Personal attacks. For the analysis 40000 comments are taken into consideration from the dataset from which 13000 comments are labeled as Cyber bullying due to personal attack. After collecting data set we will be training the system in order to detect bullying. This detection will be done as a binary classification problem. The algorithm used will be Natural Language Processing and Machine learning. As soon as bullying is detected it needs to be reported in order to stop it. The end result will implicate positive results on the victim’s mental and physical issues.
  • 3. International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 10 Issue X Oct 2022- Available at www.ijraset.com 783 © IJRASET: All Rights are Reserved | SJImpact Factor 7.538 | ISRA Journal Impact Factor 7.894 | II. BACKGROUND Researches on Cyber bullying incidents show that 11.4% of 720 young people surveyed in the NCT DELHI were victims of cyber bullying in a 2018 survey by Child Right and You, an NGO in India, and almost half of them did not even report it to their teachers, parents or guardians as they were so terrified. 22.8% aged 13-18 who used the internet for around 3 hours a day were vulnerable to cyber bullying while 28% of people who use internet more than 4 hours a day were victims of cyber bullying. There are so many other reports suggested us that the impact of cyber bullying is affecting badly the people and children between ages of 13 to 20. People are facing so many difficulties in terms of health, mental fitness and their decision making ability in work and other areas of life. Researchers also suggest taking cyber bullying seriously on the national level and try to find solution to cyber bullying. Cyber bullying should be considered as national level issue. As it is known to most of us that in the year 2016 an incident called Blue Whale Challenge led to lots of child suicides in known as Blue Whale Challenge arises in Russia and other countries. The dangerous game that spread over different social networking websites and it was a one to one relationship between an administrator and a participant. For fifty days specific tasks were given to participants starting from easy ones and then leading to the fatal events. Initially they were easy like waking up at 4:30 AM or watching a horror movie and then it finally ends up in suicide attempts. Hence it is quite necessary that these kinds of incidents must be reported at prior stages so as to avoid the hazardous effect. III. MOTIVATION 1) To eradicate the evils existing on social media with the use of proper technology. 2) To help in identifying the crimes on social media and also identifying the criminals before the situation aggravates 3) Use of machine learning algorithms to detect the bullying on social media. 4) Existing methods are not proven to be very efficient methods of detection of cyber bullying so a system is must for countering cyber bullying. IV. PROBLEM STATEMENT 1) Cyber bullying is a new type of bullying that follows an individual from the entrance of their work place and back to their homes. 2) Individuals are bullied from the start of the morning till they reach their homes at night and go to bed for sleep. 3) Existing methods are applied only when bullying becomes an act of crime not before that. 4) So it is very necessary to detect and monitor the anti-social elements before they do a crime. V. LITERATURE REVIEW 1) Manuel F. LĂłpez-VizcaĂ­no, Francisco J. NĂłvoa, Victor Carneiro and Fidel Cacheda [16] proposed a two machine model which was a learning model for early detection of cyberbullying. They did the experiments using real world data set and following a time aware evaluation. 2) Syed Mahbub, Eric Pardede and A. S. M. Kayes [15] proposes analysis of the effects of predatory approach words in the detection of cyberbullying and proposes a mechanism for generation of a dictionary of such approach words. This was systematic approach design that takes the generation of keyword dictionary into consideration. 3) Patxi Galan-Garcia [2] proposed a hypothesis that a troll (cyber bully) on social networking sites under a fake profile always has a real profile to check how others view the fake profile. He proposed a machine learning approach to determine such profiles. The identification process studied some profiles that have some kind of close relationship to them. The method used was to use ML to select profiles to study, retrieve information about tweets, select features to be used from profiles, and find the author of the tweets. 1900 tweets from 19 different profiles were used. It had an accuracy of 68% for author identification. This was later used in a case study at a school in Spain, where cyberbullying had to find the real owner of the profile among some of the suspected students, and this method worked in the case. The following method still has some drawbacks. For example a case where the trolling account doesn't have a real account to fool the system or experts who can change the writing style and behavior so that no patterns are found. Changing writing style will require more efficient algorithms. 4) Kelly Reynolds, April Kontostathis and Lynne Edwards [6] proposed a Formspring (A forum for anonymous question answers) dataset which gives recall of 78.5% accuracy using Machine learning Algorithms and oversampling due to imbalance in cyber bullying posts. They used the labeled data, in conjunction with machine learning techniques which utilizes Weka tool kit, to train a computer to recognize harassment contents. 5) Maral Dadvar and Kai Eckert [8] trained deep neural networks on Twitter, Wikipedia, Formspring datasets and used the model on YouTube dataset and investigated the performance of their model in new social media platforms for the same and achieved F1 score of 0.97 using Bidirectional Long Short-Term Memory(BLSTM) model.
  • 4. International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 10 Issue X Oct 2022- Available at www.ijraset.com 784 © IJRASET: All Rights are Reserved | SJImpact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 6) Sweta Agrawal and Amit Awekar [9] used similar same datasets for training Deep Neural Networks they provided several useful insights about cyberbullying detection. Their proposed system was basically the first of its kind that systematically analyzes cyberbullying detection using deep learning based models and transfer learning. They resolved that how the vocabulary for such models changes across various Social Media Platforms. VI. PROPOSED SYSTEM Figure: Proposed System The proposed system consists of two datasets that we will be using to train our model. A. Twitter Dataset: The Twitter Dataset is combined from two datasets containing hate speech: Hate Speech Twitter Dataset by Waseem, Zeerak and Hovy, Dirk [11] which contains 17000 tweets labeled for sexism or racism. The tweets are mined using the annotations. 5900 tweets are lost due to accounts being deactivated or tweet deleted and second dataset of Hate Speech Language Dataset by Thomas Davidson, Dana Warmsley, Michael Macy and Ingmar Weber, [12]. It contained 25000 tweets obtained by crowd sourcing. This gives total 35787 tweets for the task distribution. For the following dataset, 70 % (25,050) of this dataset is used as training data and 30 percent as testing data (10,737). B.Wikipedia Dataset The Wikipedia dataset by Wulczyn, Thain and Dixon [13] contains 1M comments labeled for Personal attacks. For our research analysis 40000 comments are used from the databanks from which 13000 comments are labeled as cyber bullying due to smear campaign. These comments are extracted from conversations between editors of pages on Wikipedia labeled by 10 annotators via Crowd Flower. For this dataset the same split (70 percent i.e 28000 to training data and 30 percent i.e 12000 to testing data) is used. Most of the existing system techniques are manual, which includes human intervention and decision making. Also existing system looks for patterns that already exist in the data. To overcome these deficiencies we are making use of Support Vector Machine (SVM). SVM is a computer algorithm that assigns the labels to object based on examples. For instance, an SVM can learn to categorize fraudulent or no fraudulent credit card activities by examining hundreds or thousands of fraudulent and or no fraudulent credit card activity reports. SVM is basically used to plot a hyper plane that creates a boundary between data points in number of features (N)-dimensional space. Linear SVM is used in the following case which is optimum for linearly separable data. In case of 0 misclassification, i.e. the class of data point is accurately predicted by our model, we only have to change the gradient from the regularization arguments. A random forest consists of different individual decision trees which individually predict a class for given query points and the class with maximum votes will be the final result. Decision Tree is a building block for random forest which provides a prediction by decision rules learned from feature vectors. A group of these uncorrelated trees provides a more corrective decision for classification. First we will design a Python language based portal which will in which we will be training our system using the algorithm with the help of data sets. Python is highly readable language also it has fewer syntactical constructions than other languages. Using Python language based portal, the two data sets and SVM we will train our network and as the system is trained if we enter the keywords or group of words then the system will show whether the entered content is bullying or not. A flag or warning message will pop out to show whether the entered content is bullying or not. VII. CONCLUSION A larger part of cyberbullying detection aimed at routine detection of cyberbullying incidents on online platforms focuses on machine learning models to detect bullying. Due to ever changing nature of online platforms detection of such events can have multiple ways. Cyber bullying on the online platforms is very menacing and leads to unfortunate events like depression, mental and physical illness and even suicide. Hence there is a need to control its scattering.
  • 5. International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 10 Issue X Oct 2022- Available at www.ijraset.com 785 © IJRASET: All Rights are Reserved | SJImpact Factor 7.538 | ISRA Journal Impact Factor 7.894 | This leads to important aspect of online activities that is cyber bullying detection on social media platforms. In this proposed system we presented an idea for detection of cyber bullying and if integrated with real time applications it will help in restricting abusers bullying. We demonstrated the build up for two types of data: Hate speech Data on Twitter and Personal attacks on Wikipedia. For Hate speech Natural Language Processing techniques proved effective having accuracy of over 90% using basic Machine learning algorithms. As a result it gives better results with BoW and TF-IDF models instead of Word2Vec models. However, personal attacks were difficult to detect through the model because of the fact that the comments usually did not use any common sentiment that could be used to train the network. Word2Vec models which utilizes context of features proved effective in both datasets giving like to like results in comparatively less features after combining with Multi Layered Perceptrons. Additional perspective can also be taken into consideration for more specific results in future works. The nature of attacks are required to be studied further for a more specific and accurate detection. REFERENCES [1] H. Ting, W. S. Liou, D. Liberona, S. L. Wang, and G. M. T. Bermudez, “Towards the detection of cyberbullying based on social network mining techniques,” in Proceedings of 4th International Conference on Behavioral, Economic, and SocioCultural Computing, BESC 2017, 2017. [2] P. GalĂĄn-GarcĂ­a, J. G. de la Puerta, C. L. GĂłmez, I. Santos, and P. G. Bringas, “Supervised machine learning for the detection of troll profiles in twitter social network: Application to a real case of cyberbullying,” 2014. [3] A. Mangaonkar, A. Hayrapetian, and R. Raje, “Collaborative detection of cyberbullying behavior in Twitter data,” 2015. [4] R. Zhao, A. Zhou, and K. Mao, “Automatic detection of cyberbullying on social networks based on bullying features,” 2016. [5] V. Banerjee, J. Telavane, P. Gaikwad, and P. Vartak, “Detection of Cyberbullying Using Deep Neural Network,” 2019. [6] K. Reynolds, A. Kontostathis, and L. Edwards, “Using machine learning to detect cyberbullying,” 2011. [7] J. Yadav, D. Kumar, and D. Chauhan, “Cyberbullying Detection using Pre-Trained BERT Model,” 2020. [8] M. Dadvar and K. Eckert, “Cyberbullying Detection in Social Networks Using Deep Learning Based Models; A Reproducibility Study,” arXiv. 2018. [9] S. Agrawal and A. Awekar, “Deep learning for detecting cyberbullying across multiple social media platforms,” arXiv. 2018. [10] Y. N. Silva, C. Rich, and D. Hall, “BullyBlocker: Towards the identification of cyber bullying in social networking sites,” 2016. [11] Z. Waseem and D. Hovy, “Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter,” 2016. [12] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” 2017. [13] E. Wulczyn, N. Thain, and L. Dixon, “Ex machina: Personal attacks seen at scale,” 2017. [14] A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep learning architectures: a review,” Artif. Intell. Rev., vol. 53, no. 6, 2020, doi: 10.1007/s10462- 019-09794-5. [15] “Detection of Harassment Type of Cyberbullying: A Dictionary of Approach Words and Its Impact” Syed Mahbub, Eric Pardede and A. S. M. Kayes [16] “Early detection of cyberbullying on social media networks” Manuel F. LĂłpez-VizcaĂ­no, Francisco J. NĂłvoa, Victor Carneiro and Fidel Cacheda.