2. INTRODUCTION
Cyberbullying is the use of electronic communication to
bully a person, typically by sending messages of an
intimidating or threatening nature.
Cyberbullying Detection implements our coded, machine
learning algorithms, in finding a negative comment from
the messages it receives by a user. The algorithm first
gives the message a value and then based on our pre
trained data, it decides if the comment is harsh enough to
3. WHAT IS CYBERBULLIYING ?
The use of electronic media or communication channel
to bully a person, typically by sending messages
intimidating or threatening is known as cyberbullying.
The technology is used to intentionally hurt or
embarrass another person.
Cyberbullying includes sending, posting, or sharing
negative, false, harmful or mean content about someone
else.
4. ISSUE RELATED TO CYBERBULLYING
Classifying the conversation into normal chat/text or
under bullying attributes.
Cyberbullying is the one of the most mentally damaging
problems on internet.
The data needs to be categorized properly before using
any approach to stop the cyberbullying activity.
It result in catastrophic impact on self and personal lives
especially of students.
7. MACHINE LEARNING METHOD
This method uses a machine learning method known as
Support Vector Method(SVM) to detect any inappropriate
entry.
This method is designed to find the issue of online
malicious entries especially on informal school websites.
The software is designed for automatically detecting the
cyberbullying cases.
This informal school websites contains information
about teachers and students.
9. TRAINING PHASE
Checking school website.
Manually detecting cyberbullying entries.
Extraction of negative words and adding them to
lexicon.
Estimating word similarity with Levenshtein distance.
(It is a string metric for measuring the difference between two sequences.)
Training with training Vector Machine Algorithm.
10. TEST PHASE
Checking school websites.
Detecting cyberbullying entries by SVM model.
(SVM is a method of supervised machine learning which is used for classify data.)
Part of speech analysis detect the harmful entry.
Estimating similarity with Levenshtein distance.
Marking and visualizing harmful entries.
11. SENTIMENT ANALYSIS
Sentiment classifier is used to classify negative and
positive categories by using Machine Learning
Algorithm.
The aims is to determine the bullying instances in social
media networks.
Twitter is used as the source of data.
12. TECHNOLOGY USED
Ling Pipe : A toolkit for processing text using
computational linguistics.
Tweet Extractor : To extract tweets from twitter
continuously.
Gephi : Open source Graph Visualization and
manipulation software.
Amazon’s Mechanical Turk Service : It is a crowdsourcing
marketplace that makes it easier for individuals and
13. DATA COLLECTION AND PRE-
PROCESSING
Tweets where collected from different
sources, around 5000 tweets.
Use of Bag-of-Words model. It takes
every word in a sentence as features ,
the whole sentence is represented by an
unordered collection of words.
14. RESULT
Amazon’s Mechanical Turk classified unlabelled data
which was used to verify and validate newly labelled data
provided by Machine learning algorithm.
Training 500 Tweets
Positive Negative Accuracy
Amazon’s Mturk 65.2% 74.0% 67.1%
15. SENTIMENT ANALYSIS CONCLUSION
This approach leverages the power of Sentiment
analysis.
The classifier is close to 70% accurate.
It is not the best result as expected due to restrictions
from accessing unlimited content from twitter.
16. SOFTWARE TO DETECT
CYBERBULLYING CONTENT
New types of devices connected to internet such as
smartphones and tablets further exacerbated the
problem of cyberbullying.
Android application which automatically detects a
possible harmful content in a text.
This application use machine learning method to spot
any undesirable content.
17. APPLICATION
Application is built for devices supporting Android OS.
Java8 and Android studio is used.
Gives users interface for detection of harmful contents.
18. HARMFUL CONTENT DETECTION
PROCESS
The application contains one activity responsible for
interacting with user.
For the process of checking harmful content the
application starts a background thread.
The user can still use the device even if checking process
takes a while.
19. METHOD
The method classifies messages as harmful or not by
using a classifier trained with language modelling
method based on Brute Force algorithm.
Brute Force : Algorithm using combinatorial approach
usually generate a massive number of combinations-
potential answers to a given problem.
Algorithm applied for automatic extraction of sentence
patterns.
All patterns used in classification was stored on mobile
device.
21. RESULT
Precision = 79%
Recall = 79%
PRECISION is the ratio of the number of relevant records
retrieved to the total number of irrelevant and relevant
records retrieved.
RECALL is the ratio of the number of relevant records
retrieved to the total number of relevant records in the
database.
Requires minimal human effort.
22. OTHER SOFTWARES FOR DETECTING
CYBERBULLYING
1. FearNot!
2. Smartians Radar
3. ReThink
4. PocketGuardian
5. Cyber Buddy
23. CHALLENGES FACED
Preventing the removal of valuable messages when
attempting to filter data.
Privacy concerns.
Incidents should be reported as earlier as possible.
False reporting.
24. CONCLUSION
• The use of internet and social media has clear
advantages for societies, but their frequent use may also
have significant adverse consequences.
• This involves unwanted sexual exposure, cybercrime
and cyberbullying.
• We developed a model for detecting cyberbullying
behavior and its severity in Twitter.