2. ABSTRACT
● With the widespread use of social media in this era,
cyberbullying increased rapidly as a cybercrime.
● Cyberbullying is a willful and repeated harm inflicted
through the use of computer, cell phones, and other electronic devices.
● The proposed system aims at detecting cyberbullying, it detects abusive
comments and messages in social media platform.
● The Machine learning algorithm,Naive bayes is used to classify comments and
messages as bullying and non-bullying.
● The project ‘Cyberbullying Detection Using Machine Learning’ discusses and
implements the approach of machine learning in order to solve the threat of
cyberbullying, and thus makes social media a safe place for the users.
3. EXISTING SYSTEM
● For several years, the researchers have worked intensively on cyberbullying
detection to find a way to control or reduce cyberbullying in Social Media
platforms.
● In a research work by Massachusetts Institute of Technology, a system to detect
cyberbullying through textual context in YouTube video comments was
developed, but the system showed less precise classification outcome and
increased false positives.
● Generally most existing systems are focused on effects after cyberbullying
incident and there is no accurate system for online cyberbullying detection.
4. PROPOSED SYSTEM
● The proposed system employs machine learning to avoid human
intervention.
● A dataset containing cyberbullying and non-bullying comments is used to
train the machine learning model using the Sklearn library in Python.
● Naive Bayes algorithm is used for detecting abusive comments and
messages in social media.
5. ● The Naive Bayes algorithm states that:
P(A/B)=(P(B/A) P(A))/P(B)
● In the proposed system automated detection of bullying comments in
social media is implemented.
● The proposed system is platform independent, it can be implemented on
any operating system and it is free to use.
7. MODULE FUNCTIONALITIES
❏ USER MODULE
● Users can sign up to the web application by registering themselves by
providing details like user name,password etc..
● Registered users can also sign in to their profile by using user id and password.
● They can post videos,stories and photos in the web application.
● Users can send friend requests to other users and can also chat with their
friends.
● Users can view,like and comment the videos and photos posted by their
friends in the web application.
8. ❏ ADMIN MODULE
● Admin can handle and make changes in the web application.
● They can also view the requests from users .
● They can also view the comments that have been classified as bullying
and non-bullying.
● They can manage the notifications of users.
9. ❏ MACHINE LEARNING MODULE
● The Machine Learning module is responsible for classifying
comments and messages as bullying or non-bullying.
● From a vast set of comments and messages, the Naive Bayes
algorithm is used to predict bullying comments and messages.
● This module includes the following steps :
➢ Data collection
➢ Data preprocessing
➢ Segmentation
➢ Feature extraction
➢ Training
➢ Testing
11. 1. DATA COLLECTION
● Collecting data for training the Machine Learning model is the basic step
in the machine learning pipeline.
● The predictions made by Machine Learning systems can only be as good as
the data on which they have been trained.
● In this system, dataset containing bullying as well as non-bullying
comments and messages.
● The data set is downloaded from KAGGLE website.
● 80% of dataset is used for training and the remaining 20% is used for
testing.
12. 2. DATA PREPROCESSING
● Real-world raw data and images are often incomplete, inconsistent and lacking in
certain behaviors or trends. They are also likely to contain many errors. So, once
collected, they are pre-processed into a format the machine learning algorithm
can use for the model.
● Data preprocessing in Machine Learning is a crucial step that helps enhance the
quality of data to promote the extraction of meaningful insights from the data.
● The proprocessing step also includes the removal of stop words, special characters
and the conversion of uppercase letters to lowercase.
● The Lemmatization step includes converting tense word into root word. For
example, the word running is converted to its root word run.
13. 3. SEGMENTATION
● Segmentation can be defined as the process of separating sentences
into different tokens.
● N-grams are used for grouping tokens.
● N-grams are used for a variety of things. Some examples include auto
completion of sentences.
● In this project, 2-gram is used to group tokens.
14. 4. FEATURE EXTRACTION
● Feature extraction is the process of taking out a list of words from the text data
and then transforming them into a feature set which is usable by a classifier.
● In this system, TF-IDF vectorizer is used for feature extraction.
● TF-IDF stands for term frequency-inverse document frequency and it is a
measure, used to quantify the importance or relevance of string
representations in a document.
● TF-IDF associates each word in a document with a number that represents how
relevant each word is in that document.
15. 5. TRAINING
● Model training is the key step in machine learning that results in a model ready
to be validated, tested, and deployed.
● The performance of the model determines the quality of the applications that
are built using it.
● Quality of training data and the training algorithm are both important assets
during the model training phase.
● Typically, dataset is split for training and testing.
● All these aspects of model training make it both an involved and important
process in the overall machine learning development cycle.
16. 6. TESTING
● In machine learning, model testing is referred to as the process where
the performance of a fully trained model is evaluated on a testing set.
● The testing set consisting of a set of testing samples should be
separated from the both training and validation sets, but it should
follow the same probability distribution as the training set.
● Each testing sample has a known value of the target.
17. DOMAIN THEORY
➔ Machine learning
● Machine learning (ML) is the study of computer algorithms that improve
automatically through experience.
● Machine learning involves computers discovering how they can perform tasks
without being explicitly programmed to do so.
● The Machine Learning process starts with inputting training data into the
selected algorithm.
● New input data is fed into the machine learning algorithm to test whether the
algorithm works correctly.
18. ➔ NAIVE BAYES
● A Naive Bayes classifier is a probabilistic machine learning model
that’s used for classification task.
● The classifier is based on the Bayes theorem.
Bayes Theorem :
P(A/B)=(P(B/A) P(A))/P(B)
● This system uses Multinomial Naive Bayes Classifier.
● The features/predictors used by the classifier are the frequency of
the words present in the document.
34. COMPARISON BETWEEN
MACHINE LEARNING AND TRANSFER LEARNING APPROACH
Machine Learning:
Machine learning is a subset of artificial intelligence that focuses on the development of
algorithms that allow computers to learn from and make predictions or decisions based on data. It involves
training models on labeled data to recognize patterns and make predictions without being explicitly
programmed.
Transfer Learning:
Transfer learning is a machine learning technique where a model trained on one task is reused or
adapted as the starting point for a model on a second related task.
Usage:
Machine Learning:
In traditional machine learning, models are trained from scratch on specific datasets for
particular tasks, such as image classification, text sentiment analysis, or predictive analytics.
Transfer Learning:
Transfer learning is commonly used in scenarios where data for a specific task is limited or
expensive to obtain. By leveraging pre-trained models, transfer learning can adapt those models to new tasks
with less data.
.
35. Training Process:
Machine Learning:
In machine learning, the training process involves feeding labeled data into an algorithm,
which learns to recognize patterns and make predictions based on that data through iterative adjustments
to its internal parameters.
Transfer Learning:
Transfer learning typically involves taking a pre-trained model, removing the last few layers
(which are task-specific), and then adding new layers tailored to the new task. Data Requirements:
Machine Learning:
Traditional machine learning models require a large amount of labeled data specific to the
task at hand for training.
Transfer Learning:
Transfer learning can be effective with smaller datasets since it leverages knowledge learned
from a different but related task.
Applications:
Machine Learning:
Machine learning techniques are applied in a wide range of applications, including image and
speech recognition, natural language processing, recommendation systems, and more.
Transfer Learning:
Transfer learning is particularly useful in computer vision tasks like object detection and
image classification, as well as in natural language processing tasks such as sentiment analysis and text
36. CONCLUSION
The overall aim of the project “Cyberbullying Detection Using Machine
Learning” is to develop a system that automatically classifies comments
and messages as bullying or non-bullying and also remove the bullying
comments from the web application.
37. BIBLIOGRAPHY
Referenced Sites:
1. Cynthia Van Hee, Gilles Jacobs, Chris Emmery, Bart Desmet, Els Lefever, Ben
Verhoeven, Guy De Pauw, Walter Daelemans, Véronique Hoste, Automatic
detection of cyberbullying in social media text, PloS one 13 (10), e0203794,
2018
2. Sweta Agrawal, Amit Awekar, European conference on information retrieval,
Deep learning for detecting cyberbullying across multiple social media
platforms, 141-153, 2018
3. Ong Chee Hang, Halina Mohamed Dahlan 2019 6th International Conference
on Research and Innovation in Information Systems, Cyberbullying lexicon
for social media, (ICRIIS), 1-6, 2019
4. John Hani, Mohamed Nashaat, Mostafa Ahmed, Zeyad Emad, Eslam Amer,
Ammar Mohammed, Social media cyberbullying detection using machine
learning, Int. J. Adv. Comput. Sci. Appl 10 (5), 703-707, 2019