SlideShare a Scribd company logo
1 of 18
Akash Singh
Kumar Vaibhav
Moderation
System for
Comment
Classification in
Native Language
Bankground
&
Problem
Statement
With the emergence of the internet, the lock to address the mass
community has been unlocked. Today, every person can have an
online presence, which he/she can use to express his/her views on
various social networking sites like Twitter, YouTube, Facebook,
Instagram, etc. These sites offer easy access to their platform, with
few to no checks of the user. Some people exploit these loopholes and
use their undetectable identities to disturb others’ peace. The comment
section of posts which is used for meaningful discussion over the
published content now contains toxic and offensive messages. Many
users are demanding to remove these features as they offer little to no
value. A system that detects toxic texts containing insults, threats, etc.,
would offer great help in filtering these comment sections.
Sustainable
Development
Goals
Literature Review
Refer to the
Literature Survey
Doc.....
Aim
&
Objectives
The main aim of this research is to propose a moderation system that can filter out
offensive, harsh, abusive comments, from the comment sections of various social media
platforms. Such models are present in resource-rich languages like English, French, etc.
With the help of Natural Language Processing, these models can be applied to comments,
written in Hinglish (Hindi + English) language.
The research objectives are formulated based on the aim of the study, which is as follows:
• To classify offensive words commonly found in the comments in various classes such
as abusive, hateful, bulgar, insult, threats, etc.
• To apply various pre-processing techniques on the self-created dataset
• To compare various predictive models to identify the most accurate model to classify
comments into their rightful classes.
• To evaluate the performance of the selected model with the created dataset
• To integrate the evaluated model with platforms to help moderators filter out
classified comments
Significance of the Study
In the era of the internet, everyone is living two lives, physical and digital. On average, a
person is digitally more active than physically. For example, a person with no friends can
have 100 friends on social media platforms. These digital identities are undetectable due to
various features offering users security. Most people show their emotions in the form of
various posts, videos, tweets, etc. Many people view them, show their support, and
appreciate the creator via the comment section offered in this content. But some people use
these comment sections to spread negativity. These comments contain disheartening
messages which can discourage the creator. Also, they make meaningful discussions
disturbing, due to which many users do not take part in them.
These comments can be removed by using a moderator but the volume of comments
posted makes it impossible for any moderator to filter through all the comments. Our work
will help these moderators easily sort and filter these comments, without the help of the
moderator. In India, most people use the Hinglish language to chat and comment. Our work
would be first in this language. It will also be able to filter English comments. Our work will
use the multi-lingual model to filter toxic comments to provide better performance.
Scope of the Study
In this project, we aim to develop a model that can classify
toxic comments written in English as well as in the Hinglish
language. Supporting different languages is very difficult for
this project. Also, detecting misspellings and identifying
substituted words being used to hide the original word is out
of the scope of this project. However, the developed
moderator system can be used as a prototype to be used in
different languages to identify toxic comments.
Akash Singh
Kumar Vaibhav
Dataset is self created, using python
inbuilt youtube_comment_scrapper,
which requires a youtube link as an
input only.
It returns a csv file containing the
[Comments, Time, Like, UserLink,
user].
Dropping out few columns as they have no requirement
Columns like "Unnamed:0", "Likes", "Time", "UserLink", "user" are not
needed in the while finding out whether a comment is a toxic or not.
Research
Methodology
Thank
You

More Related Content

Similar to NLP PPT project for youtube comment classification

social media chat application main ppt.pptx
social media chat application main ppt.pptxsocial media chat application main ppt.pptx
social media chat application main ppt.pptx
sprasad829829
 
Educational Technology YWC
Educational Technology YWCEducational Technology YWC
Educational Technology YWC
Heidi Dusek
 

Similar to NLP PPT project for youtube comment classification (20)

social media chat application main ppt.pptx
social media chat application main ppt.pptxsocial media chat application main ppt.pptx
social media chat application main ppt.pptx
 
Building the Social Library Online - Copenhagen
Building the Social Library Online - CopenhagenBuilding the Social Library Online - Copenhagen
Building the Social Library Online - Copenhagen
 
Ists
IstsIsts
Ists
 
Presentation 1
Presentation 1Presentation 1
Presentation 1
 
Social media classroom
Social media classroomSocial media classroom
Social media classroom
 
Framework for Product Recommandation for Review Dataset
Framework for Product Recommandation for Review DatasetFramework for Product Recommandation for Review Dataset
Framework for Product Recommandation for Review Dataset
 
Web 2.0 tools presentation taylor edux432
Web 2.0 tools presentation taylor edux432Web 2.0 tools presentation taylor edux432
Web 2.0 tools presentation taylor edux432
 
Web 2.0 tools and Education
Web 2.0 tools and EducationWeb 2.0 tools and Education
Web 2.0 tools and Education
 
Educational Technology YWC
Educational Technology YWCEducational Technology YWC
Educational Technology YWC
 
Hate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningHate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine Learning
 
Com 295 academic adviser ....tutorialrank.com
Com 295 academic adviser ....tutorialrank.comCom 295 academic adviser ....tutorialrank.com
Com 295 academic adviser ....tutorialrank.com
 
COM 295 Education Specialist |tutorialrank.com
COM 295 Education Specialist |tutorialrank.comCOM 295 Education Specialist |tutorialrank.com
COM 295 Education Specialist |tutorialrank.com
 
COM 295 Education Specialist |tutorialrank.com
COM 295 Education Specialist |tutorialrank.comCOM 295 Education Specialist |tutorialrank.com
COM 295 Education Specialist |tutorialrank.com
 
Industry project part2
Industry project part2Industry project part2
Industry project part2
 
Sentiment Analysis in Hindi Language : A Survey
Sentiment Analysis in Hindi Language : A SurveySentiment Analysis in Hindi Language : A Survey
Sentiment Analysis in Hindi Language : A Survey
 
Web 2
Web 2Web 2
Web 2
 
Social context = Social Experience + Governance
Social context = Social Experience + GovernanceSocial context = Social Experience + Governance
Social context = Social Experience + Governance
 
Work 2.0 Tech Best Practices Aenc
Work 2.0   Tech Best Practices   AencWork 2.0   Tech Best Practices   Aenc
Work 2.0 Tech Best Practices Aenc
 
Web2Expo NY 2009 Presentation
Web2Expo NY 2009 PresentationWeb2Expo NY 2009 Presentation
Web2Expo NY 2009 Presentation
 
ONLINE TOXIC COMMENTS.pptx
ONLINE TOXIC COMMENTS.pptxONLINE TOXIC COMMENTS.pptx
ONLINE TOXIC COMMENTS.pptx
 

Recently uploaded

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 

Recently uploaded (20)

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 

NLP PPT project for youtube comment classification

  • 1. Akash Singh Kumar Vaibhav Moderation System for Comment Classification in Native Language
  • 3. With the emergence of the internet, the lock to address the mass community has been unlocked. Today, every person can have an online presence, which he/she can use to express his/her views on various social networking sites like Twitter, YouTube, Facebook, Instagram, etc. These sites offer easy access to their platform, with few to no checks of the user. Some people exploit these loopholes and use their undetectable identities to disturb others’ peace. The comment section of posts which is used for meaningful discussion over the published content now contains toxic and offensive messages. Many users are demanding to remove these features as they offer little to no value. A system that detects toxic texts containing insults, threats, etc., would offer great help in filtering these comment sections.
  • 6. Refer to the Literature Survey Doc.....
  • 8. The main aim of this research is to propose a moderation system that can filter out offensive, harsh, abusive comments, from the comment sections of various social media platforms. Such models are present in resource-rich languages like English, French, etc. With the help of Natural Language Processing, these models can be applied to comments, written in Hinglish (Hindi + English) language. The research objectives are formulated based on the aim of the study, which is as follows: • To classify offensive words commonly found in the comments in various classes such as abusive, hateful, bulgar, insult, threats, etc. • To apply various pre-processing techniques on the self-created dataset • To compare various predictive models to identify the most accurate model to classify comments into their rightful classes. • To evaluate the performance of the selected model with the created dataset • To integrate the evaluated model with platforms to help moderators filter out classified comments
  • 10. In the era of the internet, everyone is living two lives, physical and digital. On average, a person is digitally more active than physically. For example, a person with no friends can have 100 friends on social media platforms. These digital identities are undetectable due to various features offering users security. Most people show their emotions in the form of various posts, videos, tweets, etc. Many people view them, show their support, and appreciate the creator via the comment section offered in this content. But some people use these comment sections to spread negativity. These comments contain disheartening messages which can discourage the creator. Also, they make meaningful discussions disturbing, due to which many users do not take part in them. These comments can be removed by using a moderator but the volume of comments posted makes it impossible for any moderator to filter through all the comments. Our work will help these moderators easily sort and filter these comments, without the help of the moderator. In India, most people use the Hinglish language to chat and comment. Our work would be first in this language. It will also be able to filter English comments. Our work will use the multi-lingual model to filter toxic comments to provide better performance.
  • 11. Scope of the Study
  • 12. In this project, we aim to develop a model that can classify toxic comments written in English as well as in the Hinglish language. Supporting different languages is very difficult for this project. Also, detecting misspellings and identifying substituted words being used to hide the original word is out of the scope of this project. However, the developed moderator system can be used as a prototype to be used in different languages to identify toxic comments.
  • 14. Dataset is self created, using python inbuilt youtube_comment_scrapper, which requires a youtube link as an input only. It returns a csv file containing the [Comments, Time, Like, UserLink, user].
  • 15. Dropping out few columns as they have no requirement Columns like "Unnamed:0", "Likes", "Time", "UserLink", "user" are not needed in the while finding out whether a comment is a toxic or not.
  • 16.