SlideShare a Scribd company logo
CLICKBAIT
classifier
You Won’t Believe What This
ClickBait Classifier Does!
TABLE OF CONTENTS
INTRODUCTION
Data
Preprocessing
Feature
Engineering
Training The
Model
01 02
03 04
Clickbait
YouTuber by the name Vertasium uploaded an
informative video to demonstrate the Magnus effect
by dropping a basketball from the top of a dam, titled
“Strange Applications of Magnus Effect” and received
a few thousands of views on YouTube. Later, the same
video was uploaded on a different website under the
title “Basketball dropped from a dam” and received
tens of millions of views! This simple example
illustrates just how powerful clickbait titles can be
and just how inevitable it is in today’s fast-paced
media world to be able to get viewers or visitors on a
website.
What Is Clickbait?
01
Clickbait
Clickbait is a text or a thumbnail link that is designed to attract
attention and entice users to follow that link and read or view that linked
piece of online content, typically deceptive, sensationalized, or otherwise
misleading.
The teasing title aims to exploit the “curiosity gap”, by providing just
enough information to make readers of websites curious, but not enough to
satisfy their curiosity without clicking through to the linked content.
Click-bait headlines add an element of dishonesty, using enticements
that do not accurately reflect the content being delivered.
—SOMEONE FAMOUS
Data has been scrapped from multiple sources like Twitter, Reuters, The Washington Post, The
Guardian, Bloomberg, The Hindu and WikiNews which comprises all the Non-Clickbait news,
as they are from trusted sources and are known to be reliable and largely encompass news
that are facts reported from around the world.
On the other hand, news headlines are also collected from sources like Buzzfeed, Examiner,
TheOdyssey, Thatscoop, Viralstories, PoliticalInsider, Upworthy, ViralNova and BoredPanda,
which tend to be more clickbaity than facts.
These two types of sources are used to train the model and build a classifier that can detect if
the title is trustworthy or not. The final data is labeled as clickbait or not-clickbait depending on
the source.
Data Collection
—SOMEONE FAMOUS
The headlines data contains punctuations, non-numerical and non-alphabetical
characters and they were removed using regular expressions as they would not
contribute in training the model.
Using NLTK library, the stop words are removed as it adds more noise and takes
the focus away from the keywords.
All the letters are converted into lowercase and tokenized initially into unigrams for
EDA and later into unigrams and bigrams for modeling.
A vector of word frequency is created for visualization purposes and for text
classification and understanding of the data distribution.
Data Preprocessing
—SOMEONE FAMOUS
Clickbait headlines tend to have more exaggerated words (seen below)
with numbers, exclamation and question marks. These features help us
classify the headline text into clickbait and non-clickbait. To understand
the characteristics of the text of the headlines that we are dealing with, we
assign a few features where we mark 1 if contains the feature and 0 if it
doesn’t for the following:
● Starts with or contains exaggerated words
● Starts with or contains question words
● Ends with question mark
● Ends with exclamation mark
● Starts with number
● Headlines word count
Feature Engineering
—SOMEONE FAMOUS
‘Insane’, ‘awesome’, ‘amazing’, ‘won’t believe’,
‘must’, ‘secret’, ‘facts’, ‘ultimate guide’,’ways to
improve’,’list of the best’, ‘why we love’,’you’ll
never guess’,‘strategies’, ‘ingredients’,’click
here to learn more’, ‘what happened next’,
‘see’, ‘live’, ‘you won’t believe’, ‘the last’, ‘you
can now’, ‘this is how’, ‘this is the’,‘this is what’,
‘things you need’, ‘reasons why’
Feature Engineering
—SOMEONE FAMOUS
We analyze word frequencies to find a
pattern within clickbait and non-clickbait
headlines and this is visualized using
WordClouds. We can see a clear
contrast in the type of words between
the two categories. Clickbait headlines
WordCloud have numbers and vague
wordings such as ‘actually’, ‘like’,
‘heres’, ‘need’ and ‘best’.
Exploratory Data analysis
—SOMEONE FAMOUS
Non-clickbait headlines WordCloud
have words that are news and facts
related such as ‘president’, ‘election’,
‘coronavirus’ and ‘australian’. These
tend to be less catchy words.
Exploratory Data analysis
—SOMEONE FAMOUS
We then analyze the word count feature and find that the clickbait headlines
tend to be lengthier than non-clickbait news.
Exploratory Data analysis
—SOMEONE FAMOUS
WORD FREQUENCY
—SOMEONE FAMOUS
Naive Bayes classifier, Random Forest classifier, SVM classifier and Logistic Regression
models are trained and tested and the accuracy and recall values for each of them are
measured to evaluate performance.
In order to avoid false negatives where a non-clickbait headline is classified as clickbait,
the recall value is given more weightage and consideration.
Train the model
—SOMEONE FAMOUS
From the tabulated results
above we can see that Naive
Bayes performs the best for this
dataset in terms of both
accuracy and recall scores.
Other models perform nearly
the same. But we consider
Naive Bayes as it runs faster
compared to the other models,
and this comes especially
handy when the data scales up.
Train the model
—SOMEONE FAMOUS
From the tabulated results
above we can see that Naive
Bayes performs the best for this
dataset in terms of both
accuracy and recall scores.
Other models perform nearly
the same. But we consider
Naive Bayes as it runs faster
compared to the other models,
and this comes especially
handy when the data scales up.
Train the model
—SOMEONE FAMOUS
The top 15 coefficients for clickbait are as follows:
Train the model
TAKEAWAY
Using machine learning algorithms one can train a
model to detect clickbait. As the type of data online
changes and grows, we can include more new data
into the training dataset in the future to build a better
classifier.
This POC performed at a range of 90–93% in accuracy
and recall. Since it worked at such high accuracy, it can
definitely be used on a larger scale of data to filter out
clickbait headlines. This model can be deployed on any
web platform to weed out the misinformation.
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon, infographics &
images by Freepik and illustrations by Storyset
THANK
You.
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon, infographics &
images by Freepik and illustrations by Storyset
Please, keep this slide for the attribution
SPECIAL REMINDERS
JUPITER
Jupiter is a gas giant and the biggest
planet in the entire Solar System
MARS
Despite being red, Mars is actually a
cold place full of iron oxide dust

More Related Content

What's hot

Keyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic WebKeyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic Web
Bill Slawski
 
Helpful Content Update - la mise à jour qui change à tout jamais la façon don...
Helpful Content Update - la mise à jour qui change à tout jamais la façon don...Helpful Content Update - la mise à jour qui change à tout jamais la façon don...
Helpful Content Update - la mise à jour qui change à tout jamais la façon don...
Groupe Neper
 
Ranking in Google Discover
Ranking in Google DiscoverRanking in Google Discover
Ranking in Google Discover
Lily Ray
 
Website analysis Sample Report
Website analysis  Sample ReportWebsite analysis  Sample Report
Website analysis Sample Report
Gaurav Arora India Digital Marketing Head
 
SEO Strategy For E-commerce Website
SEO Strategy For E-commerce WebsiteSEO Strategy For E-commerce Website
SEO Strategy For E-commerce Website
SysComm international
 
Website audit
Website auditWebsite audit
Website audit
David Durham
 
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEORendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Onely
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Dawn Anderson MSc DigM
 
Tips for optimising for Google Discover
Tips for optimising for Google DiscoverTips for optimising for Google Discover
Tips for optimising for Google Discover
Lucinda Wood
 
Google page rank
Google page rankGoogle page rank
Google page rank
Yifan Li
 
Personal branding - Justine Toms lecture New Bulgarian University Feb 2020
Personal branding - Justine Toms lecture New Bulgarian University Feb 2020Personal branding - Justine Toms lecture New Bulgarian University Feb 2020
Personal branding - Justine Toms lecture New Bulgarian University Feb 2020
Justine Toms
 
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdfBrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
Steven van Vessum
 
Myths, Misconceptions & Mistakes (lessons learned from a decade in digital PR)
Myths, Misconceptions & Mistakes (lessons learned from a decade in digital PR)Myths, Misconceptions & Mistakes (lessons learned from a decade in digital PR)
Myths, Misconceptions & Mistakes (lessons learned from a decade in digital PR)
Hannah Smith
 
pixiv サイバーエージェント共同勉強会 solr導入記
pixiv サイバーエージェント共同勉強会 solr導入記pixiv サイバーエージェント共同勉強会 solr導入記
pixiv サイバーエージェント共同勉強会 solr導入記Takahiro Matsumiya
 
How the E-A-T Ecosystem has Transformed Organic Search - Lily Ray - MozCon 2021
How the E-A-T Ecosystem has Transformed Organic Search - Lily Ray - MozCon 2021How the E-A-T Ecosystem has Transformed Organic Search - Lily Ray - MozCon 2021
How the E-A-T Ecosystem has Transformed Organic Search - Lily Ray - MozCon 2021
Lily Ray
 
What is Pinterest?
What is Pinterest?What is Pinterest?
What is Pinterest?ChicagoNow
 
Building a B2B Marketing Flywheel
Building a B2B Marketing FlywheelBuilding a B2B Marketing Flywheel
Building a B2B Marketing Flywheel
Todd Ebert
 
트위터의 추천 시스템 파헤치기
트위터의 추천 시스템 파헤치기트위터의 추천 시스템 파헤치기
트위터의 추천 시스템 파헤치기
Yan So
 
Actionable Tips to Increase Your Website Authority - Lily Ray
Actionable Tips to Increase Your Website Authority - Lily RayActionable Tips to Increase Your Website Authority - Lily Ray
Actionable Tips to Increase Your Website Authority - Lily Ray
Lily Ray
 

What's hot (20)

Keyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic WebKeyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic Web
 
Helpful Content Update - la mise à jour qui change à tout jamais la façon don...
Helpful Content Update - la mise à jour qui change à tout jamais la façon don...Helpful Content Update - la mise à jour qui change à tout jamais la façon don...
Helpful Content Update - la mise à jour qui change à tout jamais la façon don...
 
Ranking in Google Discover
Ranking in Google DiscoverRanking in Google Discover
Ranking in Google Discover
 
Website analysis Sample Report
Website analysis  Sample ReportWebsite analysis  Sample Report
Website analysis Sample Report
 
SEO Strategy For E-commerce Website
SEO Strategy For E-commerce WebsiteSEO Strategy For E-commerce Website
SEO Strategy For E-commerce Website
 
Website audit
Website auditWebsite audit
Website audit
 
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEORendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
 
Tips for optimising for Google Discover
Tips for optimising for Google DiscoverTips for optimising for Google Discover
Tips for optimising for Google Discover
 
Google page rank
Google page rankGoogle page rank
Google page rank
 
Personal branding - Justine Toms lecture New Bulgarian University Feb 2020
Personal branding - Justine Toms lecture New Bulgarian University Feb 2020Personal branding - Justine Toms lecture New Bulgarian University Feb 2020
Personal branding - Justine Toms lecture New Bulgarian University Feb 2020
 
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdfBrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
BrightonSEO October 2022 - Log File Analysis - Steven van Vessum.pdf
 
Myths, Misconceptions & Mistakes (lessons learned from a decade in digital PR)
Myths, Misconceptions & Mistakes (lessons learned from a decade in digital PR)Myths, Misconceptions & Mistakes (lessons learned from a decade in digital PR)
Myths, Misconceptions & Mistakes (lessons learned from a decade in digital PR)
 
pixiv サイバーエージェント共同勉強会 solr導入記
pixiv サイバーエージェント共同勉強会 solr導入記pixiv サイバーエージェント共同勉強会 solr導入記
pixiv サイバーエージェント共同勉強会 solr導入記
 
Instagram presentation
Instagram presentationInstagram presentation
Instagram presentation
 
How the E-A-T Ecosystem has Transformed Organic Search - Lily Ray - MozCon 2021
How the E-A-T Ecosystem has Transformed Organic Search - Lily Ray - MozCon 2021How the E-A-T Ecosystem has Transformed Organic Search - Lily Ray - MozCon 2021
How the E-A-T Ecosystem has Transformed Organic Search - Lily Ray - MozCon 2021
 
What is Pinterest?
What is Pinterest?What is Pinterest?
What is Pinterest?
 
Building a B2B Marketing Flywheel
Building a B2B Marketing FlywheelBuilding a B2B Marketing Flywheel
Building a B2B Marketing Flywheel
 
트위터의 추천 시스템 파헤치기
트위터의 추천 시스템 파헤치기트위터의 추천 시스템 파헤치기
트위터의 추천 시스템 파헤치기
 
Actionable Tips to Increase Your Website Authority - Lily Ray
Actionable Tips to Increase Your Website Authority - Lily RayActionable Tips to Increase Your Website Authority - Lily Ray
Actionable Tips to Increase Your Website Authority - Lily Ray
 

Similar to Ppt Presentation on Clickbait Classifier - Anupama Kurudi

BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017
Semrush
 
Responding to Context: Using data to design experiences that care about custo...
Responding to Context: Using data to design experiences that care about custo...Responding to Context: Using data to design experiences that care about custo...
Responding to Context: Using data to design experiences that care about custo...
Hollie Lubbock
 
Building on the Shoulders of Giants: the Story of Bitbucket Pipelines
Building on the Shoulders of Giants: the Story of Bitbucket PipelinesBuilding on the Shoulders of Giants: the Story of Bitbucket Pipelines
Building on the Shoulders of Giants: the Story of Bitbucket Pipelines
Atlassian
 
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
500 Startups
 
There’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaThere’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo Ahava
Web à Québec
 
Machine Learning for Lead Qualification
Machine Learning for Lead QualificationMachine Learning for Lead Qualification
Machine Learning for Lead Qualification
Rosanna Garcia
 
NYC Data Driven Business Meetup - 2.7.17
NYC Data Driven Business Meetup - 2.7.17NYC Data Driven Business Meetup - 2.7.17
NYC Data Driven Business Meetup - 2.7.17
Karl Pawlewicz
 
201201 assn forum_limited_resources
201201 assn forum_limited_resources201201 assn forum_limited_resources
201201 assn forum_limited_resourceslindachreno
 
Data Visualization Resource Guide (September 2014)
Data Visualization Resource Guide (September 2014)Data Visualization Resource Guide (September 2014)
Data Visualization Resource Guide (September 2014)
Amanda Makulec
 
Why do most machine learning projects never make it to production
Why do most machine learning projects never make it to productionWhy do most machine learning projects never make it to production
Why do most machine learning projects never make it to production
Cameron Vetter
 
The analytics-stack-guidebook
The analytics-stack-guidebookThe analytics-stack-guidebook
The analytics-stack-guidebook
Ashish Tiwari
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
Sara Hooker
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
Truong Bomi
 
Dont wait what 300 ld leaders have learned about building data fluency
 Dont wait what 300 ld leaders have learned about building data fluency Dont wait what 300 ld leaders have learned about building data fluency
Dont wait what 300 ld leaders have learned about building data fluency
Human Capital Media
 
SentimentAnalysisofTwitterProductReviewsDocument.pdf
SentimentAnalysisofTwitterProductReviewsDocument.pdfSentimentAnalysisofTwitterProductReviewsDocument.pdf
SentimentAnalysisofTwitterProductReviewsDocument.pdf
DevinSohi
 
CaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para HombreCaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com
 
CaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para HombreCaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com
 
How to Effectively Build a Martech Stack & Integrate Your Marketing Tools
How to Effectively Build a Martech Stack & Integrate Your Marketing ToolsHow to Effectively Build a Martech Stack & Integrate Your Marketing Tools
How to Effectively Build a Martech Stack & Integrate Your Marketing Tools
Pinpointe On-Demand
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph Webinar
Neo4j
 
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
Salesforce Developers
 

Similar to Ppt Presentation on Clickbait Classifier - Anupama Kurudi (20)

BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017
 
Responding to Context: Using data to design experiences that care about custo...
Responding to Context: Using data to design experiences that care about custo...Responding to Context: Using data to design experiences that care about custo...
Responding to Context: Using data to design experiences that care about custo...
 
Building on the Shoulders of Giants: the Story of Bitbucket Pipelines
Building on the Shoulders of Giants: the Story of Bitbucket PipelinesBuilding on the Shoulders of Giants: the Story of Bitbucket Pipelines
Building on the Shoulders of Giants: the Story of Bitbucket Pipelines
 
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
 
There’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaThere’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo Ahava
 
Machine Learning for Lead Qualification
Machine Learning for Lead QualificationMachine Learning for Lead Qualification
Machine Learning for Lead Qualification
 
NYC Data Driven Business Meetup - 2.7.17
NYC Data Driven Business Meetup - 2.7.17NYC Data Driven Business Meetup - 2.7.17
NYC Data Driven Business Meetup - 2.7.17
 
201201 assn forum_limited_resources
201201 assn forum_limited_resources201201 assn forum_limited_resources
201201 assn forum_limited_resources
 
Data Visualization Resource Guide (September 2014)
Data Visualization Resource Guide (September 2014)Data Visualization Resource Guide (September 2014)
Data Visualization Resource Guide (September 2014)
 
Why do most machine learning projects never make it to production
Why do most machine learning projects never make it to productionWhy do most machine learning projects never make it to production
Why do most machine learning projects never make it to production
 
The analytics-stack-guidebook
The analytics-stack-guidebookThe analytics-stack-guidebook
The analytics-stack-guidebook
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
 
Dont wait what 300 ld leaders have learned about building data fluency
 Dont wait what 300 ld leaders have learned about building data fluency Dont wait what 300 ld leaders have learned about building data fluency
Dont wait what 300 ld leaders have learned about building data fluency
 
SentimentAnalysisofTwitterProductReviewsDocument.pdf
SentimentAnalysisofTwitterProductReviewsDocument.pdfSentimentAnalysisofTwitterProductReviewsDocument.pdf
SentimentAnalysisofTwitterProductReviewsDocument.pdf
 
CaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para HombreCaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para Hombre
 
CaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para HombreCaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para Hombre
 
How to Effectively Build a Martech Stack & Integrate Your Marketing Tools
How to Effectively Build a Martech Stack & Integrate Your Marketing ToolsHow to Effectively Build a Martech Stack & Integrate Your Marketing Tools
How to Effectively Build a Martech Stack & Integrate Your Marketing Tools
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph Webinar
 
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
 

Recently uploaded

road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 

Recently uploaded (20)

road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 

Ppt Presentation on Clickbait Classifier - Anupama Kurudi

  • 1. CLICKBAIT classifier You Won’t Believe What This ClickBait Classifier Does!
  • 3. Clickbait YouTuber by the name Vertasium uploaded an informative video to demonstrate the Magnus effect by dropping a basketball from the top of a dam, titled “Strange Applications of Magnus Effect” and received a few thousands of views on YouTube. Later, the same video was uploaded on a different website under the title “Basketball dropped from a dam” and received tens of millions of views! This simple example illustrates just how powerful clickbait titles can be and just how inevitable it is in today’s fast-paced media world to be able to get viewers or visitors on a website.
  • 5. Clickbait Clickbait is a text or a thumbnail link that is designed to attract attention and entice users to follow that link and read or view that linked piece of online content, typically deceptive, sensationalized, or otherwise misleading. The teasing title aims to exploit the “curiosity gap”, by providing just enough information to make readers of websites curious, but not enough to satisfy their curiosity without clicking through to the linked content. Click-bait headlines add an element of dishonesty, using enticements that do not accurately reflect the content being delivered.
  • 6. —SOMEONE FAMOUS Data has been scrapped from multiple sources like Twitter, Reuters, The Washington Post, The Guardian, Bloomberg, The Hindu and WikiNews which comprises all the Non-Clickbait news, as they are from trusted sources and are known to be reliable and largely encompass news that are facts reported from around the world. On the other hand, news headlines are also collected from sources like Buzzfeed, Examiner, TheOdyssey, Thatscoop, Viralstories, PoliticalInsider, Upworthy, ViralNova and BoredPanda, which tend to be more clickbaity than facts. These two types of sources are used to train the model and build a classifier that can detect if the title is trustworthy or not. The final data is labeled as clickbait or not-clickbait depending on the source. Data Collection
  • 7. —SOMEONE FAMOUS The headlines data contains punctuations, non-numerical and non-alphabetical characters and they were removed using regular expressions as they would not contribute in training the model. Using NLTK library, the stop words are removed as it adds more noise and takes the focus away from the keywords. All the letters are converted into lowercase and tokenized initially into unigrams for EDA and later into unigrams and bigrams for modeling. A vector of word frequency is created for visualization purposes and for text classification and understanding of the data distribution. Data Preprocessing
  • 8. —SOMEONE FAMOUS Clickbait headlines tend to have more exaggerated words (seen below) with numbers, exclamation and question marks. These features help us classify the headline text into clickbait and non-clickbait. To understand the characteristics of the text of the headlines that we are dealing with, we assign a few features where we mark 1 if contains the feature and 0 if it doesn’t for the following: ● Starts with or contains exaggerated words ● Starts with or contains question words ● Ends with question mark ● Ends with exclamation mark ● Starts with number ● Headlines word count Feature Engineering
  • 9. —SOMEONE FAMOUS ‘Insane’, ‘awesome’, ‘amazing’, ‘won’t believe’, ‘must’, ‘secret’, ‘facts’, ‘ultimate guide’,’ways to improve’,’list of the best’, ‘why we love’,’you’ll never guess’,‘strategies’, ‘ingredients’,’click here to learn more’, ‘what happened next’, ‘see’, ‘live’, ‘you won’t believe’, ‘the last’, ‘you can now’, ‘this is how’, ‘this is the’,‘this is what’, ‘things you need’, ‘reasons why’ Feature Engineering
  • 10. —SOMEONE FAMOUS We analyze word frequencies to find a pattern within clickbait and non-clickbait headlines and this is visualized using WordClouds. We can see a clear contrast in the type of words between the two categories. Clickbait headlines WordCloud have numbers and vague wordings such as ‘actually’, ‘like’, ‘heres’, ‘need’ and ‘best’. Exploratory Data analysis
  • 11. —SOMEONE FAMOUS Non-clickbait headlines WordCloud have words that are news and facts related such as ‘president’, ‘election’, ‘coronavirus’ and ‘australian’. These tend to be less catchy words. Exploratory Data analysis
  • 12. —SOMEONE FAMOUS We then analyze the word count feature and find that the clickbait headlines tend to be lengthier than non-clickbait news. Exploratory Data analysis
  • 14. —SOMEONE FAMOUS Naive Bayes classifier, Random Forest classifier, SVM classifier and Logistic Regression models are trained and tested and the accuracy and recall values for each of them are measured to evaluate performance. In order to avoid false negatives where a non-clickbait headline is classified as clickbait, the recall value is given more weightage and consideration. Train the model
  • 15. —SOMEONE FAMOUS From the tabulated results above we can see that Naive Bayes performs the best for this dataset in terms of both accuracy and recall scores. Other models perform nearly the same. But we consider Naive Bayes as it runs faster compared to the other models, and this comes especially handy when the data scales up. Train the model
  • 16. —SOMEONE FAMOUS From the tabulated results above we can see that Naive Bayes performs the best for this dataset in terms of both accuracy and recall scores. Other models perform nearly the same. But we consider Naive Bayes as it runs faster compared to the other models, and this comes especially handy when the data scales up. Train the model
  • 17. —SOMEONE FAMOUS The top 15 coefficients for clickbait are as follows: Train the model
  • 18. TAKEAWAY Using machine learning algorithms one can train a model to detect clickbait. As the type of data online changes and grows, we can include more new data into the training dataset in the future to build a better classifier. This POC performed at a range of 90–93% in accuracy and recall. Since it worked at such high accuracy, it can definitely be used on a larger scale of data to filter out clickbait headlines. This model can be deployed on any web platform to weed out the misinformation.
  • 19. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, infographics & images by Freepik and illustrations by Storyset THANK You. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, infographics & images by Freepik and illustrations by Storyset Please, keep this slide for the attribution
  • 20.
  • 21. SPECIAL REMINDERS JUPITER Jupiter is a gas giant and the biggest planet in the entire Solar System MARS Despite being red, Mars is actually a cold place full of iron oxide dust