SlideShare a Scribd company logo
1 of 21
CLICKBAIT
classifier
You Won’t Believe What This
ClickBait Classifier Does!
TABLE OF CONTENTS
INTRODUCTION
Data
Preprocessing
Feature
Engineering
Training The
Model
01 02
03 04
Clickbait
YouTuber by the name Vertasium uploaded an
informative video to demonstrate the Magnus effect
by dropping a basketball from the top of a dam, titled
“Strange Applications of Magnus Effect” and received
a few thousands of views on YouTube. Later, the same
video was uploaded on a different website under the
title “Basketball dropped from a dam” and received
tens of millions of views! This simple example
illustrates just how powerful clickbait titles can be
and just how inevitable it is in today’s fast-paced
media world to be able to get viewers or visitors on a
website.
What Is Clickbait?
01
Clickbait
Clickbait is a text or a thumbnail link that is designed to attract
attention and entice users to follow that link and read or view that linked
piece of online content, typically deceptive, sensationalized, or otherwise
misleading.
The teasing title aims to exploit the “curiosity gap”, by providing just
enough information to make readers of websites curious, but not enough to
satisfy their curiosity without clicking through to the linked content.
Click-bait headlines add an element of dishonesty, using enticements
that do not accurately reflect the content being delivered.
—SOMEONE FAMOUS
Data has been scrapped from multiple sources like Twitter, Reuters, The Washington Post, The
Guardian, Bloomberg, The Hindu and WikiNews which comprises all the Non-Clickbait news,
as they are from trusted sources and are known to be reliable and largely encompass news
that are facts reported from around the world.
On the other hand, news headlines are also collected from sources like Buzzfeed, Examiner,
TheOdyssey, Thatscoop, Viralstories, PoliticalInsider, Upworthy, ViralNova and BoredPanda,
which tend to be more clickbaity than facts.
These two types of sources are used to train the model and build a classifier that can detect if
the title is trustworthy or not. The final data is labeled as clickbait or not-clickbait depending on
the source.
Data Collection
—SOMEONE FAMOUS
The headlines data contains punctuations, non-numerical and non-alphabetical
characters and they were removed using regular expressions as they would not
contribute in training the model.
Using NLTK library, the stop words are removed as it adds more noise and takes
the focus away from the keywords.
All the letters are converted into lowercase and tokenized initially into unigrams for
EDA and later into unigrams and bigrams for modeling.
A vector of word frequency is created for visualization purposes and for text
classification and understanding of the data distribution.
Data Preprocessing
—SOMEONE FAMOUS
Clickbait headlines tend to have more exaggerated words (seen below)
with numbers, exclamation and question marks. These features help us
classify the headline text into clickbait and non-clickbait. To understand
the characteristics of the text of the headlines that we are dealing with, we
assign a few features where we mark 1 if contains the feature and 0 if it
doesn’t for the following:
● Starts with or contains exaggerated words
● Starts with or contains question words
● Ends with question mark
● Ends with exclamation mark
● Starts with number
● Headlines word count
Feature Engineering
—SOMEONE FAMOUS
‘Insane’, ‘awesome’, ‘amazing’, ‘won’t believe’,
‘must’, ‘secret’, ‘facts’, ‘ultimate guide’,’ways to
improve’,’list of the best’, ‘why we love’,’you’ll
never guess’,‘strategies’, ‘ingredients’,’click
here to learn more’, ‘what happened next’,
‘see’, ‘live’, ‘you won’t believe’, ‘the last’, ‘you
can now’, ‘this is how’, ‘this is the’,‘this is what’,
‘things you need’, ‘reasons why’
Feature Engineering
—SOMEONE FAMOUS
We analyze word frequencies to find a
pattern within clickbait and non-clickbait
headlines and this is visualized using
WordClouds. We can see a clear
contrast in the type of words between
the two categories. Clickbait headlines
WordCloud have numbers and vague
wordings such as ‘actually’, ‘like’,
‘heres’, ‘need’ and ‘best’.
Exploratory Data analysis
—SOMEONE FAMOUS
Non-clickbait headlines WordCloud
have words that are news and facts
related such as ‘president’, ‘election’,
‘coronavirus’ and ‘australian’. These
tend to be less catchy words.
Exploratory Data analysis
—SOMEONE FAMOUS
We then analyze the word count feature and find that the clickbait headlines
tend to be lengthier than non-clickbait news.
Exploratory Data analysis
—SOMEONE FAMOUS
WORD FREQUENCY
—SOMEONE FAMOUS
Naive Bayes classifier, Random Forest classifier, SVM classifier and Logistic Regression
models are trained and tested and the accuracy and recall values for each of them are
measured to evaluate performance.
In order to avoid false negatives where a non-clickbait headline is classified as clickbait,
the recall value is given more weightage and consideration.
Train the model
—SOMEONE FAMOUS
From the tabulated results
above we can see that Naive
Bayes performs the best for this
dataset in terms of both
accuracy and recall scores.
Other models perform nearly
the same. But we consider
Naive Bayes as it runs faster
compared to the other models,
and this comes especially
handy when the data scales up.
Train the model
—SOMEONE FAMOUS
From the tabulated results
above we can see that Naive
Bayes performs the best for this
dataset in terms of both
accuracy and recall scores.
Other models perform nearly
the same. But we consider
Naive Bayes as it runs faster
compared to the other models,
and this comes especially
handy when the data scales up.
Train the model
—SOMEONE FAMOUS
The top 15 coefficients for clickbait are as follows:
Train the model
TAKEAWAY
Using machine learning algorithms one can train a
model to detect clickbait. As the type of data online
changes and grows, we can include more new data
into the training dataset in the future to build a better
classifier.
This POC performed at a range of 90–93% in accuracy
and recall. Since it worked at such high accuracy, it can
definitely be used on a larger scale of data to filter out
clickbait headlines. This model can be deployed on any
web platform to weed out the misinformation.
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon, infographics &
images by Freepik and illustrations by Storyset
THANK
You.
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon, infographics &
images by Freepik and illustrations by Storyset
Please, keep this slide for the attribution
SPECIAL REMINDERS
JUPITER
Jupiter is a gas giant and the biggest
planet in the entire Solar System
MARS
Despite being red, Mars is actually a
cold place full of iron oxide dust

More Related Content

What's hot

Punch Social Media Trends Report 2022
Punch Social Media Trends Report 2022Punch Social Media Trends Report 2022
Punch Social Media Trends Report 2022
Bryn Foweather
 
Rolling in the Deep. ISACA.SV.2016
Rolling in the Deep. ISACA.SV.2016Rolling in the Deep. ISACA.SV.2016
Rolling in the Deep. ISACA.SV.2016
Bich (Evelyn) Chu
 
Social Media Marketing Trends 2022 // The Global Indie Insights
Social Media Marketing Trends 2022  // The Global Indie InsightsSocial Media Marketing Trends 2022  // The Global Indie Insights
Social Media Marketing Trends 2022 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

What's hot (20)

The Current State of ICT Technologies.pptx
The Current State of ICT Technologies.pptxThe Current State of ICT Technologies.pptx
The Current State of ICT Technologies.pptx
 
Airbnb Social Media Strategy
Airbnb Social Media Strategy Airbnb Social Media Strategy
Airbnb Social Media Strategy
 
Social Marketing and Leveraging Influencers
Social Marketing and Leveraging InfluencersSocial Marketing and Leveraging Influencers
Social Marketing and Leveraging Influencers
 
Data Visualisation
Data VisualisationData Visualisation
Data Visualisation
 
The Power of Creative in Content Marketing
The Power of Creative in Content MarketingThe Power of Creative in Content Marketing
The Power of Creative in Content Marketing
 
Social Media Audit: NETFLIX.pptx
Social Media Audit: NETFLIX.pptxSocial Media Audit: NETFLIX.pptx
Social Media Audit: NETFLIX.pptx
 
Social Media Strategy
Social Media StrategySocial Media Strategy
Social Media Strategy
 
A Social Media Plan for 2022
A Social Media Plan for 2022A Social Media Plan for 2022
A Social Media Plan for 2022
 
Social Media Conference 2010
Social Media Conference 2010Social Media Conference 2010
Social Media Conference 2010
 
Punch Social Media Trends Report 2022
Punch Social Media Trends Report 2022Punch Social Media Trends Report 2022
Punch Social Media Trends Report 2022
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
 
Influencer marketing strategy - module 5 lesson 2
Influencer marketing strategy - module 5 lesson 2Influencer marketing strategy - module 5 lesson 2
Influencer marketing strategy - module 5 lesson 2
 
TikTok Facebook Stories Instagram Stories Social Media Success Feb 2022
TikTok Facebook Stories Instagram Stories Social Media Success Feb 2022TikTok Facebook Stories Instagram Stories Social Media Success Feb 2022
TikTok Facebook Stories Instagram Stories Social Media Success Feb 2022
 
Social Media Trends 2023 - MarketingTrips.pdf
Social Media Trends 2023 - MarketingTrips.pdfSocial Media Trends 2023 - MarketingTrips.pdf
Social Media Trends 2023 - MarketingTrips.pdf
 
Fake news
Fake newsFake news
Fake news
 
Influencer marketing strategy - module 4 lesson 1
Influencer marketing strategy - module 4 lesson 1Influencer marketing strategy - module 4 lesson 1
Influencer marketing strategy - module 4 lesson 1
 
Social Media Mining and Analytics
Social Media Mining and AnalyticsSocial Media Mining and Analytics
Social Media Mining and Analytics
 
Introduction to Social Media Listening
Introduction to Social Media ListeningIntroduction to Social Media Listening
Introduction to Social Media Listening
 
Rolling in the Deep. ISACA.SV.2016
Rolling in the Deep. ISACA.SV.2016Rolling in the Deep. ISACA.SV.2016
Rolling in the Deep. ISACA.SV.2016
 
Social Media Marketing Trends 2022 // The Global Indie Insights
Social Media Marketing Trends 2022  // The Global Indie InsightsSocial Media Marketing Trends 2022  // The Global Indie Insights
Social Media Marketing Trends 2022 // The Global Indie Insights
 

Similar to Ppt Presentation on Clickbait Classifier - Anupama Kurudi

[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
500 Startups
 
There’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaThere’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo Ahava
Web à Québec
 
201201 assn forum_limited_resources
201201 assn forum_limited_resources201201 assn forum_limited_resources
201201 assn forum_limited_resources
lindachreno
 
The analytics-stack-guidebook
The analytics-stack-guidebookThe analytics-stack-guidebook
The analytics-stack-guidebook
Ashish Tiwari
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
Truong Bomi
 
How to Effectively Build a Martech Stack & Integrate Your Marketing Tools
How to Effectively Build a Martech Stack & Integrate Your Marketing ToolsHow to Effectively Build a Martech Stack & Integrate Your Marketing Tools
How to Effectively Build a Martech Stack & Integrate Your Marketing Tools
Pinpointe On-Demand
 

Similar to Ppt Presentation on Clickbait Classifier - Anupama Kurudi (20)

BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017BrightonSEO Takeaways September 2017
BrightonSEO Takeaways September 2017
 
Responding to Context: Using data to design experiences that care about custo...
Responding to Context: Using data to design experiences that care about custo...Responding to Context: Using data to design experiences that care about custo...
Responding to Context: Using data to design experiences that care about custo...
 
Building on the Shoulders of Giants: the Story of Bitbucket Pipelines
Building on the Shoulders of Giants: the Story of Bitbucket PipelinesBuilding on the Shoulders of Giants: the Story of Bitbucket Pipelines
Building on the Shoulders of Giants: the Story of Bitbucket Pipelines
 
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
[500DISTRO] Cracking the SEO Code: Tricks & Tactics To Magnify Search Visibility
 
There’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaThere’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo Ahava
 
Machine Learning for Lead Qualification
Machine Learning for Lead QualificationMachine Learning for Lead Qualification
Machine Learning for Lead Qualification
 
NYC Data Driven Business Meetup - 2.7.17
NYC Data Driven Business Meetup - 2.7.17NYC Data Driven Business Meetup - 2.7.17
NYC Data Driven Business Meetup - 2.7.17
 
201201 assn forum_limited_resources
201201 assn forum_limited_resources201201 assn forum_limited_resources
201201 assn forum_limited_resources
 
Data Visualization Resource Guide (September 2014)
Data Visualization Resource Guide (September 2014)Data Visualization Resource Guide (September 2014)
Data Visualization Resource Guide (September 2014)
 
Why do most machine learning projects never make it to production
Why do most machine learning projects never make it to productionWhy do most machine learning projects never make it to production
Why do most machine learning projects never make it to production
 
The analytics-stack-guidebook
The analytics-stack-guidebookThe analytics-stack-guidebook
The analytics-stack-guidebook
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
 
Dont wait what 300 ld leaders have learned about building data fluency
 Dont wait what 300 ld leaders have learned about building data fluency Dont wait what 300 ld leaders have learned about building data fluency
Dont wait what 300 ld leaders have learned about building data fluency
 
SentimentAnalysisofTwitterProductReviewsDocument.pdf
SentimentAnalysisofTwitterProductReviewsDocument.pdfSentimentAnalysisofTwitterProductReviewsDocument.pdf
SentimentAnalysisofTwitterProductReviewsDocument.pdf
 
CaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para HombreCaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para Hombre
 
CaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para HombreCaballoBronco.com Botas Vaqueras para Hombre
CaballoBronco.com Botas Vaqueras para Hombre
 
How to Effectively Build a Martech Stack & Integrate Your Marketing Tools
How to Effectively Build a Martech Stack & Integrate Your Marketing ToolsHow to Effectively Build a Martech Stack & Integrate Your Marketing Tools
How to Effectively Build a Martech Stack & Integrate Your Marketing Tools
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph Webinar
 
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
Extreme Salesforce Data Volumes Webinar (with Speaker Notes)
 

Recently uploaded

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
Ghuma $ Russian Call Girls Ahmedabad ₹7.5k Pick Up & Drop With Cash Payment 8...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 

Ppt Presentation on Clickbait Classifier - Anupama Kurudi

  • 1. CLICKBAIT classifier You Won’t Believe What This ClickBait Classifier Does!
  • 3. Clickbait YouTuber by the name Vertasium uploaded an informative video to demonstrate the Magnus effect by dropping a basketball from the top of a dam, titled “Strange Applications of Magnus Effect” and received a few thousands of views on YouTube. Later, the same video was uploaded on a different website under the title “Basketball dropped from a dam” and received tens of millions of views! This simple example illustrates just how powerful clickbait titles can be and just how inevitable it is in today’s fast-paced media world to be able to get viewers or visitors on a website.
  • 5. Clickbait Clickbait is a text or a thumbnail link that is designed to attract attention and entice users to follow that link and read or view that linked piece of online content, typically deceptive, sensationalized, or otherwise misleading. The teasing title aims to exploit the “curiosity gap”, by providing just enough information to make readers of websites curious, but not enough to satisfy their curiosity without clicking through to the linked content. Click-bait headlines add an element of dishonesty, using enticements that do not accurately reflect the content being delivered.
  • 6. —SOMEONE FAMOUS Data has been scrapped from multiple sources like Twitter, Reuters, The Washington Post, The Guardian, Bloomberg, The Hindu and WikiNews which comprises all the Non-Clickbait news, as they are from trusted sources and are known to be reliable and largely encompass news that are facts reported from around the world. On the other hand, news headlines are also collected from sources like Buzzfeed, Examiner, TheOdyssey, Thatscoop, Viralstories, PoliticalInsider, Upworthy, ViralNova and BoredPanda, which tend to be more clickbaity than facts. These two types of sources are used to train the model and build a classifier that can detect if the title is trustworthy or not. The final data is labeled as clickbait or not-clickbait depending on the source. Data Collection
  • 7. —SOMEONE FAMOUS The headlines data contains punctuations, non-numerical and non-alphabetical characters and they were removed using regular expressions as they would not contribute in training the model. Using NLTK library, the stop words are removed as it adds more noise and takes the focus away from the keywords. All the letters are converted into lowercase and tokenized initially into unigrams for EDA and later into unigrams and bigrams for modeling. A vector of word frequency is created for visualization purposes and for text classification and understanding of the data distribution. Data Preprocessing
  • 8. —SOMEONE FAMOUS Clickbait headlines tend to have more exaggerated words (seen below) with numbers, exclamation and question marks. These features help us classify the headline text into clickbait and non-clickbait. To understand the characteristics of the text of the headlines that we are dealing with, we assign a few features where we mark 1 if contains the feature and 0 if it doesn’t for the following: ● Starts with or contains exaggerated words ● Starts with or contains question words ● Ends with question mark ● Ends with exclamation mark ● Starts with number ● Headlines word count Feature Engineering
  • 9. —SOMEONE FAMOUS ‘Insane’, ‘awesome’, ‘amazing’, ‘won’t believe’, ‘must’, ‘secret’, ‘facts’, ‘ultimate guide’,’ways to improve’,’list of the best’, ‘why we love’,’you’ll never guess’,‘strategies’, ‘ingredients’,’click here to learn more’, ‘what happened next’, ‘see’, ‘live’, ‘you won’t believe’, ‘the last’, ‘you can now’, ‘this is how’, ‘this is the’,‘this is what’, ‘things you need’, ‘reasons why’ Feature Engineering
  • 10. —SOMEONE FAMOUS We analyze word frequencies to find a pattern within clickbait and non-clickbait headlines and this is visualized using WordClouds. We can see a clear contrast in the type of words between the two categories. Clickbait headlines WordCloud have numbers and vague wordings such as ‘actually’, ‘like’, ‘heres’, ‘need’ and ‘best’. Exploratory Data analysis
  • 11. —SOMEONE FAMOUS Non-clickbait headlines WordCloud have words that are news and facts related such as ‘president’, ‘election’, ‘coronavirus’ and ‘australian’. These tend to be less catchy words. Exploratory Data analysis
  • 12. —SOMEONE FAMOUS We then analyze the word count feature and find that the clickbait headlines tend to be lengthier than non-clickbait news. Exploratory Data analysis
  • 14. —SOMEONE FAMOUS Naive Bayes classifier, Random Forest classifier, SVM classifier and Logistic Regression models are trained and tested and the accuracy and recall values for each of them are measured to evaluate performance. In order to avoid false negatives where a non-clickbait headline is classified as clickbait, the recall value is given more weightage and consideration. Train the model
  • 15. —SOMEONE FAMOUS From the tabulated results above we can see that Naive Bayes performs the best for this dataset in terms of both accuracy and recall scores. Other models perform nearly the same. But we consider Naive Bayes as it runs faster compared to the other models, and this comes especially handy when the data scales up. Train the model
  • 16. —SOMEONE FAMOUS From the tabulated results above we can see that Naive Bayes performs the best for this dataset in terms of both accuracy and recall scores. Other models perform nearly the same. But we consider Naive Bayes as it runs faster compared to the other models, and this comes especially handy when the data scales up. Train the model
  • 17. —SOMEONE FAMOUS The top 15 coefficients for clickbait are as follows: Train the model
  • 18. TAKEAWAY Using machine learning algorithms one can train a model to detect clickbait. As the type of data online changes and grows, we can include more new data into the training dataset in the future to build a better classifier. This POC performed at a range of 90–93% in accuracy and recall. Since it worked at such high accuracy, it can definitely be used on a larger scale of data to filter out clickbait headlines. This model can be deployed on any web platform to weed out the misinformation.
  • 19. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, infographics & images by Freepik and illustrations by Storyset THANK You. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, infographics & images by Freepik and illustrations by Storyset Please, keep this slide for the attribution
  • 20.
  • 21. SPECIAL REMINDERS JUPITER Jupiter is a gas giant and the biggest planet in the entire Solar System MARS Despite being red, Mars is actually a cold place full of iron oxide dust