SlideShare a Scribd company logo
1 of 11
Bots and Spiders vs. Real Users
• You want to know how good the search engine bots are served
• You want to know what else is looking on your page (competition?)
• You want to separate the good from the bad (traffic) for clean
analytics
• You want to get alerted when suspicious traffic kicks in (Fraud?)
• You want a clean and accurate Basis for your Marketing Analytics?
Best possible Source for DATA = Logs
Best possible Analytics Engine =
Get the Logs into New Relic
• Define New Relic the endpoint of your Logs in your CDN or from your servers (in case no CDN is
used)
• Check if logs arrive
(Example for Fastly) (Logs arrived in New Relic)
Make sense of the of the data
• Identify the Bots
Most Spiders, Bots and Crawler identify as such in the name. So we ask the Data
platform to show us a count of all – excluding Mozilla.
SELECT count(*) from Log where request_user_agent not like 'Mozilla%' facet request_user_agent since 1 day ago
Make sense of the of the data
• Exclude the knowns (identify Machines)
Most Real User Agents (Browser) identify as “Mozilla” – so we ask the Data platform to
show us a count of all – excluding Mozilla.
SELECT count(*) from Log where request_user_agent like '%Bot%' or request_user_agent like '%Spider%
' or request_user_agent like '%crawler%' facet request_user_agent since 1 day ago
Make sense of the data (Real vs. Machine)
• Combine the learnings and check what’s going on (enhanced the
check for only text/html content)
SELECT filter(count(*),
where request_user_agent not like 'Mozilla%'
or request_user_agent like '%Crawler%'
or request_user_agent like '%bot%'
or request_user_agent like '%spider%'
as 'Machine Traffic'),
filter(count(*),
where request_user_agent like 'Mozilla%'
and request_user_agent not like '%Crawler%'
and request_user_agent not like '%bot%'
and request_user_agent not like '%spider%'
as 'Real Traffic')
from Log
where type LIKE 'text/html%'
since 1 day ago timeseries
Not every Real User is a Real User
• Not all synthetic monitoring engines Identify as such – not all competitors
checking your content Identify as such – so it might be worth to have a look for
suspicious User agents. A simple count helps but also looking for Traffic spikes –
or a lot of flat traffic.
Linux with a Chrome Browser is not
really the most common
combination. Uuuh….and the
version looks rather old !
32k clicks by just one user Agent?
Hmmm….suspicious.
Worth to alert on fraud !
The stunning result of filtered Traffic
Think Further: Bots helps SEO !!! Let’s see
You just need to ask the right questions

More Related Content

Similar to Bots and spiders

Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawlervinay arora
 
What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approachesAparna Sharma
 
An Introduction to Web Analytics
An Introduction to Web AnalyticsAn Introduction to Web Analytics
An Introduction to Web Analyticsiexpertsforum
 
Website Parameters.pptx
Website Parameters.pptxWebsite Parameters.pptx
Website Parameters.pptxASHAVI2
 
Clickstream Analysis
Clickstream AnalysisClickstream Analysis
Clickstream Analysisintuitiv.de
 
The Birth of a Web Crawling Bot
The Birth of a Web Crawling BotThe Birth of a Web Crawling Bot
The Birth of a Web Crawling BotPromptCloud
 
Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Aparna Sharma
 
Affiliate Summit Orlando Meetup Group: Google Analytics for Beginners
Affiliate Summit Orlando Meetup Group:  Google Analytics for BeginnersAffiliate Summit Orlando Meetup Group:  Google Analytics for Beginners
Affiliate Summit Orlando Meetup Group: Google Analytics for BeginnersMissy Ward
 
Hotjar vs Google Analytics.pdf
Hotjar vs Google Analytics.pdfHotjar vs Google Analytics.pdf
Hotjar vs Google Analytics.pdfAmruta Relekar
 
Crash Course on Google Analytics
Crash Course on Google AnalyticsCrash Course on Google Analytics
Crash Course on Google AnalyticsGrowth Hacking Asia
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
 
Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...
Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...
Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...Ravindra Guntur
 
Complete Google analytics document
Complete Google analytics documentComplete Google analytics document
Complete Google analytics documentParshuram Yadav
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Searching Shodan For Fun And Profit
Searching Shodan For Fun And ProfitSearching Shodan For Fun And Profit
Searching Shodan For Fun And ProfitE Hacking
 

Similar to Bots and spiders (20)

Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approaches
 
An Introduction to Web Analytics
An Introduction to Web AnalyticsAn Introduction to Web Analytics
An Introduction to Web Analytics
 
Website Parameters.pptx
Website Parameters.pptxWebsite Parameters.pptx
Website Parameters.pptx
 
Clickstream Analysis
Clickstream AnalysisClickstream Analysis
Clickstream Analysis
 
The Birth of a Web Crawling Bot
The Birth of a Web Crawling BotThe Birth of a Web Crawling Bot
The Birth of a Web Crawling Bot
 
Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022
 
Affiliate Summit Orlando Meetup Group: Google Analytics for Beginners
Affiliate Summit Orlando Meetup Group:  Google Analytics for BeginnersAffiliate Summit Orlando Meetup Group:  Google Analytics for Beginners
Affiliate Summit Orlando Meetup Group: Google Analytics for Beginners
 
Hotjar vs Google Analytics.pdf
Hotjar vs Google Analytics.pdfHotjar vs Google Analytics.pdf
Hotjar vs Google Analytics.pdf
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
DCI - Free Seo Tools
DCI - Free Seo ToolsDCI - Free Seo Tools
DCI - Free Seo Tools
 
Google analytics
Google analyticsGoogle analytics
Google analytics
 
Crash Course on Google Analytics
Crash Course on Google AnalyticsCrash Course on Google Analytics
Crash Course on Google Analytics
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
Digital marketing
Digital marketingDigital marketing
Digital marketing
 
Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...
Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...
Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...
 
Www usenix-org
Www usenix-orgWww usenix-org
Www usenix-org
 
Complete Google analytics document
Complete Google analytics documentComplete Google analytics document
Complete Google analytics document
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Searching Shodan For Fun And Profit
Searching Shodan For Fun And ProfitSearching Shodan For Fun And Profit
Searching Shodan For Fun And Profit
 

Recently uploaded

Social Media Marketing PPT-Includes Paid media
Social Media Marketing PPT-Includes Paid mediaSocial Media Marketing PPT-Includes Paid media
Social Media Marketing PPT-Includes Paid mediaadityabelde2
 
Brand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdfBrand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdftbatkhuu1
 
Branding strategies of new company .pptx
Branding strategies of new company .pptxBranding strategies of new company .pptx
Branding strategies of new company .pptxVikasTiwari846641
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxelizabethella096
 
What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?riteshhsociall
 
Factors-Influencing-Branding-Strategies.pptx
Factors-Influencing-Branding-Strategies.pptxFactors-Influencing-Branding-Strategies.pptx
Factors-Influencing-Branding-Strategies.pptxVikasTiwari846641
 
Major SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain DigitalMajor SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain DigitalBanyanbrain
 
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756dollysharma2066
 
Instant Digital Issuance: An Overview With Critical First Touch Best Practices
Instant Digital Issuance: An Overview With Critical First Touch Best PracticesInstant Digital Issuance: An Overview With Critical First Touch Best Practices
Instant Digital Issuance: An Overview With Critical First Touch Best PracticesMedia Logic
 
Brand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdfBrand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdftbatkhuu1
 
Kraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationKraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationtbatkhuu1
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...ChesterYang6
 

Recently uploaded (20)

Digital Strategy Master Class - Andrew Rupert
Digital Strategy Master Class - Andrew RupertDigital Strategy Master Class - Andrew Rupert
Digital Strategy Master Class - Andrew Rupert
 
Social Media Marketing PPT-Includes Paid media
Social Media Marketing PPT-Includes Paid mediaSocial Media Marketing PPT-Includes Paid media
Social Media Marketing PPT-Includes Paid media
 
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel LeminTurn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
 
Brand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdfBrand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdf
 
Branding strategies of new company .pptx
Branding strategies of new company .pptxBranding strategies of new company .pptx
Branding strategies of new company .pptx
 
Podcast Marketing Master Class - Roger Nairn
Podcast Marketing Master Class - Roger NairnPodcast Marketing Master Class - Roger Nairn
Podcast Marketing Master Class - Roger Nairn
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
 
What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?
 
Factors-Influencing-Branding-Strategies.pptx
Factors-Influencing-Branding-Strategies.pptxFactors-Influencing-Branding-Strategies.pptx
Factors-Influencing-Branding-Strategies.pptx
 
Major SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain DigitalMajor SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain Digital
 
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose GuirgisCreator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
 
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
 
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
 
Instant Digital Issuance: An Overview With Critical First Touch Best Practices
Instant Digital Issuance: An Overview With Critical First Touch Best PracticesInstant Digital Issuance: An Overview With Critical First Touch Best Practices
Instant Digital Issuance: An Overview With Critical First Touch Best Practices
 
Brand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdfBrand experience Dream Center Peoria Presentation.pdf
Brand experience Dream Center Peoria Presentation.pdf
 
LinkedIn Social Selling Master Class - David Wong
LinkedIn Social Selling Master Class - David WongLinkedIn Social Selling Master Class - David Wong
LinkedIn Social Selling Master Class - David Wong
 
The Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison KaltmanThe Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison Kaltman
 
Kraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentationKraft Mac and Cheese campaign presentation
Kraft Mac and Cheese campaign presentation
 
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan ScheltgenHow to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
 

Bots and spiders

  • 1.
  • 2. Bots and Spiders vs. Real Users • You want to know how good the search engine bots are served • You want to know what else is looking on your page (competition?) • You want to separate the good from the bad (traffic) for clean analytics • You want to get alerted when suspicious traffic kicks in (Fraud?) • You want a clean and accurate Basis for your Marketing Analytics?
  • 3. Best possible Source for DATA = Logs Best possible Analytics Engine =
  • 4. Get the Logs into New Relic • Define New Relic the endpoint of your Logs in your CDN or from your servers (in case no CDN is used) • Check if logs arrive (Example for Fastly) (Logs arrived in New Relic)
  • 5. Make sense of the of the data • Identify the Bots Most Spiders, Bots and Crawler identify as such in the name. So we ask the Data platform to show us a count of all – excluding Mozilla. SELECT count(*) from Log where request_user_agent not like 'Mozilla%' facet request_user_agent since 1 day ago
  • 6. Make sense of the of the data • Exclude the knowns (identify Machines) Most Real User Agents (Browser) identify as “Mozilla” – so we ask the Data platform to show us a count of all – excluding Mozilla. SELECT count(*) from Log where request_user_agent like '%Bot%' or request_user_agent like '%Spider% ' or request_user_agent like '%crawler%' facet request_user_agent since 1 day ago
  • 7. Make sense of the data (Real vs. Machine) • Combine the learnings and check what’s going on (enhanced the check for only text/html content) SELECT filter(count(*), where request_user_agent not like 'Mozilla%' or request_user_agent like '%Crawler%' or request_user_agent like '%bot%' or request_user_agent like '%spider%' as 'Machine Traffic'), filter(count(*), where request_user_agent like 'Mozilla%' and request_user_agent not like '%Crawler%' and request_user_agent not like '%bot%' and request_user_agent not like '%spider%' as 'Real Traffic') from Log where type LIKE 'text/html%' since 1 day ago timeseries
  • 8. Not every Real User is a Real User • Not all synthetic monitoring engines Identify as such – not all competitors checking your content Identify as such – so it might be worth to have a look for suspicious User agents. A simple count helps but also looking for Traffic spikes – or a lot of flat traffic. Linux with a Chrome Browser is not really the most common combination. Uuuh….and the version looks rather old ! 32k clicks by just one user Agent? Hmmm….suspicious.
  • 9. Worth to alert on fraud !
  • 10. The stunning result of filtered Traffic
  • 11. Think Further: Bots helps SEO !!! Let’s see You just need to ask the right questions