Bots and spiders

•Download as PPTX, PDF•

0 likes•253 views

This document discusses how to analyze web traffic logs to distinguish between real human users and non-human traffic such as bots, spiders, and crawlers. It recommends sending logs to New Relic for analysis and using queries to: 1) Identify bot and crawler traffic by searching for those terms in the user agent field. 2) Exclude known browsers like Mozilla to find real users. 3) Combine the queries to see totals for machine traffic versus real human traffic. 4) Also look for suspicious user agents showing atypical traffic patterns that could indicate fraud.

Marketing

Bots and Spiders vs. Real Users
• You want to know how good the search engine bots are served
• You want to know what else is looking on your page (competition?)
• You want to separate the good from the bad (traffic) for clean
analytics
• You want to get alerted when suspicious traffic kicks in (Fraud?)
• You want a clean and accurate Basis for your Marketing Analytics?

Best possible Source for DATA = Logs
Best possible Analytics Engine =

Get the Logs into New Relic
• Define New Relic the endpoint of your Logs in your CDN or from your servers (in case no CDN is
used)
• Check if logs arrive
(Example for Fastly) (Logs arrived in New Relic)

Make sense of the of the data
• Identify the Bots
Most Spiders, Bots and Crawler identify as such in the name. So we ask the Data
platform to show us a count of all – excluding Mozilla.
SELECT count(*) from Log where request_user_agent not like 'Mozilla%' facet request_user_agent since 1 day ago

Make sense of the of the data
• Exclude the knowns (identify Machines)
Most Real User Agents (Browser) identify as “Mozilla” – so we ask the Data platform to
show us a count of all – excluding Mozilla.
SELECT count(*) from Log where request_user_agent like '%Bot%' or request_user_agent like '%Spider%
' or request_user_agent like '%crawler%' facet request_user_agent since 1 day ago

Make sense of the data (Real vs. Machine)
• Combine the learnings and check what’s going on (enhanced the
check for only text/html content)
SELECT filter(count(*),
where request_user_agent not like 'Mozilla%'
or request_user_agent like '%Crawler%'
or request_user_agent like '%bot%'
or request_user_agent like '%spider%'
as 'Machine Traffic'),
filter(count(*),
where request_user_agent like 'Mozilla%'
and request_user_agent not like '%Crawler%'
and request_user_agent not like '%bot%'
and request_user_agent not like '%spider%'
as 'Real Traffic')
from Log
where type LIKE 'text/html%'
since 1 day ago timeseries

Not every Real User is a Real User
• Not all synthetic monitoring engines Identify as such – not all competitors
checking your content Identify as such – so it might be worth to have a look for
suspicious User agents. A simple count helps but also looking for Traffic spikes –
or a lot of flat traffic.
Linux with a Chrome Browser is not
really the most common
combination. Uuuh….and the
version looks rather old !
32k clicks by just one user Agent?
Hmmm….suspicious.

Think Further: Bots helps SEO !!! Let’s see
You just need to ask the right questions

Similar to Bots and spiders

Search engine and web crawlervinay arora

What are the different types of web scraping approachesAparna Sharma

An Introduction to Web Analyticsiexpertsforum

Website Parameters.pptxASHAVI2

Clickstream Analysisintuitiv.de

The Birth of a Web Crawling BotPromptCloud

Top 13 web scraping tools in 2022Aparna Sharma

Affiliate Summit Orlando Meetup Group: Google Analytics for BeginnersMissy Ward

Hotjar vs Google Analytics.pdfAmruta Relekar

What is web scraping?Brijesh Prajapati

DCI - Free Seo ToolsDot Com Infoway - Custom Software, Mobile, Web Application Development and Digital Marketing Company

Google analyticsDeepak Kumawat

Crash Course on Google AnalyticsGrowth Hacking Asia

Analytics in Your EnterpriseWSO2

Digital marketingMuhamad SHABAREK

Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...Ravindra Guntur

Www usenix-orgOuzza Brahim

Complete Google analytics documentParshuram Yadav

Azure Stream Analytics : Analyse Data in MotionRuhani Arora

Searching Shodan For Fun And ProfitE Hacking

Similar to Bots and spiders (20)

Search engine and web crawler

What are the different types of web scraping approaches

An Introduction to Web Analytics

Website Parameters.pptx

Clickstream Analysis

The Birth of a Web Crawling Bot

Top 13 web scraping tools in 2022

Affiliate Summit Orlando Meetup Group: Google Analytics for Beginners

Hotjar vs Google Analytics.pdf

What is web scraping?

DCI - Free Seo Tools

Google analytics

Crash Course on Google Analytics

Analytics in Your Enterprise

Digital marketing

Recognising Behavioural Patterns of Web API Bots Using Machine Learning Techn...

Www usenix-org

Complete Google analytics document

Azure Stream Analytics : Analyse Data in Motion

Searching Shodan For Fun And Profit

Recently uploaded

Digital Strategy Master Class - Andrew RupertDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

Social Media Marketing PPT-Includes Paid mediaadityabelde2

Turn Digital Reputation Threats into Offense Tactics - Daniel LeminDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

Brand experience Peoria City Soccer Presentation.pdftbatkhuu1

Branding strategies of new company .pptxVikasTiwari846641

Podcast Marketing Master Class - Roger NairnDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

Unraveling the Mystery of the Hinterkaifeck Murders.pptxelizabethella096

What is Google Search Console and What is it provide?riteshhsociall

Factors-Influencing-Branding-Strategies.pptxVikasTiwari846641

Major SEO Trends in 2024 - Banyanbrain DigitalBanyanbrain

Creator Influencer Strategy Master Class - Corinne Rose GuirgisDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756dollysharma2066

Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

Instant Digital Issuance: An Overview With Critical First Touch Best PracticesMedia Logic

Brand experience Dream Center Peoria Presentation.pdftbatkhuu1

LinkedIn Social Selling Master Class - David WongDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

The Future of Brands on LinkedIn - Alison KaltmanDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

Kraft Mac and Cheese campaign presentationtbatkhuu1

How to Create a Social Media Plan Like a Pro - Jordan ScheltgenDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...ChesterYang6

Recently uploaded (20)

Digital Strategy Master Class - Andrew Rupert

Social Media Marketing PPT-Includes Paid media

Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin

Brand experience Peoria City Soccer Presentation.pdf

Branding strategies of new company .pptx

Podcast Marketing Master Class - Roger Nairn

Unraveling the Mystery of the Hinterkaifeck Murders.pptx

What is Google Search Console and What is it provide?

Factors-Influencing-Branding-Strategies.pptx

Major SEO Trends in 2024 - Banyanbrain Digital

Creator Influencer Strategy Master Class - Corinne Rose Guirgis

FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756

Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...

Instant Digital Issuance: An Overview With Critical First Touch Best Practices

Brand experience Dream Center Peoria Presentation.pdf

LinkedIn Social Selling Master Class - David Wong

The Future of Brands on LinkedIn - Alison Kaltman

Kraft Mac and Cheese campaign presentation

How to Create a Social Media Plan Like a Pro - Jordan Scheltgen

Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...

Bots and spiders

2. Bots and Spiders vs. Real Users • You want to know how good the search engine bots are served • You want to know what else is looking on your page (competition?) • You want to separate the good from the bad (traffic) for clean analytics • You want to get alerted when suspicious traffic kicks in (Fraud?) • You want a clean and accurate Basis for your Marketing Analytics?

3. Best possible Source for DATA = Logs Best possible Analytics Engine =

4. Get the Logs into New Relic • Define New Relic the endpoint of your Logs in your CDN or from your servers (in case no CDN is used) • Check if logs arrive (Example for Fastly) (Logs arrived in New Relic)

5. Make sense of the of the data • Identify the Bots Most Spiders, Bots and Crawler identify as such in the name. So we ask the Data platform to show us a count of all – excluding Mozilla. SELECT count(*) from Log where request_user_agent not like 'Mozilla%' facet request_user_agent since 1 day ago

6. Make sense of the of the data • Exclude the knowns (identify Machines) Most Real User Agents (Browser) identify as “Mozilla” – so we ask the Data platform to show us a count of all – excluding Mozilla. SELECT count(*) from Log where request_user_agent like '%Bot%' or request_user_agent like '%Spider% ' or request_user_agent like '%crawler%' facet request_user_agent since 1 day ago

7. Make sense of the data (Real vs. Machine) • Combine the learnings and check what’s going on (enhanced the check for only text/html content) SELECT filter(count(*), where request_user_agent not like 'Mozilla%' or request_user_agent like '%Crawler%' or request_user_agent like '%bot%' or request_user_agent like '%spider%' as 'Machine Traffic'), filter(count(*), where request_user_agent like 'Mozilla%' and request_user_agent not like '%Crawler%' and request_user_agent not like '%bot%' and request_user_agent not like '%spider%' as 'Real Traffic') from Log where type LIKE 'text/html%' since 1 day ago timeseries

8. Not every Real User is a Real User • Not all synthetic monitoring engines Identify as such – not all competitors checking your content Identify as such – so it might be worth to have a look for suspicious User agents. A simple count helps but also looking for Traffic spikes – or a lot of flat traffic. Linux with a Chrome Browser is not really the most common combination. Uuuh….and the version looks rather old ! 32k clicks by just one user Agent? Hmmm….suspicious.

9. Worth to alert on fraud !

10. The stunning result of filtered Traffic

11. Think Further: Bots helps SEO !!! Let’s see You just need to ask the right questions

Bots and spiders

Recommended

Recommended

More Related Content

Similar to Bots and spiders

Similar to Bots and spiders (20)

Recently uploaded

Recently uploaded (20)

Bots and spiders