SlideShare a Scribd company logo
1 of 168
2009
God it’s bad.
-$1.5 Billion
Why hasn’t Google seen the changes on my page?
How should I prioritise errors in Search Console?
Are my canonicals being respected?
Does Google think this page is important?
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
IP Address
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Timestamp
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Request type
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Homepage
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Protocol
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Status Code
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Size of the page (in bytes)
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html))"
User Agent
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
5 things
2 3 4 51
1 Diagnose crawling &
indexation issues
2 3 4 51
Number of
requests
Five folders Googlebot crawled the most
Five folders Googlebot crawled the most
Number of
requests
% of Organic sessions VS % of crawl budget
Sessions Crawl budget
2 Prioritisation
2 3 4 51
example.com/article
Prioritizing
1
Full Print
example.com/article/full
example.com/article/print
Prioritizing
2
example.com/article/pdf
Prioritizing
3
Prioritizing
1
Full Print
3 Spot bugs &
view site health
2 3 4 51
Delayed errors with a limit of 1000
4 How important does Google
see parts of your site?
2 3 4 51
My SEO was as bad as my design
But at least my hair was better
teflsearch.com
teflsearch.com/job-results
teflsearch.com/job-results/country/china
teflsearch.com/jobadvert3455
Average number of times Googlebot crawled a template
1. teflsearch.com
2. teflsearch.com/job-results
3. teflsearch.com/job-results/country/china
4. teflsearch.com/job-advert3455
1. teflsearch.com
2. teflsearch.com/job-results
3. teflsearch.com/job-results/country/china
4. teflsearch.com/job-advert3455
teflsearch.com/job-results
Average number of times Googlebot crawled a template
35%
5 How fresh does it think your
content is?
2 3 4 51
bit.ly/moz-fresh
Average number of times a page template is crawled by
Googlebot
●Improve our internal linking
●Build trust with last modified date in
sitemap
2 3 4 51
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Talk to a developer
and ask for
information
Are all the logs in one place?
Hi x
I’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions about
the log set-up (as well as with getting the logs!).
What we’d ideally like is 3-6 months of historical logs for the website. Our goal is look at all the different pages search engines are crawling on our website, discover where they’re
spending their time, the status code errors they’re finding etc.
There are also some things that are really helpful for us to know when getting logs.
Do the logs have any personal informationin?
We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be
removed.
Do you have any sort of caching which would create separate sets of logs?
If there is anything like Varnish running on the server, or a CDN which might create logs in different location to the rest of your server? If so then we will need those logs as well as just
those from the server. (Although we’re only concerned about a CDN if it’s caching pages, or serving from the same hostname; if you’re just using Cloudflare for example to cache
external images then we don’t need it).
Are there any sub parts of your site which log to a different place?
Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well.
Do you log hostname?
It’s really useful for us to be able to see hostname in the logs. By default a lot of common server logging set-ups don’t log hostname, so if it’s not turned on, then it would be very useful
to have that turned on now for any future analysis.
Is there anything else we should know?
Best,
{x}
Email for a developer
So we might have something that looks like this
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
BigQuery
BigQuery
Google’s online database for
data analysis.
1. Ask powerful questions
2. Repeatable
3. Scaleable
4. Combine with crawl data
5. Easy to set-up
6. Easy to learn
What do we want from analysing our logs?
9,000,000 rows of data for 2
months.
400 - 800 queries
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Format the logs so we can import them into
BigQuery
Separate the Googlebot logs from all the
other logs
Screaming Frog Log
Analyser
Code something
Screaming Frog Log Analyser
Code something
bit.ly/logs-code
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Our data in BQ
We make sure we
got what we wanted
THE QUESTION:
What is the total number of requests
Googlebot makes each day to our site?
Our first SQL query
SELECT
timestamp
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
timestamp
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*)
FROM
[mydata.log_analysis]
GROUP BY
date
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*) as number_of_requests
FROM
[mydata.log_analysis]
GROUP BY
date
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*) as number_of_requests
FROM
[mydata.log_analysis]
GROUP BY
date
Comparing logs to GSC crawl volume
Number of
requests
Run queries
Find something weird
Go look at crawl & website
Our data in BQ
1 Diagnose crawling &
indexation issues
2 Prioritisation
3 Spot bugs &
view site health
4 How important does Google
see parts of your site?
5 How fresh does it think
your content is?
1 Diagnose crawling &
indexation issues
4 How important does Google
see parts of your site?
What are the top 20 URLs crawled by
Google over our logs?
Login is my top crawled page and then search?
What are the top 20 page_path_1 folders
crawled by Google over our logs?
Location folders are taking more than 70% of my budget
Getting data by the day
Page Number of Googlebot Requests
page1 200,000
page2 120,000
Number of Googlebot requests day by day
3 Spot bugs &
view site health
How many of each status code does
Google find per day over our logs?
Number of Googlebot requests day by day
What are most requested 404 URLs by
Googlebot over the past 30 days?
Boy does it want that ad-tech snippet
5 How fresh does it think your
content is?
How many times on average is each page
in a page template crawled a day?
Average number of times a page template is crawled by
Googlebot
How long does it take for a page to be discovered after being published?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
What percentage of a directory was crawled over the past 30 days?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
What percentage of a directory was crawled over the past 30 days?
What are the total number of requests across two different time periods?
That’s a lot of questions
bit.ly/logs-resource
bit.ly/logs-resource
bit.ly/logs-resource
bit.ly/logs-resource
In Summary
This is the thing you’re probably not doing
bit.ly/logs-resource
@dom_woodman
bit.ly/logs-resource
@dom_woodman

More Related Content

What's hot

Improving Crawling and Indexing using Real-Time Log File Insights
Improving Crawling and Indexing using Real-Time Log File InsightsImproving Crawling and Indexing using Real-Time Log File Insights
Improving Crawling and Indexing using Real-Time Log File InsightsSteven van Vessum
 
Martin McGarry - SEO strategy c/o England manager Gareth Southgate
Martin McGarry - SEO strategy c/o England manager Gareth SouthgateMartin McGarry - SEO strategy c/o England manager Gareth Southgate
Martin McGarry - SEO strategy c/o England manager Gareth SouthgateMartin McGarry
 
Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowSallyR7
 
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptxBrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptxJosephineHaagen
 
SEO Tool Overload😱... Google Data Studio to the rescue
SEO Tool Overload😱... Google Data Studio to the rescueSEO Tool Overload😱... Google Data Studio to the rescue
SEO Tool Overload😱... Google Data Studio to the rescueNils De Moor
 
How to get more traffic with less content - BrightonSEO
How to get more traffic with less content - BrightonSEOHow to get more traffic with less content - BrightonSEO
How to get more traffic with less content - BrightonSEOAnna Gregory-Hall
 
How to leverage indexation tracking to monitor issues and improve performance
How to leverage indexation tracking to monitor issues and improve performanceHow to leverage indexation tracking to monitor issues and improve performance
How to leverage indexation tracking to monitor issues and improve performanceSimon Lesser
 
Assessing Remote Talent to Scale Up SEO Success
Assessing Remote Talent to Scale Up SEO Success Assessing Remote Talent to Scale Up SEO Success
Assessing Remote Talent to Scale Up SEO Success Begum Kaya
 
Google Sheets For SEO - Tom Pool - London SEO Meetup XL
Google Sheets For SEO - Tom Pool - London SEO Meetup XLGoogle Sheets For SEO - Tom Pool - London SEO Meetup XL
Google Sheets For SEO - Tom Pool - London SEO Meetup XLTom Pool
 
I Am A Donut - How To Avoid International SEO Mistakes
I Am A Donut - How To Avoid International SEO MistakesI Am A Donut - How To Avoid International SEO Mistakes
I Am A Donut - How To Avoid International SEO MistakesTom Brennan
 
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina StoyData Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina StoyLazarinaStoyanova
 
Product, service and category page links (and how to get them) - Rebecca Moss...
Product, service and category page links (and how to get them) - Rebecca Moss...Product, service and category page links (and how to get them) - Rebecca Moss...
Product, service and category page links (and how to get them) - Rebecca Moss...Rebecca Moss
 
Why your tech optimisations are still sat in the backlog
Why your tech optimisations are still sat in the backlogWhy your tech optimisations are still sat in the backlog
Why your tech optimisations are still sat in the backlogVicky481083
 
How to do User Research on a shoestring budget
How to do User Research on a shoestring budgetHow to do User Research on a shoestring budget
How to do User Research on a shoestring budgetAngus Carbarns
 
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance FrameworkGoodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance FrameworkAleyda Solís
 
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
Canonicalization for SEO BrightonSEO April 2023 Patrick StoxCanonicalization for SEO BrightonSEO April 2023 Patrick Stox
Canonicalization for SEO BrightonSEO April 2023 Patrick StoxAhrefs
 
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...Jessica Maloney
 
Data-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costsData-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costsadlift
 
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...LazarinaStoyanova
 
EAT: Have We Been Looking At It Backwards
EAT: Have We Been Looking At It BackwardsEAT: Have We Been Looking At It Backwards
EAT: Have We Been Looking At It BackwardsEdwardZiubrzynski1
 

What's hot (20)

Improving Crawling and Indexing using Real-Time Log File Insights
Improving Crawling and Indexing using Real-Time Log File InsightsImproving Crawling and Indexing using Real-Time Log File Insights
Improving Crawling and Indexing using Real-Time Log File Insights
 
Martin McGarry - SEO strategy c/o England manager Gareth Southgate
Martin McGarry - SEO strategy c/o England manager Gareth SouthgateMartin McGarry - SEO strategy c/o England manager Gareth Southgate
Martin McGarry - SEO strategy c/o England manager Gareth Southgate
 
Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to Know
 
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptxBrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
BrightonSEO - NLP for SEOs - How to optimise your content for BERT.pptx
 
SEO Tool Overload😱... Google Data Studio to the rescue
SEO Tool Overload😱... Google Data Studio to the rescueSEO Tool Overload😱... Google Data Studio to the rescue
SEO Tool Overload😱... Google Data Studio to the rescue
 
How to get more traffic with less content - BrightonSEO
How to get more traffic with less content - BrightonSEOHow to get more traffic with less content - BrightonSEO
How to get more traffic with less content - BrightonSEO
 
How to leverage indexation tracking to monitor issues and improve performance
How to leverage indexation tracking to monitor issues and improve performanceHow to leverage indexation tracking to monitor issues and improve performance
How to leverage indexation tracking to monitor issues and improve performance
 
Assessing Remote Talent to Scale Up SEO Success
Assessing Remote Talent to Scale Up SEO Success Assessing Remote Talent to Scale Up SEO Success
Assessing Remote Talent to Scale Up SEO Success
 
Google Sheets For SEO - Tom Pool - London SEO Meetup XL
Google Sheets For SEO - Tom Pool - London SEO Meetup XLGoogle Sheets For SEO - Tom Pool - London SEO Meetup XL
Google Sheets For SEO - Tom Pool - London SEO Meetup XL
 
I Am A Donut - How To Avoid International SEO Mistakes
I Am A Donut - How To Avoid International SEO MistakesI Am A Donut - How To Avoid International SEO Mistakes
I Am A Donut - How To Avoid International SEO Mistakes
 
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina StoyData Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina Stoy
 
Product, service and category page links (and how to get them) - Rebecca Moss...
Product, service and category page links (and how to get them) - Rebecca Moss...Product, service and category page links (and how to get them) - Rebecca Moss...
Product, service and category page links (and how to get them) - Rebecca Moss...
 
Why your tech optimisations are still sat in the backlog
Why your tech optimisations are still sat in the backlogWhy your tech optimisations are still sat in the backlog
Why your tech optimisations are still sat in the backlog
 
How to do User Research on a shoestring budget
How to do User Research on a shoestring budgetHow to do User Research on a shoestring budget
How to do User Research on a shoestring budget
 
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance FrameworkGoodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
 
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
Canonicalization for SEO BrightonSEO April 2023 Patrick StoxCanonicalization for SEO BrightonSEO April 2023 Patrick Stox
Canonicalization for SEO BrightonSEO April 2023 Patrick Stox
 
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
 
Data-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costsData-driven SEO & content strategy to reduce your customer acquisition costs
Data-driven SEO & content strategy to reduce your customer acquisition costs
 
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
How to Incorporate ML in your SERP Analysis, Lazarina Stoy -BrightonSEO Oct, ...
 
EAT: Have We Been Looking At It Backwards
EAT: Have We Been Looking At It BackwardsEAT: Have We Been Looking At It Backwards
EAT: Have We Been Looking At It Backwards
 

Similar to A Guide to Log Analysis with Big Query

SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your LogsSearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your LogsDistilled
 
SEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech SideSEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech SideDominic Woodman
 
Log analysis and pro use cases for search marketers online version (1)
Log analysis and pro use cases for search marketers online version (1)Log analysis and pro use cases for search marketers online version (1)
Log analysis and pro use cases for search marketers online version (1)David Sottimano
 
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...Semrush
 
Deep crawl the chaotic landscape of JavaScript
Deep crawl the chaotic landscape of JavaScript Deep crawl the chaotic landscape of JavaScript
Deep crawl the chaotic landscape of JavaScript Onely
 
Google Tag Manager for Ecommerce
Google Tag Manager for EcommerceGoogle Tag Manager for Ecommerce
Google Tag Manager for EcommerceDaytodayebay
 
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014Bastian Grimm
 
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...Vladislav Morgun
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonWebsite Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonJacque Doring
 
SEARCH Y : Benjamin Bussière - Javascript and seo misconceptions, misunders...
SEARCH Y :  Benjamin Bussière - Javascript and seo  misconceptions, misunders...SEARCH Y :  Benjamin Bussière - Javascript and seo  misconceptions, misunders...
SEARCH Y : Benjamin Bussière - Javascript and seo misconceptions, misunders...SEARCH Y - Philippe Yonnet Evénements
 
TFM - Using Google Tag Manager for ecom
TFM - Using Google Tag Manager for ecom TFM - Using Google Tag Manager for ecom
TFM - Using Google Tag Manager for ecom Gerry White
 
Analysis report didm
Analysis report didmAnalysis report didm
Analysis report didmriyabansal29
 
WordPress SEO in 2014 - WordCamp Baltimore 2014
WordPress SEO in 2014 - WordCamp Baltimore 2014WordPress SEO in 2014 - WordCamp Baltimore 2014
WordPress SEO in 2014 - WordCamp Baltimore 2014Arsham Mirshah
 
Javascript SEO - Leicester Digital May 2018
Javascript SEO - Leicester Digital May 2018Javascript SEO - Leicester Digital May 2018
Javascript SEO - Leicester Digital May 2018Kieran Headley
 
Demand Quest SEO training session 2
Demand Quest SEO training session 2Demand Quest SEO training session 2
Demand Quest SEO training session 2Nate Plaunt
 
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016Mark Thomas
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Nate Plaunt
 
Seo for Engineers
Seo for EngineersSeo for Engineers
Seo for EngineersCort Tafoya
 

Similar to A Guide to Log Analysis with Big Query (20)

SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your LogsSearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
 
SEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech SideSEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech Side
 
Log analysis and pro use cases for search marketers online version (1)
Log analysis and pro use cases for search marketers online version (1)Log analysis and pro use cases for search marketers online version (1)
Log analysis and pro use cases for search marketers online version (1)
 
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
 
Deep crawl the chaotic landscape of JavaScript
Deep crawl the chaotic landscape of JavaScript Deep crawl the chaotic landscape of JavaScript
Deep crawl the chaotic landscape of JavaScript
 
Google Tag Manager for Ecommerce
Google Tag Manager for EcommerceGoogle Tag Manager for Ecommerce
Google Tag Manager for Ecommerce
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
 
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonWebsite Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
 
SEARCH Y : Benjamin Bussière - Javascript and seo misconceptions, misunders...
SEARCH Y :  Benjamin Bussière - Javascript and seo  misconceptions, misunders...SEARCH Y :  Benjamin Bussière - Javascript and seo  misconceptions, misunders...
SEARCH Y : Benjamin Bussière - Javascript and seo misconceptions, misunders...
 
TFM - Using Google Tag Manager for ecom
TFM - Using Google Tag Manager for ecom TFM - Using Google Tag Manager for ecom
TFM - Using Google Tag Manager for ecom
 
Analysis report didm
Analysis report didmAnalysis report didm
Analysis report didm
 
WordPress SEO in 2014 - WordCamp Baltimore 2014
WordPress SEO in 2014 - WordCamp Baltimore 2014WordPress SEO in 2014 - WordCamp Baltimore 2014
WordPress SEO in 2014 - WordCamp Baltimore 2014
 
Javascript SEO - Leicester Digital May 2018
Javascript SEO - Leicester Digital May 2018Javascript SEO - Leicester Digital May 2018
Javascript SEO - Leicester Digital May 2018
 
Demand Quest SEO training session 2
Demand Quest SEO training session 2Demand Quest SEO training session 2
Demand Quest SEO training session 2
 
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2
 
Seo for Engineers
Seo for EngineersSeo for Engineers
Seo for Engineers
 
Modern JavaScript and SEO
Modern JavaScript and SEOModern JavaScript and SEO
Modern JavaScript and SEO
 

More from Dominic Woodman

How to Succeed in B2B SEO
How to Succeed in B2B SEOHow to Succeed in B2B SEO
How to Succeed in B2B SEODominic Woodman
 
19 Lessons I learned from a year of SEO split testing
19 Lessons I learned from a year of SEO split testing19 Lessons I learned from a year of SEO split testing
19 Lessons I learned from a year of SEO split testingDominic Woodman
 
Information Architecture for SEOs - Matching intent to pages & internal linki...
Information Architecture for SEOs - Matching intent to pages & internal linki...Information Architecture for SEOs - Matching intent to pages & internal linki...
Information Architecture for SEOs - Matching intent to pages & internal linki...Dominic Woodman
 
Debugging SEO - Language & Breaking Down
Debugging SEO - Language & Breaking DownDebugging SEO - Language & Breaking Down
Debugging SEO - Language & Breaking DownDominic Woodman
 
How a year of SEO split testing changed how I thought SEO worked
How a year of SEO split testing changed how I thought SEO workedHow a year of SEO split testing changed how I thought SEO worked
How a year of SEO split testing changed how I thought SEO workedDominic Woodman
 
Matching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureMatching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureDominic Woodman
 
Split Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of LearningSplit Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of LearningDominic Woodman
 
What is AMP and do I care?
What is AMP and do I care?What is AMP and do I care?
What is AMP and do I care?Dominic Woodman
 

More from Dominic Woodman (8)

How to Succeed in B2B SEO
How to Succeed in B2B SEOHow to Succeed in B2B SEO
How to Succeed in B2B SEO
 
19 Lessons I learned from a year of SEO split testing
19 Lessons I learned from a year of SEO split testing19 Lessons I learned from a year of SEO split testing
19 Lessons I learned from a year of SEO split testing
 
Information Architecture for SEOs - Matching intent to pages & internal linki...
Information Architecture for SEOs - Matching intent to pages & internal linki...Information Architecture for SEOs - Matching intent to pages & internal linki...
Information Architecture for SEOs - Matching intent to pages & internal linki...
 
Debugging SEO - Language & Breaking Down
Debugging SEO - Language & Breaking DownDebugging SEO - Language & Breaking Down
Debugging SEO - Language & Breaking Down
 
How a year of SEO split testing changed how I thought SEO worked
How a year of SEO split testing changed how I thought SEO workedHow a year of SEO split testing changed how I thought SEO worked
How a year of SEO split testing changed how I thought SEO worked
 
Matching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureMatching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information Architecture
 
Split Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of LearningSplit Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of Learning
 
What is AMP and do I care?
What is AMP and do I care?What is AMP and do I care?
What is AMP and do I care?
 

Recently uploaded

SORA AI: Will It Be the Future of Video Creation?
SORA AI: Will It Be the Future of Video Creation?SORA AI: Will It Be the Future of Video Creation?
SORA AI: Will It Be the Future of Video Creation?Searchable Design
 
marketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfmarketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfarsathsahil
 
VIP Call Girls In Green Park 9654467111 Escorts Service
VIP Call Girls In Green Park 9654467111 Escorts ServiceVIP Call Girls In Green Park 9654467111 Escorts Service
VIP Call Girls In Green Park 9654467111 Escorts ServiceSapana Sha
 
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdfSnapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdfEastern Online-iSURVEY
 
What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?Juan Pineda
 
Red bull marketing presentation pptxxxxx
Red bull marketing presentation pptxxxxxRed bull marketing presentation pptxxxxx
Red bull marketing presentation pptxxxxx216310017
 
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly BulletinBLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly BulletinBalmerLawrie
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessVarn
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...ChesterYang6
 
April 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupApril 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupVbout.com
 
9654467111 Call Girls In Mahipalpur Women Seeking Men
9654467111 Call Girls In Mahipalpur Women Seeking Men9654467111 Call Girls In Mahipalpur Women Seeking Men
9654467111 Call Girls In Mahipalpur Women Seeking MenSapana Sha
 
Russian Call Girls Nagpur Swara 8617697112 Independent Escort Service Nagpur
Russian Call Girls Nagpur Swara 8617697112 Independent Escort Service NagpurRussian Call Girls Nagpur Swara 8617697112 Independent Escort Service Nagpur
Russian Call Girls Nagpur Swara 8617697112 Independent Escort Service NagpurCall girls in Ahmedabad High profile
 
Social Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRSapana Sha
 
How To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot SetupHow To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot Setupssuser4571da
 

Recently uploaded (20)

SORA AI: Will It Be the Future of Video Creation?
SORA AI: Will It Be the Future of Video Creation?SORA AI: Will It Be the Future of Video Creation?
SORA AI: Will It Be the Future of Video Creation?
 
marketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdfmarketing strategy of tanishq word PPROJECT.pdf
marketing strategy of tanishq word PPROJECT.pdf
 
VIP Call Girls In Green Park 9654467111 Escorts Service
VIP Call Girls In Green Park 9654467111 Escorts ServiceVIP Call Girls In Green Park 9654467111 Escorts Service
VIP Call Girls In Green Park 9654467111 Escorts Service
 
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdfSnapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
 
What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?
 
Red bull marketing presentation pptxxxxx
Red bull marketing presentation pptxxxxxRed bull marketing presentation pptxxxxx
Red bull marketing presentation pptxxxxx
 
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly BulletinBLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
 
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
Netflix Ads The Game Changer in Video Ads – Who Needs YouTube.pptx (Chester Y...
 
April 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupApril 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting Group
 
9654467111 Call Girls In Mahipalpur Women Seeking Men
9654467111 Call Girls In Mahipalpur Women Seeking Men9654467111 Call Girls In Mahipalpur Women Seeking Men
9654467111 Call Girls In Mahipalpur Women Seeking Men
 
Russian Call Girls Nagpur Swara 8617697112 Independent Escort Service Nagpur
Russian Call Girls Nagpur Swara 8617697112 Independent Escort Service NagpurRussian Call Girls Nagpur Swara 8617697112 Independent Escort Service Nagpur
Russian Call Girls Nagpur Swara 8617697112 Independent Escort Service Nagpur
 
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
 
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
 
Social Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdfSocial Samosa Guidebook for SAMMIES 2024.pdf
Social Samosa Guidebook for SAMMIES 2024.pdf
 
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel LeminTurn Digital Reputation Threats into Offense Tactics - Daniel Lemin
Turn Digital Reputation Threats into Offense Tactics - Daniel Lemin
 
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan ScheltgenHow to Create a Social Media Plan Like a Pro - Jordan Scheltgen
How to Create a Social Media Plan Like a Pro - Jordan Scheltgen
 
Brand Strategy Master Class - Juntae DeLane
Brand Strategy Master Class - Juntae DeLaneBrand Strategy Master Class - Juntae DeLane
Brand Strategy Master Class - Juntae DeLane
 
Call Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCRCall Us ➥9654467111▻Call Girls In Delhi NCR
Call Us ➥9654467111▻Call Girls In Delhi NCR
 
How To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot SetupHow To Utilize Calculated Properties in your HubSpot Setup
How To Utilize Calculated Properties in your HubSpot Setup
 

A Guide to Log Analysis with Big Query

  • 2.
  • 3.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18. Why hasn’t Google seen the changes on my page?
  • 19. How should I prioritise errors in Search Console?
  • 20. Are my canonicals being respected?
  • 21. Does Google think this page is important?
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 29.
  • 30.
  • 31. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" IP Address
  • 32. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Timestamp
  • 33. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Request type
  • 34. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Homepage
  • 35. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Protocol
  • 36. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Status Code
  • 37. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Size of the page (in bytes)
  • 38. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html))" User Agent
  • 39. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 40. 5 things 2 3 4 51
  • 41. 1 Diagnose crawling & indexation issues 2 3 4 51
  • 42.
  • 43.
  • 44. Number of requests Five folders Googlebot crawled the most
  • 45. Five folders Googlebot crawled the most Number of requests
  • 46. % of Organic sessions VS % of crawl budget Sessions Crawl budget
  • 48.
  • 57. 3 Spot bugs & view site health 2 3 4 51
  • 58. Delayed errors with a limit of 1000
  • 59.
  • 60. 4 How important does Google see parts of your site? 2 3 4 51
  • 61. My SEO was as bad as my design
  • 62. But at least my hair was better
  • 67. Average number of times Googlebot crawled a template
  • 68. 1. teflsearch.com 2. teflsearch.com/job-results 3. teflsearch.com/job-results/country/china 4. teflsearch.com/job-advert3455
  • 69. 1. teflsearch.com 2. teflsearch.com/job-results 3. teflsearch.com/job-results/country/china 4. teflsearch.com/job-advert3455
  • 71. Average number of times Googlebot crawled a template 35%
  • 72. 5 How fresh does it think your content is? 2 3 4 51
  • 74. Average number of times a page template is crawled by Googlebot
  • 75. ●Improve our internal linking ●Build trust with last modified date in sitemap
  • 76. 2 3 4 51
  • 77. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 78.
  • 79.
  • 80.
  • 81. Talk to a developer and ask for information
  • 82. Are all the logs in one place?
  • 83. Hi x I’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions about the log set-up (as well as with getting the logs!). What we’d ideally like is 3-6 months of historical logs for the website. Our goal is look at all the different pages search engines are crawling on our website, discover where they’re spending their time, the status code errors they’re finding etc. There are also some things that are really helpful for us to know when getting logs. Do the logs have any personal informationin? We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be removed. Do you have any sort of caching which would create separate sets of logs? If there is anything like Varnish running on the server, or a CDN which might create logs in different location to the rest of your server? If so then we will need those logs as well as just those from the server. (Although we’re only concerned about a CDN if it’s caching pages, or serving from the same hostname; if you’re just using Cloudflare for example to cache external images then we don’t need it). Are there any sub parts of your site which log to a different place? Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well. Do you log hostname? It’s really useful for us to be able to see hostname in the logs. By default a lot of common server logging set-ups don’t log hostname, so if it’s not turned on, then it would be very useful to have that turned on now for any future analysis. Is there anything else we should know? Best, {x} Email for a developer
  • 84. So we might have something that looks like this
  • 85. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 86.
  • 87.
  • 88.
  • 90.
  • 92. Google’s online database for data analysis.
  • 93. 1. Ask powerful questions 2. Repeatable 3. Scaleable 4. Combine with crawl data 5. Easy to set-up 6. Easy to learn What do we want from analysing our logs?
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99. 9,000,000 rows of data for 2 months. 400 - 800 queries
  • 100.
  • 101. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 102. Format the logs so we can import them into BigQuery Separate the Googlebot logs from all the other logs
  • 104. Screaming Frog Log Analyser
  • 105.
  • 108. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 109. Our data in BQ
  • 110. We make sure we got what we wanted
  • 111. THE QUESTION: What is the total number of requests Googlebot makes each day to our site?
  • 112. Our first SQL query SELECT timestamp FROM [mydata.log_analysis]
  • 113. Our first SQL query SELECT timestamp FROM [mydata.log_analysis]
  • 114. Our first SQL query SELECT DATE(timestamp) FROM [mydata.log_analysis]
  • 115. Our first SQL query SELECT DATE(timestamp) FROM [mydata.log_analysis]
  • 116. Our first SQL query SELECT DATE(timestamp) as date FROM [mydata.log_analysis]
  • 117. Our first SQL query SELECT DATE(timestamp) as date FROM [mydata.log_analysis]
  • 118. Our first SQL query SELECT DATE(timestamp) as date, count(*) FROM [mydata.log_analysis]
  • 119. Our first SQL query SELECT DATE(timestamp) as date, count(*) FROM [mydata.log_analysis] GROUP BY date
  • 120. Our first SQL query SELECT DATE(timestamp) as date, count(*) as number_of_requests FROM [mydata.log_analysis] GROUP BY date
  • 121. Our first SQL query SELECT DATE(timestamp) as date, count(*) as number_of_requests FROM [mydata.log_analysis] GROUP BY date
  • 122. Comparing logs to GSC crawl volume Number of requests
  • 123. Run queries Find something weird Go look at crawl & website
  • 124. Our data in BQ
  • 125. 1 Diagnose crawling & indexation issues
  • 127. 3 Spot bugs & view site health
  • 128. 4 How important does Google see parts of your site?
  • 129. 5 How fresh does it think your content is?
  • 130. 1 Diagnose crawling & indexation issues 4 How important does Google see parts of your site?
  • 131. What are the top 20 URLs crawled by Google over our logs?
  • 132. Login is my top crawled page and then search?
  • 133. What are the top 20 page_path_1 folders crawled by Google over our logs?
  • 134. Location folders are taking more than 70% of my budget
  • 135. Getting data by the day Page Number of Googlebot Requests page1 200,000 page2 120,000
  • 136. Number of Googlebot requests day by day
  • 137. 3 Spot bugs & view site health
  • 138. How many of each status code does Google find per day over our logs?
  • 139. Number of Googlebot requests day by day
  • 140. What are most requested 404 URLs by Googlebot over the past 30 days?
  • 141. Boy does it want that ad-tech snippet
  • 142. 5 How fresh does it think your content is?
  • 143. How many times on average is each page in a page template crawled a day?
  • 144. Average number of times a page template is crawled by Googlebot
  • 145. How long does it take for a page to be discovered after being published?
  • 146. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs?
  • 147. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl?
  • 148. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled?
  • 149. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website?
  • 150. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day?
  • 151. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes?
  • 152. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters?
  • 153. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset?
  • 154. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset? What percentage of a directory was crawled over the past 30 days?
  • 155. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset? What percentage of a directory was crawled over the past 30 days? What are the total number of requests across two different time periods?
  • 156. That’s a lot of questions
  • 162. This is the thing you’re probably not doing
  • 163.
  • 164.
  • 165.
  • 167.

Editor's Notes

  1. Walmart listened but it didnt’ go and look at what it’s customers were doing
  2. https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-notes-september-9th-2016/
  3. Start as an actual story Can i have the house salad please Greek or lentils Olives or no olives Green or black Stone or no stones Vinegrette? Balsamic or Ceaser Balsamic Do you want rocket? I would like a salad
  4. Ask for pii to be removed - how many logs - the dates?
  5. The Good You can customize for more complicated logging formats You can use reverse DNS lookup and ASN lookup You can work with log datasets that are too large to download to your computer
  6. Start as an actual story Can i have the house salad please Greek or lentils Olives or no olives Green or black Stone or no stones Vinegrette? Balsamic or Ceaser Balsamic Do you want rocket? I would like a salad
  7. This is the summation of years worth of work - i can’t fit it into a 40 min presentation so i put resources here. Dw if you get lost it’s all here