SlideShare a Scribd company logo
1 of 26
Download to read offline
Checking Google Index status at scale with Node.js
Checking
Google Index status
at scale with Node.js
Jose Luis Hernando
@jlhernando #BrightonSEO
Senior Technical SEO Consultant
Checking Google Index status at scale with Node.js
Today’s agenda
1. Why it’s important to know your website’s indexing status
2. The challenge to extract this data
3. Getting the data with Node.js – Live Demo!
4. Using this data for your SEO strategy
Checking Google Index status at scale with Node.js
Why is it important?
Reason #1
Not in the Index => Not in the SERPs
Icons from Google, Flaticon & Sitecheckerpro
Checking Google Index status at scale with Node.js
Why is it important?
Reason #2
Google evaluates site quality based on indexed pages
Sources:
Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable)
English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel
Low Quality Pages
Uncontrolled Faceted Navigation URLs
Unsupervised User Generated Content
Indexable Non-Canonical URLs
High Quality Pages
Category Pages
Editorial Pages
Canonical Product Pages
+
Checking Google Index status at scale with Node.js
Why is it important?
Reason #3
Inefficient use of Google’s resources
https://website.com/category-one/
HTML CSS JS
/category-one/?color=red
/category-one/?color=blue
/category-one/?color=red&blue
…
∞
Checking Google Index status at scale with Node.js
71.7%
54.3%
41.7%
34.4%
45.3%
30.2%
15.1%
10.1%
1-10k
10k-100k
100k-1M
1M+
Avg. Crawl Ratio (%) Avg. Active Ratio (%)
Source: How Does Google Crawl the Web? – (Annabelle Bouard & Dimitri Brunel – Botify)
Crawl Ratio
Percentage of pages
crawled by Google in 30 days
Active Ratio
Percentage of pages that
have generated at least
one organic visit in 30 days.
How much of your site is Googlebot crawling?
Checking Google Index status at scale with Node.js
The challenge
to extract this data
• Googlebot’s crawling behaviour
doesn’t determine indexing status
Checking Google Index status at scale with Node.js
The challenge:
extracting this data
• Googlebot’s crawling behaviour
doesn’t determine indexing status
• You rely on partial and sometimes
inaccurate data points:
• site: & inurl: operators
• GSC Indexing reports:
• URL Inspection Tool (< 200 URLs /day)
• Coverage Reports (< 1,000 rows / report)
Checking Google Index status at scale with Node.js
Proxy metrics != Accurate data
Checking Google Index status at scale with Node.js
If you can’t find it, build it
Checking Google Index status at scale with Node.js
{Live demo}
bit.ly/google-index-checker-script
Checking Google Index status at scale with Node.js
Using the following method
goes against Google’s Terms of Service
as it automatically requests search queries from Google Search
Quick FYI
Checking Google Index status at scale with Node.js
Our script outperforms every other method available
Checking Google Index status at scale with Node.js
How can you use Google index data?
Identify inefficient use of
crawl budget
Error Prioritisation
Identify holes
in your architecture
Check for pages from your
site that should be indexed
but are not.
Find pages that should not
be indexed but are indexed.
Detect pages that used to
exist and now return an error
(4xx) but are still indexed.
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772 URLs
80% Indexed 74,223
7,465
Google Index Status of 2xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
• 404 Status Code – 29,969
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772 URLs
80% Indexed
21% Indexed
6,268
23,701
Google Index Status of 4xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Use case #1
Sitemap Health Check
How many URLs from your XML sitemap are
indexed?
• 200 Status Code – 81,688
• 404 Status Code – 29,969
• 301 Status Code – 365
Inspired by Data Secrets of the Index Coverage Report – AJ Kohn
Sitemaps = 111,772 URLs
80% Indexed
21% Indexed
4% Indexed
16 349
Google Index Status of 3xx URLs
from Sitemap
Indexed Not Indexed
Checking Google Index status at scale with Node.js
Sitemap Health Check
Next Steps
1) Identify if these URLs are important to your site’s bottom line
2) Check if a pool of these URLs have issues on GSC’s
Index Coverage Report
3) Choose a tactic to improve the visibility of these URLs
4) Isolate the relevant URLs and modify the existing sitemap or create a
new-sitemap.xml to monitor progress
Checking Google Index status at scale with Node.js
Use case #2
Log File Analysis Plus+
How many URLs with Googlebot hits are
indexed?
• ~160k Googlebot hits to non-canonical URLs
(/Uppercase/ vs /lowercase/)
• Identified if non-canonical URLs were indexed
• Identified if the referenced canonical URLs
were indexed
35.8%
64.2%
Indexed Non-Canonical URLs
Requested by Googlebot
Indexed Not Indexed
Undisclosed Client
Checking Google Index status at scale with Node.js
Log File Analysis+
Next Steps
1) Identify if the canonical tag is correctly placed
2) Identify if the root cause is internal linking, external linking or other
3) Consider redirecting non-canonical URLs to canonical URLs
4) Create a new-sitemap.xml with problematic URLs to encourage
Googlebot revisiting those URLs and for monitoring purposes
Checking Google Index status at scale with Node.js
• Check Real-time indexing (News sites, Offer sites, Job Boards)
• Check uncontrolled faceted navigation (Crawl budget optimisation)
• Check inactive product/category URLs – (Site architecture
improvements)
• Check old 4xx that are live now & haven't been deindexed yet
(Recover organic opportunities)
Other use cases
Inform your SEO strategy
Checking Google Index status at scale with Node.js
Further reading
https://bit.ly/google-index-checks
Checking Google Index status at scale with Node.js
Further reading
https://bit.ly/gsc-index-coverage
Checking Google Index status at scale with Node.js
The Google Index Checker script has opened a door
to get useful, actionable data at scale for your sites
Use it, and act on it.
Checking Google Index status at scale with Node.js
Thank you.
builtvisible.com
Jose Luis Hernando
Senior Technical SEO Consultant
@jlhernando
Checking Google Index status at scale with Node.js
How does Google crawl the web – Annabelle Bouard & Dimitri Brunel (Botify)
English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel
Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable)
Data Secrets of the Index Coverage Report - Blind Five Year Old (AJ Kohn)
How Google Search Works – Google Documentation
How Search organises information – Google Documentation
Our new search index: Caffeine - Carrie Grimes
When indexing goes wrong: how Google Search recovered from indexing issues & lessons learned since -
Vincent Courson, Google Search Outreach
How Search Engines Work: Crawling, Indexing & Ranking – Moz
(Please) Stop Using Unsafe Characters in URLs – Jeff Starr
Sources & additional reading

More Related Content

What's hot

What's hot (19)

Seo 101 in 2019
Seo 101 in 2019Seo 101 in 2019
Seo 101 in 2019
 
SEO for website migrations - 53 SEO factors for a successful website relaunch
SEO for website migrations - 53 SEO factors for a successful website relaunchSEO for website migrations - 53 SEO factors for a successful website relaunch
SEO for website migrations - 53 SEO factors for a successful website relaunch
 
The 30 Minute SEO Audit
The 30 Minute SEO AuditThe 30 Minute SEO Audit
The 30 Minute SEO Audit
 
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021Headless SEO: Optimising Next Gen Sites | brightonSEO 2021
Headless SEO: Optimising Next Gen Sites | brightonSEO 2021
 
What a search engine can teach you about product sitemaps - BrightonSEO April...
What a search engine can teach you about product sitemaps - BrightonSEO April...What a search engine can teach you about product sitemaps - BrightonSEO April...
What a search engine can teach you about product sitemaps - BrightonSEO April...
 
Found vs. Chosen: How to Earn the Long Click with Content Hubs
Found vs. Chosen: How to Earn the Long Click with Content HubsFound vs. Chosen: How to Earn the Long Click with Content Hubs
Found vs. Chosen: How to Earn the Long Click with Content Hubs
 
Matching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureMatching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information Architecture
 
SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools SEO Audit Workshop : Frameworks , Techniques and Tools
SEO Audit Workshop : Frameworks , Techniques and Tools
 
Technical SEO: How to Perform an SEO Audit (Step by Step Guide)
Technical SEO: How to Perform an SEO Audit (Step by Step Guide)Technical SEO: How to Perform an SEO Audit (Step by Step Guide)
Technical SEO: How to Perform an SEO Audit (Step by Step Guide)
 
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering BudgetBrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
 
How Does Google Crawl the Web?
How Does Google Crawl the Web?How Does Google Crawl the Web?
How Does Google Crawl the Web?
 
Mobile-First Index: A Data-Driven Analysis & Discussion
Mobile-First Index:  A Data-Driven Analysis & DiscussionMobile-First Index:  A Data-Driven Analysis & Discussion
Mobile-First Index: A Data-Driven Analysis & Discussion
 
How To Successfully Undertake Site Migrations - Search London 2017
How To Successfully Undertake Site Migrations - Search London 2017How To Successfully Undertake Site Migrations - Search London 2017
How To Successfully Undertake Site Migrations - Search London 2017
 
Building an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsBuilding an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gaps
 
How to Succeed in B2B SEO
How to Succeed in B2B SEOHow to Succeed in B2B SEO
How to Succeed in B2B SEO
 
Decrypt Google’s Behavior with Botify Log Analyzer
Decrypt Google’s Behavior with Botify Log AnalyzerDecrypt Google’s Behavior with Botify Log Analyzer
Decrypt Google’s Behavior with Botify Log Analyzer
 
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
 
How SEO Has Changed (and what to do about it) - Adam Audette - RKG Summit 2013
How SEO Has Changed (and what to do about it) - Adam Audette - RKG Summit 2013How SEO Has Changed (and what to do about it) - Adam Audette - RKG Summit 2013
How SEO Has Changed (and what to do about it) - Adam Audette - RKG Summit 2013
 
SEO for Enterprise: Stuff You Can Do Yourself!
SEO for Enterprise: Stuff You Can Do Yourself!SEO for Enterprise: Stuff You Can Do Yourself!
SEO for Enterprise: Stuff You Can Do Yourself!
 

Similar to Checking google index status at scale

Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag ManagerPaul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Julia Grosman
 

Similar to Checking google index status at scale (20)

Evaluating URLs at Scale
Evaluating URLs at ScaleEvaluating URLs at Scale
Evaluating URLs at Scale
 
Crawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to KnowCrawl Budget: Everything you Need to Know
Crawl Budget: Everything you Need to Know
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive Guide
 
What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?
 
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-PractiseTechnical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
Technical SEO - An Introduction to Core Aspects of Technical SEO Best-Practise
 
33 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 201633 Tactics to Engage and Retain More Customers - IRCE 2016
33 Tactics to Engage and Retain More Customers - IRCE 2016
 
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag ManagerPaul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
Paul Duncan - Advanced Tracking & Enriched SERP Results via Google Tag Manager
 
Site Analysis
Site AnalysisSite Analysis
Site Analysis
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
 
33 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 201633 Tactics to Engage and Retain More Customers- IRCE 2016
33 Tactics to Engage and Retain More Customers- IRCE 2016
 
Technical SEO Checklist for Beginners
Technical SEO Checklist for BeginnersTechnical SEO Checklist for Beginners
Technical SEO Checklist for Beginners
 
Site Migrations by Nik Ranger
 Site Migrations by Nik Ranger Site Migrations by Nik Ranger
Site Migrations by Nik Ranger
 
Raven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy DevelopmentRaven Tools for Reporting, Analysis & Strategy Development
Raven Tools for Reporting, Analysis & Strategy Development
 
Dc seo fin
Dc seo finDc seo fin
Dc seo fin
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 
Faceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it WrongFaceted Navigation: (Almost) Everyone is Doing it Wrong
Faceted Navigation: (Almost) Everyone is Doing it Wrong
 
Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies  Basic Search Engine Optimization Strategies
Basic Search Engine Optimization Strategies
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning CatalystIntroduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
Introduction To SEO (SEARCH ENGINE OPTIMIZATION)- Learning Catalyst
 
Seo tutorial
Seo tutorialSeo tutorial
Seo tutorial
 

More from Builtvisible

More from Builtvisible (20)

Webinar: How to benefit from changing consumer demand
Webinar: How to benefit from changing consumer demandWebinar: How to benefit from changing consumer demand
Webinar: How to benefit from changing consumer demand
 
GA4 Mini Training Webinar Deck.pdf
GA4 Mini Training Webinar Deck.pdfGA4 Mini Training Webinar Deck.pdf
GA4 Mini Training Webinar Deck.pdf
 
Webinar: How and why to use social media to inform creative content
Webinar: How and why to use social media to inform creative contentWebinar: How and why to use social media to inform creative content
Webinar: How and why to use social media to inform creative content
 
Webinar: How to supercharge local SEO strategies for multi-location businesses
Webinar: How to supercharge local SEO strategies for multi-location businessesWebinar: How to supercharge local SEO strategies for multi-location businesses
Webinar: How to supercharge local SEO strategies for multi-location businesses
 
How to prepare for Google's page experience update
How to prepare for Google's page experience updateHow to prepare for Google's page experience update
How to prepare for Google's page experience update
 
Optimising your faceted navigation to target long-tail keywords
Optimising your faceted navigation to target long-tail keywordsOptimising your faceted navigation to target long-tail keywords
Optimising your faceted navigation to target long-tail keywords
 
Ecommerce quick wins you can implement today to boost SEO performance
Ecommerce quick wins you can implement today to boost SEO performanceEcommerce quick wins you can implement today to boost SEO performance
Ecommerce quick wins you can implement today to boost SEO performance
 
How to build a flexible content strategy
How to build a flexible content strategyHow to build a flexible content strategy
How to build a flexible content strategy
 
How to make change happen in your organisation by talking your devs language
How to make change happen in your organisation by talking your devs languageHow to make change happen in your organisation by talking your devs language
How to make change happen in your organisation by talking your devs language
 
Google for jobs – Matt Hunt's top tips from Tea-time SEO
Google for jobs – Matt Hunt's top tips from Tea-time SEOGoogle for jobs – Matt Hunt's top tips from Tea-time SEO
Google for jobs – Matt Hunt's top tips from Tea-time SEO
 
Crawling ecommerce sites – Maria Camanes' top tips from Tea-time SEO
Crawling ecommerce sites – Maria Camanes' top tips from Tea-time SEOCrawling ecommerce sites – Maria Camanes' top tips from Tea-time SEO
Crawling ecommerce sites – Maria Camanes' top tips from Tea-time SEO
 
Reducing site speed - Rachel Costello's top tips from Tea-time SEO
Reducing site speed - Rachel Costello's top tips from Tea-time SEOReducing site speed - Rachel Costello's top tips from Tea-time SEO
Reducing site speed - Rachel Costello's top tips from Tea-time SEO
 
Webinar: Common challenges with e commerce seo optimisation
Webinar: Common challenges with e commerce seo optimisationWebinar: Common challenges with e commerce seo optimisation
Webinar: Common challenges with e commerce seo optimisation
 
Webinar: Turn browsers to customers with product page improvements
Webinar: Turn browsers to customers with product page improvementsWebinar: Turn browsers to customers with product page improvements
Webinar: Turn browsers to customers with product page improvements
 
Building a culture of measurement: PR Week Breakfast Briefing
Building a culture of measurement: PR Week Breakfast BriefingBuilding a culture of measurement: PR Week Breakfast Briefing
Building a culture of measurement: PR Week Breakfast Briefing
 
Getting PR Onside with Data | SearchLove 2018
Getting PR Onside with Data | SearchLove 2018Getting PR Onside with Data | SearchLove 2018
Getting PR Onside with Data | SearchLove 2018
 
PPC Cost Analysis | Search Marketing Summit Australia 2
PPC Cost Analysis | Search Marketing Summit Australia 2PPC Cost Analysis | Search Marketing Summit Australia 2
PPC Cost Analysis | Search Marketing Summit Australia 2
 
Addressing Site Quality | Search Marketing Summit Australia
Addressing Site Quality | Search Marketing Summit AustraliaAddressing Site Quality | Search Marketing Summit Australia
Addressing Site Quality | Search Marketing Summit Australia
 
SEO for Faceted Navigation | Get STAT City Crawl
SEO for Faceted Navigation | Get STAT City CrawlSEO for Faceted Navigation | Get STAT City Crawl
SEO for Faceted Navigation | Get STAT City Crawl
 
Google Tag Manager Can Do What? | SMX London
Google Tag Manager Can Do What? | SMX LondonGoogle Tag Manager Can Do What? | SMX London
Google Tag Manager Can Do What? | SMX London
 

Recently uploaded

Mastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to SuccessMastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to Success
Abdulsamad Lukman
 
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (20)

Distribution Ad Platform_ The Role of Distribution Ad Network.pdf
Distribution Ad Platform_ The Role of  Distribution Ad Network.pdfDistribution Ad Platform_ The Role of  Distribution Ad Network.pdf
Distribution Ad Platform_ The Role of Distribution Ad Network.pdf
 
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdfMicro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
 
Best 5 Graphics Designing Course In Chandigarh
Best 5 Graphics Designing Course In ChandigarhBest 5 Graphics Designing Course In Chandigarh
Best 5 Graphics Designing Course In Chandigarh
 
SP Search Term Data Optimization Template.pdf
SP Search Term Data Optimization Template.pdfSP Search Term Data Optimization Template.pdf
SP Search Term Data Optimization Template.pdf
 
Aligarh Hire 💕 8250092165 Young and Hot Call Girls Service Agency Escorts
Aligarh Hire 💕 8250092165 Young and Hot Call Girls Service Agency EscortsAligarh Hire 💕 8250092165 Young and Hot Call Girls Service Agency Escorts
Aligarh Hire 💕 8250092165 Young and Hot Call Girls Service Agency Escorts
 
W.H.Bender Quote 61 -Influential restaurant and food service industry network...
W.H.Bender Quote 61 -Influential restaurant and food service industry network...W.H.Bender Quote 61 -Influential restaurant and food service industry network...
W.H.Bender Quote 61 -Influential restaurant and food service industry network...
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh Benday
 
Mastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to SuccessMastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to Success
 
personal branding kit for music business
personal branding kit for music businesspersonal branding kit for music business
personal branding kit for music business
 
Unveiling the Legacy of the Rosetta stone A Key to Ancient Knowledge.pptx
Unveiling the Legacy of the Rosetta stone A Key to Ancient Knowledge.pptxUnveiling the Legacy of the Rosetta stone A Key to Ancient Knowledge.pptx
Unveiling the Legacy of the Rosetta stone A Key to Ancient Knowledge.pptx
 
Gain potential customers through Lead Generation
Gain potential customers through Lead GenerationGain potential customers through Lead Generation
Gain potential customers through Lead Generation
 
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
 
Aiizennxqc Digital Marketing | SEO & SMM
Aiizennxqc Digital Marketing | SEO & SMMAiizennxqc Digital Marketing | SEO & SMM
Aiizennxqc Digital Marketing | SEO & SMM
 
How consumers use technology and the impacts on their lives
How consumers use technology and the impacts on their livesHow consumers use technology and the impacts on their lives
How consumers use technology and the impacts on their lives
 
Elevating Your Digital Presence by Evitha.pdf
Elevating Your Digital Presence by Evitha.pdfElevating Your Digital Presence by Evitha.pdf
Elevating Your Digital Presence by Evitha.pdf
 
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATIONHOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
 
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night ServiceVIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
 
Instant Digital Issuance: An Overview With Critical First Touch Best Practices
Instant Digital Issuance: An Overview With Critical First Touch Best PracticesInstant Digital Issuance: An Overview With Critical First Touch Best Practices
Instant Digital Issuance: An Overview With Critical First Touch Best Practices
 
Cartona.pptx. Marketing how to present your project very well , discussed a...
Cartona.pptx.   Marketing how to present your project very well , discussed a...Cartona.pptx.   Marketing how to present your project very well , discussed a...
Cartona.pptx. Marketing how to present your project very well , discussed a...
 
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
 

Checking google index status at scale

  • 1. Checking Google Index status at scale with Node.js Checking Google Index status at scale with Node.js Jose Luis Hernando @jlhernando #BrightonSEO Senior Technical SEO Consultant
  • 2. Checking Google Index status at scale with Node.js Today’s agenda 1. Why it’s important to know your website’s indexing status 2. The challenge to extract this data 3. Getting the data with Node.js – Live Demo! 4. Using this data for your SEO strategy
  • 3. Checking Google Index status at scale with Node.js Why is it important? Reason #1 Not in the Index => Not in the SERPs Icons from Google, Flaticon & Sitecheckerpro
  • 4. Checking Google Index status at scale with Node.js Why is it important? Reason #2 Google evaluates site quality based on indexed pages Sources: Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable) English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel Low Quality Pages Uncontrolled Faceted Navigation URLs Unsupervised User Generated Content Indexable Non-Canonical URLs High Quality Pages Category Pages Editorial Pages Canonical Product Pages +
  • 5. Checking Google Index status at scale with Node.js Why is it important? Reason #3 Inefficient use of Google’s resources https://website.com/category-one/ HTML CSS JS /category-one/?color=red /category-one/?color=blue /category-one/?color=red&blue … ∞
  • 6. Checking Google Index status at scale with Node.js 71.7% 54.3% 41.7% 34.4% 45.3% 30.2% 15.1% 10.1% 1-10k 10k-100k 100k-1M 1M+ Avg. Crawl Ratio (%) Avg. Active Ratio (%) Source: How Does Google Crawl the Web? – (Annabelle Bouard & Dimitri Brunel – Botify) Crawl Ratio Percentage of pages crawled by Google in 30 days Active Ratio Percentage of pages that have generated at least one organic visit in 30 days. How much of your site is Googlebot crawling?
  • 7. Checking Google Index status at scale with Node.js The challenge to extract this data • Googlebot’s crawling behaviour doesn’t determine indexing status
  • 8. Checking Google Index status at scale with Node.js The challenge: extracting this data • Googlebot’s crawling behaviour doesn’t determine indexing status • You rely on partial and sometimes inaccurate data points: • site: & inurl: operators • GSC Indexing reports: • URL Inspection Tool (< 200 URLs /day) • Coverage Reports (< 1,000 rows / report)
  • 9. Checking Google Index status at scale with Node.js Proxy metrics != Accurate data
  • 10. Checking Google Index status at scale with Node.js If you can’t find it, build it
  • 11. Checking Google Index status at scale with Node.js {Live demo} bit.ly/google-index-checker-script
  • 12. Checking Google Index status at scale with Node.js Using the following method goes against Google’s Terms of Service as it automatically requests search queries from Google Search Quick FYI
  • 13. Checking Google Index status at scale with Node.js Our script outperforms every other method available
  • 14. Checking Google Index status at scale with Node.js How can you use Google index data? Identify inefficient use of crawl budget Error Prioritisation Identify holes in your architecture Check for pages from your site that should be indexed but are not. Find pages that should not be indexed but are indexed. Detect pages that used to exist and now return an error (4xx) but are still indexed.
  • 15. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 74,223 7,465 Google Index Status of 2xx URLs from Sitemap Indexed Not Indexed
  • 16. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 • 404 Status Code – 29,969 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 21% Indexed 6,268 23,701 Google Index Status of 4xx URLs from Sitemap Indexed Not Indexed
  • 17. Checking Google Index status at scale with Node.js Use case #1 Sitemap Health Check How many URLs from your XML sitemap are indexed? • 200 Status Code – 81,688 • 404 Status Code – 29,969 • 301 Status Code – 365 Inspired by Data Secrets of the Index Coverage Report – AJ Kohn Sitemaps = 111,772 URLs 80% Indexed 21% Indexed 4% Indexed 16 349 Google Index Status of 3xx URLs from Sitemap Indexed Not Indexed
  • 18. Checking Google Index status at scale with Node.js Sitemap Health Check Next Steps 1) Identify if these URLs are important to your site’s bottom line 2) Check if a pool of these URLs have issues on GSC’s Index Coverage Report 3) Choose a tactic to improve the visibility of these URLs 4) Isolate the relevant URLs and modify the existing sitemap or create a new-sitemap.xml to monitor progress
  • 19. Checking Google Index status at scale with Node.js Use case #2 Log File Analysis Plus+ How many URLs with Googlebot hits are indexed? • ~160k Googlebot hits to non-canonical URLs (/Uppercase/ vs /lowercase/) • Identified if non-canonical URLs were indexed • Identified if the referenced canonical URLs were indexed 35.8% 64.2% Indexed Non-Canonical URLs Requested by Googlebot Indexed Not Indexed Undisclosed Client
  • 20. Checking Google Index status at scale with Node.js Log File Analysis+ Next Steps 1) Identify if the canonical tag is correctly placed 2) Identify if the root cause is internal linking, external linking or other 3) Consider redirecting non-canonical URLs to canonical URLs 4) Create a new-sitemap.xml with problematic URLs to encourage Googlebot revisiting those URLs and for monitoring purposes
  • 21. Checking Google Index status at scale with Node.js • Check Real-time indexing (News sites, Offer sites, Job Boards) • Check uncontrolled faceted navigation (Crawl budget optimisation) • Check inactive product/category URLs – (Site architecture improvements) • Check old 4xx that are live now & haven't been deindexed yet (Recover organic opportunities) Other use cases Inform your SEO strategy
  • 22. Checking Google Index status at scale with Node.js Further reading https://bit.ly/google-index-checks
  • 23. Checking Google Index status at scale with Node.js Further reading https://bit.ly/gsc-index-coverage
  • 24. Checking Google Index status at scale with Node.js The Google Index Checker script has opened a door to get useful, actionable data at scale for your sites Use it, and act on it.
  • 25. Checking Google Index status at scale with Node.js Thank you. builtvisible.com Jose Luis Hernando Senior Technical SEO Consultant @jlhernando
  • 26. Checking Google Index status at scale with Node.js How does Google crawl the web – Annabelle Bouard & Dimitri Brunel (Botify) English Google Webmaster Central office-hours hangout – Google Webmasters YouTube Channel Google Only Can Judge Site Quality Based On Pages They Index – Barry Swartz (Search Engine Roundtable) Data Secrets of the Index Coverage Report - Blind Five Year Old (AJ Kohn) How Google Search Works – Google Documentation How Search organises information – Google Documentation Our new search index: Caffeine - Carrie Grimes When indexing goes wrong: how Google Search recovered from indexing issues & lessons learned since - Vincent Courson, Google Search Outreach How Search Engines Work: Crawling, Indexing & Ranking – Moz (Please) Stop Using Unsafe Characters in URLs – Jeff Starr Sources & additional reading