4. The bottom
line
● Updated guidelines recognise
medium and problematic sites
● Increasing the crawl rate to
better fit your site
● Reducing crawl bloat
6. Page is created
Site requests page is
crawled
Google Crawls the page
Indexing
Does the page contain authoritative, high
quality content that matches user intent?
Page is indexed Crawled but not indexed
Discovered not Indexed
7. Crawl Demand Crawl Capacity
● Eligible indexable URLs
● How often content is
created/ updated
● How much traffic you get
to folders/ pages on site
● How quickly content loads
● How many errors are on
site
JS
Rendering
12. Uncover the coverage
report
● Discovered not indexed
● Crawled not indexed
● Server errors
● Large number of canonical
tags
● Large number redirects
● Indexed -blocked by robots.txt
24. Use Internal links to prioritise
● Capitalise on link equity
● Review link hierarchy regularly
● Strengthen content hubs
● Be mindful of where links are pointing
25. Dynamically populate sitemaps
● Qualify all 200 canonicals
● Break your sitemaps up
● Don’t forget images and videos
● Minimise manual management
26. Use robots.txt
● Prevent Google from crawling
specific taxonomies
● Highlight top directories
● Link to sitemaps
● Remember: Blocked URLS can
be indexed
28. Three Questions to ask yourself
1.What are our KPIs?
2.How do we define user intent?
3.What is our web structure?
Credit: https://www.screamingfrog.co.uk/site-architecture-crawl-visualisations/
29. Low (low traffic) quality content
If it doesn’t work, can we cut it?
● Are they high intent pages?
● Do they have quality backlinks?
If the answer is no, what is the value in holding on to pages
that aren’t working?
30. Duplication & Cannibalisation
● Canonicals ➡️ Do we need both sets of content
● Bad taxonomy ➡️ Clean up the site structure
● Internal competition ➡️ Use GSC & keyword mapping
31. Faceted navigation &
parameters
● Do not deindex all parameters as a rule
● Are top level parameters valuable?
● Are static alternatives available?
● Is link equity and information architecture protected?
32. Redirects & 404s
● Redirects used for OOS products
● Historical buildup of redirects or 404’s
● Embedded links will continue to be crawled
● Optimise the user journey not just site health
34. Noindex directive - if the page serves a purpose
in the user journey, the noindex meta tag is
probably the best method but does not take effect
immediately.
35. Blocking via robots.txt - this is a great way of
keeping down crawl requests but it doesn’t stop
indexation.
36. 410 - Removing URLs that do not play a part in the
user journey or provide value is a great way of
dealing with crawl bloat.
38. Ensure Content is Accessible
If you have a large JS web app,
ensuring that Google does not
have to allocate more server
resources than essential to
render important content.
Credit: https://nextjs.org/learn/basics/data-fetching/two-forms
39. Updating & Creating New Content
Regularly
Creating fresh content that meets
Google E-E-A-T quality rater
guidelines will help increase crawl
demand for your site.
40. Improving Page Speed and Performance
● Invest in servers
● Improve page performance metrics
● Limit errors - redirect chains or loops
41. Crawl Budget: A Happy Ending
Identifying crawl
rate
Optimising the
crawl budget
Identifying crawl
issues
1.
2. 3.
42. THANK YOU
Slideshare.Net/SallyRaymer2
@salraym /sally-r-seo/
Some great resources:
Google Documentation: Managing Crawl Budget
Verifying Google Bot for log file analysis
Screaming Frog’s guide to log file analysis
How to identify keyword cannibalisation
SEJ internal link optimisation checklist for enterprise sites
Mckinsey Consumer Decision Journey
Mckinsey customer touchpoints