SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Slides from my talk at SAScon 2016 in which I outline my approach to technical SEO audits, showing my process and tools and highlighting the aspects I pay most attention to.
Slides from my talk at SAScon 2016 in which I outline my approach to technical SEO audits, showing my process and tools and highlighting the aspects I pay most attention to.
8.
@badams #SAScon
What is Crawl Optimisation?
Ensuring search engine spiders waste as little time as
possible crawling the right URLs on your site.
If you waste crawl budget, the right pages are unlikely to
be crawled & indexed.
9.
@badams #SAScon
Google’s Crawl Sources
• Site crawl
• XML Sitemaps
• Inbound links
• DNS records
• Domain registrations
• Browsing data
13.
@badams #SAScon
Crawl Waste
• Check URLs in XML Sitemap
14.
@badams #SAScon
Optimise XML Sitemaps
• Check if the sitemap contains final URLs only
• Identify 301-redirects or other non-200 status codes
• Check usage of multiple sitemaps
16.
@badams #SAScon
Optimisation of Paginated Listings
• Check number of items on a single page
• Check implementation of rel=prev/next pagination meta tags
• Check blocking of sorting parameters in robots.txt
Disallow: /*?sort=*
17.
@badams #SAScon
Optimisation of Faceted Navigation
• Decide which facets have SEO value
Recommend creating static pages for these
• All other facets:
robots.txt disallow
‘rel=nofollow’ on facet links
18.
@badams #SAScon
Check Configuration of URL Parameters
19.
@badams #SAScon
Crawl Waste
• Check crawling/indexing of internal site search results
20.
@badams #SAScon
Block Internal Site Search Pages
• Block in robots.txt
User-agent: *
Disallow: /SearchResults.aspx
Disallow: /*query=*
Disallow: /*s=*
22.
@badams #SAScon
Minimise Internal Redirects
• Find redirects with Screaming Frog
• Internal links should all be 200 OK
• Flat site structure
23.
@badams #SAScon
Check HTTP Status Codes
The most important ones:
200 OK: everything is fine, here is your content
404 Not Found: the page you are trying to view doesn’t exist
301 Permanent Redirect: the page you are trying to view has moved permanently,
here’s the new URL
302 Temporary Redirect: the page you are trying to view has moved temporarily, here is
the new URL
500 Server Error: there’s been a massive fuck up, I can’t serve you this page
24.
@badams #SAScon
Check HTTP Status Codes
Less common:
410 Gone: This page is gone and there is no alternative version. Useful for getting a page
out of Google’s index quickly
503 Service Unavailable: The site is temporarily down. Use this when your server is
having issues, as it will not impact rankings
25.
@badams #SAScon
Check Soft 404s
When Google detects a Not Found error
page but the HTTP status code is 200 OK.
28.
@badams #SAScon
Use Canonicals Wisely
• “rel=canonical” is primarily for index issues
It is not a fix for crawl waste
Search engines need to see the canonical tag before they can act on it
Ergo, pages need to be crawled before rel=canonical has any effect
Ditto with meta noindex tags
30.
@badams #SAScon
OK to use Canonicals for…
• Separate mobile URLs
• Session-specific URL parameters
• Content syndication
• Unavoidable content duplication
34.
@badams #SAScon
Index Optimisation
• Ensure Google indexes & ranks the right pages
• Minimise indexing of zero-value pages
• Optimise all technical relevancy factors
36.
@badams #SAScon
Human-Readable URLs
Bad URL:
http://domain.com/default.aspx?p=43351&s=abx&ref=ps-2301-g&…
Good URL:
http://domain.com/safety-boots/caterpillar/steel-toe-safety-boots.html
• Don’t overdo it – no keyword stuffing
• Use a logical structure that makes sense to humans
38.
@badams #SAScon
Always use Canonicals
• Duplicate URLs can originate from various sources…
http://www.website.com/page1.html
http://www.website.com/page1.html?utm_source=buffer&utm_medium=social&
utm_campaign=seo
39.
@badams #SAScon
Use Full URLs in Canonicals
<link rel=“canonical” href=“/page1.html”>
<link rel=“canonical” href=“www.website.com/page1.html”>
<link rel=“canonical”
href=“https://www.website.com/page1.html”>
40.
@badams #SAScon
Meta Robots Tag
<meta name=“robots” content=“…”>
• ‘noindex’: don’t index this page
• ‘nofollow’: don’t follow any links on this page
• ‘nosnippet’: don’t show a search snippet for this page
• ‘noodp’: don’t use the ODP/DMOZ description for this page
• ‘noarchive’: don’t show a Cached link for this page
• ‘unavailable_after:[date]’: stop crawling and indexing of this page
after this date
• ‘noimageindex’: don’t use this page as the referring page for an
image that appears in Google search results
• ‘none’: same as ‘noindex, nofollow’
46.
@badams #SAScon
Structured Data Testing Tool
https://search.google.com/structured-data/testing-tool/
47.
@badams #SAScon
Expired Pages
• Google’s advice: serve 404 Not Found
Downside: potential loss of link value
Source: https://www.youtube.com/watch?v=9tz7Eexwp_A
48.
@badams #SAScon
My Advice
• Keep the page up
• Recommend alternative
products
49.
@badams #SAScon
High-churn Listings Sites
• Pages with a limited lifespan, potentially thousands of new
pages every week
Online auctions / ‘… for sale’ classified sites / Job listings / etc…
301-redirect old URL to most relevant new URL
Minimum 180 days
Serve 410 (or 404) on old URL after 180 days
51.
@badams #SAScon
International Domains
• Check if they’re using the right TLD;
Generic TLDs: .com, .org, .net, .info, …
ccTLDs: .co.uk, .ie, .de, .fr, .it, .nl, …
• Generic domains can be geo-targeted with Google Search
Console
• Country-code domains will be assumed to target that country
It’s almost impossible to get a .ie website to rank in google.co.uk
53.
@badams #SAScon
Website Structure
• Subdirectories:
website.com/gb
website.com/it
• Subdomains:
gb.website.com
it.website.com
Verify separately in
Google Search Console
and set the geo-target
54.
@badams #SAScon
Country & Language
www.website.com/be-fr/
www.website.com/be-nl/
www.website.com/be-de/
Use official ISO country
& language codes where
possible
55.
@badams #SAScon
HTML Language Tag
• Use the ‘lang’ attribute:
• Don’t forget to change when you launch your international
version!
58.
@badams #SAScon
IP Redirects
• Google primarily crawls from
US-based IP addresses
• If a site uses IP address
redirects, make an exception
for all ‘Googlebot’ user-agents
69.
@badams #SAScon
Create Your Report
• Make it actionable
Give examples
• Explain WHY something needs changed
Sometimes devs can come up with a more elegant solution
• Prioritise
Provide a recommended timeline of changes