Overcoming Common Crawl
Optimisation Issues
@DeepCrawl
• Importance of Technical SEO
• Crawl Space & Budget
•Benchmarking
•Authority, Speed & Efficiency
•The Future
•Summary & Questions
Contents
@DeepCrawl
How important is Technical SEO?
"Even a basic understanding of what to
look for in technical SEO can get you far.
So many people today focus too heavily
on off-page SEO, but if a site is
technically flawed, it won't matter how
many links you have or how good your
content is.”
@DeepCrawl
Erin Everhart, SEO Manager, The Home Depot
“Infinite Crawl Space”
Google, 2008
@DeepCrawl
“Infinite Crawl Space”
@DeepCrawl
Number of websites vs users (bn)
@DeepCrawl
0
0.5
1
1.5
2
2.5
3
3.5
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Websites Internet Users
Source: Internet Live Stats, W3C
* Estimation based on figures from June 2014
@DeepCrawl
“The amount of time Googlebot spends
crawling your site and indexing pages”
Crawl Budget
@DeepCrawl
39%
8%
1%
40%
11%
1%
Common eCommerce Issues
Unique Duplicate Paginated Non Indexable Non 200 Failed
*data taken from 157 sample crawls (10,000 URLs) of UK ecommerce websites
*
@DeepCrawl
What can you Do?
@DeepCrawl
Ease of Access & Valuable Pages
@DeepCrawl
Ensure that your pages are easy to reach
Content is hard to find for users and
unlikely to be crawled by Googlebot
@DeepCrawl
Use visitor data to identify low value pages
@DeepCrawl
Use visitor data to identify high value pages
@DeepCrawl
Site Speed & Redirects
@DeepCrawl
Ensure your pages can be crawled quickly
https://www.youtube.com/watch?v=opUfIzuzJSw&feature=youtu.be&t=1010
@DeepCrawl
Keep redirects to a minimum
@DeepCrawl
• Put your most important pages first
• Minimise the number of redirect loops/chains
• Cache content that changes infrequently and/or move static
content to a Content Delivery Network
• Enable Compression (Gzip)
• Improve server response times
Considerations for improving Time on site & Site Speed
@DeepCrawl
Making the most of Googlebots’ time
@DeepCrawl
Remove all Duplicate Pages
• Makes Googlebot work twice as hard
• Weakens the authority of the Primary page you want to be indexed
@DeepCrawl
Non-Indexable does not mean Non-Crawlable
“Before implementing a rel=canonical
tag, you have to ask yourself whether it
actually addresses the underlying issue,
or whether it’s a slap-dash fix that
serves as a mere cosmetic cover-up for
the problem.”
Barry Adams, SEO Consultant & State of Digital Editor
@DeepCrawl
@DeepCrawl
Utilise robots.txt
• When used correctly robots.txt is incredibly effective in
increasing Crawl Budget
• Always test any changes as it’s very easy to make mistakes
@DeepCrawl
@DeepCrawl
Manage URL Parameters
@DeepCrawl
• Instruct Google (in GSC) to ignore parameters
• Utilise the robots.txt file where appropriate (TEST THIS)
• Remember that non-indexable doesn’t mean it’s not crawled
• Ensure there is no duplication
• Use self-referencing canonical tags
Considerations for improving Efficiency
@DeepCrawl
What can be Achieved?
@DeepCrawl
What can be achieved?
https://www.deepcrawl.com/case-studies/modanisa-sales-up-400-thanks-to-seo-audit/
@DeepCrawl
Where are we Heading?
@DeepCrawl
The Future of Technical SEO
• Organic searches have been made secure, making keyword research increasingly difficult –
even more important to get Technical SEO right
• Google will allow you to purchase additional crawl budget, if you feel you aren’t being
crawled enough
• Competition to be indexed & rank continues to increase with 40,000 Google search queries
per second and more & more websites– ensure sitemaps are accurate to direct Google to
the pages you prioritise
• Mobile will continue to have great significance in the world of search with more than half
happening on mobile – The principles of getting Technical SEO right will still apply
@DeepCrawl
How to get ahead of the Game
The first step to being found in Search results is being crawled and indexed by Googlebot
Canonicalising pages removes the duplication issue but can create crawl efficiency issues
Make life easy as possible for Search Engines to crawl and index your site(s)
Big doesn’t necessarily always mean better – often, less is more
Test, test and test changes again and again
Stay up to date with the latest developments - Watch Webmaster Hangouts……………….. (or read the Summaries in our
newsletter!)
All these changes are of benefit to the user
@DeepCrawl
@David_BrownUK
Thank You!

David Brown - Crawl Efficiency & Fixing Common Crawl Issues

  • 1.
  • 2.
    @DeepCrawl • Importance ofTechnical SEO • Crawl Space & Budget •Benchmarking •Authority, Speed & Efficiency •The Future •Summary & Questions Contents
  • 3.
  • 4.
    "Even a basicunderstanding of what to look for in technical SEO can get you far. So many people today focus too heavily on off-page SEO, but if a site is technically flawed, it won't matter how many links you have or how good your content is.” @DeepCrawl Erin Everhart, SEO Manager, The Home Depot
  • 5.
  • 6.
  • 7.
    Number of websitesvs users (bn) @DeepCrawl 0 0.5 1 1.5 2 2.5 3 3.5 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Websites Internet Users Source: Internet Live Stats, W3C * Estimation based on figures from June 2014
  • 8.
    @DeepCrawl “The amount oftime Googlebot spends crawling your site and indexing pages” Crawl Budget
  • 9.
    @DeepCrawl 39% 8% 1% 40% 11% 1% Common eCommerce Issues UniqueDuplicate Paginated Non Indexable Non 200 Failed *data taken from 157 sample crawls (10,000 URLs) of UK ecommerce websites *
  • 10.
  • 11.
    @DeepCrawl Ease of Access& Valuable Pages
  • 12.
    @DeepCrawl Ensure that yourpages are easy to reach Content is hard to find for users and unlikely to be crawled by Googlebot
  • 13.
    @DeepCrawl Use visitor datato identify low value pages
  • 14.
    @DeepCrawl Use visitor datato identify high value pages
  • 15.
  • 16.
    @DeepCrawl Ensure your pagescan be crawled quickly https://www.youtube.com/watch?v=opUfIzuzJSw&feature=youtu.be&t=1010
  • 17.
  • 18.
    @DeepCrawl • Put yourmost important pages first • Minimise the number of redirect loops/chains • Cache content that changes infrequently and/or move static content to a Content Delivery Network • Enable Compression (Gzip) • Improve server response times Considerations for improving Time on site & Site Speed
  • 19.
    @DeepCrawl Making the mostof Googlebots’ time
  • 20.
    @DeepCrawl Remove all DuplicatePages • Makes Googlebot work twice as hard • Weakens the authority of the Primary page you want to be indexed
  • 21.
  • 22.
    “Before implementing arel=canonical tag, you have to ask yourself whether it actually addresses the underlying issue, or whether it’s a slap-dash fix that serves as a mere cosmetic cover-up for the problem.” Barry Adams, SEO Consultant & State of Digital Editor @DeepCrawl
  • 23.
    @DeepCrawl Utilise robots.txt • Whenused correctly robots.txt is incredibly effective in increasing Crawl Budget • Always test any changes as it’s very easy to make mistakes
  • 24.
  • 25.
  • 26.
    @DeepCrawl • Instruct Google(in GSC) to ignore parameters • Utilise the robots.txt file where appropriate (TEST THIS) • Remember that non-indexable doesn’t mean it’s not crawled • Ensure there is no duplication • Use self-referencing canonical tags Considerations for improving Efficiency
  • 27.
  • 28.
    @DeepCrawl What can beachieved? https://www.deepcrawl.com/case-studies/modanisa-sales-up-400-thanks-to-seo-audit/
  • 29.
  • 30.
    @DeepCrawl The Future ofTechnical SEO • Organic searches have been made secure, making keyword research increasingly difficult – even more important to get Technical SEO right • Google will allow you to purchase additional crawl budget, if you feel you aren’t being crawled enough • Competition to be indexed & rank continues to increase with 40,000 Google search queries per second and more & more websites– ensure sitemaps are accurate to direct Google to the pages you prioritise • Mobile will continue to have great significance in the world of search with more than half happening on mobile – The principles of getting Technical SEO right will still apply
  • 31.
    @DeepCrawl How to getahead of the Game The first step to being found in Search results is being crawled and indexed by Googlebot Canonicalising pages removes the duplication issue but can create crawl efficiency issues Make life easy as possible for Search Engines to crawl and index your site(s) Big doesn’t necessarily always mean better – often, less is more Test, test and test changes again and again Stay up to date with the latest developments - Watch Webmaster Hangouts……………….. (or read the Summaries in our newsletter!) All these changes are of benefit to the user
  • 32.