Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Crawl Data to Inform Site Architecture By Brain Weiss

912 views

Published on

From the SMX West Conference in San Jose, CA March 13-15, 2018. SESSION: The Latest In Advanced Technical SEO. PRESENTATION: Using Crawl Data to Inform Site Architecture - Given by Brian Weiss - Stone Temple, Managing Consultant. #SMX #32A

Published in: Marketing
  • Be the first to comment

Using Crawl Data to Inform Site Architecture By Brain Weiss

  1. 1. #SMX #32a Brian Weiss You Can’t Fix What You Can’t Find Using Crawl Data to Inform Site Architecture
  2. 2. #SMX #32a Brian Weiss Brian Weiss • Over 10Years in SEO; 7 as a consultant, 3 in-house • ManagingConsultant at StoneTemple • Columnist at searchengineland.com & StoneTemple’s Blog, Digital Marketing Excellence • Have analyzed crawls for clients of all sizes including two of the top 50 trafficked websites in USA • Largest crawls run: 200+ million pages and 3+ billion URLs discovered
  3. 3. #SMX #32a Brian Weiss Improved visibility leads to greater insight: Why Run a Crawl?
  4. 4. #SMX #32a Brian Weiss Improved visibility leads to greater insight: Why Run a Crawl? • You can’t fix what you can’t find.
  5. 5. #SMX #32a Brian Weiss Improved visibility leads to greater insight: Why Run a Crawl? • You can’t fix what you can’t find. • You can’t describe what you can’t measure.
  6. 6. #SMX #32a Brian Weiss Improved visibility leads to greater insight: Why Run a Crawl? • Crawling helps us to zoom in to get a clearer picture when we zoom out. • You can’t fix what you can’t find. • You can’t describe what you can’t measure.
  7. 7. #SMX #32a Brian Weiss Rapidly identify site architecture and SEO problems1 Why Run a Crawl?
  8. 8. #SMX #32a Brian Weiss Validate mobile vs. desktop parity2 Why Run a Crawl?
  9. 9. #SMX #32a Brian Weiss Precisely measure changes from one crawl to the next3 We have too many duplicate pages. We have 352,000 Duplicate pages, which account for 17% of our total URLs. Why Run a Crawl? Gather data to support resource requests4
  10. 10. #SMX #32a Brian Weiss Make sure we’re not missing something5 Why Run a Crawl?
  11. 11. #SMX #32a Brian Weiss Site is too large to visit every page6 Why Run a Crawl?
  12. 12. #SMX #32a Brian Weiss If link graph is dependent on Javascript rendering, you will need a headless crawler 1 Choosing a crawler – which features do I need?
  13. 13. #SMX #32a Brian Weiss Capable of obeying robots.txt instructions2 Choosing a crawler – which features do I need?
  14. 14. #SMX #32a Brian Weiss Ability to change user agent – mobile vs. desktop3 Choosing a crawler – which features do I need?
  15. 15. #SMX #32a Brian Weiss Companion log file analysis4 Choosing a crawler – which features do I need?
  16. 16. #SMX #32a Brian Weiss Popular Crawling Solutions Include:
  17. 17. #SMX #32a Brian Weiss • URL • Meta robots/noindex • Canonical • <Title> • Meta description • Rel=alt tags • Rel=next/prev • Amphtml • HTTP response code What data should a crawler collect? Basics
  18. 18. #SMX #32a Brian Weiss • Link depth • Link source(s) • Pagerank flow approximation • Redirect chain capture • Capture of URLs blocked by robots.txt What data should a crawler collect? Nice To Haves
  19. 19. #SMX #32a Brian Weiss What data should a crawler collect? Critical • Ability to aggregate data by folder/page type
  20. 20. #SMX #32a Brian Weiss Where are the important URLs (by folder/subdomain/parameter/etc.) It is helpful to have a baseline understanding BEFORE you crawl…
  21. 21. #SMX #32a Brian Weiss What do I EXPECT to see? (based on page creation logic) It is helpful to have a baseline understanding BEFORE you crawl… Expected Observed
  22. 22. #SMX #32a Brian Weiss Do my important pages have an efficient crawl path? Big Questions to Answer: Your link graph tells Google howYOU are prioritizing pages 1
  23. 23. #SMX #32a Brian Weiss Am I feeding Google thin content? Big Questions to Answer: ?!?2
  24. 24. #SMX #32a Brian Weiss Am I using my crawl budget well? Big Questions to Answer: 3
  25. 25. #SMX #32a Brian Weiss Do I have problems with titles, headlines or tags? Big Questions to Answer: 4 DuplicateTag Counts
  26. 26. #SMX #32a Brian Weiss Fortune 500 Retailer with 3X more category pages than product pages Case Study Examples Traffic increased 23% inT2
  27. 27. #SMX #32a Brian Weiss Small B2C Company had article pages that were pushed deeper on mobile (bad sign for mobile first) Case Study Examples Desktop Mobile Link Depth Link Depth Desktop 6 Mobile 216
  28. 28. #SMX #32a Brian Weiss A national e-commerce site restructured their site and increased the % of internal PageRank pointing to indexable pages by 100% Case Study Examples Traffic increased 33% inT2 28% 56%
  29. 29. #SMX #32a Brian Weiss Systematic Checks to Run • Mobile & AMPURLs • Hreflang • Rel=next/prev (also verify links on page) • Canonical; target should be self- canonical – no chains or canonicals to noindex or blocked pages 1 Handshake tag agreement
  30. 30. #SMX #32a Brian Weiss Systematic Checks to Run 2 Duplication: Title, H1, Meta Description (among unique canonical pages)
  31. 31. #SMX #32a Brian Weiss Systematic Checks to Run 3 Crawl errors – 300’s, 400’s, 500’s
  32. 32. #SMX #32a Brian Weiss Which pages in my XML Sitemaps do not appear in my crawl? Crawl Diagnostics Related to Indexation & Ranking
  33. 33. #SMX #32a Brian Weiss Crawl Diagnostics Related to Indexation & Ranking Link depth of indexed vs non-indexed pages Link Depth Indexed Non Indexed 1 1 1 2 70 10 3 2,000 450 4 10,000 5,000 5 5,000 10,000 6 1,500 12,000 7 1,000 15,000 8+ 2,500 100,000
  34. 34. #SMX #32a Brian Weiss Average link depth of important page groupings Crawl Diagnostics Related to Indexation & Ranking PageType Average Link Depth Category 2 Subcategory 3.5 Product 5.8 Blog 32.8 Reviews 4
  35. 35. #SMX #32a Brian Weiss • # of pages of each type • # of INDEXABLE pages of each type Crawl Diagnostics Related to Indexation & Ranking Page Type # of Pages In Crawl # of Indexable Pages /productpage 3,000 500 /categorypage 120 100 /articlepage 200 200 /blogpage 350 325 If indexable pages are less than 50% of URLs, you may be using too many band-aid fixes
  36. 36. #SMX #32a Brian Weiss Every Band-Aid has a Cost: Consolidates Pagerank BUT requires crawl and reconciliation of 2 pages (less so over time), acceptance by Google Passes Pagerank, consumes crawl budget (less over time) Canonical Pagerank sinkhole, preserves crawl budget • Nofollow – same but worse (don’t use on internal links) Robots.txt Noindex
  37. 37. #SMX #32a Brian Weiss Do I have a large number of pages to which I’m passing a relatively small amount of Pagerank? How to Choose a Band-Aid; Robots.txt
  38. 38. #SMX #32a Brian Weiss Do I have a large number of pages to which I’m passing a relatively small amount of Pagerank? Robots.txt Canonical Do I have a good canonical target and method for applying? How to Choose a Band-Aid;
  39. 39. #SMX #32a Brian Weiss Do I have a large number of pages to which I’m passing a relatively small amount of Pagerank? Robots.txt Noindex Canonical Do I have a good canonical target and method for applying? Is neither of the above true? And I have low quality pages that I don’t want indexed? How to Choose a Band-Aid;
  40. 40. #SMX #32a Brian Weiss Non-band aid solution: Stop linking to pages you don’t want indexed! How to Choose a Band-Aid;
  41. 41. #SMX #32a Brian Weiss What is my average daily crawl from Google? (# of pages – GSC) Diving into Google Crawl (GSC) 1
  42. 42. #SMX #32a Brian Weiss How many days would it take Google to crawl all the URLs on my site? Diving into Google Crawl (GSC) 2
  43. 43. #SMX #32a Brian Weiss Diving into Google Crawl (GSC) 2 What do non-indexed pages have in common compared to my pages that are indexed?
  44. 44. #SMX #32a Brian Weiss Diving into Google Crawl (GSC) 3 How does distribution of pages crawled by Google compare to distribution of pages found in crawl? Your Crawl Distribution Google’s Crawl Distribution
  45. 45. #SMX #32a Brian Weiss • Link parity • Page distribution • Content parity • Title, H1, meta description, tag agreement • Custom crawl config for unique elements • Consistency by user agent MobileDesktopDesktop vs. Mobile Crawl Analysis
  46. 46. #SMX #32a Brian Weiss Share these #SMXInsights on your social channels! #SMXInsights  You can’t fix what you can’t find. Crawling gives you visibility you can’t otherwise get.  Every band-aid fix has a cost.  Your link graph tells Google which pages YOU are prioritizing. Make sure you know what it says.
  47. 47. #SMX #32a Brian Weiss LEARN MORE: UPCOMING @SMX EVENTS Brian Weiss bweiss@stonetemple.com www.stonetemple.com Thank You!

×