Infinite Loops Dirty Architecture And Too Many Indexed URLs

2,230 views

Published on

Dawn Anderson's Brighton SEO deck from April 2014. Looks at crawlability issues on large sites and in particular to infinite URLs / infinite loops, dirty architecture and too many indexed URLs.

There is a blog post / article that I wrote for the Brighton SEO newspaper which covers the information in this deck in a lot more detail.

It is here:

http://bit.ly/Ss6Lf1

Published in: Marketing, Technology, Design
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,230
On SlideShare
0
From Embeds
0
Number of Embeds
702
Actions
Shares
0
Downloads
22
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Infinite Loops Dirty Architecture And Too Many Indexed URLs

  1. 1. INFINITE LOOPS & crawl rank DIRTY ARCHITECTURE Dawn Anderson
  2. 2. CAME INDUSTRY VIA A DIFFERENT ROUTE THIS to
  3. 3. I decided to add an additional dimension to the site TO ‘EXPLODE’ NATURAL SEARCH TRAFFIC
  4. 4. 1.5 Million URLs
  5. 5. Crawl Rate Going Down Indexation Levels Going Up
  6. 6. GOOGLE Only crawling 0.1%Of our pages per day
  7. 7. Infinite Loop Definition: An infinite loop is a sequence of instructions in a computer program which loops endlessly, either due to the loop having no terminating condition, having one that can never be met, or one that causes the loop to start over. ..
  8. 8. PENGUIN & PANDA updates came along
  9. 9. TOO MANY URLS =SEO DEATH ‘WE’RE ALL ‘DOOMED’’
  10. 10. Budget CRAWL Roughly proportionate to PageRank Pages with a lot of links get crawled more Still applies in current search landscape
  11. 11. Rank CRAWL A ranking metric for ‘no’ to ‘low’ PageRank pages?? Pages crawled more often rank higher Get ‘low’ to ‘no’ PageRank pages crawled more than competitors = YOU WIN
  12. 12. CRAWL OPTIMISATION Googlebot goes AND KEEP WATCHING FIND OUT WHERE
  13. 13. CHECK & MONITOR for over-indexation 500 Page Website 50,00 URLs in Google YOU MAY HAVE DODGY CODE
  14. 14. Shoes.sitemap.xml Dresses.sitemap.xml tshirts.sitemap.xml Check THOROUGHLY, Name & Categorise XML Sitemaps yoursite.sitemap.xml
  15. 15. DON’T BE AFRAID of hard 404’s Use 410’s where you can Giraffe AVOID soft 404’s
  16. 16. ENSURE THAT Dynamic variables / parameters are checked for validation Don’t render to just any old thing with a ‘200 OK’ response code or return a soft 404 HOW WILL YOU KNOW IF THERE’S A PROBLEM? You won’t
  17. 17. AVOID A ‘JUMBLE SALE’ BUT
  18. 18. Use Robots.txt, nofollows, sitemaps, nav paths & cross module internal linking ‘Herd’ Googlebot
  19. 19. Get Those Low Level Pages Crawled - Often Whichever way you can Pass equity to Siblings as Well as children
  20. 20. Visit the internal links section on GWT Most Important Page 1 Most Important Page 2 Most Important Page 3 IS THIS YOUR BLOG?? HOPE NOT
  21. 21. CANONICALISATIONIn web search and search engine optimization (SEO), URL canonicalization deals with web content that has more than one possible URL. Having multiple URLs for the same web content can cause problems for search engines - specifically in determining which URL should be shown in search results.[2] Example: •http://wikipedia.com •http://www.wikipedia.com •http://www.wikipedia.com/ •http://www.wikipedia.com/?source=asdf All of these URLs point to the homepage of Wikipedia, but a search engine will only consider one of them to be the canonical form of the URL.(source - Wikipedia)
  22. 22. Deal Well With Near & near duplicate content Via canonicalization, 301’s & Content Build Out
  23. 23. STOP LYING & ‘GET FRESH’ Genuine ‘last modified dates’ are ALL important - FORGET PRIORITY
  24. 24. "It's not that Google will penalize you, it's the opportunity cost for dirty architecture based on a finite crawl budget" (A.J.Kohn) (BLIND FIVE YEAR OLD) REMEMBER THIS
  25. 25. Me @dawnieando

×