0
Bot Herding
               presented by Stephan Spencer,
             Founder & President, Netconcepts


© 2008 Stephan M ...
Duplicate Content Mitigation
 Dup content is rampant on blogs. Herd bots to permalink
  URL & lead in everywhere else (Ar...
Duplicate Content Mitigation
 Include sig line (& headshot photo!) at bottom of
  post/article. Link to original article/...
Duplicate Content Mitigation
 On ecommerce sites, dup content also rampant:
   – Manufacturer-provided product descriptio...
Pagination
 Not only creates many pages that share the same
  keyword theme, also very large categories with
  thousands ...
PageRank Leakage?
 If you’re using Robots.txt Disallow, you’re probably
  leaking PageRank
 Robots.txt Disallow & Meta R...
Rewriting Spider-Unfriendly URLs
 3 approaches:
   1) Use a “URL rewriting” server module / plugin – such as
      mod_re...
mod_rewrite – the Foundation for URL
Rewriting, Remapping & Redirecting
 Works with Apache and IBM HTTP Server
 Place “r...
Regular Expressions
 The magic of regular expressions / pattern matching
   –   * means 0 or more of the immediately prec...
Regular Expressions
   – () puts whatever is wrapped within it into memory
   – Access what’s in memory with $1 (what’s in...
mod_rewrite Specifics
 Proxy page using [P] flag
   – RewriteRule /blah.html$ http://www.google.com/ [P]
 [QSA] flag is ...
IIS? ISAPI_Rewrite!
 What if your site is running Microsoft IIS Server?
 ISAPI_Rewrite plugin! Not that different from m...
Implementing 301 Redirects Using
Redirect Directives
 In .htaccess (or httpd.conf), you can redirect individual
  URLs, t...
Implementing 301 Redirects Using
Rewrite Rules
 Or use a rewrite rule with the [R=301] flag
   – RewriteCond %{HTTP_HOST}...
Conditional Redirects
 Conditional 301 for bots – great for capturing the link juice
  from inbound affiliate links
 Onl...
Status Code
                                                                       200 for humans

© 2008 Stephan M Spence...
301 for all bots.
                                                                         Muahaha!!




© 2008 Stephan M ...
Implementing Conditional Redirects
Using Rewrite Rules
 Selectively redirect bots that request URLs with session
  IDs to...
Error Pages




        © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
Error Pages
 Traditional approach is to serve up a 404, which drops that error
  page with the obsolete or wrong URL out ...
URL Stability
 An annually recurring feature, like a Holiday Gift Buying
  Guide, should have a stable, date-unspecified ...
URL Testing
 URL affects
  searcher
  clickthrough
  rates
 Short URLs
  get clicked on
  2X long URLs

  (Source: Marke...
URL Testing
 Further, long URLs appear to act as a deterrent to clicking, drawing
  attention away from its listing and i...
Yank Competitor’s Grouped Results
from Google page 1 SERPs
 Knock out your competitor’s second indented (grouped)
  listi...
This isn’t
                                       really #3




© 2008 Stephan M Spencer Netconcepts www.netconcepts.com s...
Nope,
                                         not yet




© 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspenc...
Gone!
                                        It’s true
                                        position
                 ...
SEO the
                                        title of #12
                                        to bump it
          ...
More Things I Wish I Had Time to
Cover
 Robots.txt gotchas
 Webmaster Central tools (www vs no www, crawl rate, robots.t...
Thanks!
 This Powerpoint can be downloaded from
  www.netconcepts.com/learn/bot-herding.ppt
 For 180 minute long screenc...
Upcoming SlideShare
Loading in...5
×

Google Bot Herding, PageRank Sculpting and Manipulation

8,487

Published on

Presentation from Stephan Spencer, Founder & President of Netconcepts and about Google Bot Herding, PageRank Sculpting and Manipulation.

Published in: Technology, Design
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,487
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
83
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Google Bot Herding, PageRank Sculpting and Manipulation"

  1. 1. Bot Herding presented by Stephan Spencer, Founder & President, Netconcepts © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  2. 2. Duplicate Content Mitigation  Dup content is rampant on blogs. Herd bots to permalink URL & lead in everywhere else (Archives by Date pages, Category pages, Tag pages, Home page, etc.) with paraphrased “Optional Excerpt” – Not just the first couple paragraphs, i.e. the <!--more--> tag! – Requires you to revise your Main Index Template theme file: if (empty($post->post_excerpt) || is_single() || is_page()) { the_content(); } else { the_excerpt(); echo quot;<a href='”; the_permalink(); echo quot;' rel='nofollow'>Continue reading &raquo;</a>quot;; } © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  3. 3. Duplicate Content Mitigation  Include sig line (& headshot photo!) at bottom of post/article. Link to original article/post permalink URL! – http://www.naturalsearchblog.com/archives/2008/06/03/syndic ating-your-articles/ – http://www.businessblogconsulting.com/2008/05/brand- yourself-with-photo-sig-line © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  4. 4. Duplicate Content Mitigation  On ecommerce sites, dup content also rampant: – Manufacturer-provided product descriptions, inconsistent order of query string parameters, “guided navigation”, pagination within categories, tracking parameters  Selectively append tracking codes for humans w/ “white hat cloaking” or use JavaScript to append the codes – REI.com used to append a quot;vcatquot; parameter on all brand links on their Shop By Brand page (see http://web.archive.org/web/20060823085548/www.rei.com/rei/s ales_and_events/brands.html) © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  5. 5. Pagination  Not only creates many pages that share the same keyword theme, also very large categories with thousands of products result in hundreds of pages of product listings not getting crawled. Thus lowered product page indexation.  Herd bots through keyword-rich subcat links or “View All” link or both? How to display page number links? Optimal # of products to display/link per page? Test! © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  6. 6. PageRank Leakage?  If you’re using Robots.txt Disallow, you’re probably leaking PageRank  Robots.txt Disallow & Meta Robots Noindex both accumulate and pass PageRank – Meta Noindex tag on a Master sitemap page will de-index the page but still pass PageRank to linked sub-sitemap pages  Meta Robots Nofollow blocks the flow of PageRank – http://www.stonetemple.com/articles/interview-matt-cutts.shtml © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  7. 7. Rewriting Spider-Unfriendly URLs  3 approaches: 1) Use a “URL rewriting” server module / plugin – such as mod_rewrite for Apache, or ISAPI_Rewrite for IIS Server 2) Recode your scripts to extract variables out of the “path_info” part of the URL instead of the “query_string” 3) Or, if IT department involvement must be minimized, use a proxy server based solution (e.g. Netconcepts' GravityStream) – With (1) and (2), replace all occurrences of your old URLs in links on your site with your new search-friendly URLs. 301 redirect the old to new URLs too, so no link juice is lost. © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  8. 8. mod_rewrite – the Foundation for URL Rewriting, Remapping & Redirecting  Works with Apache and IBM HTTP Server  Place “rules” within .htaccess or your Apache config file (e.g. httpd.conf, sites_conf/…) – RewriteEngine on – RewriteBase / – RewriteRule ^products/([0-9]+)/?$ /get_product.php?id=$1 [L] – RewriteRule ^([^/]+)/([^/]+).htm$ /webapp/wcs/stores/servlet/ProductDisplay?storeId=10 001&cat alogId=10001&langId=-1 &categoryID=$1&productID=$2 [QSA,P,L] © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  9. 9. Regular Expressions  The magic of regular expressions / pattern matching – * means 0 or more of the immediately preceding character – + means 1 or more of the immediately preceding character – ? means 0 or 1 occurrence of the immediately preceding char – ^ means the beginning of the string, $ means the end of it – . means any character (i.e. wildcard) – “escapes” the character that follows, e.g. . mea dot ns – [ ] is for character ranges, e.g. [A-Za-z]. – ^ inside [] brackets means “not”, e.g. [^/] © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  10. 10. Regular Expressions – () puts whatever is wrapped within it into memory – Access what’s in memory with $1 (what’s in first set of parens), $2 (what’s in second set of parens), and so on  Gotchas to beware of: – “Greedy” expressions. Use [^ instead of .* – .* can match on nothing. Use .+ instead – Unintentional substring matches because ^ or $ wasn’t specified © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  11. 11. mod_rewrite Specifics  Proxy page using [P] flag – RewriteRule /blah.html$ http://www.google.com/ [P]  [QSA] flag is for when you don’t want query string params dropped (like when you want a tracking param preserved)  [L] flag saves on server processing  Got a huge pile of rewrites? Use RewriteMap and have a lookup table as a text file © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  12. 12. IIS? ISAPI_Rewrite!  What if your site is running Microsoft IIS Server?  ISAPI_Rewrite plugin! Not that different from mod_rewrite  In httpd.ini : – [ISAPI_Rewrite] RewriteRule ^/category/([0-9]+).htm$ /index.asp?PageAction=VIEWCATS&Category=$1 [L] – Will rewrite a URL like http://www.example.com/index.asp?PageAction=VIEWCATS&Ca tegory=207 to something like http://www.example.com/category/207.htm © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  13. 13. Implementing 301 Redirects Using Redirect Directives  In .htaccess (or httpd.conf), you can redirect individual URLs, the contents of directories, entire domains… : – Redirect 301 /old_url.htm http://www.example.com/new_url.htm – Redirect 301 /old_dir/ http://www.example.com/new_dir/ – Redirect 301 / http://www.example.com  Pattern matching can be done with RedirectMatch 301 – RedirectMatch 301 ^/(.+)/index.html$ http://www.example.com/$1/ © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  14. 14. Implementing 301 Redirects Using Rewrite Rules  Or use a rewrite rule with the [R=301] flag – RewriteCond %{HTTP_HOST} !^www.example.com$ [NC] – RewriteRule ^(.*)$ http://www.example.com/$1 [L,QSA,R=301]  [NC] flag makes the rewrite condition case-insensitive © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  15. 15. Conditional Redirects  Conditional 301 for bots – great for capturing the link juice from inbound affiliate links  Only works if you manage your own affiliate program  Most are outsourced and 302  (e.g. C.J.)  By outsourcing your affiliate marketing, none of your deep affiliate links are counting  If Amazon’s doing it, why can’t you?  – (Credit to Brian Klais for hypothesizing Amazon was doing this) – http://tinyurl.com/5ubc28 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  16. 16. Status Code 200 for humans © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  17. 17. 301 for all bots. Muahaha!! © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  18. 18. Implementing Conditional Redirects Using Rewrite Rules  Selectively redirect bots that request URLs with session IDs to the URL sans session ID: – RewriteCond %{QUERY_STRING} PHPSESSID RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR] RewriteCond %{HTTP_USER_AGENT} Slurp [OR] RewriteCond %{HTTP_USER_AGENT} Ask Jeeves RewriteRule ^/(.*)$ /$1 [R=301,L]  Utilize browscap.ini instead of having to keep up with each spider’s name and version changes © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  19. 19. Error Pages © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  20. 20. Error Pages  Traditional approach is to serve up a 404, which drops that error page with the obsolete or wrong URL out of the search indexes. This squanders the link juice to that page.  But what if you return a 200 status code instead, so that the spiders follow the links! Then include a meta robots noindex so the error page itself doesn’t get indexed.   Or do a 301 redirect to something valuable (e.g. your home page) and dynamically include a small error notice   (Credit to Francois Planque for this clever approach.) © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  21. 21. URL Stability  An annually recurring feature, like a Holiday Gift Buying Guide, should have a stable, date-unspecified URL – No need for any 301s – When the current edition is to be retired and replaced with a new edition, assign a new URL to the archived edition  Otherwise link juice earned over time is not carried over to future years’ editions © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  22. 22. URL Testing  URL affects searcher clickthrough rates  Short URLs get clicked on 2X long URLs (Source: MarketingSherpa, used with permission) © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  23. 23. URL Testing  Further, long URLs appear to act as a deterrent to clicking, drawing attention away from its listing and instead directing it to the listing below it, which then gets clicked 2.5x more frequently. – http://searchengineland.com/080515-084124.php  Don’t be complacent with search-friendly URLs. Test and optimize.  Make iterative improvements to URLs, but don’t lose link juice to previous URLs. 301 previous URLs to latest. No ch of 301s. ains  WordPress handles 301s automatically when renaming post slugs  Mass editing URLs (post slugs) in WordPress – announcement tomorrow in Give It Up session © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  24. 24. Yank Competitor’s Grouped Results from Google page 1 SERPs  Knock out your competitor’s second indented (grouped) listing by directing link juice to other non-competitive listings (e.g. on page 2 SERPs, or directly below indented result’s true position)  First, find the true position of their indented result by appending &num=9 to the URL and see if the indented listing drops off. If not, append &num=8. Rinse and repeat until the indented listing falls away. Indented listing is more susceptible the worse its true position. © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  25. 25. This isn’t really #3 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  26. 26. Nope, not yet © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  27. 27. Gone! It’s true position was #9 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  28. 28. SEO the title of #12 to bump it up to page 1 – it will be grouped to #2. Then link to #11 and bump it up to page 1 to knock #4 to page 2 © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  29. 29. More Things I Wish I Had Time to Cover  Robots.txt gotchas  Webmaster Central tools (www vs no www, crawl rate, robots.txt builder, Sitemaps, etc.)  Yahoo's Dynamic URLs tab in Site Explorer  <div class=quot;robots-nocontentquot;>  If-Modified-Since  Status codes 404, 401, 500 etc.  PageRank transfer from PDFs, RSS feeds, Word docs etc.  Diagnostic tools (e.g. livehttpheaders, User Agent Switcher) © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  30. 30. Thanks!  This Powerpoint can be downloaded from www.netconcepts.com/learn/bot-herding.ppt  For 180 minute long screencast (including 90 minutes of Q&A) on SEO for large dynamic websites (taught by myself and Chris Smith) – including transcripts – email seo@netconcepts.com  Questions after the show? Email me at stephan@netconcepts.com © 2008 Stephan M Spencer Netconcepts www.netconcepts.com sspencer@netconcepts.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×