Getting Rid of Duplicate Content Issues Once and For All PubCon, Las Vegas November 13, 2008 Ben D’Angelo Software Engineer
What are “duplicate content issues”? <ul><ul><li>Multiple disjoint situations! </li></ul></ul><ul><ul><li>Duplicate conten...
Guiding principle <ul><ul><li>One URL for one piece of content </li></ul></ul><ul><ul><li>Why? </li></ul></ul><ul><ul><li>...
Sources of duplicates within your sites <ul><ul><li>Multiple URLs pointing to the same page </li></ul></ul><ul><ul><ul><li...
<ul><ul><li>Many systems for de-duping URLs at various stages in our crawl/index pipeline </li></ul></ul><ul><ul><ul><li>G...
What can you do about your site? <ul><ul><li>For exact dupes: 301 </li></ul></ul><ul><ul><ul><li>Tracking URLs </li></ul><...
What can you do about your site? Choose www or non-www as preferred
What can you do about your site?
What can you do about another site? <ul><ul><li>Include original absolute URL in syndicated content </li></ul></ul><ul><ul...
Best practices for Google <ul><ul><li>Avoid duplicate URLs / sites </li></ul></ul><ul><ul><li>Generate unique, compelling ...
Useful links <ul><li>Webmaster Central     http://google.com/webmasters/ </li></ul><ul><ul><li>Webmaster Central Blog </l...
Thank You!
Upcoming SlideShare
Loading in...5
×

getting_rid_of_duplicate_content_iss-ben_dangelo.ppt

382

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
382
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

getting_rid_of_duplicate_content_iss-ben_dangelo.ppt

  1. 1. Getting Rid of Duplicate Content Issues Once and For All PubCon, Las Vegas November 13, 2008 Ben D’Angelo Software Engineer
  2. 2. What are “duplicate content issues”? <ul><ul><li>Multiple disjoint situations! </li></ul></ul><ul><ul><li>Duplicate content within your site or sites </li></ul></ul><ul><ul><ul><li>Multiple URLs pointing to the same page, similar pages </li></ul></ul></ul><ul><ul><ul><li>Different countries (same language) </li></ul></ul></ul><ul><ul><li>Duplicate content across other sites </li></ul></ul><ul><ul><ul><li>Syndicated content </li></ul></ul></ul><ul><ul><ul><li>Scraped content </li></ul></ul></ul>
  3. 3. Guiding principle <ul><ul><li>One URL for one piece of content </li></ul></ul><ul><ul><li>Why? </li></ul></ul><ul><ul><li>Users don’t like duplicates in results </li></ul></ul><ul><ul><li>Saves resources in our index—more room for other pages from your site! </li></ul></ul><ul><ul><li>Saves resources on your server </li></ul></ul>
  4. 4. Sources of duplicates within your sites <ul><ul><li>Multiple URLs pointing to the same page </li></ul></ul><ul><ul><ul><li>www vs non-www </li></ul></ul></ul><ul><ul><ul><li>Session ids, URL parameters </li></ul></ul></ul><ul><ul><ul><li>Printable versions of pages </li></ul></ul></ul><ul><ul><ul><li>CNAMEs </li></ul></ul></ul><ul><ul><li>Similar content on different pages </li></ul></ul><ul><ul><li>Manufacturer’s databases </li></ul></ul><ul><ul><li>Different countries </li></ul></ul>
  5. 5. <ul><ul><li>Many systems for de-duping URLs at various stages in our crawl/index pipeline </li></ul></ul><ul><ul><ul><li>General idea: cluster pages, choose the “best” representative </li></ul></ul></ul><ul><ul><li>Different filters are used for different types of duplicate content </li></ul></ul><ul><ul><li>Goal: serve one version of the content in search results </li></ul></ul><ul><ul><li>Generally just a filter: it will not destroy your site </li></ul></ul>How does Google handle this?
  6. 6. What can you do about your site? <ul><ul><li>For exact dupes: 301 </li></ul></ul><ul><ul><ul><li>Tracking URLs </li></ul></ul></ul><ul><ul><ul><li>www vs non-www (also Google Webmaster Tools) </li></ul></ul></ul><ul><ul><li>Near duplicates: noindex / robots.txt </li></ul></ul><ul><ul><ul><li>Printable pages </li></ul></ul></ul><ul><ul><ul><li>Clones of other sites </li></ul></ul></ul><ul><ul><li>Domains by country </li></ul></ul><ul><ul><ul><li>Different languages is not duplicate content </li></ul></ul></ul><ul><ul><ul><li>Use unique content specific to the country </li></ul></ul></ul><ul><ul><ul><li>Use different TLDs (also Google Webmaster Tools) for geo-targeting </li></ul></ul></ul><ul><ul><li>Url parameters </li></ul></ul><ul><ul><ul><li>Put data which does not affect the substance of a page in a cookie </li></ul></ul></ul>
  7. 7. What can you do about your site? Choose www or non-www as preferred
  8. 8. What can you do about your site?
  9. 9. What can you do about another site? <ul><ul><li>Include original absolute URL in syndicated content </li></ul></ul><ul><ul><li>Syndicate different content </li></ul></ul><ul><ul><li>If you use syndicated content, manage your expectations </li></ul></ul><ul><ul><li>Don’t worry about scrapers or proxies too much; they generally don’t affect your rankings </li></ul></ul><ul><ul><ul><li>If you are concerned, file a </li></ul></ul></ul><ul><ul><ul><ul><li>DMCA request ( http://www.google.com/dmca.html ) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Spam report ( https://www.google.com/webmasters/tools/spamreport ) </li></ul></ul></ul></ul>
  10. 10. Best practices for Google <ul><ul><li>Avoid duplicate URLs / sites </li></ul></ul><ul><ul><li>Generate unique, compelling content for users </li></ul></ul><ul><ul><li>Don’t be overly concerned with duplicate content </li></ul></ul><ul><ul><li>Let us know about any issues at the Webmaster Help Forum </li></ul></ul>
  11. 11. Useful links <ul><li>Webmaster Central  http://google.com/webmasters/ </li></ul><ul><ul><li>Webmaster Central Blog </li></ul></ul><ul><ul><ul><li>http://googlewebmastercentral.blogspot.com/ </li></ul></ul></ul><ul><ul><li>Webmaster Help Center </li></ul></ul><ul><ul><ul><li>http://www.google.com/support/webmasters/ </li></ul></ul></ul><ul><ul><li>Webmaster Discussion Group </li></ul></ul><ul><ul><ul><li>http://groups.google.com/group/Google_Webmaster_Help </li></ul></ul></ul>
  12. 12. Thank You!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×