Your SlideShare is downloading. ×
  • Like
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply




Published in Technology , News & Politics
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Getting Rid of Duplicate Content Issues Once and For All PubCon, Las Vegas November 13, 2008 Ben D’Angelo Software Engineer
  • 2. What are “duplicate content issues”?
      • Multiple disjoint situations!
      • Duplicate content within your site or sites
        • Multiple URLs pointing to the same page, similar pages
        • Different countries (same language)
      • Duplicate content across other sites
        • Syndicated content
        • Scraped content
  • 3. Guiding principle
      • One URL for one piece of content
      • Why?
      • Users don’t like duplicates in results
      • Saves resources in our index—more room for other pages from your site!
      • Saves resources on your server
  • 4. Sources of duplicates within your sites
      • Multiple URLs pointing to the same page
        • www vs non-www
        • Session ids, URL parameters
        • Printable versions of pages
        • CNAMEs
      • Similar content on different pages
      • Manufacturer’s databases
      • Different countries
  • 5.
      • Many systems for de-duping URLs at various stages in our crawl/index pipeline
        • General idea: cluster pages, choose the “best” representative
      • Different filters are used for different types of duplicate content
      • Goal: serve one version of the content in search results
      • Generally just a filter: it will not destroy your site
    How does Google handle this?
  • 6. What can you do about your site?
      • For exact dupes: 301
        • Tracking URLs
        • www vs non-www (also Google Webmaster Tools)
      • Near duplicates: noindex / robots.txt
        • Printable pages
        • Clones of other sites
      • Domains by country
        • Different languages is not duplicate content
        • Use unique content specific to the country
        • Use different TLDs (also Google Webmaster Tools) for geo-targeting
      • Url parameters
        • Put data which does not affect the substance of a page in a cookie
  • 7. What can you do about your site? Choose www or non-www as preferred
  • 8. What can you do about your site?
  • 9. What can you do about another site?
      • Include original absolute URL in syndicated content
      • Syndicate different content
      • If you use syndicated content, manage your expectations
      • Don’t worry about scrapers or proxies too much; they generally don’t affect your rankings
        • If you are concerned, file a
          • DMCA request ( )
          • Spam report ( )
  • 10. Best practices for Google
      • Avoid duplicate URLs / sites
      • Generate unique, compelling content for users
      • Don’t be overly concerned with duplicate content
      • Let us know about any issues at the Webmaster Help Forum
  • 11. Useful links
    • Webmaster Central 
      • Webmaster Central Blog
      • Webmaster Help Center
      • Webmaster Discussion Group
  • 12. Thank You!