Duplicate Content Filters, Penalties and other             Content Minefields              27th March 2012
Search Quality – the Duplicate Content HeadacheGoogle can’t afford a SERPs of;4)Search engine optimization           Searc...
Resource – the Duplicate Content HeadacheDuplicate content has consequences for SE in;Wastes Crawler resources - finite nu...
Document importance – Duplicate Content Headache Duplicate content can be a signal of an important document; • Song lyrics...
Types of Duplicate ContentDuplicate content comes in many formsIntentional vs non intentionalOn-site vs off-site          ...
On-Site Duplicate Content (Impacts Quality Score)Intentional•    Printer friendly pages•Different font sizes•PDF documents...
On-Site Duplicate Content (Impacts Quality Score)10’000s of stub pages worst case scenario example;  This was 2 weeks afte...
Off-Site Duplicate Content (Filters and Penalties)Intentional vs non-intentional somewhat greyDomain branding eg .com, .co...
How Does Google Filter Off-site Duplicate ContentAuthors feel they have a right to rank for their own content –Google’s Lo...
Examples of Off-site Duplicate Content and QualityClient with .com.au and a .com with https duplicatesCasino Client with a...
How to Diagnose (on-site) Duplicate ContentLink building will exacerbate duplicate content indexingKeep an eye on indexed ...
How to address on-site and off-site duplicate contentYou have a whole armoury of potential tools including;Robots.txt excl...
Google Engineers Can’t Agree
Adam Lasnik – “Deftly Dealing withDuplicate Content” 2006  Probably the authoritative guide to duplicate content;  • What ...
Deftly Dealing with... - Our advice/experienceRobots.txtRoutinely ignored by Google, probably because of malwareUser-agent...
Our advice/experienceCanonical tagWorks great for cross-domain duplicate contentLargely ineffective for pagination eg shop...
Our advice/experienceRobots Meta TagNoindex,Follow - 100% obeyed by Google and passes Page Rank tooVery effective for pagi...
Our advice/experiencePassword Protect/htaccess 403 ForbiddenWorks great for staging sitesStubs - Problem in that it genera...
Extreme Techniques to Avoid Dupe ContentMake all your backend .exewith htaccess
Summary Duplicate content is a minefield! Filters usually apply, penalties are very rare You have the answer in your own h...
Thank you for your attention!Thanks to:Anton GroeneveldtCarla dos Santos
Upcoming SlideShare
Loading in …5
×

Duplicate content presentation March 2012

791 views

Published on

Some valuable insights into why duplicate content on your website is a problem for Google. Work-arounds and suggested solutions are made, but please let us know your thoughts.

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
791
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Duplicate content presentation March 2012

  1. 1. Duplicate Content Filters, Penalties and other Content Minefields 27th March 2012
  2. 2. Search Quality – the Duplicate Content HeadacheGoogle can’t afford a SERPs of;4)Search engine optimization Search engine optimization (SEO) is the process of improving thevisibility of a website or a web page in search engines........ 2) Search engine optimization Search engine optimization (SEO) is the process of improving thevisibility of a website or a web page in search engines........3) Search engine optimization Search engine optimization (SEO) is the process of improving thevisibility of a website or a web page in search engines........4) Search engine optimization Search engine optimization (SEO) is the process of improving thevisibility of a website or a web page in search engines........ 2
  3. 3. Resource – the Duplicate Content HeadacheDuplicate content has consequences for SE in;Wastes Crawler resources - finite number of crawlersWastes Bandwidth – how often can you crawl 1 trillion documents andkeep your index fresh?Increases Query CPU time – how do you search 1 trillion documents asquickly as possible? 3
  4. 4. Document importance – Duplicate Content Headache Duplicate content can be a signal of an important document; • Song lyrics • Scholarly texts and historical documents, eg the Bible (1,000 pages) • The Linux manual (2,000 pages) • Breaking News – Associated Press, Reuters etc. 4
  5. 5. Types of Duplicate ContentDuplicate content comes in many formsIntentional vs non intentionalOn-site vs off-site 5
  6. 6. On-Site Duplicate Content (Impacts Quality Score)Intentional• Printer friendly pages•Different font sizes•PDF documents•Archive (non graphics versions)•Shopping filters (sort by and pagination)•RSS feedsNon-intentional• Affiliate URLs - www.example.com/?btag=123• Adwords Campaigns - www.example.com/?utc=google•Search results•www vs non www URLs•https vs http•Stubs/plugins 6
  7. 7. On-Site Duplicate Content (Impacts Quality Score)10’000s of stub pages worst case scenario example; This was 2 weeks after Andy had removed the duplicate links from the search pages on our advice eg; http://www.motors.co.uk/Ford-Escort-0-9999999---2 http://www.motors.co.uk/Ford-Escort-0-9999999--U-2- http://www.motors.co.uk/Ford-Escort-0-9999999---2%20- 7
  8. 8. Off-Site Duplicate Content (Filters and Penalties)Intentional vs non-intentional somewhat greyDomain branding eg .com, .co.za(Mobile website)Content syndicationContent theftStaging websites a common problem!!Quality signals are often used to filter off-site Duplicates!!! 8
  9. 9. How Does Google Filter Off-site Duplicate ContentAuthors feel they have a right to rank for their own content –Google’s Loyalty is to its users!!!Google doesn’t necessarily reward a source or original but assesses;• Relevance (eg is an article in context)• Domain authority & links (eg Google Knol, Facebook)• Fresh content boost• Site quality signals (eg internal duplicate content!!!) 9
  10. 10. Examples of Off-site Duplicate Content and QualityClient with .com.au and a .com with https duplicatesCasino Client with alot of stub pages(pre Panda)Casino site– severe health issues; 10
  11. 11. How to Diagnose (on-site) Duplicate ContentLink building will exacerbate duplicate content indexingKeep an eye on indexed pages (weekly) and look for spikes in GoogleIndexing, (Yahoo and Bing)Look for site:example.comduplicatesUse Xenu link checkerHeed any Webmaster Tools warningsCheck your crawling and cache dates Frequent update but stale cache dates = dupe content issues 11
  12. 12. How to address on-site and off-site duplicate contentYou have a whole armoury of potential tools including;Robots.txt exclusionRobots meta tagCanonical tagWebmaster URL exclusionPassword protection(301 redirects)(File a DMCA against serial content thieves?)Lot of well-meaning people give bad advice though 12
  13. 13. Google Engineers Can’t Agree
  14. 14. Adam Lasnik – “Deftly Dealing withDuplicate Content” 2006 Probably the authoritative guide to duplicate content; • What is duplicate content? • What isnt duplicate content? • Why does Google care about duplicate content? • What does Google do about it? • How can Webmasters proactively address duplicate content issues? `
  15. 15. Deftly Dealing with... - Our advice/experienceRobots.txtRoutinely ignored by Google, probably because of malwareUser-agent: *Allow: /the-good-stuff/Disallow: /the-malware/Robots.txt is ignored unless combined with emergency WebmasterTools URL removal (3 months) 15
  16. 16. Our advice/experienceCanonical tagWorks great for cross-domain duplicate contentLargely ineffective for pagination eg shopping sitesTotally ineffective unless canonical URLs are VERY similar if not identical 16
  17. 17. Our advice/experienceRobots Meta TagNoindex,Follow - 100% obeyed by Google and passes Page Rank tooVery effective for pagination eg shopping sitesWorks well for tracking links too (www.example.com/?affid=123456)Doesn’t work when used with blocking robots.txt 17
  18. 18. Our advice/experiencePassword Protect/htaccess 403 ForbiddenWorks great for staging sitesStubs - Problem in that it generates Webmaster Tools errorsOur feeling best to avoid on your main domain 18
  19. 19. Extreme Techniques to Avoid Dupe ContentMake all your backend .exewith htaccess
  20. 20. Summary Duplicate content is a minefield! Filters usually apply, penalties are very rare You have the answer in your own hands Stay on top of your site’s health – especially internal duplicate content
  21. 21. Thank you for your attention!Thanks to:Anton GroeneveldtCarla dos Santos

×