Your SlideShare is downloading. ×

Duplicate Content Issues


Published on

Duplicate Content Issues, diagnosis & causes by Kristjan Mar Hauksson Nordic eMarketing

Duplicate Content Issues, diagnosis & causes by Kristjan Mar Hauksson Nordic eMarketing

Published in: Business

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • DC issues - Diagnosis & causes
  • For example, a blog entry may appear on the home page of a blog, in an archive page, and in a page of other entries with the same label.
  • Not forgettingprintfriendlypages,tracking and sortingURL parameters, www and not www etc…
  • Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools."Boiler plate" originally referred to the maker's label used to identify the builder of steam boilers. This link shows an example of a boiler plate.In the field of printing, the term dates back to the early 1900s. From the 1890s onwards, printing plates of text for widespread reproduction such as advertisements or syndicated columns were cast or stamped in steel (instead of the much softer and less durable lead alloys used otherwise) ready for the printing press and distributed to newspapers around the United States. They came to be known as 'boilerplates'. Until the 1950s, thousands of newspapers received and used this kind of boilerplate from the nation's largest supplier, the Western Newspaper Union.[citation needed]Some companies also sent out press releases as boilerplate so that they had to be printed as written. The modern equivalent is the press release boilerplate, or "boiler," a paragraph or two that describes the company and its products.
  • Interestingreads:
  • Transcript

    • 1. Double troubleDC issues - Diagnosis & causesKristjan Mar HaukssonNordic eMarketingDirector Internet Marketing@optimizeyourwebLondon| 18–21 February
    • 2. London| 18–21 February 2013 | #SESLON“- They ALL have some degree of Duplicate contentproblems – Every single site I have ever analyzeddoes!”Mikkel DeMib @optimizeyourweb
    • 3. London| 18–21 February 2013 | #SESLON“Duplicate content is in most cases due to the wayCMS’ are set up …..or we might have a team of lazycontent writers on our hands.” @optimizeyourweb
    • 4. London| 18–21 February 2013 | #SESLON“Understand your content management system:Make sure youre familiar with how content isdisplayed on your website. Blogs, forums, andrelated systems often show the same content inmultiple formats.” @optimizeyourweb
    • 5. London| 18–21 February 2013 | #SESLONDiagnosis & Causes @optimizeyourweb
    • 6. London| 18–21 February 2013 | #SESLONCouple of easy to use “tools”•• Xenu• Zoom Search Engine• Google (Search, Webmaster Tools, etc..)• Manual testing• Screaming FrogMore on: @optimizeyourweb
    • 7. London| 18–21 February 2013 | #SESLONUsing the site: command•• This should show you how Google crawls your site and what it finds• Does this site have 46,800 products and categories? @optimizeyourweb
    • 8. London| 18–21 February 2013 | #SESLONAnother simple way to identify DC is to search• Look at the content you have on your site, take something like a news headline and Google it• This will in most cases show you how Google is crawling your site and what it finds @optimizeyourweb
    • 9. London| 18–21 February 2013 | #SESLON Sample content leak .dk .se .no .fr
    • 10. London| 18–21 February 2013 | #SESLON
    • 11. London| 18–21 February 2013 | #SESLON @optimizeyourweb
    • 12. London| 18–21 February 2013 | #SESLONUsing Xenu• If the site allows being crawled you can use Xenu to crawl it and then look at the information that comes out of it• Arrange it and behold …. @optimizeyourweb
    • 13. London| 18–21 February 2013 | #SESLONUsing Copyscape• Copyscape was originally created to find “stolen” copy but works great when it comes to DC
    • 14. London| 18–21 February 2013 | #SESLONContent ownership• Websites are often developed on a DEV url, which is in many cases open, but only used for collaboration between developers and site owners, then somebody uses Google mail to share it or it is sniffed by a subdomain finder. Then content ownership can be an issue… for a long time. @optimizeyourweb
    • 15. London| 18–21 February 2013 | #SESLONImage Plagiarism• A search for ‚ritzy bryan‘ gives 895.000 results• When you click images... 5 of top 9 top are the photographers• But the top two are not on his website• Click on the image• Click ‚Image details‘ and you get lots of similar images• Scroll down and you get lots of plagiarizing websites
    • 16. London| 18–21 February 2013 | #SESLONDiagnosis & Causes @optimizeyourweb
    • 17. London| 18–21 February 2013 | #SESLONFrequent causes when starting a new site• Firstly make sure that your dev.server is under lock and key – Close it when you are done• If you are using something like a news or a product module over multiple sites, make sure that the ownership is clear• Not all of our content creates duplicate content on your site – Scrapers can give you hell! • Report plagiarism to Google as soon as you find it – take ownership. @optimizeyourweb
    • 18. London| 18–21 February 2013 | #SESLON301, 404 – Default or not default and ….• 404s that are not 404s – Things can go a bit crazy if not inserted properly on large commerce sites as an example• WWW, Non-WWW & Default pages• Query strings and session IDs• Template content• Boilerplate repetition, publishing stubs & similar content• User generated duplicate (replica) content @optimizeyourweb
    • 19. London| 18–21 February 2013 | #SESLONThe mother of all checklists ;-)• Take everything that is a likely cause and create a checklist and go through these items one by one and make sure they are in order• This is all common sense stuff and there is so much information online. You should not have to do the same mistakes as those before you….• Know your CMS before you start implementing it! @optimizeyourweb
    • 20. London| 18–21 February 2013 | #SESLONThank you @optimizeyourweb