Your SlideShare is downloading. ×
Duplicate Content SES NY 2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Duplicate Content SES NY 2009

599
views

Published on

Sasi Parthasarathy, Program Manager for Live Search at Microsoft talks about duplicate content & multiple site issues at SES NY.

Sasi Parthasarathy, Program Manager for Live Search at Microsoft talks about duplicate content & multiple site issues at SES NY.

Published in: Technology, Design

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
599
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Duplicate Content & Multiple Site Issues Sasi Parthasarathy Program Manager, Microsoft
  • 2. Topics covered • Duplicate content – Internal content -> URL Canonicalization – External content -> Spam, Geo-targeting • Content Syndication • Good practices • Examples Examples Examples
  • 3. URL canonicalization • Less is more - expose only one URL per piece of content – pretty please • The practice of consolidating all versions of a page under one URL is referred to as quot;canonicalizationquot; • Helps the search engine; at the same time does not split your rank juice • Having too many duplicate URLs will waste crawl time – the crawler might spend time indexing duplicate URLs and miss good content • 4 ways to get to microsoft.com but we need only one 1. microsoft.com 2. www.microsoft.com 3. www.microsoft.com/en/us/default.aspx 4. www.microsoft.com/en/us/
  • 4. Few recommendations for canonicalization • Select WWW or Non-WWW, then redirect the other option to your preferred version • Remove the default filename from the end of your URLs – All web servers allow you to select one or more default filenames to serve when the browser requests a directory. Check and see if the default filename is at the end of the URL and then trim it off • Link internally to the canonical form of your URL – Make sure you always link to the proper canonical form of your URLs from within your site • Remove query string variables or rewrite to readable URLs – http://www.mysite.com/downloads/details.aspx?FamilyID=ab99&displaylang=en to http://www.mysite.com/downloads/en/family/ab99
  • 5. Why duplicate content? • Your intention is the key • If your intent is to manipulate the search engine, you will be penalized Example1: Multiple domains with very little or no difference in content and no clear intent why these domains exist Example2: If you are trying to falsely promote original content as your own (please report any issues with copied content to Live Search support)
  • 6. Going International – Help Search Engines You may have similar pages but for various regions. Problems for search engines with geo-targeting: • No standardized way to tell a search engine which region or language your content is targeted for • Top level domains may not indicate the intended audience. For example, http://ma.tt/, an English site or Orange.com, a French Telecom site hosted in France. • Using search unfriendly redirection techniques
  • 7. Few indicators - Help Live Search while Geo- targeting • Country code top-level domain (ccTLD). For example, .ca specifically targets users in Canada • Set all your domains in Live Search webmaster tools and make it explicit for the region These indicators will help us show the correct page for the correct market
  • 8. Content Syndication • Syndicate with caution: For sites that syndicate their content on other sites • From our perspective, we always want to show the version we think is appropriate to the user. This may not be the version you want or prefer. • Tip: Ask your partner to use robots.txt to stop us from indexing the syndicated material
  • 9. General tips to help the Search Engine • Dynamic URLs – if the content is not changing, don’t have too many parameters • 301 is your best friend – use them when you can • No 302 hijack!! • When you do a site update, don’t have links to expired pages • Use robots.txt for anything you don’t want crawlers to crawl • Consistent naming convention – easy for search engines to understand • Follow standard URL formation practices