SlideShare a Scribd company logo
1 of 18
Dealing with Duplicate Content
Ehren Reilly
May 8, 2012
Why focus on Answers?
• What is duplicate content? Why is it bad?
• Examples of duplicate content on our site and other
notable sites
• Techniques
– robots.txt disallow
– Meta robots tag
– 301 redirect dupe URL to primary URL
– Canonical URL tag
– Prevent duplicate page from being created in the first place
• Related topics
– rel=“alternate” for language/regional support
– Expired and no-longer-relevant content, and strategic content
deletion
• Suggested reading
Agenda
Why focus on Answers?
• What is duplicate content & why is it bad?
• Examples of duplicate content on our site and other
notable sites
• Techniques
– robots.txt disallow
– Meta robots tag
– 301 redirect dupe URL to primary URL
– Canonical URL tag
– Prevent duplicate page from being created in the first place
• Related topics
– rel=“alternate” for language/regional support
– Expired and no-longer-relevant content, and strategic content
deletion
• Suggested reading
Why focus on Answers?
• The same content often appears at more than one URL
• Intentional duplication
– Quotation
– Re-use
• E.g. Ask.com Wikipedia
– Content syndication
• Inadvertent duplication
– Separate mobile-optimized or printer-optimized version of
page.
– Separate regional design or branding
• E.g., uk.ask.com/wiki/Rihanna vs www.ask.com/wiki/Rihanna
– Dynamic content where different queries return same results
• E.g., ask.com/questions-about/iPhones vs ask.com/questions-
about/iphone vs ask.com/questions-about/iPhone
– Extra junk in URL that does not substantively change content.
• E.g., www.ask.com/wiki/Symbolics?qsrc=3044 vs
www.ask.com/wiki/Symbolics
– Pagination and filtering
• E.g.,
• Manipulative duplication
– Scraper sites
– Blatant copyright infringement / plagiarism
– SEO spam
What is duplicate content?
Why focus on Answers?
• Fragmentation of link equity, authority & anchor text
– If there are 100 links to “iphone” and 50 links to “iPhones”:
• Do I treat this as a single page with 150 links?
• Do I treat both pages as separate and important? If so, “iphone” is 100
links worth of importance, and “iPhones” is 50 links worth of
importance.
• Lower confidence in single, definitive source
– If there are many versions, which version is the definitive one?
– Which URL has the most relevant/reliable copy of this for a
given search query?
– “I know ask.com is a good source for X, but I can’t figure out
which of these URLs is ask.com’ definitive page on X.”
• Penalties for manipulative and non-user-friendly
duplication
– Posting exact same content on multiple different sites
– Panda penalty for “thin content”
Why is duplicate content a
problem?
Why focus on Answers?
• Case-insensitive URLs
– /q/ vs /Q/
• www.ask.com/q/What-Causes-
Sepsis
• www.ask.com/Q/What-Causes-
Sepsis
– Questions About page paths
• ask.com/questions-about/t-rex
• ask.com/questions-about/T-Rex
• Duplicate questions in Ask.com
Community (e.g.)
• US vs UK Ask.com wiki
– uk.ask.com/wiki/Rihanna vs
www.ask.com/wiki/Rihanna
• Accidentally indexable weird
subdomains
– replyask.lc.iad.www.ask.com
Examples from Ask.com
Why focus on Answers?Examples from Other Sites
Same people’s bios used verbatim for two different brands’ websites.
http://www.google.com/search?q=pangea+media+snapapp+management+team
http://www.google.com/search?q=snapapp+management+team
Google will never show you both versions for a single search query.
Why focus on Answers?
• Facebook has massive duplicate content issues,
and as a result deep pages do not rank well in
Google search results.
– Five different versions of NYC Ballet’s “Videos” page.
– None of them is on the first page in Google for New
York City Ballet Videos.
• Instead, main facebook.com/nycballet page is #8 in Google.
• In this heavily re-blogged post from Google’s blog
– Many people quoted this passage.
– The original source shows up first in the SERP
Examples from Other Sites
Why focus on Answers?
• Many ways duplicate content can arise and many
techniques to manage it.
• Different techniques are better suited to different
situations.
• Things to consider about each method:
– Prevents penalties?
– Allows for alternate styling?
– Speed/effectiveness?
– Propagates link equity to all outbound links?
– Consolidates link equity from all inbound links
Techniques for Managing
Duplicate Content
Why focus on Answers?
• What it is: File on site that tells bots how to crawl
various sections of your site. Specific to each
subdomain.
• Message to bots: “Don’t crawl this content, don’t
put it in your index, and disregard any links that
point here. Go away.”
• What it’s good for:
– Sections of the site that have no SEO value.
– Secret stuff that you don’t want getting crawled.
• What it’s bad for:
– Inelegant, brute-force way of dealing with duplicate
content.
Robots.txt “Disallow”
Why focus on Answers?
• What it is: Meta tag on individual page, which is like a more targeted
version of robots.txt.
• Message to bots: Has two separate parameters.
– index/noindex: Should this page be crawled & indexed by the bot?
– follow/nofollow: Should links out from these page be allowed to propagate
link equity?
☞ Usually, if you’re trying to block a page from the index, but it has links to other
indexed pages, you want <meta name=“robots”
content=“noindex,follow”>
• What it’s good for:
– More targeted version of robots.txt
– Allows you to block from index but still propagate link equity.
– Great for deep pages of paginated/listed content.
• What it’s bad for:
– Alternate versions of content that users might actually want to find from
search.
• Suggested Use: eHow Content Pages that they won’t let us use for SEO.
Meta Robots
Why focus on Answers?301 Redirect
• What it is: Permanent redirection of duplicate/old URL to
primary/new URL.
• Message to bots: “Don’t go to that old URL, go to this new
one. Remove the old one from the index, and forward all
link equity to the new one.”
• What it’s good for:
– Consolidating content that exists in unnecessary variations.
– Preserving the value of links, no matter which URL they link to.
• What it’s bad for:
– Not possible to maintain alternate versions of content, since
both users and bots are redirected to a different URL.
• Suggested Use: Content deleted from Community because
it is redundant/duplicate.
Why focus on Answers?Canonical URL Tag
• What it is: Meta tag that tells bots which instance of the page to index. If there are
multiple instances of the same, they are consolidated together into a single URL.
– E.g., <link rel=“canonical” href=“http://www.ask.com/questions-about/T-Rex”/>
• Message to bots: “For purposes of search listings, this content belongs to such-and-
such URL”.
• What it’s good for:
– Consolidating link equity among various versions of the same content.
– Allows you to maintain different versions without incurring a penalty or forfeiting any link equity.
– Prevents accidental indexing of trivially/accidentally different URLs.
• What it’s bad for:
– Slow to work.
– Officially just a “suggestion”, not a “rule”.
– Not 100% effective at keeping pages out of index.
• Suggested Use: Any page that gets listed in search engines. Especially:
– Pages with lots of meaningless URL parameters
– Pages with case-insensitive URLs (choose a single, canonical capitalization format)
– Pages that can be accessed on multiple weird subdomains
• Warning: Do not just dynamically put the page URL here.
– Make sure it is actually a canonical version of the URL.
– Should only not vary based on capitalization.
– Should not dynamically insert domain. Should specify the correct domain for search index.
Why focus on Answers?Prevention
• Nothing else is more effective than abstinence.
• Foresee potential duplicate content issues, and
build technologies to prevent them.
– For user-generated pages, suggest an already-created
page rather than create a new one with the same
topic. (E.g., Quora)
– For automatically-created pages, do programmatic
de-duplication before pages are created.
• Does this query return all the same content as some other
query?
• Don’t include words/characters in URL that don’t affect the
query results.
Why focus on Answers?Technique Comparison Chart
Prevents
penalties
Allows for
alternate
styling
Fast &
effective
removal
from index
Propagates
link equity to
all outbound
links
Consolidates
link equity
from all
inbound links
Robots.txt
✔ ✔ ✔ ✗ ✗
Meta Robots
“noindex,nofollow” ✔ ✔ ✔ ✗ ✗
Meta Robots
“noindex,follow”
✔ ✔ ✔ ✔ ✗
301 Redirect
✔ ✗ ✔ ✔ ✔
Canonical
URL Tag ✔ ✔ ✗ ✔ ✔
Prevention
✔ ✗ ✔ ✔ ✔
Why focus on Answers?
Related:
International versions with rel=“alternate”
• Used in combination with rel=“canonical”
• Tells Google if and when there are
country/language specific versions of the page.
• Different versions share link equity and other
ranking signals.
• Google SERP links to the appropriate country-
specific version for each user.
Why focus on Answers?
Related:
Deleted and Expired Content
• Sometimes content gets intentionally deleted.
– Community Terms violation.
– Legal/copyright issues.
– Terminated partnerships.
– Expired or no longer valuable.
• User experience options for deleted/empty pages
a) 301 redirect to another relevant page
b) Replace with “content deleted” message and links to other
relevant pages.
c) Generic error message.
• HTTP/robots treatment of deleted/empty pages
a) 301
b) 404
c) 200 with meta robots “noindex,follow”
d) 200 that can be indexed  Duplicate content
Why focus on Answers?Suggested Reading
• http://www.seomoz.org/learn-seo/duplicate-
content
• http://searchengineland.com/8-canonicalization-
best-practices-in-plain-english-44475
• http://support.google.com/webmasters/bin/ans
wer.py?hl=en&answer=139394
• http://www.seomoz.org/blog/canonical-url-tag-
the-most-important-advancement-in-seo-
practices-since-sitemaps

More Related Content

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Dealing with duplicate content

  • 1. Dealing with Duplicate Content Ehren Reilly May 8, 2012
  • 2. Why focus on Answers? • What is duplicate content? Why is it bad? • Examples of duplicate content on our site and other notable sites • Techniques – robots.txt disallow – Meta robots tag – 301 redirect dupe URL to primary URL – Canonical URL tag – Prevent duplicate page from being created in the first place • Related topics – rel=“alternate” for language/regional support – Expired and no-longer-relevant content, and strategic content deletion • Suggested reading Agenda
  • 3. Why focus on Answers? • What is duplicate content & why is it bad? • Examples of duplicate content on our site and other notable sites • Techniques – robots.txt disallow – Meta robots tag – 301 redirect dupe URL to primary URL – Canonical URL tag – Prevent duplicate page from being created in the first place • Related topics – rel=“alternate” for language/regional support – Expired and no-longer-relevant content, and strategic content deletion • Suggested reading
  • 4. Why focus on Answers? • The same content often appears at more than one URL • Intentional duplication – Quotation – Re-use • E.g. Ask.com Wikipedia – Content syndication • Inadvertent duplication – Separate mobile-optimized or printer-optimized version of page. – Separate regional design or branding • E.g., uk.ask.com/wiki/Rihanna vs www.ask.com/wiki/Rihanna – Dynamic content where different queries return same results • E.g., ask.com/questions-about/iPhones vs ask.com/questions- about/iphone vs ask.com/questions-about/iPhone – Extra junk in URL that does not substantively change content. • E.g., www.ask.com/wiki/Symbolics?qsrc=3044 vs www.ask.com/wiki/Symbolics – Pagination and filtering • E.g., • Manipulative duplication – Scraper sites – Blatant copyright infringement / plagiarism – SEO spam What is duplicate content?
  • 5. Why focus on Answers? • Fragmentation of link equity, authority & anchor text – If there are 100 links to “iphone” and 50 links to “iPhones”: • Do I treat this as a single page with 150 links? • Do I treat both pages as separate and important? If so, “iphone” is 100 links worth of importance, and “iPhones” is 50 links worth of importance. • Lower confidence in single, definitive source – If there are many versions, which version is the definitive one? – Which URL has the most relevant/reliable copy of this for a given search query? – “I know ask.com is a good source for X, but I can’t figure out which of these URLs is ask.com’ definitive page on X.” • Penalties for manipulative and non-user-friendly duplication – Posting exact same content on multiple different sites – Panda penalty for “thin content” Why is duplicate content a problem?
  • 6. Why focus on Answers? • Case-insensitive URLs – /q/ vs /Q/ • www.ask.com/q/What-Causes- Sepsis • www.ask.com/Q/What-Causes- Sepsis – Questions About page paths • ask.com/questions-about/t-rex • ask.com/questions-about/T-Rex • Duplicate questions in Ask.com Community (e.g.) • US vs UK Ask.com wiki – uk.ask.com/wiki/Rihanna vs www.ask.com/wiki/Rihanna • Accidentally indexable weird subdomains – replyask.lc.iad.www.ask.com Examples from Ask.com
  • 7. Why focus on Answers?Examples from Other Sites Same people’s bios used verbatim for two different brands’ websites. http://www.google.com/search?q=pangea+media+snapapp+management+team http://www.google.com/search?q=snapapp+management+team Google will never show you both versions for a single search query.
  • 8. Why focus on Answers? • Facebook has massive duplicate content issues, and as a result deep pages do not rank well in Google search results. – Five different versions of NYC Ballet’s “Videos” page. – None of them is on the first page in Google for New York City Ballet Videos. • Instead, main facebook.com/nycballet page is #8 in Google. • In this heavily re-blogged post from Google’s blog – Many people quoted this passage. – The original source shows up first in the SERP Examples from Other Sites
  • 9. Why focus on Answers? • Many ways duplicate content can arise and many techniques to manage it. • Different techniques are better suited to different situations. • Things to consider about each method: – Prevents penalties? – Allows for alternate styling? – Speed/effectiveness? – Propagates link equity to all outbound links? – Consolidates link equity from all inbound links Techniques for Managing Duplicate Content
  • 10. Why focus on Answers? • What it is: File on site that tells bots how to crawl various sections of your site. Specific to each subdomain. • Message to bots: “Don’t crawl this content, don’t put it in your index, and disregard any links that point here. Go away.” • What it’s good for: – Sections of the site that have no SEO value. – Secret stuff that you don’t want getting crawled. • What it’s bad for: – Inelegant, brute-force way of dealing with duplicate content. Robots.txt “Disallow”
  • 11. Why focus on Answers? • What it is: Meta tag on individual page, which is like a more targeted version of robots.txt. • Message to bots: Has two separate parameters. – index/noindex: Should this page be crawled & indexed by the bot? – follow/nofollow: Should links out from these page be allowed to propagate link equity? ☞ Usually, if you’re trying to block a page from the index, but it has links to other indexed pages, you want <meta name=“robots” content=“noindex,follow”> • What it’s good for: – More targeted version of robots.txt – Allows you to block from index but still propagate link equity. – Great for deep pages of paginated/listed content. • What it’s bad for: – Alternate versions of content that users might actually want to find from search. • Suggested Use: eHow Content Pages that they won’t let us use for SEO. Meta Robots
  • 12. Why focus on Answers?301 Redirect • What it is: Permanent redirection of duplicate/old URL to primary/new URL. • Message to bots: “Don’t go to that old URL, go to this new one. Remove the old one from the index, and forward all link equity to the new one.” • What it’s good for: – Consolidating content that exists in unnecessary variations. – Preserving the value of links, no matter which URL they link to. • What it’s bad for: – Not possible to maintain alternate versions of content, since both users and bots are redirected to a different URL. • Suggested Use: Content deleted from Community because it is redundant/duplicate.
  • 13. Why focus on Answers?Canonical URL Tag • What it is: Meta tag that tells bots which instance of the page to index. If there are multiple instances of the same, they are consolidated together into a single URL. – E.g., <link rel=“canonical” href=“http://www.ask.com/questions-about/T-Rex”/> • Message to bots: “For purposes of search listings, this content belongs to such-and- such URL”. • What it’s good for: – Consolidating link equity among various versions of the same content. – Allows you to maintain different versions without incurring a penalty or forfeiting any link equity. – Prevents accidental indexing of trivially/accidentally different URLs. • What it’s bad for: – Slow to work. – Officially just a “suggestion”, not a “rule”. – Not 100% effective at keeping pages out of index. • Suggested Use: Any page that gets listed in search engines. Especially: – Pages with lots of meaningless URL parameters – Pages with case-insensitive URLs (choose a single, canonical capitalization format) – Pages that can be accessed on multiple weird subdomains • Warning: Do not just dynamically put the page URL here. – Make sure it is actually a canonical version of the URL. – Should only not vary based on capitalization. – Should not dynamically insert domain. Should specify the correct domain for search index.
  • 14. Why focus on Answers?Prevention • Nothing else is more effective than abstinence. • Foresee potential duplicate content issues, and build technologies to prevent them. – For user-generated pages, suggest an already-created page rather than create a new one with the same topic. (E.g., Quora) – For automatically-created pages, do programmatic de-duplication before pages are created. • Does this query return all the same content as some other query? • Don’t include words/characters in URL that don’t affect the query results.
  • 15. Why focus on Answers?Technique Comparison Chart Prevents penalties Allows for alternate styling Fast & effective removal from index Propagates link equity to all outbound links Consolidates link equity from all inbound links Robots.txt ✔ ✔ ✔ ✗ ✗ Meta Robots “noindex,nofollow” ✔ ✔ ✔ ✗ ✗ Meta Robots “noindex,follow” ✔ ✔ ✔ ✔ ✗ 301 Redirect ✔ ✗ ✔ ✔ ✔ Canonical URL Tag ✔ ✔ ✗ ✔ ✔ Prevention ✔ ✗ ✔ ✔ ✔
  • 16. Why focus on Answers? Related: International versions with rel=“alternate” • Used in combination with rel=“canonical” • Tells Google if and when there are country/language specific versions of the page. • Different versions share link equity and other ranking signals. • Google SERP links to the appropriate country- specific version for each user.
  • 17. Why focus on Answers? Related: Deleted and Expired Content • Sometimes content gets intentionally deleted. – Community Terms violation. – Legal/copyright issues. – Terminated partnerships. – Expired or no longer valuable. • User experience options for deleted/empty pages a) 301 redirect to another relevant page b) Replace with “content deleted” message and links to other relevant pages. c) Generic error message. • HTTP/robots treatment of deleted/empty pages a) 301 b) 404 c) 200 with meta robots “noindex,follow” d) 200 that can be indexed  Duplicate content
  • 18. Why focus on Answers?Suggested Reading • http://www.seomoz.org/learn-seo/duplicate- content • http://searchengineland.com/8-canonicalization- best-practices-in-plain-english-44475 • http://support.google.com/webmasters/bin/ans wer.py?hl=en&answer=139394 • http://www.seomoz.org/blog/canonical-url-tag- the-most-important-advancement-in-seo- practices-since-sitemaps