SlideShare a Scribd company logo
Duplicate Content Filters, Penalties and other
             Content Minefields

              27th March 2012
Search Quality – the Duplicate Content Headache

Google can’t afford a SERPs of;



4)Search engine optimization
           Search engine optimization (SEO) is the process of     improving the
visibility of a website or a web page in search engines........
 2) Search engine optimization
           Search engine optimization (SEO) is the process of     improving the
visibility of a website or a web page in search engines........
3) Search engine optimization
           Search engine optimization (SEO) is the process of     improving the
visibility of a website or a web page in search engines........
4) Search engine optimization
           Search engine optimization (SEO) is the process of     improving the
visibility of a website or a web page in search engines........
                                                                                  2
Resource – the Duplicate Content Headache
Duplicate content has consequences for SE in;

Wastes Crawler resources - finite number of crawlers

Wastes Bandwidth – how often can you crawl 1 trillion documents and
keep your index fresh?

Increases Query CPU time – how do you search 1 trillion documents as
quickly as possible?




                                                                       3
Document importance – Duplicate Content Headache
 Duplicate content can be a signal of an important document;

 • Song lyrics

 • Scholarly texts and historical documents, eg the Bible (1,000 pages)

 • The Linux manual (2,000 pages)

 • Breaking News – Associated Press, Reuters

 etc.




                                                                          4
Types of Duplicate Content
Duplicate content comes in many forms



Intentional vs non intentional

On-site vs off-site




                                               5
On-Site Duplicate Content (Impacts Quality Score)
Intentional
•    Printer friendly pages
•Different font sizes
•PDF documents
•Archive (non graphics versions)
•Shopping filters (sort by and pagination)
•RSS feeds

Non-intentional
• Affiliate URLs - www.example.com/?btag=123
• Adwords Campaigns - www.example.com/?utc=google
•Search results
•www vs non www URLs
•https vs http
•Stubs/plugins

                                                    6
On-Site Duplicate Content (Impacts Quality Score)
10’000s of stub pages worst case scenario example;




  This was 2 weeks after Andy had removed the duplicate links from the search pages on our advice eg;
  http://www.motors.co.uk/Ford-Escort-0-9999999---2
  http://www.motors.co.uk/Ford-Escort-0-9999999--U-2-
  http://www.motors.co.uk/Ford-Escort-0-9999999---2%20-

                                                                                                        7
Off-Site Duplicate Content (Filters and Penalties)
Intentional vs non-intentional somewhat grey

Domain branding eg .com, .co.za
(Mobile website)
Content syndication
Content theft
Staging websites a common problem!!



Quality signals are often used to filter off-site Duplicates!!!




                                                                  8
How Does Google Filter Off-site Duplicate Content
Authors feel they have a right to rank for their own content –
Google’s Loyalty is to its users!!!

Google doesn’t necessarily reward a source or original but assesses;

• Relevance (eg is an article in context)
• Domain authority & links (eg Google Knol, Facebook)
• Fresh content boost

• Site quality signals (eg internal duplicate content!!!)




                                                                       9
Examples of Off-site Duplicate Content and Quality
Client with .com.au and a .com with https duplicates

Casino Client with a
lot of stub pages
(pre Panda)

Casino site
– severe health issues;




                                                       10
How to Diagnose (on-site) Duplicate Content
Link building will exacerbate duplicate content indexing

Keep an eye on indexed pages (weekly) and look for spikes in Google
Indexing, (Yahoo and Bing)

Look for site:example.com
duplicates

Use Xenu link checker

Heed any Webmaster Tools warnings

Check your crawling and cache dates
        Frequent update but stale cache dates = dupe content issues

                                                                      11
How to address on-site and off-site duplicate content
You have a whole armoury of potential tools including;

Robots.txt exclusion
Robots meta tag
Canonical tag
Webmaster URL exclusion
Password protection
(301 redirects)

(File a DMCA against serial content thieves?)

Lot of well-meaning people give bad advice though




                                                         12
Google Engineers Can’t Agree
Adam Lasnik – “Deftly Dealing with
Duplicate Content” 2006

  Probably the authoritative guide to duplicate content;

  • What is duplicate content?

  • What isn't duplicate content?

  • Why does Google care about duplicate content?

  • What does Google do about it?

  • How can Webmasters proactively address duplicate content
  issues?


  `
Deftly Dealing with... - Our advice/experience

Robots.txt

Routinely ignored by Google, probably because of malware

User-agent: *

Allow: /the-good-stuff/
Disallow: /the-malware/

Robots.txt is ignored unless combined with emergency Webmaster
Tools URL removal (3 months)




                                                                 15
Our advice/experience

Canonical tag

Works great for cross-domain duplicate content

Largely ineffective for pagination eg shopping sites

Totally ineffective unless canonical URLs are VERY similar if not identical




                                                                              16
Our advice/experience

Robots Meta Tag

Noindex,Follow - 100% obeyed by Google and passes Page Rank too

Very effective for pagination eg shopping sites

Works well for tracking links too (www.example.com/?affid=123456)

Doesn’t work when used with blocking robots.txt




                                                                    17
Our advice/experience

Password Protect/htaccess 403 Forbidden

Works great for staging sites

Stubs - Problem in that it generates Webmaster Tools errors

Our feeling best to avoid on your main domain




                                                              18
Extreme Techniques to Avoid Dupe Content
Make all your backend .exe
with htaccess
Summary

 Duplicate content is a minefield!

 Filters usually apply, penalties are very rare

 You have the answer in your own hands

 Stay on top of your site’s health – especially internal duplicate content
Thank you for your attention!

Thanks to:
Anton Groeneveldt
Carla dos Santos

More Related Content

What's hot

Optimizing Your Web Site for Discovery: A Workshop
Optimizing Your Web Site for Discovery: A WorkshopOptimizing Your Web Site for Discovery: A Workshop
Optimizing Your Web Site for Discovery: A Workshop
OReillyTOC
 
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick StoxA Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
patrickstox
 
New Search Strategies
New Search StrategiesNew Search Strategies
New Search Strategies
notess
 
Google algorithim’s
Google  algorithim’sGoogle  algorithim’s
Google algorithim’s
Veom Infotech LLC
 
Data Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will CritchlowData Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will Critchlowauexpo Conference
 
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS MeetupReact JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
patrickstox
 
How Google Search Algorithm Works ??
How Google Search Algorithm Works ??How Google Search Algorithm Works ??
How Google Search Algorithm Works ??
viralshahb
 
SEO Fundamentals
SEO FundamentalsSEO Fundamentals
SEO Fundamentals
Gaurav Kakade
 
BlueGlassX - Big Site SEO Triage by Dr. Pete Meyers
BlueGlassX - Big Site SEO Triage by Dr. Pete MeyersBlueGlassX - Big Site SEO Triage by Dr. Pete Meyers
BlueGlassX - Big Site SEO Triage by Dr. Pete MeyersBlueGlass Interactive, Inc.
 
How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015
Yannis Karagiannidis
 
Digifoot 2012 ppt
Digifoot 2012 pptDigifoot 2012 ppt
Digifoot 2012 ppttpoelzer
 
Comparing Search Engines
Comparing Search EnginesComparing Search Engines
Comparing Search Engines
Melissa Brisbin
 
Internet research skills
Internet research skillsInternet research skills
Internet research skillssouth learning
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowEric Sieverts
 
Internet search skills
Internet search skillsInternet search skills
Internet search skillsSarahTS
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
Lorri Mon
 
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick StoxSMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
patrickstox
 
Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014
Thomas Whittam
 

What's hot (20)

Optimizing Your Web Site for Discovery: A Workshop
Optimizing Your Web Site for Discovery: A WorkshopOptimizing Your Web Site for Discovery: A Workshop
Optimizing Your Web Site for Discovery: A Workshop
 
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick StoxA Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
 
New Search Strategies
New Search StrategiesNew Search Strategies
New Search Strategies
 
Google algorithim’s
Google  algorithim’sGoogle  algorithim’s
Google algorithim’s
 
Data Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will CritchlowData Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will Critchlow
 
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS MeetupReact JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
React JS and Search Engines - Patrick Stox at Triangle ReactJS Meetup
 
Google
GoogleGoogle
Google
 
How Google Search Algorithm Works ??
How Google Search Algorithm Works ??How Google Search Algorithm Works ??
How Google Search Algorithm Works ??
 
SEO Fundamentals
SEO FundamentalsSEO Fundamentals
SEO Fundamentals
 
Deep web
Deep webDeep web
Deep web
 
BlueGlassX - Big Site SEO Triage by Dr. Pete Meyers
BlueGlassX - Big Site SEO Triage by Dr. Pete MeyersBlueGlassX - Big Site SEO Triage by Dr. Pete Meyers
BlueGlassX - Big Site SEO Triage by Dr. Pete Meyers
 
How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015How to disrupt established markets with SEO in 2015 - LOGIN 2015
How to disrupt established markets with SEO in 2015 - LOGIN 2015
 
Digifoot 2012 ppt
Digifoot 2012 pptDigifoot 2012 ppt
Digifoot 2012 ppt
 
Comparing Search Engines
Comparing Search EnginesComparing Search Engines
Comparing Search Engines
 
Internet research skills
Internet research skillsInternet research skills
Internet research skills
 
Searching the internet - what patent searchers should know
Searching the internet - what patent searchers should knowSearching the internet - what patent searchers should know
Searching the internet - what patent searchers should know
 
Internet search skills
Internet search skillsInternet search skills
Internet search skills
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick StoxSMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
 
Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014Proactive Measures for Good Site Health - Brighton SEO 2014
Proactive Measures for Good Site Health - Brighton SEO 2014
 

Similar to Duplicate content presentation March 2012

Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012 Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012
Jessica Dunbar
 
Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Search Engine Optimization (SEO)
Search Engine Optimization (SEO)
Christopher Mbinda
 
CATOLICO LUCHADOR - Tutorial: Google for Webmasters
CATOLICO LUCHADOR - Tutorial: Google for WebmastersCATOLICO LUCHADOR - Tutorial: Google for Webmasters
CATOLICO LUCHADOR - Tutorial: Google for WebmastersPedro Briceño
 
getting_rid_of_duplicate_content_iss-priyank_garg.ppt
getting_rid_of_duplicate_content_iss-priyank_garg.pptgetting_rid_of_duplicate_content_iss-priyank_garg.ppt
getting_rid_of_duplicate_content_iss-priyank_garg.pptzachbrowne
 
ваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersкрылов сергей
 
ваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersкрылов сергей
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
mherbison
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2
Nate Plaunt
 
Google for webmasters
Google for webmastersGoogle for webmasters
Google for webmastersMK-D Activo
 
Tutorial Google For Webmasters
Tutorial Google For WebmastersTutorial Google For Webmasters
Tutorial Google For Webmastersmamos
 
Content Audit Webinar with Everett & URL Profiler
Content Audit Webinar with Everett & URL ProfilerContent Audit Webinar with Everett & URL Profiler
Content Audit Webinar with Everett & URL Profiler
GoInflow
 
SEO tips 2013
SEO tips 2013SEO tips 2013
SEO tips 2013
Krisztián Száraz
 
Chewy Trewella - Google Searchtips
Chewy Trewella - Google SearchtipsChewy Trewella - Google Searchtips
Chewy Trewella - Google Searchtipssounddelivery
 
Demand Quest SEO training session 2
Demand Quest SEO training session 2Demand Quest SEO training session 2
Demand Quest SEO training session 2
Nate Plaunt
 
SEO in Orbit - Duplicate Content by OnCrawl
SEO in Orbit - Duplicate Content by OnCrawlSEO in Orbit - Duplicate Content by OnCrawl
SEO in Orbit - Duplicate Content by OnCrawl
Alexis Sanders
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
Tommi Forsström
 
Search Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for DevelopersSearch Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for Developers
Matthew Robinson
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonWebsite Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Jacque Doring
 

Similar to Duplicate content presentation March 2012 (20)

Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012 Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012
 
Search Engine Optimization (SEO)
Search Engine Optimization (SEO)Search Engine Optimization (SEO)
Search Engine Optimization (SEO)
 
CATOLICO LUCHADOR - Tutorial: Google for Webmasters
CATOLICO LUCHADOR - Tutorial: Google for WebmastersCATOLICO LUCHADOR - Tutorial: Google for Webmasters
CATOLICO LUCHADOR - Tutorial: Google for Webmasters
 
getting_rid_of_duplicate_content_iss-priyank_garg.ppt
getting_rid_of_duplicate_content_iss-priyank_garg.pptgetting_rid_of_duplicate_content_iss-priyank_garg.ppt
getting_rid_of_duplicate_content_iss-priyank_garg.ppt
 
ваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmasters
 
ваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmastersваш сантехник в Питере - Tutorial: Google for Webmasters
ваш сантехник в Питере - Tutorial: Google for Webmasters
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2
 
Google for webmasters
Google for webmastersGoogle for webmasters
Google for webmasters
 
Tutorial Google For Webmasters
Tutorial Google For WebmastersTutorial Google For Webmasters
Tutorial Google For Webmasters
 
Content Audit Webinar with Everett & URL Profiler
Content Audit Webinar with Everett & URL ProfilerContent Audit Webinar with Everett & URL Profiler
Content Audit Webinar with Everett & URL Profiler
 
SEO tips 2013
SEO tips 2013SEO tips 2013
SEO tips 2013
 
Chewy Trewella - Google Searchtips
Chewy Trewella - Google SearchtipsChewy Trewella - Google Searchtips
Chewy Trewella - Google Searchtips
 
Demand Quest SEO training session 2
Demand Quest SEO training session 2Demand Quest SEO training session 2
Demand Quest SEO training session 2
 
SEO in Orbit - Duplicate Content by OnCrawl
SEO in Orbit - Duplicate Content by OnCrawlSEO in Orbit - Duplicate Content by OnCrawl
SEO in Orbit - Duplicate Content by OnCrawl
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Seo Made Easy
Seo Made EasySeo Made Easy
Seo Made Easy
 
Search Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for DevelopersSearch Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for Developers
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonWebsite Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
 
Seo
SeoSeo
Seo
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

Duplicate content presentation March 2012

  • 1. Duplicate Content Filters, Penalties and other Content Minefields 27th March 2012
  • 2. Search Quality – the Duplicate Content Headache Google can’t afford a SERPs of; 4)Search engine optimization Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........ 2) Search engine optimization Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........ 3) Search engine optimization Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........ 4) Search engine optimization Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines........ 2
  • 3. Resource – the Duplicate Content Headache Duplicate content has consequences for SE in; Wastes Crawler resources - finite number of crawlers Wastes Bandwidth – how often can you crawl 1 trillion documents and keep your index fresh? Increases Query CPU time – how do you search 1 trillion documents as quickly as possible? 3
  • 4. Document importance – Duplicate Content Headache Duplicate content can be a signal of an important document; • Song lyrics • Scholarly texts and historical documents, eg the Bible (1,000 pages) • The Linux manual (2,000 pages) • Breaking News – Associated Press, Reuters etc. 4
  • 5. Types of Duplicate Content Duplicate content comes in many forms Intentional vs non intentional On-site vs off-site 5
  • 6. On-Site Duplicate Content (Impacts Quality Score) Intentional • Printer friendly pages •Different font sizes •PDF documents •Archive (non graphics versions) •Shopping filters (sort by and pagination) •RSS feeds Non-intentional • Affiliate URLs - www.example.com/?btag=123 • Adwords Campaigns - www.example.com/?utc=google •Search results •www vs non www URLs •https vs http •Stubs/plugins 6
  • 7. On-Site Duplicate Content (Impacts Quality Score) 10’000s of stub pages worst case scenario example; This was 2 weeks after Andy had removed the duplicate links from the search pages on our advice eg; http://www.motors.co.uk/Ford-Escort-0-9999999---2 http://www.motors.co.uk/Ford-Escort-0-9999999--U-2- http://www.motors.co.uk/Ford-Escort-0-9999999---2%20- 7
  • 8. Off-Site Duplicate Content (Filters and Penalties) Intentional vs non-intentional somewhat grey Domain branding eg .com, .co.za (Mobile website) Content syndication Content theft Staging websites a common problem!! Quality signals are often used to filter off-site Duplicates!!! 8
  • 9. How Does Google Filter Off-site Duplicate Content Authors feel they have a right to rank for their own content – Google’s Loyalty is to its users!!! Google doesn’t necessarily reward a source or original but assesses; • Relevance (eg is an article in context) • Domain authority & links (eg Google Knol, Facebook) • Fresh content boost • Site quality signals (eg internal duplicate content!!!) 9
  • 10. Examples of Off-site Duplicate Content and Quality Client with .com.au and a .com with https duplicates Casino Client with a lot of stub pages (pre Panda) Casino site – severe health issues; 10
  • 11. How to Diagnose (on-site) Duplicate Content Link building will exacerbate duplicate content indexing Keep an eye on indexed pages (weekly) and look for spikes in Google Indexing, (Yahoo and Bing) Look for site:example.com duplicates Use Xenu link checker Heed any Webmaster Tools warnings Check your crawling and cache dates Frequent update but stale cache dates = dupe content issues 11
  • 12. How to address on-site and off-site duplicate content You have a whole armoury of potential tools including; Robots.txt exclusion Robots meta tag Canonical tag Webmaster URL exclusion Password protection (301 redirects) (File a DMCA against serial content thieves?) Lot of well-meaning people give bad advice though 12
  • 14. Adam Lasnik – “Deftly Dealing with Duplicate Content” 2006 Probably the authoritative guide to duplicate content; • What is duplicate content? • What isn't duplicate content? • Why does Google care about duplicate content? • What does Google do about it? • How can Webmasters proactively address duplicate content issues? `
  • 15. Deftly Dealing with... - Our advice/experience Robots.txt Routinely ignored by Google, probably because of malware User-agent: * Allow: /the-good-stuff/ Disallow: /the-malware/ Robots.txt is ignored unless combined with emergency Webmaster Tools URL removal (3 months) 15
  • 16. Our advice/experience Canonical tag Works great for cross-domain duplicate content Largely ineffective for pagination eg shopping sites Totally ineffective unless canonical URLs are VERY similar if not identical 16
  • 17. Our advice/experience Robots Meta Tag Noindex,Follow - 100% obeyed by Google and passes Page Rank too Very effective for pagination eg shopping sites Works well for tracking links too (www.example.com/?affid=123456) Doesn’t work when used with blocking robots.txt 17
  • 18. Our advice/experience Password Protect/htaccess 403 Forbidden Works great for staging sites Stubs - Problem in that it generates Webmaster Tools errors Our feeling best to avoid on your main domain 18
  • 19. Extreme Techniques to Avoid Dupe Content Make all your backend .exe with htaccess
  • 20. Summary Duplicate content is a minefield! Filters usually apply, penalties are very rare You have the answer in your own hands Stay on top of your site’s health – especially internal duplicate content
  • 21. Thank you for your attention! Thanks to: Anton Groeneveldt Carla dos Santos