SEO in Orbit - Duplicate Content by OnCrawl

New perspectives on duplicate content
Alexis Sanders
Senior SEO Manager at Merkle
Omi Sido
Technical SEO at Canon Europe
#OnCrawlinOrbit

Why should SEOs care about duplicate content?
#OnCrawlinOrbit

There is no manual
penalty for duplicate
content.
Source: October 2015 Google Hangout

Source: 10 Things I Hate About You
the
website
you
don’t
want to
be

what a user sees: what a bot sees:
Umm, I think I
like the white
shirt better…

Source: Introduction to Information Retrieval (c19)
“by some estimates,
as many as 40% of
the pages on the
web are duplicates
of other pages”

1. Indexing Challenges
2. Lower Link Impact
3. Internal Competition
4. Poor Crawl Bandwidth

Common sources of duplication:

o Repetitive page
o Doorway pages
o Inventory control
o Syndicated content
o PR releases
o Republishing
o Plagiarism
o Non-unique copy
o Localized content
o Thin content
o Staging sites
o HTTP vs. HTTPS
o Subdomain
o URL cases
o File extensions
o Trailing slash
o Index pages
o Parameters
o Pagination
o Mobile Configuration
o Internal site search
technical content
o Facets
o Sorts
o Image-only

How can SEOs find and identify duplicate content?
#OnCrawlinOrbit

1. Know your user journey
7. If your content is stolen:
Request a canonical tag
File a DCMA request
6. Strategically → consolidate,
create, delete, optimize
5. Leverage
appropriate
signaling
4. If the pages are
100% duplicate,
consolidate w/
a 301 redirect
3. Prioritize
duplicate
content
issues that are
affecting
performance
2. Create a strong
hierarchical URL taxonomy

oncrawl > Duplicate Content > By Group

Google
• Direct quotes in Google
• Searching via site:searches
• site:
• site: + inurl:
• intitle:
• filetype: (for file extensions)

GSC > Coverage > Duplicate …

•Quetext
•Noplag
•PaperRater
•Grammarly
•CopyScape
Plagiarism Tools

Resolving duplicate content:
a memorable case
#OnCrawlinOrbit

+64.2% in sessions Y/Y from Google organic.
Adding unique content.

+28.7% in sessions Y/Y from Bing organic.
From only H1  SSR full UX

HTTPS
HTTP
Once fixed, clicks
returned to normal
range on HTTPS w/in
3-4 days.
Accidental HTTP
canonical.
Estimated 5-10%
loss.

What is new for duplicate content
in the past year and a half?
#OnCrawlinOrbit

What will duplicate content management
look like in the future?
#OnCrawlinOrbit

• Less technical-based duplicate content (as CMS wise up)
• More automation (unit testing and external testing)
• Automatically detect high similarity pages and page types
for writers and content managers
• Google continue to improve their existing systems and
detection
• Perhaps an alert system to escalate issue of Google not
using the right canonical
Alexis’ hopes for the future,

Do you have a favorite technical trick?
#OnCrawlinOrbit

• EC2 remote computer instance
• Check mobile first testing tool
• Switch user agent to Googlebot
• Using TechnicalSEO.com’s robots.txt
tool
• Screaming frog log analyzer
• Made with Love’s htaccess checker
• Using Google Data Studio to report on
changes (syncing Sheets with updates,
filtering each page by relevant updates)
Alexis’ tech SEO tricks

Do you have a least favorite technical SEO question?
#OnCrawlinOrbit

Do you have a favorite googlebot?
#OnCrawlinOrbit

Alexis: I like the idea that
Googlebot is tired and
overworked (from crawling 130
trillion URLs).

Do you have a favorite planet?
#OnCrawlinOrbit

Launching the best SEO tips
into space
Next up on June 27th
from Bordeaux, France
FULL AGENDA AT WWW.ONCRAWL.COM/SEOINORBIT

SEO in Orbit - Duplicate Content by OnCrawl

More Related Content

What's hot

Similar to SEO in Orbit - Duplicate Content by OnCrawl

More from Alexis Sanders

Recently uploaded

SEO in Orbit - Duplicate Content by OnCrawl

Editor's Notes