When multiple URLs are near-duplicates Google will choose one and discard the others. This is to ensure that users are not annoyed by multiple URLs which have the same or very very nearly the same and meet a search query in the same context and with the same intent equally well. Many ecommerce sites fall foul to cannibalisation (cannibalization) of their own SEO success because they have within their site a number of mistakes they make with regards to such issues as canonicalization, inconsistent internal link signals, semantic misalignment of intent in site categories and sections, incorrect implementation of href lang internationalization, and other 'clues' which search engines use to identify 'the best' version or source of information for a search query. Here we identify some of the symptoms of SEO cannibalization, and seek to find some solutions which can help to increase visibility for target pages in websites
2. Dawn Anderson
Director - Move It Marketing
Associate Lecturer – Digital marketing & Search
From Manchester, UK
International SEO Consultant – 10+ Yrs in SEO
Pomeranian Pooch Lover – Meet Bert
Googebot Hunter (Practice & Academia)
Search Awards Judge
Contributor
Researcher
Twitter Chatterer @dawnando
4. EXACT
DUPLICATES
ARE
FILTERED
OUT
IN
THE
SEARCH
ENGINE
SYSTEM
TO
ARRIVE
AT
A
‘SINGLE
URL
/
CONTENT
FINGERPRINT’
5. ’BUT…’NEAR-DUPLICATE
CONTENT’ CAN BE A ‘BALL
AND CHAIN’ TO YOUR
WEBSITE
IT’S
NOT
ALWAYS
THE
ONE
YOU
WOULD
PREFER
THAT
GETS
CHOSEN
AS
’THE
ONE’
AND
INDEXED
MAKING
GOOGLE
PLAY
SPOT
THE
DIFFERENCE
7. A
LOT
OF
THE
CONTENT
IS
‘KIND
OF
THE
SAME’
“There’s
a
needle
in
here
somewhere”
“It’s
an
important
needle
too”
8. AS
ONE
PAGE
GOES
UP…
OTHER
’TOO
SIMILAR’
PAGES
GO
DOWN…
OR
REPLACE
EACH
OTHER
IN
THE
INDEX
GOOGLE
CAN
GET
CONFUSED
AS
TO
WHICH
PAGE
IT
SHOULD
RANK
FROM
YOUR
SITE
FROM
‘NEAR-‐
DUPLICATES’
WRONG
URLS
RANKING
9. ARE MANY URLs FIGHTING WITH EACH
OTHER FOR THE SAME QUERY?
CAVEAT:
CONTEXT
OF
QUERY
WILL
ALSO
COME
INTO
PLAY
BUT
GENERALLY
YOU
DON’T
WANT
SEVERAL
URLS
IN
THE
LIST
SOME
SYMPTOMS
SHARED
IMPRESSIONS
&
POOR
CTR
10. DO
A
SITE:YOURDOMAIN.COM
SEARCH
IN
GOOGLE
SERPS
DO
A
SITE:
www.YOURDOMAIN.COM
SEARCH
IN
GOOGLE
SERPS
CHECK
THE
DIFFERENCE
REPEAT
WITH
HTTPS
URLS INDEXED FROM DIFFERENT
VERSIONS OF YOUR SITE
11. CHECK YOUR PAGE SIMILARITY %
http://www.webconfs.com/similar-‐page-‐checker.php
IDENTIFY
CULPRITS
13. DEEP CRAWL & ON-PAGE.ORG
DUPLICATE & WRONG VERSION CHECKER
14. INCONSISTENT ANCHOR TEXT INTERNAL
LINKING
BE
CONSISTENT
IN
INTERNAL
ANCHOR
LINKS
MAKE
SURE
CONTEXT
AND
LINK
IS
USEFUL
TO
HUMAN
&
SEARCH
ENGINE
VISITOR
SITE
TRAVERSAL
IF
YOUR
’INTERNAL
ANCHOR
CLOUD’
SCREAMS
‘SPAM
ON
MONEY
TERMS’…
NOOO
AVOID
BEING
’FORMULAIC’
‘CLICK
HERE’
IS
NEVER,
EVER,
EVER
USEFUL
15. BAD VERY SIMILAR CONTENT CAN BE
GENERATED VIA ‘WRONG PARAMETER’ PICKUP
SOMETIMES
GOOGLEBOT
PICKS
UP
ON
THE
WRONG
PARAMETER
FIELD
FROM
DYNAMIC
URLS
AND
HEADS
OFF
INTO
THE
MAZE.
THE
IMPORTANT
URL
IS
LOST
CHECK
LOG
FILES
FOR
STRANGE
URLS
THE
IMPORTANT
URL
LOSES
IMPORTANCE
WHEN
GOOGLEBOT
ENCOUNTERS
DYNAMICALLY
GENERATED
RANDOM
BUT
‘LOGICAL’
CONTENT
FROM
PARAMETERS
REVIEW
URL
PARAMETERS
16. SKEWED ‘IMPORTANCE’ VIA INTERNAL LINKING
STOP
VOTING
FOR
THE
WRONG
TARGETS
FROM
WITHIN
YOUR
OWN
SITE
THE
MOST
IMPORTANT
PAGES
SHOULD
BE
TOWARD
THE
TOP
OF
THE
LIST…
NOT
YOUR
BLOG
OR
BLOG
CATEGORIES
IDEALLY
17. BREADCRUMBS EMPHASISE IMPORTANCE
& SILO RELEVANCE IN SITES
Image
credit:
https://www.smashingmagazine.com/2009/03/breadcrumbs-‐in-‐web-‐design-‐examples-‐and-‐best-‐
practices/
HOME
SITE SECTION
CATEGORY
PRODUCT
MOST
FEWER
FEWER
SINGLE
TEXT
OUTPUT
ONLY
BREADCRUMB
18. MERGE ‘TOO SIMILARS’ FOR
CONCENTRATION
1)
Variants
2)
Stemming
3)
Synonyms
4)
Associated
keywords
5)
Theme
level,
section
and
page
level
6)
Silos
DO
A
CONTENT
AUDIT
Find
variants
for
a
keyword
in
Google
Search
Console
in
content
keywords
They’re
potentially
the
same
thing
You
need
to
merge
pages
if
they
can’t
stand
on
their
own
two
feet
alone
CONCENTRATION
NOT
DILUTION
‘BULK
UP’,
DON’T
’STEAL’
THE
POWER
19. INTENT TO CONTENT URL MAPPING
BOTH
HUMANS
AND
SEARCH
ENGINES
NEED
LOGICAL
STRUCTURES
AND
SIGNS
TO
UNDERSTAND
CONTENT
/
CONTEXT
MATCHING
QUERY
INTENT.
THIS
HELPS
AVOID
CANNIBALISATION
HERO HUB HELP
• Transational
• Money
makers
• Ecommerce
• Categories
• The
Buzz
• Community
creators
• Engagement
• Guiding
• Supporting
• Advising
• Teaching
21. NOTES ON CANONICALIZATION
1)
Duplicate
or
‘very
near
duplicate’
content
should
be
canonicalized
2)
You
can’t
no-‐index
a
URL
and
then
canonicalize it
to
something
else
3)
With
and
without
a
trailing
slash
is
different
4)
Don’t
forget
to
switch
your
canonicals
when
moving
to
HTTPS
5)
Self
referencing
canonicalization
can
help
if
your
content
gets
scraped
6)
Avoid
canonicalizing URLs
to
another
if
it’s
too
different
7)
Don’t
canonicalize paginated
URLs
to
the
first
page
of
the
set
– use
next/prev
8)
Remember
that
Google
is
looking
for
a
single
version
of
a
URL
to
index
–
canonicalization
is
key
to
this
9)
Don’t
canonicalize just
for
rankings.
Canonicalize to
reduce
duplication
10)
Consider
whether
a
301
redirect
would
be
the
best
fit
11)
Review
‘URL
parameters’
on
dynamic
sites
to
help
choose
canonicals
SOME
CANONICALS
MAY
BE
IGNORED
IF
GOOGLEBOT
/
THE
SEARCH
ENGINE
THINKS
YOU
MADE
A
MISTAKE
(e.g.
too
different
a
URL)
22. 21 WAYS TO AVOID SITE CANNIBALISATION
1)
Utilise Internal
linking
connecting
related
items
&
sections,
navigation,
breadcrumbs
and
some
contextual
(BUT
DO
NOT
BE
SPAMMY)
2)
Anchor
text
consistency
(BUT
DO
NOT
BE
SPAMMY)
3)
Canonicalization
of
near
similar
content
4)
Page
Title
Differentiation
5)
H1,
H2,
H3
tag
differentiation
6)
Intent
to
content
mapping
&
topical
permanent
hub
pages
7)
If
you
can’t
improve
(for
now),
noindex (temporarily)
and
then
improve
8)
Exclusion
of
near
duplicate
non-‐preferred
version
from
XML
sitemap
9)
Exclusion
of
301
redirects
from
XML
sitemap
10)
Verify
all
possible
versions
of
your
site
in
Google
Search
Console
(http/https/www/non-‐www)
11)
Choose
1
version
of
your
site
(HTTPS/HTTP/WWW/NONWWW)
and
301
redirect
other
versions
12)
Review
your
pages
for
duplicate
body
content
13)
Check
for
multiple
similar
URLs
sharing
impressions
for
queries
in
GSC
14)
Keep
boilerplate
areas
to
a
minimum
15)
Avoid
‘filler’
or
’placeholder’
content
just
for
the
sake
of
having
content
16)
Avoid
‘spinning
content
around’
in
different
parts
of
your
site
without
having
decent
unique
content
16)
Use
breadcrumbs
to
include
site
section
to
provide
further
context
17)
Review
parameters
and
check
to
see
that
Googlebot is
not
crawling
the
wrong
ones
18)
Fold
up
thin
content
and
merge
to
make
‘great
content’
which
stands
out
on
its
own
19)
Review
internal
links
and
chop
out
any
redirect
chains
to
end
target
20)
Check
any
hreflang tags
to
make
sure
right
content
is
ranking
in
right
language
if
internationalised
21)
Check
server
logs
on
larger
sites
to
check
for
abnormalities
&
googlebot visiting
strange
places