Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Duplicate Content Myths Types and Ways To Make It Work For You

1,972 views

Published on

Duplicate content continues to confuse many of us. Part of the problem is there are different types of duplicate content which may be treated differently by search engines. There are different ways to deal with this in your SEO and content marketing strategy. It's important to be careful when removing content to ensure you're not shooting yourself in the foot. Instead of remove, try to improve or regroup content which is being triggered for the same query class / cluster and may be diluting. Change the emphasis and make something of added value for users. Consider query and category agnostic filtering versus content which is considered at query run-time auction in search results.

Published in: Marketing

Duplicate Content Myths Types and Ways To Make It Work For You

  1. 1. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle DUPLICATE CONTENT: MYTHS, TYPES & WAYS TO MAKE IT WORK FOR YOU
  2. 2. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Duplicate Content Penalty ‘Myth’… It Just Won’t Die Query  Refinement  Suggestion   Next  Probable  Queries  on  “near   duplicate  urls can  cause” 2017
  3. 3. At least 30% of the web is a duplicate of other pages on the web
  4. 4. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle “And  that’s   OK”
  5. 5. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle The Duplicate Content ‘Penalty’ Myth ‘Real’  duplicates  (matching content  checksum)  filtered  and   not  indexed “Each  content  filter sends  the   retrieved  web  pages  to  Dupserver to  determine  if  they  are  duplicates of  other  web  pages” http://www.google.ch/patents/US20120317089
  6. 6. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation SubtitleFilters
  7. 7. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Handling Near-Duplicate Content Attracted Lots of Research §Dennis  Fetterly §Marc  Najork §Mark  Manasse §Ziv Bar-­‐Yossef §Monica  Henzinger §William  Pugh §Andrei  Broder Some Notable ‘Spot the Difference’ Researchers DETECTING  DUPLICATES  &  NEAR-­‐DUPLICATES   EARLY  SAVES  ON  RESOURCES  /  EFFICIENCY
  8. 8. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Because… Near Duplicate Content is More Difficult to Detect than Exact Duplicates ’Detecting  Duplicate  and   Near  Duplicate  Files’ IT’S  AN  ONGOING  REAL   WORLD  CHALLENGE (Henzinger /  Pugh,  2003,  2009,  2011,   2012,  2011,  2016) These  Google  patents  in  the  series   keep  being  ‘tweaked’  (A  is  not  the   same  as  B)
  9. 9. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle A lot of busy Googlebots & potential for duplicates • The  web  doubled  in   size  2010  – 2012 • Another  1/3  by  2015 • Finite  search  engine   resources • Processes  automated   for  scale “I  just  never  have   any  ‘me’-­‐time’   any  more”
  10. 10. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Near Duplicates Do Not Change Often SO…  WHY   WASTE   RESOURCES   CRAWLING   THEM?
  11. 11. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle DENNIS FETTERLY
  12. 12. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle The Slow Page Evolution of Near Duplicates “Clusters  of  near-­‐duplicate  documents   are  fairly  stable:  Two  documents  that   are  near-­‐duplicates  of  one  another  are   very  likely  to  still  be  near-­‐duplicates  10   weeks  later” (Fetterly &  Najork,  2003)
  13. 13. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle … The Raters Guidelines still ask raters to catch ‘dupes’ In  Fact…  There’s   a  whole  section   of  the  guidelines   dedicated  to   them 2017
  14. 14. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Mostly Stable For Years… But… The Web is Always Changing 2017
  15. 15. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Near-Dupes are still doing strange things John Mu at International Search Summit § Nearly  the  same  but  not   the  same  still  causes   confusion § Particularly  problematic   on  internationalization § But  applies  to  all  sites   with  pages  not  the  same   but  ’nearly-­‐the-­‐same’ 2017
  16. 16. BUT…Different types of ‘duplicate content’
  17. 17. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle • Full  duplication • Partial  duplication • Document  inclusion • In-­‐document  duplication • (Local  duplication  (in-­‐same-­‐site))
  18. 18. All types may not be treated the same
  19. 19. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle PERFECT DUPLICATES
  20. 20. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation SubtitleFiltered before indexing
  21. 21. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle D.U.S.T. (DIFFERENT URL, SIMILAR TEXT)
  22. 22. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle DUSTBUSTER - Do Not Crawl in The Dust… Ziv Bar-Yossef Reduce  crawling  and  wasted   resources  to  low  importance  pages CAVEAT:  IT  IS  NOT   KNOWN  WHETHER   THIS  IS  BEING  USED   AT  ALL.    RESEARCH   AND  THEORY § Builds  crawling  ‘rules’ § Detects  duplicate  content   URL  patterns § From  small  ‘sampling’  visits § Swerves  ‘DUST’ § DUSTBUSTER § Saves  crawling  resources § Potentially  Popular  CMS   configurations  URL   parameters  detect  ‘DUST’ 2003
  23. 23. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation SubtitleFiltered before indexing
  24. 24. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Cookie Cutter Sites
  25. 25. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle
  26. 26. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle
  27. 27. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation SubtitleFiltered before indexing
  28. 28. Query & Category Agnostic
  29. 29. Never hit query run-time auction Because… They’re not indexed… Filtered
  30. 30. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Tripping Flags NEAR  DUP  ==  TRUE NEAR  DUP  ==  FALSE
  31. 31. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Query Agnostic Nature of Near-Duplicate Clustering
  32. 32. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Single URL Content Fingerprint
  33. 33. But… The Single URL Fingerprint May Not Be The One You Choose
  34. 34. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle BOILERPLATE ISSUES
  35. 35. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation SubtitleFiltered before indexing
  36. 36. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle TOKENS, VECTORS & SHINGLING
  37. 37. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle (w) Shingling A  rose  is  a  rose  is  a  rose
  38. 38. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Shingling A  rose  is  a  rose  is  a  rose N-­‐Gram (Where  ‘n’  is  no.   words  (tokens)  in   snapshot) [A  rose  is  a]   [rose  is  a   rose]  [is  a   rose  is]  (4)
  39. 39. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle SHINGLE VECTORS SUPERSHINGLE MEGASHINGLE Shingles, Supershingles & Megashingles WORD  ==   TOKEN
  40. 40. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitlehttp://corpus.tools/wiki/Onion
  41. 41. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle http://corpus.tools/wiki/Onion N-­‐gram   length   (word   string) POTENTIAL  EXAMPLE
  42. 42. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle http://corpus.tools/wiki/Onion Dup  Content   Threshold e.g.  0.5   (50%) POTENTIAL  EXAMPLE
  43. 43. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Broder, A.Z., Glassman, S.C., Manasse, M.S. and Zweig, G., 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8-13), pp.1157-1166.
  44. 44. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle “We have developed an efficient way to determine the syntactic similarity of files and have applied it to every document on the World Wide Web” (Broder et al, 1997) Broder, A.Z., Glassman, S.C., Manasse, M.S. and Zweig, G., 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8-13), pp.1157-1166.
  45. 45. Documents grouped together to meet similar queries equally (in a cluster)
  46. 46. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Multiple Title Candidates For A Query DYNAMIC,   CONTEXTUAL   SEARCH
  47. 47. Not? Query Agnostic
  48. 48. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Quilting Web Pages
  49. 49. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle DUPLICATE CONTENT TYPE – NEAR DUPE (QUILTING) UNIQUE PARAGRAPH EXTERNAL SYNDICATED EXTERNAL SYNDICATED EXTERNAL SYNDICATED HEADER - TEMPLATE FOOTER - TEMPLATE UNIQUE PARAGRAPH A S I D E
  50. 50. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle CONTENT  INCLUSION CMS’
  51. 51. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle UNIQUE   MAIN   CONTENT BUT  ITS   CONTENT  IS   INCLUDED   ELSEWHERE TEASER  ‘INCLUDED’  ELSEWHERE
  52. 52. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle May… or May NOT be Filtered before indexing ?
  53. 53. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Pages  that  look  very  different   but  meet  the  same  user   information  need  equally
  54. 54. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Cross  Over,  Query  Class    &  Semantic  Collisions
  55. 55. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Possible Treatment of Near-Duplicate Query Candidates “If  more  than  one  candidate  is   determined  to  be  part  of  a   ’search  query  cluster’,  the  most   important  one  based  on  factors   such  as  relevance,  freshness,   importance  is  returned.    The   others  are  eliminated.” (Henzinger /  Pugh,  2012,2016) Last  updated   2016
  56. 56. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle May… or May NOT be Filtered before indexing ?
  57. 57. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle URL Parameter-driven Ecommerce Platforms
  58. 58. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle How Are Choosing Strategies Catered For in Ecommerce? § FACETED  NAVIGATION   &  WEBSITE  FILTERS  ==   Allows  for  ‘Elimination  by   Aspects’ § PAGINATION ==  Reduces   ‘Too  Much  Choice’  effects § SORTING ==  Caters  for   ‘FIRST  /  BEST’  choosing   strategies CHOICE-­‐ ASSISTING   FUNCTIONALITY HEURISTICS
  59. 59. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle And with these choice-assisting functionalities come… “Exponentially   multiplicative   URLs”
  60. 60. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Exponentially Multiplicative URLs From Faceted Navigation… 100  DRESSES 5  COLOURS 10  SIZES 2  LENGTHS 4  SUPPLIERS 100  x  5  x  10  x   2  x  4  = 40,000 URLs
  61. 61. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle And that’s without HTTPS, WWW/non or internationalization 100  DRESSES 5  COLOURS 10  SIZES 2  LENGTHS 4  SUPPLIERS 100  x  5  x  10  x   2  x  4  = 40,000 URLs X  2  BECAUSE…   HTTPS  VERSION 80,000   URLs X  2…  BECAUSE…   WWW  /  NON   WWW  VERSION 160,000   URLs X  5…   BECAUSE…   EN  /  FR  /  ES  /   DE  /  IT  (e.g.) 800,000   URLs
  62. 62. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle May… or May NOT be Filtered before indexing ?
  63. 63. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle THAT’S A LOT OF URLs FOR 100 DRESSES Bored  Googlebot (Unrelated  to  speed)
  64. 64. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle When You Stop Boring Googlebot
  65. 65. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle When You Stop Boring Googlebot NOT   SPEED   RELATED
  66. 66. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle CANONICALIZATION
  67. 67. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle The Canonical Tag - Otherwise Known As… RFC 6596 ‘THE   CANONICAL   LINK   RELATION’ 2012
  68. 68. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle ‘The Canonical Link Relation’ – RFC6596 Is Still Adhered To 2017
  69. 69. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle 50%  OF  SEO’S   “SEARCH  ENGINES  HAVE   IGNORED  CANONICAL  TAGS   THEY  HAD  IMPLEMENTED” 2017
  70. 70. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle A lot can go wrong with mixed signals…
  71. 71. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle There Are Many Signals To Consider In Canonicalization 404 & 410 301 302, 303, 307 Valid canonical from ‘context’ URL to valid target Fall back to default pre ‘Canonical Link Relation’duplicate handling signals Valid href lang (if present and applicable) Manual Action SUPER  STRONG STRONG  -­‐ DIRECTIVE STRONG  -­‐ DIRECTIVE STRONG  -­‐ DIRECTIVE STRONG  -­‐ HINT STRONG  -­‐ HINT DEFAULT ALL  NEED  TO  BE  IN  UNISON HTTPS  (Google  Specific)
  72. 72. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle “REL=NEXT  /  REL  =   PREV”  IS  NOTA  FORM   OF  CANONICALIZATION 2017
  73. 73. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle PAGINATION & SORTING
  74. 74. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Rel =”next” Rel = “prev” RFC 5988 (Web Linking)
  75. 75. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle 2011… -> We’re Still Unclear
  76. 76. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle View-all’ Search Experience
  77. 77. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle If  a  canonical  is  not  deemed  to  be  valid   there  is  likelihood  the  pre-­‐RFC6596 Canonical  Link  Relation  treatment  of   duplicates  and  near-­‐duplicates  will  be   applied: Such  as  ‘internal  links’ COMMON  CANONICAL  MISTAKES
  78. 78. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle “301s  AND  302s  ARE   BOTH  A  FORM  OF   CANONICALIZATION” 2017
  79. 79. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Don’t  canonicalize from  an   ”index”  to  a  “noindex or  vice-­‐ versa  because  this  means  the   pages  are  NOT  the  same. The  canonical  will  likely  be   ignored COMMON  CANONICAL  MISTAKES If  “href lang”  references  an   alternative  which  does  not   match  a  canonical  link  the   canonical  will  likely  be   ignored
  80. 80. Fall  Back
  81. 81. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle HOW TO RESOLVE?
  82. 82. Instead of ‘remove’ consider ‘degroup’
  83. 83. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle DUPLICATE CONTENT TYPE – NEAR DUPE (ADDING VALUE)
  84. 84. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Hubs & Authorities BOWTIE  OF  THE  WEB Build  Strongly  Connected  Components
  85. 85. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle SORT OUT YOUR LIBRARY SYSTEM & QUERY CLASSES
  86. 86. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle SEARCH ENGINES LOVE CATEGORIZATION
  87. 87. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Focused Crawling CRAWLING   CONTENT  ON  A   SPECIFIC   TOPIC  FOR   EFFICIENCY
  88. 88. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle The ‘Mere’ Categorization Effect (Phenomenon) FTW Simply  by  labelling  products  /   items  as  being  part  of  a   category  regardless  of  label   appears  to  increase   perception  of  variety  &   positive  experience  (Mogliner et  al,  2003) HUMANS  LOVE  CATEGORIES   TOO…  IT  IS  A  PHENOMENON
  89. 89. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Homonyms contribute to need for query refinement HOMONYMS  –WORDS  THAT  ARE  SPELT  OR  PRONOUNCED   THE  SAME  BUT  HAVE  DIFFERENT  MEANINGS   ROSE EVENING WATCH SINK BACK ARMS BOW CHECK STRENGTHEN  DIFFERENTIAL   CONTEXT
  90. 90. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle INTELLIGENT INTERNAL LINKING
  91. 91. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle LOCAL NAVIGATION RELEVANCE TABLE  OF  CONTENTS  STYLE  IN   PAGE  NAVIGATIONAL  HEURISTIC   FOR  SEARCH  ENGINE  AND   HUMAN PAGINATED  TAB  THROUGH  ON   SECTIONS  OF  REVIEW GRANULAR   RELEVANCE
  92. 92. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Parameter Handling “IS  PARAMETER-­‐HANDLING  A   WAY  TO  HELP  GOOGLE  BUILD  A   SET  OF  ‘DUSTBUSTER   CRAWLING  RULES’  EARLY?” MAKE  THE   RULES
  93. 93. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle ADD VALUE TO NEAR DUPES (INFORMATIONAL VIEWS (INFORMATION ARCHITECTURE)
  94. 94. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle INFORMATION  VIEWS   ADDING  VALUE  AND   PASSING  STRENGTH  TO   CANONICAL  TARGETS
  95. 95. Internal Links Are The Dogs
  96. 96. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation SubtitleSitemaps Below Surface
  97. 97. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle BUILD STRONG SECTIONS REVIEWS BLOG BUYING   GUIDES COST   CALCULATORS COMMERCE MAIN  SITE  THEME  (ONTOLOGY SEMANTICS  RULE UGC
  98. 98. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Power Mapper
  99. 99. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Related Content Mostly Adds Value To Other Content That  content  is  ‘stitched’  from  elsewhere But  it  is  VERY  useful  overall  &  helps  with   searcher  ‘foraging’ To  create  context  for  what  it  links  out  to
  100. 100. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation SubtitleNOT Filtered before indexing Doc  IDs  meeting   contextual   information  needs   -­‐ 1,  or  2  pages   (max)  chosen  at   query  run-­‐time
  101. 101. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation SubtitleNOT Filtered before indexing Fighting  with  each   other  to  be  ‘THE’   result Seems  like   ‘dilution’
  102. 102. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle BE VERY CAREFUL WITH ‘PRUNING’
  103. 103. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle CAN YOU ‘IMPROVE’, ‘DE-GROUP’ OR ‘REMORPH’ … RATHER THAN ‘REMOVE?
  104. 104. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle THAT’S A LOT OF URLs FOR 100 DRESSES Is  The  Difference  Substantively  Different  To  Queries?
  105. 105. Does  The   Repurposed  or   Collated  Content  Add   ‘Additional’  Value??
  106. 106. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Content meeting informational needs equally treated different TO DUPLICATES
  107. 107. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle GOTCHAS
  108. 108. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Gotchas – Velvet Blues Update (SOME) URLs (WP PLUGIN)
  109. 109. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle BETTER SEARCH REPLACE PLUGIN REVIEWS
  110. 110. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle WP For AMP Internal Linking Canonical Issues
  111. 111. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle WP For AMP Internal Linking Canonical Issues
  112. 112. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle MAGENTO GOTCHA WITH CANONICALS
  113. 113. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Understand The Canonical Link Relation Rules – RFC6596 The target (canonical) IRI MUST identify content that is either duplicative or a superset of the content at the context (referring) IRI.
  114. 114. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Google’s Maile Ohye on ‘How To Hire An SEO’
  115. 115. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Without &filter=0 Appended to end of Query https://www.google.co.uk/search?q=red+dress es+size+10+long+sleeves&oq=red+dresses+siz e+10+long+sleeves&aqs=chrome.0.69i59.1257 0j0j7&sourceid=chrome&ie=UTF-­‐8 NOBODY  HAS  MORE   THAN  ONE  LISTING
  116. 116. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle With filter=0 Appended to end of Query https://www.google.co.uk/search?q=red+size+ 10+dresses+long+sleeves&oq=red+size+10+dr esses+long+sleeves&aqs=chrome..69i57.13605 j0j7&sourceid=chrome&ie=UTF-­‐8&filter=0 ALL  SITES  HAVE  AT   LEAST  2  LISTINGS MISSED OPPORTUNITIES
  117. 117. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle
  118. 118. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle ASOS.com 18%
  119. 119. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Quadruple Listings
  120. 120. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Similar Content – Query Refinement SERPs NOT  FILTERED NOT  NEAR-­‐DUPES Does  the  searcher   want  ‘gas   engineers,  heating   engineers,  central   heating?’
  121. 121. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle CONFUSING  DUPLICATE,  NEAR-­‐ DUPLICATE  (DUST)  AND  SIMILAR   CONTENT  COULD  COST  YOU   DEARLY Maybe a lot of people are confused by duplicates? § Be  careful  about  canonicalizing when  unnecessary § True  duplicate  content  &  near-­‐ dupes  are  query  and  category   agnostic § Similar  is  not  duplicate § You  may  still  have  the  answers   to  different  queries  based  on  a   small  important  difference § AT  LEAST  4  TYPES  OF   DUPLICATE  CONTENT 2017
  122. 122. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Thank You
  123. 123. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle APPENDIX
  124. 124. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Problems With The Many ‘Faces’ of Faceted Navigation https://webmasters.googleblog.com/2014/02/faceted-­‐navigation-­‐best-­‐ and-­‐5-­‐of-­‐worst.html -­‐ Wednesday, February 12, 2014 Example  of  faceted  navigation:   http://www.example.com/category.php?category=gummy-candies&price=5- 10&price=over-10 Facet  means  ‘little  faces’  (USEFUL  TRIVIA)
  125. 125. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Relation Links – ‘Web Linking’ https://tools.ietf.org/html/rfc5988 Web  LINKING  – RFC  5988 INTERNET ENGINEERING TASK FORCE
  126. 126. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle Internationalization – An Additional Layer of Complexity ‘TAGS  FOR  IDENTIFYING   LANGUAGES  – rfc 5646 https://tools.ietf.org/html/rfc5646 INTERNET  ENGINEERING  TASK   FORCE
  127. 127. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle A Solution - The Introduction of Href Lang Wikipedia  page  on  href lang Rules  on  href lang https://support.google.com/webmasters/answer/182192?hl=en&ref_topic=2370587 -­‐ MULTINATIONAL  &  MULTILINGUAL  SITES  AND  HREF  LANG https://support.google.com/webmasters/topic/2370587?hl=en&ref_topic=4598733 -­‐ HREF  LANG  Google https://support.google.com/webmasters/answer/2620865?hl=en&ref_topic=2370587 -­‐ USE  A  SITEMAP  FOR  HREF  LANG https://support.google.com/webmasters/answer/6144055?hl=en&ref_topic=2370587 -­‐ LOCALE  AWARE  WITH  GOOGLEBOT CRAWLING
  128. 128. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle INTERNATIONALIZED RESOURCE INDICATOR IRI Internationalized Resource Identifiers (IRIs) RFC 3987 https://tools.ietf.org/html/rfc3987 INTERNET ENGINEERING TASK FORCE
  129. 129. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle REFERENCES
  130. 130. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle References & Sources Fetterly,  D.,  Manasse,  M.  and  Najork,  M.,  2003.  On  the  evolution  of  clusters   of  near-­‐duplicate  web  pages. Journal  of  Web  Engineering, 2(4),  pp.228-­‐246. Broder,  A.Z.,  Glassman,  S.C.,  Manasse,  M.S.  and  Zweig,  G.,  1997.  Syntactic   clustering  of  the  web. Computer  Networks  and  ISDN  Systems, 29(8-­‐13),   pp.1157-­‐1166. Broder,  A.,  Kumar,  R.,  Maghoul,  F.,  Raghavan,  P.,  Rajagopalan,  S.,  Stata,  R.,   Tomkins,  A.  and  Wiener,  J.,  2000.  Graph  structure  in  the  web. Computer   networks, 33(1),  pp.309-­‐320. Mogilner,  C.,  Rudnick,  T.  and  Iyengar,  S.S.,  2008.  The  mere  categorization   effect:  How  the  presence  of  categories  increases  choosers'  perceptions  of   assortment  variety  and  outcome  satisfaction. Journal  of  Consumer   Research, 35(2),  pp.202-­‐215.
  131. 131. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle References & Sources http://www.seobythesea.com/2008/02/new-­‐google-­‐process-­‐for-­‐detecting-­‐ near-­‐duplicate-­‐content/ Pugh,  W.  and  Henzinger,  M.H.,  Google  Inc.,  2016. Detecting  duplicate  and   near-­‐duplicate  files.  U.S.  Patent  9,275,143. Alonso,  O.,  Fetterly,  D.  and  Manasse,  M.,  2013,  December.  Duplicate  news   story  detection  revisited.  In Asia  Information  Retrieval  Symposium (pp.  203-­‐ 214).  Springer  Berlin  Heidelberg. RFC  5988  – The  Canonical  Relation  Link  -­‐ https://tools.ietf.org/html/rfc5988 Fetterly,  D.,  Manasse,  M.  and  Najork,  M.,  2003.  On  the  evolution  of  clusters   of  near-­‐duplicate  web  pages. Journal  of  Web  Engineering, 2(4),  pp.228-­‐246.
  132. 132. @dawnieando from @MoveItMarketing Click To Edit Presentation SubtitleClick To Edit Presentation Subtitle References & Sources Najork,  M.,  2012,  August.  Detecting  quilted  web  pages  at  scale.   In Proceedings  of  the  35th  international  ACM  SIGIR  conference  on   Research  and  development  in  information  retrieval (pp.  385-­‐394).  ACM Source: Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A. and Wiener, J., 2000. Graph structure in the web. Computer networks, 33(1), pp.309-320. .

×