Successfully reported this slideshow.
Your SlideShare is downloading. ×

Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical Debt in SEO - Big Digital Adelaide 2017

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 61 Ad

Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical Debt in SEO - Big Digital Adelaide 2017

You have a shiny new site and your brand is looking for a fresh start with their offering. It may be one of many past migrations, protocol switches and redirections you're undertaken historically. But then you find that things didn't quite go as you expected. You never really got back to where you wanted to be in organic search. Part of this is because 'Gone is never Gone'. Every URL that ever was known of on your site is listed in the history logs in the Google search engine system and history logs are used to determine the amount of time your site will be apportioned crawling. You inherited technical SEO debt and generational cruft where everything gets blurred for Google in understanding which is the target URL for a particular term. This can be particularly prevalent when you migrate from one ecommerce platform to another because past crawling rules developed for your site are now not applicable but are still in the history and crawl patterns discovered.

You have a shiny new site and your brand is looking for a fresh start with their offering. It may be one of many past migrations, protocol switches and redirections you're undertaken historically. But then you find that things didn't quite go as you expected. You never really got back to where you wanted to be in organic search. Part of this is because 'Gone is never Gone'. Every URL that ever was known of on your site is listed in the history logs in the Google search engine system and history logs are used to determine the amount of time your site will be apportioned crawling. You inherited technical SEO debt and generational cruft where everything gets blurred for Google in understanding which is the target URL for a particular term. This can be particularly prevalent when you migrate from one ecommerce platform to another because past crawling rules developed for your site are now not applicable but are still in the history and crawl patterns discovered.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical Debt in SEO - Big Digital Adelaide 2017 (20)

Advertisement

More from Dawn Anderson MSc DigM (20)

Recently uploaded (20)

Advertisement

Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical Debt in SEO - Big Digital Adelaide 2017

  1. 1. @dawnieando #BigDigitalADL Generational  Cruft  &  Technical  Debt  in   SEOGONE  IS   NEVER  GONE Dawn  Anderson  @  dawnieando
  2. 2. @dawnieando #BigDigitalADL A New Beginning § “A  new  website  will  solve  ALL  our   problems” § “Let’s  start  again” § “We’ll  just  migrate…  and  redirect   everything”
  3. 3. @dawnieando #BigDigitalADL 404 Page Not Found § “Of  course,  we  won’t   redirect  everything…” § “Not  everything  will   be  worth  redirecting”
  4. 4. @dawnieando #BigDigitalADL 410 Gone § “Some,  we’ll  just  kill   off  with  a  410…” § “Then  the  URLs  will   be  gone”
  5. 5. @dawnieando #BigDigitalADL But in Reality…it Still Exists
  6. 6. @dawnieando #BigDigitalADL Because…Web Crawler System’s History Logs SEARCH  ENGINES  HAVE  A    BIG  MEMORY  AND  A  LOT  OF   STORAGE
  7. 7. @dawnieando #BigDigitalADL Web Crawler System History Logs
  8. 8. @dawnieando #BigDigitalADL Web Crawler System GOOGLE  NEVER  FORGETS The  history  logs  play  a   role  in  deciding  when   every  URL  that  was  EVER   discovered  gets  visited   again
  9. 9. @dawnieando #BigDigitalADL History Log Records Include: • URL  fingerprint • Timestamp  (last  crawl  or  download   attempt) • Crawl  status  (success  or  error)  (Response   code) • Content  checksum  (binary  code) • Source  ID  (accessed  from  cache  or   downloaded) • Segment  identifier  (Crawl  segment  assigned   to??) • Page  importance  (a  measure  of  importance   assigned  to  the  URL) May  be   calculated  by   identifying   historical   importance   scores  based  on   past  X  number  of   crawls
  10. 10. @dawnieando #BigDigitalADL Gone Is Never Gone “We  knew  there  was  content   there  at  some  point  so  we   just  swing  by  every  now  and   then  to  see  if  anything  came   back”  (John  Mueller,  2016)
  11. 11. @dawnieando #BigDigitalADL Generational Cruft … NOT ‘Crufts’
  12. 12. @dawnieando #BigDigitalADL The Generational ’Snail Trail’ • Old  XML  sitemaps • Redirects  drop  away  on  old  site   .htaccess • DNS  issues • People  link  to  old  site  but  wrong   protocol • Old  sites  not  verified  in  GSC • Not  all  protocols  redirecting Leaving  it’s   slithery     footprint
  13. 13. @dawnieando #BigDigitalADL The Generational ’Snail Trail’ • All  eating  away  at   Googlebot’s attention  on   your  server’s  IP WATCH  OUT  FOR  THE  SNAIL  TRAIL  &   GENERATIONAL  CRUFT
  14. 14. @dawnieando #BigDigitalADL The Slow Page Evolution of Near Duplicates In  a  study  over  11  weeks  Denis  Fetterly and   Mark  Najork found  that  near-­‐duplicate  pages   rarely  change  and  that  they  are  still  near-­‐ duplicates  of  each  other  10  weeks  later.   Therefore  once  identified  their  download   priority  may  be  reduced  so  that  resources  may   be  used  more  efficiently  /  productively   elsewhere (Fetterly &  Najork,  2003) Fetterly &   Najork,   2003
  15. 15. @dawnieando #BigDigitalADL ‘Transitive’?? Transitive  -­‐ A  ==  B  +  B  ==  C  then  A  ==  C THEORY:  Maybe  for  some  types  of   content  more  than  others  – e.g.   ecommerce/directories  but  not  news THEORY  ALERT  !!!!!!!
  16. 16. @dawnieando #BigDigitalADL DUSTBUSTER & DUST CRAWLING RULES DO  NOT   CRAWL  IN   THE  DUST BUILDS   ‘HINTS’  ON   WHAT  NOT   TO  CRAWL
  17. 17. @dawnieando #BigDigitalADL ‘Sampling’in Crawling for Efficiency ‘SMALL  TEST  VISITS  TO  A  SITE  TO   UNDERSTAND  WHETHER  IT  IS  WORTH   CRAWLING’
  18. 18. @dawnieando #BigDigitalADL Every Site Will Have Its Own Crawling Rules UNSURE  AS  TO  WHETHER   THIS  IS  BEING  USED DUSTBUSTER   CRAWLING RULES
  19. 19. @dawnieando #BigDigitalADL Popular CMS ’Rule Patterns’ (URL Parameters) ALL  WILL  HAVE  COMMON   CANONICALIZATION  PATTERNS  WHICH   CAN  BE  LEARNED
  20. 20. @dawnieando #BigDigitalADL CRAWLING RULES BUILT OVER TIME Crawl  Frequency  Patterns No  two  sites  will  have  the  same  crawl   schedules  or  rules  built Moving  from  one  CMS  to  another  may   mean  that  different  parameters  are   created.    New  parameters  =  new  rules  
  21. 21. @dawnieando #BigDigitalADL Every Version of Your Past Ecommerce Sites “Exponentially   multiplicative   URLs” Had  potential  to  spew…  at  some  point…
  22. 22. @dawnieando #BigDigitalADL URLs Take Their Place in Crawling Queues The  Queue  Gets  Long  &   Congested
  23. 23. @dawnieando #BigDigitalADL YOU INHERITED SEO TECHNICAL DEBT • Previous  content  /  link  manual  actions • Previous  algorithmic  suppressions • Past  infinite  loops • “We’ll  SEO  it  after  launch” • “SEO  is  dead…  so  we  won’t  optimise” • Dodgy  URL  parameters • Misconfigured  URL  parameters • Old  URL  crawling  ‘rules  /  hints’
  24. 24. @dawnieando #BigDigitalADL TECHNICAL DEBT
  25. 25. @dawnieando #BigDigitalADL IT WASN’T ME – PASSING THE BUCK
  26. 26. @dawnieando #BigDigitalADL … with Interest AT  SOME  POINT  IT   MUST  BE  REPAID
  27. 27. @dawnieando #BigDigitalADL GENERATIONAL CRUFT EVERY  SINGLE  TIME  YOU  MIGRATE,  CHANGE  DESIGN,  REDIRECT,  REINVENT  A  SITE  /  URL A  CLEAN  START REDIRECTIONS ANOTHER  STRUCTURE FIRST  SITE   STRUCTURE NEW  CRAWLING  ‘RULES’   BUILT CRAWLING   ‘RULES’  BUILT EVERYTHING   IS  ‘200  OK’ MORE  URLs MIXED  RESPONSE  CODES REDIRECTIONS ‘FUZZINESS’  IS  EMERGING NEW  CRAWLING  ‘RULES’  BUILT MORE  URLs REDIRECT  CHAINS  &  MIXED   RESPONSE  CODES NEW  SEO’s  DON’T   KNOW  THE  ‘HISTORY’ TARGET  URLs  NOW  ‘VERY  FUZZY’
  28. 28. @dawnieando #BigDigitalADL Aged ‘Patchwork Quilt’Sites A  LITTLE  BIT  OF  THIS  CMS  AND  A   LITTLE  BIT  OF  THAT  CMS MANY  HISTORICAL  PARAMETERS   CREATED
  29. 29. @dawnieando #BigDigitalADL ’Fuzzy’ URL Targets with Each Site Generation EVERYTHING  GETS   A  BIT  BLURRED ‘Which  is  the  target  URL   again?
  30. 30. @dawnieando #BigDigitalADL Time Seems To Fly… The Older You Get Your  new  site  URL  is  just   one  of  very  many  historical   URLs  on  your  IP  to  be   visited  periodically A  tiny  fish  in  a  very   big  URL  pond  queue
  31. 31. @dawnieando #BigDigitalADL The Great 302s Pass PageRank Debate
  32. 32. @dawnieando #BigDigitalADL SOLUTION - THE BELOVED CANONICAL § 30X  redirects § Canonical  tag § Href lang § HTTPS  protocol § Global  canonicalization   rules In  ’ALL’  its  forms
  33. 33. @dawnieando #BigDigitalADL Chocolate Boxes Research
  34. 34. @dawnieando #BigDigitalADL Advanced Technical SEO? 50%  of  SEOs  surveyed   considered;; “CANONICALIZATION   IS  ADVANCED   TECHNICAL  SEO”
  35. 35. @dawnieando #BigDigitalADL Oh Yeah – Canonicalization is Easy 76%  of  SEOs  surveyed   considered;; “CANONICALIZATION   IS  AN  EASY  CONCEPT   TO  UNDERSTAND”
  36. 36. @dawnieando #BigDigitalADL REL NEXT REL PREV is NOT Canonicalization 47%  of  SEOs   categorizing  themselves   as  ‘TECHNICAL  SEO’s   considered;; “REL=NEXT  /  REL  =   PREV”  IS  A  FORM  OF   CANONICALIZATION
  37. 37. @dawnieando #BigDigitalADL On Redirections as Canonicalization Forms Lots  were  unsure  about;; “301s  and  302s  are   BOTH  forms  of   canonicalization”
  38. 38. @dawnieando #BigDigitalADL On Href Lang as Canonicalization Only  64%  of  ’Technical SEOs’  thought  HRef Lang  was  a  form  of Canonicalization IT  IS
  39. 39. @dawnieando #BigDigitalADL URL Parameter Handling is Your Friend Help  Google  Build  ‘Crawling   Rules’  for  your  site  rather   than  wasting  time  on   ‘sampling’  and  giving  a  bad   impression GIVE  HELP  AND   GUIDANCE  WITH  THE   CRAWL  RULE  AND   HINT  BUILDING
  40. 40. @dawnieando #BigDigitalADL SOLUTION - Understand URL Parameters ACTIVE  PARAMETERS  ==  CHANGE  THE  CONTENT  ON   YOUR  PAGE (e.g.  sort,  filter,  translate,  paginate,  specify) PASSIVE  PARAMETERS  ==  DO  NOT  CHANGE  THE   CONTENT  ON  YOUR  PAGE (Often  used  for  tracking)    (ALIAS:  REPRESENTATIVE)
  41. 41. @dawnieando #BigDigitalADL ACTIVE Parameters (CHANGE CONTENT) SORT  ==  Sorts  dynamic  items  and  reorders  in  descending  /  ascending  price  /   popularity  /  added NARROWING  ==  Filters  dynamically  added  items  down  to  include  only   features  &  attributes  in  a  chosen  consideration  set SPECIFYING  ==  Identifies  a  particular  dynamically  variable  populated   content  set  within  a  site  section  (e.g.  store=women) TRANSLATING  ==  Indicates  a  language  driven  translation  URL  (e.g.  lang=fr) PAGINATING  ==  Indicates  a  paginated  display  of  long  content  (e.g.  page=2)
  42. 42. @dawnieando #BigDigitalADL Understand How URLs with Multiple Parameters Are Handled The  most  restrictive  parameter  blocked  overrules   lesser  restrictions
  43. 43. @dawnieando #BigDigitalADL Examples of Multiple Parameter Handling KNOW  THE   RULES http://www.example.com?shopping-­‐category=DVD-­‐movies&sort-­‐ by=production-­‐year&sort-­‐order=asc WILL  BE  CRAWLED http://www.example.com?shopping-­‐category=shoes&sort-­‐by=size&sort-­‐ order=asc WILL  NOT  BE  CRAWLED  (production-­‐year  blocks)
  44. 44. @dawnieando #BigDigitalADL Help Googlebot Get Round its Shopping List OPEN  MORE  CHECKOUTS WIDEN  THE  AISLES MAKE  THINGS  EASY  TO  FIND DON’T  CONFUSE   GOOGLEBOT HELP  FILL  THE  TROLLEY   QUICKLY SPEED,  SPEED,  SPEED
  45. 45. @dawnieando #BigDigitalADL SOLUTION - XML SitemapsAre Your Friend… (Strong Foundations) They  help  to   pass   ‘importance’   signals  within   a  site But…  never   leave  them  to   just   autogenerate without   periodically   checking ‘The   foundations’   underneath  a   site
  46. 46. @dawnieando #BigDigitalADL Validate & Retain in GSC ALL Past Domains & Past Site Versions (Protocols (HTTPS / HTTP) THERE  MAY  STILL  BE  UNDETECTED  ACTIVITY  GOING  ON  THERE
  47. 47. @dawnieando #BigDigitalADL Server Log FileAnalysis is Your Friend… You’ll  be  surprised  by  what  you  find Find  out  what  Googlebot is   visiting  and  when  (how   often)  and  whether  it   should  be  visiting  it  at  all
  48. 48. @dawnieando #BigDigitalADL SOLUTION - Save & Grow The URLs Not  EVERYTHING  is   worthy  of  its  own  URL VARIANTS STEMMINGS PLURALS RANDOM  TAGS LONG,  LONG,  LONG   TAIL  PARAMETERS
  49. 49. @dawnieando #BigDigitalADL SOLUTION - Save & Grow The URLs A  URL  is  like  a   fine  wine Maturing  over   time
  50. 50. @dawnieando #BigDigitalADL Pass Strong Clues - Highly Relevant New Structure STRONG SEMANTICS
  51. 51. @dawnieando #BigDigitalADL Wiki Page Redirects https://dbpedia.org/sparql Wikipedia   Redirects thesaurus.com OR  A  GOOD  OLD  FASHIONED  THESAURUS
  52. 52. @dawnieando #BigDigitalADL Increase ‘Importance’ quickly of target URLs • Internal  link  optimization • Canonicalise to  (if  relevant) • Strengthen  up  importance  signals • Inclusion  in  front  facing  and  XML   sitemaps • Improve  the  content  &  keep  it   updated • 301  redirect  to  (if  relevant   redundant  content)
  53. 53. @dawnieando #BigDigitalADL Reduce ‘Importance’ quickly of old URLs • Internal  link  unoptimization • 410 • Dig  out  URLs  with  links  to  them • Orphan  URLs • Canonicals  to  HTTPs • Exclusion  from  XML  sitemaps  (even   old  ones  on  the  server) • Strip  out  content
  54. 54. @dawnieando #BigDigitalADL 304 IF MODIFIED HEADERS ONLY  DOWNLOAD  IF  THE   CONTENT  CHECKSUM  HAS   CHANGED
  55. 55. @dawnieando #BigDigitalADL CHOP BACK CHAINS
  56. 56. @dawnieando #BigDigitalADL SOLUTION – Be Careful About Creating New Dynamic Parameters QUEUEING…  AGAIN Waiting  for  good  URLs  to  be   visited…  AGAIN
  57. 57. @dawnieando #BigDigitalADL REVISIT PAST .HTACCESS FILES Can  you  rewrite  the  rules  to  be  more   efficient  or  cut  out  some  old  rules  still   firing  unnecessarily?
  58. 58. @dawnieando #BigDigitalADL SOLUTION - NEVER TRY TO ‘OUTRUN’ GOOGLEBOT
  59. 59. @dawnieando #BigDigitalADL You Have a Shiny New Site… So What? You  may  still  have… GENERATIONAL  CRUFT  &  TECHNICAL   DEBT  TO  PAY  OFF GONE  IS  NEVER  GONE
  60. 60. @dawnieando #BigDigitalADL REMEMBER ”Gone  is   Never  Gone” “Google  Has  A   Big  Memory” Dawn  Anderson  @  dawnieando THANK  YOU
  61. 61. @dawnieando #BigDigitalADL Sources & References • https://patentimages.storage.googleapis.com/US8042112B1/US08042112-­‐ 20111018-­‐D00000.png • Randall,  K.H.,  Google  Inc.,  2010. Scheduler  for  search  engine  crawler.  U.S.  Patent   7,725,452.

×