SlideShare a Scribd company logo
@dawnieando from	
  @MoveItMarketing
Dawn	
  Anderson	
  @	
  dawnieando
@dawnieando from	
  @MoveItMarketing
CRUFT
@dawnieando from	
  @MoveItMarketing
The Great 302s Pass PageRank Debate
@dawnieando from	
  @MoveItMarketing
GENERATIONAL CRUFT
MULTIPLE	
  GENERATIONS	
  OF	
  A	
  
WEBSITE
@dawnieando from	
  @MoveItMarketing
NOT ‘Crufts’ – THE WORLD’S LARGEST
DOG SHOW
ERIC
@dawnieando from	
  @MoveItMarketing
CONTENT CRUFT
https://moz.com/blog/c
lean-­‐site-­‐cruft-­‐before-­‐it-­‐
causes-­‐ranking-­‐
problems-­‐whiteboard-­‐
friday
@dawnieando from	
  @MoveItMarketing
THIS TYPE OF CRUFT IS
NOT
THE SAME AS CONTENT
CRUFT
@dawnieando from	
  @MoveItMarketing
SOFTWARE	
  CRUFT
@dawnieando from	
  @MoveItMarketing
‘URL	
  CRUFT’	
  IS	
  A	
  
THING
“characters relevant	
  or	
  meaningful	
  
only	
  to	
  the	
  people	
  who	
  created	
  the	
  
site,	
  such	
  as	
  implementation	
  details	
  
of	
  the	
  computer	
  system	
  which	
  serves	
  
the	
  page.	
  Examples	
  of	
  URL	
  cruft	
  
include filename	
  extensions such	
  
as .php or .html,	
  and	
  internal	
  
organizational	
  details	
  such	
  
as /public/or /Users/john/work/draft
s/.[9]”	
  
(Wikipedia	
  Definition)
ALL	
  THE	
  RANDOM CRAP
PEOPLE	
  ADD	
  TO
QUERY	
  STRINGS,	
  
PARAMETERS,	
  DIRECTORY	
  
FOLDERS	
  AND	
  URL	
  
STRUCTURES
@dawnieando from	
  @MoveItMarketing
CODE	
  &	
  URL	
  
CRUFT	
  MAKES	
  
CRAWLING	
  
SLUGGISH
@dawnieando from	
  @MoveItMarketing
“COOL	
  URIs	
  DON’T	
  
CHANGE”
Sir	
  Tim	
  Berners-­‐Lee
(Inventor	
  of	
  the	
  World	
  Wide	
  Web)
https://www.w3.org/Provider/Style/URI
Attrubution:	
  By	
  Uldis Bojārs (Flickr.)	
  [CC	
  BY-­‐SA	
  2.0	
  (http://creativecommons.org/licenses/by-­‐sa/2.0)],	
  via	
  Wikimedia	
  
Commons
@dawnieando from	
  @MoveItMarketing
A Clean Slate
LET’S START WITH
A
CLEAN
SLATE
@dawnieando from	
  @MoveItMarketing
Websites (AND URLs) are not
disposable
@dawnieando from	
  @MoveItMarketing
SEARCH	
  ENGINES	
  NEVER	
  FORGETS
Search	
  engines	
  
have	
  a	
  long	
  
memory	
  and	
  a	
  lot	
  
of	
  storage
@dawnieando from	
  @MoveItMarketing
404	
  NOT	
  
FOUND
&	
  410	
  
GONE
§ “Of	
  course,	
  we	
  
won’t	
  redirect	
  
everything…”
§ “Not	
  everything	
  
will	
  be	
  worth	
  
redirecting”
@dawnieando from	
  @MoveItMarketing
410 Gone
§ “Some,	
  we’ll	
  just	
  kill	
  
off	
  with	
  a	
  410…”
§ “Then	
  the	
  URLs	
  will	
  
be	
  gone”
@dawnieando from	
  @MoveItMarketing
https://twitter.com/JohnMu/status/903904602617204738
@dawnieando from	
  @MoveItMarketing
302	
  ==	
  Default 301	
  ==	
  Intentional
404	
  ==	
  Default 410	
  ==	
  Intentional
“The	
  410	
  response	
  is	
  primarily	
  intended	
  to	
  assist	
  the	
  task	
  of	
  web	
  maintenance	
  by	
  
notifying	
  the	
  recipient	
  that	
  the	
  resource	
  is	
  intentionally	
  unavailable	
  and	
  that	
  the	
  server	
  
owners	
  desire	
  that	
  remote	
  links	
  to	
  that	
  resource	
  be	
  removed.”	
  (RFC	
  7231)
https://tools.ietf.org/html/rfc7231#section-­‐6.5.9
ARE YOU SURE?
MAYBE YES
@dawnieando from	
  @MoveItMarketing
https://www.youtube.com/watch?v=xp5Nf8ANfOw
THE	
  DIFFERENCE	
  BETWEEN	
  HOW	
  GOOGLE	
  TREATS	
  404	
  VERSUS	
  410s
@dawnieando from	
  @MoveItMarketing
DO NOT THINK 410s WON’T BE
RECRAWLED AGAIN
Source:	
  https://www.docsplace.org/4578/09/410-­‐gone-­‐stops-­‐crawling-­‐dead-­‐urls/
@dawnieando from	
  @MoveItMarketing
“We	
  knew	
  there	
  was	
  content	
  
there	
  at	
  some	
  point	
  so	
  we	
  
just	
  swing	
  by	
  every	
  now	
  and	
  
then	
  to	
  see	
  if	
  anything	
  came	
  
back”	
  (John	
  Mueller,	
  2016)
In Reality… Gone Is Never Gone
@dawnieando from	
  @MoveItMarketing
ZOMBIES
ARE	
  NEVER
GONE
NO	
  URLS	
  ARE	
  
EVER	
  GONE	
  	
  
ONLY	
  THE	
  RESOURCE	
  THERE	
  
IS	
  GONE
https://www.seroundtable.com/google-­‐410-­‐indexing-­‐22584.html
5	
  YEARS	
  LATER
@dawnieando from	
  @MoveItMarketing
HOW ABOUT 14 YEARS LATER?
https://www.webmasterworld.com/google/4864613.htm
2	
  HOURS	
  ALIVE…	
  
14	
  YEARS	
  LATER
@dawnieando from	
  @MoveItMarketing
YOU END UP WITH A CONGA LINE OF
LEGACY URLS, SUBDOMAINS
& VARIOUS SITE
PROTOCOLS
@dawnieando from	
  @MoveItMarketing
“Forever,
And ever,
And ever,
And ever…
You’ll be a
URL”
@dawnieando from	
  @MoveItMarketing
GOOGLEBOT GETS WHERE WATER
COULDN’T
https://petermeadit.com/blog
/block-­‐web-­‐crawlers/
@dawnieando from	
  @MoveItMarketing
EVEN YOUR STAGING & DEV SITES
Found	
  with	
  a	
  very	
  simple	
  wildcard	
  *	
  site:	
  query
@dawnieando from	
  @MoveItMarketing
THE CHALLENGE IS
NOT IN INDEXING…
BUT IN KEEPING
EVERYTHING
INDEXED UP TO DATE
@dawnieando from	
  @MoveItMarketing
INCREMENTAL CRAWLING NEVER ENDS
“Crawling	
  method	
  
based	
  on	
  crawl	
  
frequency	
  based	
  on	
  
URL	
  historical	
  
change	
  &	
  
importance	
  
rate”
Crawling
Which
Never
Ends
Ongoing
@dawnieando from	
  @MoveItMarketing
The Crawling ‘Frontier’ (THE URL QUEUE)
‘TO	
  BE	
  EXPLORED’
(OR	
  REVISTED)
@dawnieando from	
  @MoveItMarketing
URLs Take Their Place in The Frontier
Queue (New & Revisit)
The	
  Queue	
  Gets	
  Long	
  &	
  
Congested
@dawnieando from	
  @MoveItMarketing
EVEN
THE
RANDOM	
  CRAP
@dawnieando from	
  @MoveItMarketing
PAST DATA ON CHANGE IS A GREAT
PREDICTOR OF FUTURE DATA
PREDICTION	
  BASED	
  
PRIORITY	
  
SCHEDULING
…	
  WHEN	
  
THERE	
  IS	
  
CONSISTENCY
“past	
  changes	
  to	
  a	
  page	
  are	
  a	
  good	
  predictor	
  of	
  future	
  changes.	
  This	
  result	
  
has	
  practical	
  implications	
  for	
  incremental	
  web	
  crawlers	
  that	
  seek	
  to	
  
maximize	
  the	
  freshness	
  of	
  a	
  web	
  page	
  collection	
  or	
  index.”	
  (
@dawnieando from	
  @MoveItMarketing
BASED	
  ON	
  ROLLING	
  
AVERAGES
OF	
  PAST
CRAWL	
  VISITS
@dawnieando from	
  @MoveItMarketing
IMPORTANCE
TIERING
FOR SCALE
(EFFICIENCY)
@dawnieando from	
  @MoveItMarketing
A NEW URL HAS NO
BUT YOUR OLD ONES HAVE LOTS
@dawnieando from	
  @MoveItMarketing
Stored in Search Engine
History Logs
@dawnieando from	
  @MoveItMarketing
TO	
  BUILD	
  
PROBABILITY	
  &	
  
PREDICTABILITY	
  
MODELS
@dawnieando from	
  @MoveItMarketing
History Log Records Include:
• URL	
  fingerprint
• Timestamp	
  (last	
  crawl	
  or	
  download	
  
attempt)
• Crawl	
  status	
  (success	
  or	
  error)	
  
(Response	
  code)
• Content	
  checksum	
  (binary	
  code)
• Source	
  ID	
  (accessed	
  from	
  cache	
  or	
  
downloaded)
• Segment	
  identifier	
  (Crawl	
  
segment	
  assigned	
  to??)
• Page	
  importance	
  (a	
  measure	
  of	
  
importance	
  assigned	
  to	
  the	
  URL)
@dawnieando from	
  @MoveItMarketing
”The	
  URL	
  page	
  importance	
  score	
  can	
  be	
  retrieved	
  from	
  the	
  …	
  URL	
  history	
  log …or	
  it	
  can	
  
be	
  obtained	
  by	
  obtaining	
  the	
  historical	
  page	
  importance	
  score	
  for	
  the	
  URL	
  for	
  a	
  
predefined	
  number	
  of	
  prior	
  crawls	
  and	
  then	
  performing	
  a	
  predefined	
  filtering	
  function	
  
on	
  those	
  values	
  to	
  obtain	
  the	
  URL	
  page	
  importance	
  score.”
Scheduler	
  for	
  Search	
  Engine	
  Crawler
https://www.google.com/patents/US8042112
DOC	
  ID CRAWL	
  1	
  
IMPORTANCE	
  
RECORD
CRAWL	
  2	
  
IMPORTANCE	
  
RECORD
CRAWL 3	
  
IMPORTANCE	
  
RECORD
CRAWL	
  4	
  
IMPORTANCE	
  
RECORD
CRAWL	
  5	
  
IMPORTANCE	
  
RECORD
CRAWL	
  6
IMPORTANCE	
  
RECORD
DOC	
  ID	
  1 1 0.8 0.6 0.4 0.2 0
DOC	
  ID	
  2 0 0.2 0.4 0.6 0.8 1
@dawnieando from	
  @MoveItMarketing
URL_SEEN TEST
YOU CAN’T JUST KEEP TRYING TO JUMP
THE INDEXING QUEUE EITHER
PUSH	
  INDEXING PULLINDEXING
E.G.	
  FETCH	
  AS	
  GOOGLEBOT	
  &	
  
SUBMIT	
  TO	
  INDEX
VISITS	
  BY	
  NATURAL	
  CRAWLING	
  
&	
  DISCOVERY	
  OF	
  URLS	
  /	
  URL	
  
VISIT	
  SCHEDULING	
  /	
  REVISITS
@dawnieando from	
  @MoveItMarketing
‘Sampling’ in Crawling for Efficiency
‘SMALL	
  TEST	
  VISITS	
  TO	
  A	
  SITE	
  TO	
  
UNDERSTAND	
  WHETHER	
  IT	
  IS	
  WORTH	
  
CRAWLING	
  &	
  UNDERSTAND	
  	
  URL	
  
PATTERNS	
  &	
  RESOURCES	
  THERE’
@dawnieando from	
  @MoveItMarketing
Popular CMS ’Rule Patterns’ (URL Parameters)
ALL	
  WILL	
  HAVE	
  COMMON	
  
CANONICALIZATION	
  PATTERNS	
  WHICH	
  
CAN	
  BE	
  LEARNED
@dawnieando from	
  @MoveItMarketing
DUSTBUSTER & DUST CRAWLING RULES
DO	
  NOT	
  
CRAWL	
  IN	
  
THE	
  DUST
BUILDS	
  
‘HINTS’	
  ON	
  
WHAT	
  NOT	
  
TO	
  CRAWL
EVERY	
  SITE	
  WILL	
  
HAVE	
  ITS	
  OWN	
  
CRAWLING	
  
RULES
@dawnieando from	
  @MoveItMarketing
Aged ‘Patchwork Quilt’ Sites
A	
  LITTLE	
  BIT	
  OF	
  THIS	
  CMS	
  AND	
  A	
  
LITTLE	
  BIT	
  OF	
  THAT	
  CMS
MANY	
  HISTORICAL	
  PARAMETERS	
  
CREATED	
  &	
  CRAWLING	
  SAMPLE	
  
PATTERNS
@dawnieando from	
  @MoveItMarketing
Every Version of Your Past Ecommerce Sites
“Exponentially	
  
multiplicative	
  
URLs”
Had	
  potential	
  to	
  spew…	
  at	
  some	
  point…
DIFFERENT	
  PARAMETERS	
  &	
  URL	
  
PATTERNS	
  WHICH	
  ARE	
  LEARNED	
  BY	
  
CRAWLERS…	
  AND	
  REMEMBERED…	
  
FOREVER
@dawnieando from	
  @MoveItMarketing
‘Transitive’??
Transitive	
  -­‐ A	
  ==	
  B	
  +	
  B	
  ==	
  C	
  then	
  A	
  ==	
  C
For	
  some	
  types	
  of	
  content	
  more	
  than	
  
others	
  – e.g.	
  ecommerce/directories	
  but	
  
not	
  news
SAMPLING
@dawnieando from	
  @MoveItMarketing
EFFICIENCY
IS	
  NOT	
  JUST	
  ABOUT	
  URL	
  
SCHEDULING.	
  
IT	
  IS	
  ABOUT	
  NEAR	
  MEMORY	
  
STORAGE	
  (e.g.	
  CACHING)	
  TOO
@dawnieando from	
  @MoveItMarketing
REUSING PRE-­‐EMPTING	
  (PARTICULARLY	
  
POPULAR	
  DOCUMENTS	
  /	
  QUERIES	
  )	
  
&	
  REUSING	
  WHAT	
  WAS	
  ALREADY	
  IN	
  
NEARBY	
  (MEMORY	
  V	
  DISC)	
  
STORAGE
@dawnieando from	
  @MoveItMarketing
REUSE LOW	
  IMPORTANCE	
  and /	
  or	
  
DOESN’T	
  CHANGE OFTEN
REUSE IF	
  NOT	
  MODIFIED	
  SINCE LIKELY	
  TO	
  CHANGE	
  BY	
  X	
  DATE	
  
(SINCE DATE)
DOWNLOAD CHANGES	
  FREQUENTLY WITH	
  
IMPORTANT	
  CHANGE	
  OR	
  IS	
  AN	
  
IMPORTANT	
  DOCUMENT
REUSE	
  IF	
  NOT	
  MODIFIED	
  SINCE
https://www.google.com/patents/US8042112
@dawnieando from	
  @MoveItMarketing
CRAWL	
  SAMPLES	
  ALSO	
  
HELP	
  WITH	
  
MODELLING	
  TO	
  MAP	
  
DOCS	
  TO	
  TOPIC	
  
RELEVANCE
@dawnieando from	
  @MoveItMarketing
YOU BROKE YOUR SILO STRUCTURE
Image	
  credit:	
  https://www.slideshare.net/patrickstox/nlp-­‐sitemap-­‐smx-­‐2016-­‐
patrick-­‐stox-­‐latest-­‐in-­‐advanced-­‐technical-­‐seo
SEMANTIC	
  
LOSS
@dawnieando from	
  @MoveItMarketing
‘CONCEPT DRIFT’
IS A THING
fuzzy difficult  to  perceive;;  indistinct  or  vague.
synonyms: blurry, blurred, indistinct; unclear, bleary, misty, distorted, out	
  of	
  
focus, unfocused, lacking	
  definition, low	
  resolution, nebulous;
Ill-­‐
defined, indefinite, vague, hazy, imprecise, inexact, loose, woolly
"a	
  fuzzy	
  picture"
https://en.wikipedia.org/wiki/Concept_drift
AI
ALERT
@dawnieando from	
  @MoveItMarketing
BOOLEAN LOGIC – EXTREME CASES OF
TRUTH
(TRUE (1) OR FALSE (0))
@dawnieando from	
  @MoveItMarketing
‘FUZZY LOGIC’ – DEGREES OF TRUTH
SEMANTIC	
  
LOSS
@dawnieando from	
  @MoveItMarketing
BIG TOPICAL
URL FISH IN
A SMALL
TOPICAL
POND
@dawnieando from	
  @MoveItMarketing
SMALL TOPICAL URL
FISH
IN A BIG TOPICAL
POND
SEMANTIC	
  
LOSS
@dawnieando from	
  @MoveItMarketing
’Fuzzy’ URL Targets with Each Site Generation
EVERYTHING	
  GETS	
  
A	
  BIT	
  BLURRED
‘Which	
  is	
  the	
  target	
  URL	
  
again?
@dawnieando from	
  @MoveItMarketing
GENERATIONAL	
  
CRUFT	
  CAN	
  
SNOWBALL
• Past	
  infinite	
  loops
• Dodgy	
  URL	
  parameters
• Misconfigured	
  URL	
  parameters
• Old	
  URL	
  crawling	
  ‘rules	
  /	
  hints’
• Old	
  ‘importance	
  /	
  quality’	
  
scores
• Filtered	
  dupes	
  &	
  near-­‐dupes
• Mixed	
  messaging	
  canonicals
• 410s	
  still	
  being	
  revisited
• Internal	
  links	
  to	
  old	
  sites	
  /	
  
protocols
@dawnieando from	
  @MoveItMarketing
WRONG
URL
RANKING
’SWAPPING
OUT’
(Especially	
  
multiple	
  
child	
  nodes)
SHARP	
  &	
  
VOLATILE
RANKING	
  
FLUX
SOME	
  SYMPTOMS
@dawnieando from	
  @MoveItMarketing
A	
  LOT	
  OF	
  WRONG	
  TARGETS	
  
RANKING	
  POST	
  MIGRATION
SOME	
  SYMPTOMS
@dawnieando from	
  @MoveItMarketing
MIXED CONTENT & MULTIPLE SITE
VERSIONS
http://www.itv.com/news/
@dawnieando from	
  @MoveItMarketing
MIXED
CONTENT &
MULTIPLE SITE
VERSIONS
http://www.itv.com/news/
BOTH	
  HTTP	
  &	
  
HTTPS	
  FIGHTING	
  
EACH	
  OTHER
@dawnieando from	
  @MoveItMarketing
PEOPLE CHURN
INTERNAL	
  TEAM	
  
CHURN
EXTERNAL	
  AGENCY	
  
CHURN
@dawnieando from	
  @MoveItMarketing
FIND SITES ON THE SAME SERVER
@dawnieando from	
  @MoveItMarketing
DIAGNOSE: Validate & Retain in GSC ALL Past
Domains & Past Site Versions (Protocols (HTTPS /
HTTP)
THERE	
  MAY	
  STILL	
  BE	
  UNDETECTED	
  ACTIVITY	
  GOING	
  ON	
  THERE
@dawnieando from	
  @MoveItMarketing
URL Parameter Handling is Your Friend
Help	
  Google	
  Build	
  ‘Crawling	
  
Rules’	
  for	
  your	
  site	
  rather	
  
than	
  wasting	
  time	
  on	
  
‘sampling’	
  and	
  giving	
  a	
  bad	
  
impression
GIVE	
  HELP	
  AND	
  
GUIDANCE	
  WITH	
  THE	
  
CRAWL	
  RULE	
  AND	
  
HINT	
  BUILDING
@dawnieando from	
  @MoveItMarketing
Help	
  Google	
  Build	
  
‘Crawling	
  Rules’	
  for	
  
your	
  site	
  rather	
  than	
  
wasting	
  time	
  on	
  
‘sampling’	
  and	
  giving	
  
a	
  bad	
  impression
BE	
  VERY	
  
CAREFUL
@dawnieando from	
  @MoveItMarketing
PEOPLE CANONICALIZE WRONG
ON	
  MULTIPLE	
  GENERATIONS	
  OF	
  SITES
@dawnieando from	
  @MoveItMarketing
47%  of  TECHNICAL  
SEOs  thought:
“REL=NEXT  /  REL  =  
PREV”  IS  A  FORM  OF  
CANONICALIZATION
@dawnieando from	
  @MoveItMarketing
Lots  OF  SEOS  were  
unaware  that:
“301s  and  302s  are  
BOTH  forms  of  
canonicalization”
@dawnieando from	
  @MoveItMarketing
Only  64%  of  ’Technical
SEOs’  realised Href
Lang  is  a  form  of
Canonicalization
(Internationalization)
@dawnieando from	
  @MoveItMarketing
@dawnieando from	
  @MoveItMarketing
REVIEW & UNDERSTAND - THE
CANONICAL LINK RELATION
§ 30X	
  redirects
§ Canonical	
  tag
§ Href lang
§ HTTPS	
  protocol
§ Global	
  canonicalization	
  
rules
§ URL	
  normalization
In	
  ’ALL’	
  its	
  forms
@dawnieando from	
  @MoveItMarketing
PEOPLE APPEND (ADD TO FILES) -
SOMETIMES IT’S FEAR OF DEPENDENCIES
@dawnieando from	
  @MoveItMarketing
YOU	
  NEED	
  
TO	
  KNOW	
  
WHAT’S	
  ON	
  
THAT	
  
SERVER
DIAGNOSE: HEAD BACK TO THE
SERVER
@dawnieando from	
  @MoveItMarketing
DIAGNOSE: SERVER LOG FILE ANALYSIS
BUT	
  WATCH	
  OUT	
  FOR	
  
OTHER	
  TOOLS	
  EMULATING	
  
GOOGLEBOT	
  AND	
  FILTER	
  
THEM	
  OUT
ANALYSE	
  THE	
  LOGS	
  FOR	
  
‘ALL’	
  YOUR	
  SITES	
  AND	
  ‘ALL’	
  
PROTOCOLS	
  TO	
  SEE	
  THE	
  
PATTERNS	
  EMERGE
@dawnieando from	
  @MoveItMarketing
When analysing logs you’re often
viewing URLs from a ‘A LONNNNGGGG Time
Ago’
LOOKING	
  
AT
LEGACY
@dawnieando from	
  @MoveItMarketing
REVISIT ALLPAST .HTACCESS FILES
Can	
  you	
  rewrite	
  the	
  rules	
  to	
  be	
  
more	
  efficient	
  with	
  regex	
  or	
  cut	
  out	
  
some	
  old	
  rules	
  still	
  firing	
  
unnecessarily?	
  (CREATE	
  SHORTCUTS)
REMEMBER	
  .HTACCESS	
  RULES	
  RUN	
  IN	
  ORDER	
  OF	
  
THEIR	
  APPEARANCE	
  IN	
  THE	
  FILE.	
  	
  
CAN	
  YOU	
  USE	
  WILDCARDS	
  TO	
  OPTIMIZE	
  OR	
  SKIP	
  
STEPS?
.HTACCESS	
  
SITE	
  1
.HTACCESS	
  
SITE	
  2
.HTACCESS	
  
SITE	
  3
@dawnieando from	
  @MoveItMarketing
CHOP BACK REDIRECT CHAINS
@dawnieando from	
  @MoveItMarketing
Help Googlebot Get Round its Shopping List
OPEN	
  MORE	
  CHECKOUTS
WIDEN	
  THE	
  AISLES
MAKE	
  THINGS	
  EASY	
  TO	
  FIND
DON’T	
  CONFUSE	
  
GOOGLEBOT
HELP	
  FILL	
  THE	
  TROLLEY	
  
QUICKLY
SPEED,	
  SPEED,	
  SPEED
@dawnieando from	
  @MoveItMarketing
XML Sitemaps Are Your Friend… (Strong
Foundations)
They	
  help	
  to	
  
pass	
  
‘importance’	
  
signals	
  to	
  URLs
But…	
  never	
  
leave	
  them	
  to	
  
just	
  
autogenerate
without	
  
periodically	
  
checking
‘The	
  
foundations’	
  
underneath	
  a	
  
site
@dawnieando from	
  @MoveItMarketing
EXTERNALLY HOSTED XML SITEMAPS
• Take	
  back	
  control
• Jump	
  the	
  dev	
  queue
• Allows	
  for	
  custom	
  configuration	
  of	
  optimal	
  
canonical	
  click	
  paths
• Allows	
  for	
  consistent	
  signals	
  of	
  importance	
  to	
  
included	
  URLs
• Forget	
  about	
  setting	
  priority
• Forget	
  about	
  last	
  modified
• Even	
  a	
  simple	
  list	
  of	
  URLs	
  FTW	
  will	
  do
• Keep	
  them	
  organised for	
  granular	
  analysis	
  of	
  
problem	
  site	
  sections
@dawnieando from	
  @MoveItMarketing
INSTEAD	
  OF	
  
REMOVE…	
  
CONSIDER…	
  
DISTRACT	
  &	
  
ITERATIVELY
IMPROVE
STRATEGIC	
  USE	
  OF	
  INTERNAL	
  LINK	
  
POPULARITY
REDUCE	
  IMPORTANCE	
  SIGNALS	
  
TO	
  DIFFERENT	
  PAGES
INCLUDE	
  IMPORTANT	
  PAGES	
  IN	
  
XML	
  SITEMAPS
INCLUDE	
  IMPORTANT	
  PAGES	
  IN	
  
HTML	
  SITEMAPS
@dawnieando from	
  @MoveItMarketing
BUILD WELL CATEGORIZED AND
CONCEPTUALLY STRUCTURED
SITEMAPS
https://www.slideshare.net/p
atrickstox/nlp-­‐sitemap-­‐smx-­‐
2016-­‐patrick-­‐stox-­‐latest-­‐in-­‐
advanced-­‐technical-­‐seo
@dawnieando from	
  @MoveItMarketing
SOLUTION: Increase ‘Importance’ quickly of
target URLs
• Internal	
  link	
  optimization
• Canonicalise to	
  (if	
  relevant)
• Strengthen	
  up	
  importance	
  signals
• Inclusion	
  in	
  front	
  facing	
  HTML	
  and	
  XML	
  
sitemaps
• Improve	
  the	
  content	
  &	
  keep	
  it	
  updated
• 301	
  redirect	
  to	
  (if	
  relevant	
  redundant	
  
content)
• Topical	
  hubs	
  and	
  strong	
  information	
  
views	
  to	
  navigate	
  users	
  &	
  add	
  relevance
@dawnieando from	
  @MoveItMarketing
SOLUTION: Reduce ‘Importance’ quickly of old
URLs
• Internal	
  link	
  UNOPTIMIZATION
• 410
• Dig	
  out	
  URLs	
  with	
  links	
  to	
  them
• Orphan	
  URLs
• Canonicals	
  to	
  HTTPs
• EXCLUSION	
  from	
  XML	
  sitemaps	
  
(even	
  old	
  ones	
  on	
  the	
  server)
• Archiving	
  of	
  content
@dawnieando from	
  @MoveItMarketing
CONTENT CRUFT
https://moz.com/blog/c
lean-­‐site-­‐cruft-­‐before-­‐it-­‐
causes-­‐ranking-­‐
problems-­‐whiteboard-­‐
friday
@dawnieando from	
  @MoveItMarketing
IT’S	
  VERY	
  
IMPORTANT…	
  
YOU	
  STAY	
  OUT	
  
OF	
  SERVER	
  
ERROR	
  STATUS
500
‘Try	
  again’	
  intervals	
  likely	
  extended	
  
between	
  each	
  failed	
  connection	
  
attempt
@dawnieando from	
  @MoveItMarketing
Consistency is
REMEMBER	
  ’ROLLING	
  
AVERAGES’
@dawnieando from	
  @MoveItMarketing
APPENDIX
@dawnieando from	
  @MoveItMarketing
410 Likely Get Deindexed Quicker
https://plus.google.com/+JohnMueller/
posts/NEsqE7Sr4Z4
“Usually	
  seeing	
  it	
  (410)	
  1-­‐2	
  
times	
  is	
  enough	
  for	
  us	
  to	
  
drop	
  those	
  URLs	
  from	
  the	
  
index”	
  	
  John	
  M	
  on	
  Google+
(https://plus.google.com/u/0
/+JohnMueller/posts/NEsq
E7Sr4Z4)
@dawnieando from	
  @MoveItMarketing
LEGACY ISSUES VIA CANONICALS OR
REDIRECTION (COMMON MISTAKES)
• PAGE	
  CANONICALIZED	
  TO	
  IS	
  NOT	
  A	
  SUPERSET	
  OR	
  
DUPLICATIVE	
  (IT	
  IS	
  NOT	
  RELEVANT	
  ENOUGH)
• 301s	
  TO	
  IRRELEVANT	
  PAGES	
  BECOME	
  SOFT	
  404
• FOLDING	
  UP	
  PRODUCT	
  PAGES	
  TO	
  CATEGORES	
  (PEOPLE	
  
WERE	
  LOOKING	
  FOR	
  A	
  SPECIFIC	
  PRODUCT)
• CANONICALIZATION	
  TO	
  PAGES	
  WHEN	
  IN	
  THE	
  FUTURE	
  
301	
  REDIRECT	
  TO	
  ANOTHER	
  URL	
  THEREFORE	
  NEGATING	
  
THE	
  PAGES	
  CANONICALIZING	
  TO	
  THEM
• CONFLICTS	
  BETWEEN	
  HREF	
  LANG	
  AND	
  
CANONICALIZATION
@dawnieando from	
  @MoveItMarketing
MORE CAUSES
SEARCH ENGINES ARE CRAWLING MORE CODE THAN YOU MIGHT HAVE
INTENDED IN THE FIRST PLACE
JAVASCRIPT ERRORS FROM LEGACY CODE & LIBRARIES
LEGACY 302s FROM REDIRECTED LEGACY DOMAINS WHICH CONFUSE
INTERMEDIATE SIGNALS BETWEEN 301S (WHICH ARE INTENDED DEFINITE
REDIRECTIONS)
ABANDONED URLS
AJAX URLS (NOT THE SAME AS THE NAMED ANCHOR) – DEPRECATION OF
AJAX CRAWLING (ASYNCHRONOUS JAVASCRIPT & XML)
@dawnieando from	
  @MoveItMarketing
“If	
  “change”	
  means	
  “any	
  change”,	
  then	
  about	
  40%	
  of	
  all	
  web	
  pages	
  change	
  weekly	
  
[12].	
  Even	
  if	
  we	
  consider	
  only	
  pages	
  that	
  change	
  by	
  a	
  third	
  or	
  more,	
  about	
  7%	
  of	
  all	
  
web	
  pages	
  change	
  weekly	
  [17].”	
  (Broder,	
  A.Z.,	
  Najork,	
  M.	
  and	
  Wiener,	
  J.L.,	
  2003)
EVEN	
  AS	
  FAR	
  BACK	
  IN	
  2003
40% of ALL web pages
changed weekly
___________________
7%	
  of	
  web	
  pages	
  changed	
  a	
  1/3	
  of	
  their	
  
page	
  content	
  or	
  more	
  weekly
@dawnieando from	
  @MoveItMarketing
HOW	
  MUCH	
  BIGGER	
  &	
  DYNAMIC	
  IS	
  THE	
  WEB	
  
NOW	
  IN	
  2017?
http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
@dawnieando from	
  @MoveItMarketing
FUZZY	
  LOGIC• Rule	
  
based	
  
logic
• Been	
  
around	
  
for	
  20+	
  
years
• Is	
  within	
  
a	
  subset	
  
of	
  AI
@dawnieando from	
  @MoveItMarketing
THESE	
  THINGS	
  ADD	
  UP
THEY	
  ALSO	
  STILL	
  NEED	
  TO	
  BE	
  DISCOVERED	
  
WHICH	
  REQUIRES	
  INITIAL	
  CRAWLING
https://twitter.com/dawnieando/status/906465965029969920
@dawnieando from	
  @MoveItMarketing
“404	
  vs	
  410	
  doesn't	
  affect	
  the	
  recrawl
rate:	
  we'll	
  still	
  occasionally	
  check	
  to	
  
see	
  if	
  these	
  pages	
  are	
  still	
  gone,	
  
especially	
  when	
  we	
  spot	
  a	
  new	
  link	
  to	
  
them”
John	
  Mueller,	
  Google+
2015
https://plus.google.com/u/0/+JohnMu
eller/posts/NEsqE7Sr4Z4
ESPECIALLY IF
THERE ARE
LINKS TO IT
@dawnieando from	
  @MoveItMarketing
Pass Strong Clues - Highly Relevant New
Conceptual Structures
STRONG
SEMANTICS	
  &	
  
CONCEPTUALLY	
  
CO-­‐OCCURRING	
  
TERMS
@dawnieando from	
  @MoveItMarketing
THINK CAREFULLY ABOUT URL CREATION
Not	
  EVERYTHING	
  is	
  
worthy	
  of	
  its	
  own	
  URL
VARIANTS
STEMMINGS
PLURALS
RANDOM	
  TAGS
LONG,	
  LONG,	
  LONG	
  
TAIL	
  PARAMETERS
@dawnieando from	
  @MoveItMarketing
ONLY	
  DOWNLOAD	
  IF	
  
THERE	
  IS	
  SUBSTANTIVE	
  
CHANGE
TAKE	
  SOME	
  CONTROL	
  WITH	
  304	
  &	
  EXPIRES	
  AFTER	
  HEADERS	
  
ON	
  LESS	
  IMPORTANT	
  PAGES
https://developers.google.com/web/fundamentals/pe
rformance/optimizing-­‐content-­‐efficiency/http-­‐caching
VALID	
  
REPRESENTATION
THE	
  URL	
  WILL	
  STILL	
  BE	
  VISITED	
  
BUT	
  0	
  (ZERO)	
  WILL	
  BE	
  
DOWNLOADED	
  SO	
  IT	
  IS	
  STRAIGHT	
  
ON	
  TO	
  THE	
  NEXT	
  URL	
  VERY	
  
QUICKLY
https://webmasters.googleblog.com/2006/09/better-­‐
details-­‐about-­‐when-­‐googlebot.html
https://tools.ietf.org/html/rfc7232#section-­‐4.1
@dawnieando from	
  @MoveItMarketing
A	
  URI	
  is	
  like	
  a	
  fine	
  
wine
Maturing	
  over	
  
time
“COOL	
  URIs	
  
DON’T	
  
CHANGE”
Sir	
  Tim	
  Berners-­‐Lee
(Inventor	
  of	
  the	
  World	
  Wide	
  Web)
https://www.w3.org/Provider/Style/URI
@dawnieando from	
  @MoveItMarketing
A	
  LONG,	
  LONG	
  TIME	
  AGO
• You	
  need	
  to	
  go	
  right	
  back	
  to	
  the	
  beginning
• What	
  domains	
  did	
  the	
  organisation EVER	
  register?
• Where	
  do	
  they	
  redirect	
  to?
• Is	
  it	
  via	
  301,	
  302	
  or	
  are	
  they	
  merely	
  parked	
  domains?
• Who	
  would	
  know?	
  	
  Who	
  is	
  responsible?
• Verify	
  them	
  all	
  in	
  Google	
  Search	
  Console
• Some	
  of	
  these	
  may	
  EVEN	
  HAVE	
  PENALTIES	
  HISTORICALLY
• If	
  there	
  are	
  links	
  to	
  any	
  there	
  is	
  likely	
  still	
  crawling	
  activity	
  there
• Analyse logs	
  across	
  multiple	
  subdomains	
  &	
  protocols
@dawnieando from	
  @MoveItMarketing
QUESTIONS TO ASK
HOW MANY MICRO-SITES HAVE YOU HAD?
HOW MANY SUBDOMAINS?
HOW MANY OTHER DOMAINS?
WHO IS RESPONSIBLE FOR DOMAIN REG
WHO KNOWS WITHIN THE ORGANISATION?
WHO REGISTERED THE DOMAINS?
WHO CAN UPDATE DNS RECORDS?
ARE THESE SITES STILL ON SERVERS?
HAVE ANY OF THESE SITES HAD MANUALACTIONS?
HOW ARE THESE SITES REDIRECTED?
ARE THEY PARKED DOMAINS?
@dawnieando from	
  @MoveItMarketing
DATA FROM
HISTORY LOGS
CONTRIBUTE
TO WHEN TO
REVISIT URIs
ON THE WEB
@dawnieando from	
  @MoveItMarketing
SOLUTION: REVISITING BLOATED
APPENDED .HTACCESS FILES ON ALL
LEGACY SITES (IF NOT REDIRECTING
AT A DNS LEVEL)
NOT	
  JUST	
  THE	
  .HTACCESS	
  FILE	
  ON	
  THE	
  EXISTING	
  
SITE	
  EITHER.
GOOGLEBOT	
  MAY	
  HIT	
  .HTACCESS	
  ON	
  PAST	
  SITES	
  
SO	
  THEY	
  MAY	
  ALSO	
  NEED	
  OPTIMIZING
.HTACCESS	
  RUN	
  IN	
  ORDER	
  SO	
  PROVIDE	
  
OPPORTUNITY	
  FOR	
  SHORT	
  CUTS	
  
@dawnieando from	
  @MoveItMarketing
SOME TYPES OF URL CRUFT
• INCORRECTLY	
  APPLIED	
  CANONICAL	
  
TAGS	
  
• CONFLICTING	
  HREF	
  LANG	
  &	
  
CANONICAL	
  TAGS
• MIXED	
  CONTENT
• URL	
  SHORTENERS
• SESSION	
  IDS
• UTM	
  TAGGING
• OLD	
  AJAX	
  FRAGMENTS
• PARAMETERS	
  FROM	
  MULTI	
  FACET	
  
DROP	
  DOWN	
  CHOICES
• .html,	
  .php,	
  .index.html,	
  .aspx
• LEGACY	
  URL	
  REWRITING	
  &	
  
PARAMETERS	
  IN	
  .HTACCESS	
  FILES
• LEGACY	
  FOLDERS	
  WHICH	
  CONTRIBUTE	
  
NO	
  MEANING	
  TO	
  SITE	
  ONTOLOGY
UNCRUFTY
www.myeasyurlwillmakeyouw
onder.com/resume
CRUFTY
www.myeasyurlwillmakeyouw
onder.com/resume.html
CRUFTY
http://nymag.com/scienceofus/2015/07/how-­‐
to-­‐recover-­‐from-­‐an-­‐all-­‐
nighter.html?om_rid=AAENcg&om_mid=_BTtF
a0B869PyJp&utm_content=buffer8fdd1&utm_
medium=social&utm_source=twitter.com&ut
m_campaign=buffer
@dawnieando from	
  @MoveItMarketing
INDEX
TIERING
Presented	
  by	
  B	
  Cambazoglu at	
  European	
  Summer	
  School	
  Information	
  Retrieval	
  2017	
  – (Cambazoglu,	
  B.B.	
  and	
  Baeza-­‐Yates,	
  R.,	
  2011.	
  
Scalability	
  challenges	
  in	
  web	
  search	
  engines.	
  In Advanced	
  topics	
  in	
  information	
  retrieval (pp.	
  27-­‐50).	
  Springer	
  Berlin	
  Heidelberg.)
@dawnieando from	
  @MoveItMarketing
FIND SITES ON THE SAME SERVER
@dawnieando from	
  @MoveItMarketing
TWO-PHASE
RANKING IN
A SEARCH
NODE
Presented	
  by	
  B	
  Cambazoglu at	
  European	
  Summer	
  School	
  Information	
  Retrieval	
  2017	
  – (Cambazoglu,	
  B.B.	
  and	
  Baeza-­‐Yates,	
  R.,	
  
2011.	
  Scalability	
  challenges	
  in	
  web	
  search	
  engines.	
  In Advanced	
  topics	
  in	
  information	
  retrieval (pp.	
  27-­‐50).	
  Springer	
  Berlin	
  
Heidelberg.)
@dawnieando from	
  @MoveItMarketing
FUZZY LOGIC – DEGREES OF TRUTH
0.8	
  Doc	
  ID	
  likely	
  to	
  
be	
  a	
  correct	
  URI	
  to	
  
choose	
  from	
  term	
  /	
  
query	
  cluster
@dawnieando from	
  @MoveItMarketing
EVERY	
  SINGLE	
  TIME	
  YOU	
  MIGRATE,	
  CHANGE	
  DESIGN,	
  REDIRECT,	
  REINVENT	
  A	
  SITE	
  /	
  URL
A	
  CLEAN	
  START
REDIRECTIONS
ANOTHER	
  STRUCTURE
FIRST	
  SITE	
  
STRUCTURE
NEW	
  CRAWLING	
  ‘RULES’	
  
BUILT
CRAWLING	
  
‘RULES’	
  BUILT
EVERYTHING	
  
IS	
  ‘200	
  OK’
MORE	
  URLs
MIXED	
  RESPONSE	
  CODES
REDIRECTIONS
‘FUZZINESS’	
  IS	
  EMERGING
NEW	
  CRAWLING	
  ‘RULES’	
  BUILT
MORE	
  URLs
REDIRECT	
  CHAINS	
  &	
  MIXED	
  
RESPONSE	
  CODES
NEW	
  SEO’s	
  DON’T	
  
KNOW	
  THE	
  ‘HISTORY’
TARGET	
  URLs	
  NOW	
  ‘VERY	
  FUZZY’
@dawnieando from	
  @MoveItMarketing
BUT WHEN DATA IS INCONSISTENT
FUZZY LOGIC MAY FAIL
‘DEGREES	
  OF	
  TRUTH’
MAY	
  BECOME	
  MORE	
  
BLURRED	
  /	
  VAGUE
@dawnieando from	
  @MoveItMarketing
SOLUTION: XML SITEMAPS
@dawnieando from	
  @MoveItMarketing
TERM-FREQUENCY INVERSE
DOCUMENT FREQUENCY
Cruft	
  can	
  also	
  skew	
  term-­‐frequency	
  
inverse	
  document	
  frequency
AND	
  THE	
  QUERY	
  CLUSTERS	
  DOCUMENTS	
  BELONG	
  TO
@dawnieando from	
  @MoveItMarketing
The Generational ’Snail Trail’
• Old	
  XML	
  sitemaps
• Redirects	
  drop	
  away	
  on	
  old	
  site	
  
.htaccess
• DNS	
  issues
• People	
  link	
  to	
  old	
  site	
  but	
  wrong	
  
protocol
• Old	
  sites	
  not	
  verified	
  in	
  GSC
• Not	
  all	
  protocols	
  redirecting
Leaving	
  it’s	
  
slithery	
  	
  
footprint
@dawnieando from	
  @MoveItMarketing
URL NORMALIZATION
Can be
problematic
and ‘crufty’
too
https://en.wikipedia.org/wiki/URL_normalization
@dawnieando from	
  @MoveItMarketing
REDUCTION & REPOPULATION OF INTERNAL LINK
POPULARITY (IBP) BETWEEN URL
SCHEDULING
IT’S	
  NOT	
  ONLY	
  THEIR	
  ‘INTERNAL	
  PAGE	
  
RANK’	
  BUT	
  ALSO	
  THE	
  ANCHORS,	
  INTER-­‐
CONNECTING	
  CONCEPTUAL	
  /	
  TOPIC	
  
RELEVANCE	
  IN	
  CONTENT	
  AND	
  THE	
  TEXT	
  
SURROUNDING	
  INTERNAL	
  LINK	
  ANCHORS	
  
(AND	
  PROBABLY	
  OTHER	
  THINGS	
  TOO)
SEMANTIC	
  ’CLUES’	
  WERE	
  LOST	
  ALONG	
  
THE	
  WAY
SEMANTIC
‘CONTEXT’ & IBP
BUCKET IS
LEAKING
@dawnieando from	
  @MoveItMarketing
SOLUTION: Wiki Page
Redirects on Topics
https://dbpedia.org/sparql
Wikipedia	
  
Redirects
thesaurus.com
OR	
  A	
  GOOD	
  OLD	
  FASHIONED	
  THESAURUS
@dawnieando from	
  @MoveItMarketing
Understand How URLs with
Multiple Parameters Are Handled
The	
  most	
  restrictive	
  parameter	
  blocked	
  overrules	
  
lesser	
  restrictions
@dawnieando from	
  @MoveItMarketing
THE	
  USE	
  OF	
  REUSE	
  TABLESTABLE	
  I
Reuse	
  Table	
  Example
URL URL
Record	
  No. Fingerprint	
  (FP) Reuse	
  Type If	
  Modified	
  Since	
  .	
  .	
  .
1 2123242 REUSE
2 2323232 REUSE	
  IF	
  NOT Feb.	
  5,	
  2004
MODIFIED	
  SINCE
3 3343433 DOWNLOAD
. . . .
. . . .
. . . .
https://www.google.com/patents/US8042112
@dawnieando from	
  @MoveItMarketing
REMEMBER
”Gone	
  is	
  Never	
  Gone”
“Search	
  Engines	
  Never	
  
Forget”Dawn	
  Anderson	
  @	
  dawnieando
@dawnieando from	
  @MoveItMarketing
REFERENCES
@dawnieando from	
  @MoveItMarketing
Sources & References
Bar-­‐Yossef,	
  Z.,	
  Keidar,	
  I.	
  and	
  Schonfeld,	
  U.,	
  2009.	
  Do	
  not	
  crawl	
  in	
  the	
  dust:	
  
different	
  urls with	
  similar	
  text. ACM	
  Transactions	
  on	
  the	
  Web	
  (TWEB), 3(1),	
  p.3
Broder,	
  A.Z.,	
  Najork,	
  M.	
  and	
  Wiener,	
  J.L.,	
  2003,	
  May.	
  Efficient	
  URL	
  caching	
  for	
  
world	
  wide	
  web	
  crawling.	
  In Proceedings	
  of	
  the	
  12th	
  international	
  conference	
  
on	
  World	
  Wide	
  Web (pp.	
  679-­‐689).	
  ACM
Cambazoglu,	
  B.B.	
  and	
  Baeza-­‐Yates,	
  R.,	
  2011.	
  Scalability	
  challenges	
  in	
  web	
  search	
  
engines.	
  In Advanced	
  topics	
  in	
  information	
  retrieval (pp.	
  27-­‐50).	
  Springer	
  Berlin	
  
Heidelberg.
Cho,	
  J.,	
  Garcia-­‐Molina,	
  H.	
  and	
  Page,	
  L.,	
  1998.	
  Efficient	
  crawling	
  through	
  URL	
  
ordering. Computer	
  Networks	
  and	
  ISDN	
  Systems, 30(1),	
  pp.161-­‐172
Fetterly,	
  D.,	
  Manasse,	
  M.,	
  Najork,	
  M.	
  and	
  Wiener,	
  J.,	
  2003,	
  May.	
  A	
  large-­‐scale	
  
study	
  of	
  the	
  evolution	
  of	
  web	
  pages.	
  In Proceedings	
  of	
  the	
  12th	
  international	
  
conference	
  on	
  World	
  Wide	
  Web (pp.	
  669-­‐678).	
  ACM
@dawnieando from	
  @MoveItMarketing
Sources & References
• Olston,	
  C.	
  and	
  Najork,	
  M.,	
  2010.	
  Web	
  crawling. Foundations	
  and	
  Trends®	
  in	
  
Information	
  Retrieval, 4(3),	
  pp.175-­‐246.
• Pandey,	
  S.	
  and	
  Olston,	
  C.,	
  2008,	
  February.	
  Crawl	
  ordering	
  by	
  search	
  impact.	
  
In Proceedings	
  of	
  the	
  2008	
  International	
  Conference	
  on	
  Web	
  Search	
  and	
  Data	
  
Mining (pp.	
  3-­‐14).	
  ACM.
• Olston,	
  C.	
  and	
  Pandey,	
  S.,	
  2008,	
  April.	
  Recrawl scheduling	
  based	
  on	
  information	
  
longevity.	
  In Proceedings	
  of	
  the	
  17th	
  international	
  conference	
  on	
  World	
  Wide	
  
Web (pp.	
  437-­‐446).	
  ACM
• Pandey,	
  S.	
  and	
  Olston,	
  C.,	
  2005,	
  May.	
  User-­‐centric	
  web	
  crawling.	
  In Proceedings	
  of	
  
the	
  14th	
  international	
  conference	
  on	
  World	
  Wide	
  Web (pp.	
  401-­‐411).	
  ACM.
• Pandey,	
  S.	
  and	
  Olston,	
  C.,	
  2008,	
  February.	
  Crawl	
  ordering	
  by	
  search	
  impact.	
  
In Proceedings	
  of	
  the	
  2008	
  International	
  Conference	
  on	
  Web	
  Search	
  and	
  Data	
  
Mining (pp.	
  3-­‐14).	
  ACM
@dawnieando from	
  @MoveItMarketing
Sources & References
• https://patentimages.storage.googleapis.com/US8042112B1/US08042112-­‐
20111018-­‐D00000.png
• Randall,	
  K.H.,	
  Google	
  Inc.,	
  2010. Scheduler	
  for	
  search	
  engine	
  crawler.	
  U.S.	
  Patent	
  
7,725,452.

More Related Content

What's hot

SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
Dawn Anderson MSc DigM
 
How to Improve Your Website's Indexation - Sean Butcher Brighton SEO Present...
How to Improve Your Website's Indexation  - Sean Butcher Brighton SEO Present...How to Improve Your Website's Indexation  - Sean Butcher Brighton SEO Present...
How to Improve Your Website's Indexation - Sean Butcher Brighton SEO Present...
Sean Butcher
 
UK Top 5,000 Websites; Mobile Site Speed Benchmark - BrightonSEO
UK Top 5,000 Websites; Mobile Site Speed Benchmark - BrightonSEOUK Top 5,000 Websites; Mobile Site Speed Benchmark - BrightonSEO
UK Top 5,000 Websites; Mobile Site Speed Benchmark - BrightonSEO
Erudite
 
Debugging rendering problems at scale
Debugging rendering problems at scaleDebugging rendering problems at scale
Debugging rendering problems at scale
Giacomo Zecchini
 
Creating Commerce Reviews and Considering The Case For User Generated Reviews
Creating Commerce Reviews and Considering The Case For User Generated ReviewsCreating Commerce Reviews and Considering The Case For User Generated Reviews
Creating Commerce Reviews and Considering The Case For User Generated Reviews
Dawn Anderson MSc DigM
 
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
Angry Creative (UK)
 
Moving URLs: Structural Web changes 
without losing rankings #SearchLove
Moving URLs: Structural Web changes 
without losing rankings #SearchLoveMoving URLs: Structural Web changes 
without losing rankings #SearchLove
Moving URLs: Structural Web changes 
without losing rankings #SearchLove
Aleyda Solís
 
Mobile-First Preparedness- what we've learned from crawling the top 1 million...
Mobile-First Preparedness- what we've learned from crawling the top 1 million...Mobile-First Preparedness- what we've learned from crawling the top 1 million...
Mobile-First Preparedness- what we've learned from crawling the top 1 million...
Jon Myers
 
Pubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn andersonPubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn anderson
Dawn Anderson MSc DigM
 
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEORendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Onely
 
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Dawn Anderson MSc DigM
 
Digital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Digital Olympus Technical SEO Findings Whilst Taming An SEO BeastDigital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Digital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Dawn Anderson MSc DigM
 
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
Distilled
 
From Web Site to Web App: Fantastic Optimisations and Where To Find Them
From Web Site to Web App: Fantastic Optimisations and Where To Find ThemFrom Web Site to Web App: Fantastic Optimisations and Where To Find Them
From Web Site to Web App: Fantastic Optimisations and Where To Find Them
MobileMoxie
 
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
Charly Wargnier
 
Modern SEO Players Guide
Modern SEO Players GuideModern SEO Players Guide
Modern SEO Players Guide
Michael King
 
SEO for Bloggers - WordCamp Seattle 2012
SEO for Bloggers - WordCamp Seattle 2012SEO for Bloggers - WordCamp Seattle 2012
SEO for Bloggers - WordCamp Seattle 2012
Justin Briggs
 
BrightonSEO Structured Data by Alexis Sanders
BrightonSEO Structured Data by Alexis SandersBrightonSEO Structured Data by Alexis Sanders
BrightonSEO Structured Data by Alexis Sanders
Alexis Sanders
 
BrightonSEO - How to use XPath with eCommerce Websites
BrightonSEO - How to use XPath with eCommerce WebsitesBrightonSEO - How to use XPath with eCommerce Websites
BrightonSEO - How to use XPath with eCommerce Websites
Janet Plumpton
 
Browser Changes That Will Impact SEO From 2019-2020
Browser Changes That Will Impact SEO From 2019-2020Browser Changes That Will Impact SEO From 2019-2020
Browser Changes That Will Impact SEO From 2019-2020
Tom Anthony
 

What's hot (20)

SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016
 
How to Improve Your Website's Indexation - Sean Butcher Brighton SEO Present...
How to Improve Your Website's Indexation  - Sean Butcher Brighton SEO Present...How to Improve Your Website's Indexation  - Sean Butcher Brighton SEO Present...
How to Improve Your Website's Indexation - Sean Butcher Brighton SEO Present...
 
UK Top 5,000 Websites; Mobile Site Speed Benchmark - BrightonSEO
UK Top 5,000 Websites; Mobile Site Speed Benchmark - BrightonSEOUK Top 5,000 Websites; Mobile Site Speed Benchmark - BrightonSEO
UK Top 5,000 Websites; Mobile Site Speed Benchmark - BrightonSEO
 
Debugging rendering problems at scale
Debugging rendering problems at scaleDebugging rendering problems at scale
Debugging rendering problems at scale
 
Creating Commerce Reviews and Considering The Case For User Generated Reviews
Creating Commerce Reviews and Considering The Case For User Generated ReviewsCreating Commerce Reviews and Considering The Case For User Generated Reviews
Creating Commerce Reviews and Considering The Case For User Generated Reviews
 
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
David Lockie 'Using Open Source to Speed Up your Roadmap' BrightonSEO 2017
 
Moving URLs: Structural Web changes 
without losing rankings #SearchLove
Moving URLs: Structural Web changes 
without losing rankings #SearchLoveMoving URLs: Structural Web changes 
without losing rankings #SearchLove
Moving URLs: Structural Web changes 
without losing rankings #SearchLove
 
Mobile-First Preparedness- what we've learned from crawling the top 1 million...
Mobile-First Preparedness- what we've learned from crawling the top 1 million...Mobile-First Preparedness- what we've learned from crawling the top 1 million...
Mobile-First Preparedness- what we've learned from crawling the top 1 million...
 
Pubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn andersonPubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn anderson
 
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEORendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
 
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
Technical SEO - Gone is Never Gone - Fixing Generational Cruft and Technical ...
 
Digital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Digital Olympus Technical SEO Findings Whilst Taming An SEO BeastDigital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Digital Olympus Technical SEO Findings Whilst Taming An SEO Beast
 
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
 
From Web Site to Web App: Fantastic Optimisations and Where To Find Them
From Web Site to Web App: Fantastic Optimisations and Where To Find ThemFrom Web Site to Web App: Fantastic Optimisations and Where To Find Them
From Web Site to Web App: Fantastic Optimisations and Where To Find Them
 
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
How to build simple web apps to automate your SEO tasks - BrightonSEO Spring ...
 
Modern SEO Players Guide
Modern SEO Players GuideModern SEO Players Guide
Modern SEO Players Guide
 
SEO for Bloggers - WordCamp Seattle 2012
SEO for Bloggers - WordCamp Seattle 2012SEO for Bloggers - WordCamp Seattle 2012
SEO for Bloggers - WordCamp Seattle 2012
 
BrightonSEO Structured Data by Alexis Sanders
BrightonSEO Structured Data by Alexis SandersBrightonSEO Structured Data by Alexis Sanders
BrightonSEO Structured Data by Alexis Sanders
 
BrightonSEO - How to use XPath with eCommerce Websites
BrightonSEO - How to use XPath with eCommerce WebsitesBrightonSEO - How to use XPath with eCommerce Websites
BrightonSEO - How to use XPath with eCommerce Websites
 
Browser Changes That Will Impact SEO From 2019-2020
Browser Changes That Will Impact SEO From 2019-2020Browser Changes That Will Impact SEO From 2019-2020
Browser Changes That Will Impact SEO From 2019-2020
 

Viewers also liked

BrightonSEO Slides - Blogging advice that'll make your job easier - guaranteed!
BrightonSEO Slides - Blogging advice that'll make your job easier - guaranteed!BrightonSEO Slides - Blogging advice that'll make your job easier - guaranteed!
BrightonSEO Slides - Blogging advice that'll make your job easier - guaranteed!
Sam Charles
 
Product Feed Research: What we learned from indexing 500m SKUs
Product Feed Research: What we learned from indexing 500m SKUsProduct Feed Research: What we learned from indexing 500m SKUs
Product Feed Research: What we learned from indexing 500m SKUs
Ben Morgan
 
Brighton SEO - Getting a competitive advantage on ebay
Brighton SEO - Getting a competitive advantage on ebayBrighton SEO - Getting a competitive advantage on ebay
Brighton SEO - Getting a competitive advantage on ebay
Darren Ratcliffe
 
Quality PR Linkbuilding - With Terrible Budget (BrightonSEO, September 2017)
Quality PR Linkbuilding - With Terrible Budget (BrightonSEO, September 2017)Quality PR Linkbuilding - With Terrible Budget (BrightonSEO, September 2017)
Quality PR Linkbuilding - With Terrible Budget (BrightonSEO, September 2017)
Ben Harrow
 
How Google Tag Manager can save your seo ? - Talk for Brighton SEO 2017
How Google Tag Manager can save your seo ? - Talk for Brighton SEO 2017How Google Tag Manager can save your seo ? - Talk for Brighton SEO 2017
How Google Tag Manager can save your seo ? - Talk for Brighton SEO 2017
Woptimo
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical check
Chloe Bodard
 
Brighton SEO 2017: Six Kick Ass Content Strategies - Laura Hampton
Brighton SEO 2017: Six Kick Ass Content Strategies - Laura HamptonBrighton SEO 2017: Six Kick Ass Content Strategies - Laura Hampton
Brighton SEO 2017: Six Kick Ass Content Strategies - Laura Hampton
Laura Hampton
 
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
kvonweb
 
BrightonSEO - Influencer Marketing - Allyson Griffiths iCrossing
BrightonSEO - Influencer Marketing - Allyson Griffiths iCrossingBrightonSEO - Influencer Marketing - Allyson Griffiths iCrossing
BrightonSEO - Influencer Marketing - Allyson Griffiths iCrossing
Allyson Griffiths
 
Better conversion with Intelligent Analytics
Better conversion with Intelligent AnalyticsBetter conversion with Intelligent Analytics
Better conversion with Intelligent Analytics
Tim Stewart
 
SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)
SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)
SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)
Saeley-Ewan Johnson jnr
 
BrightonSEO 2017- Harnessing your Reputation to win New Customers
BrightonSEO 2017- Harnessing your Reputation to win New CustomersBrightonSEO 2017- Harnessing your Reputation to win New Customers
BrightonSEO 2017- Harnessing your Reputation to win New Customers
Myles Anderson
 
Creating more human experiences with chatbots
Creating more human experiences with chatbotsCreating more human experiences with chatbots
Creating more human experiences with chatbots
Jonathan Seal
 
How to Get Top Tier Links With No Budget
How to Get Top Tier Links With No BudgetHow to Get Top Tier Links With No Budget
How to Get Top Tier Links With No Budget
Bobbi Brant
 
The SEO's Guide To JavaScript - Ric Rodriguez, Brighton SEO 2017
The SEO's Guide To JavaScript - Ric Rodriguez, Brighton SEO 2017The SEO's Guide To JavaScript - Ric Rodriguez, Brighton SEO 2017
The SEO's Guide To JavaScript - Ric Rodriguez, Brighton SEO 2017
Ric Rodriguez
 
Shut up and Listen: Social Listening Beyond Your Brand
Shut up and Listen: Social Listening Beyond Your BrandShut up and Listen: Social Listening Beyond Your Brand
Shut up and Listen: Social Listening Beyond Your Brand
Jellyfish Online Marketing
 
Using Natural Language APIs in SEO
Using Natural Language APIs in SEOUsing Natural Language APIs in SEO
Using Natural Language APIs in SEO
Stephan Solomonidis
 
Matching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureMatching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information Architecture
Dominic Woodman
 
Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - B...
Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - B...Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - B...
Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - B...
Chris Green
 
Setting AMP for Success at #BrightonSEO
Setting AMP for Success at #BrightonSEOSetting AMP for Success at #BrightonSEO
Setting AMP for Success at #BrightonSEO
Aleyda Solís
 

Viewers also liked (20)

BrightonSEO Slides - Blogging advice that'll make your job easier - guaranteed!
BrightonSEO Slides - Blogging advice that'll make your job easier - guaranteed!BrightonSEO Slides - Blogging advice that'll make your job easier - guaranteed!
BrightonSEO Slides - Blogging advice that'll make your job easier - guaranteed!
 
Product Feed Research: What we learned from indexing 500m SKUs
Product Feed Research: What we learned from indexing 500m SKUsProduct Feed Research: What we learned from indexing 500m SKUs
Product Feed Research: What we learned from indexing 500m SKUs
 
Brighton SEO - Getting a competitive advantage on ebay
Brighton SEO - Getting a competitive advantage on ebayBrighton SEO - Getting a competitive advantage on ebay
Brighton SEO - Getting a competitive advantage on ebay
 
Quality PR Linkbuilding - With Terrible Budget (BrightonSEO, September 2017)
Quality PR Linkbuilding - With Terrible Budget (BrightonSEO, September 2017)Quality PR Linkbuilding - With Terrible Budget (BrightonSEO, September 2017)
Quality PR Linkbuilding - With Terrible Budget (BrightonSEO, September 2017)
 
How Google Tag Manager can save your seo ? - Talk for Brighton SEO 2017
How Google Tag Manager can save your seo ? - Talk for Brighton SEO 2017How Google Tag Manager can save your seo ? - Talk for Brighton SEO 2017
How Google Tag Manager can save your seo ? - Talk for Brighton SEO 2017
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical check
 
Brighton SEO 2017: Six Kick Ass Content Strategies - Laura Hampton
Brighton SEO 2017: Six Kick Ass Content Strategies - Laura HamptonBrighton SEO 2017: Six Kick Ass Content Strategies - Laura Hampton
Brighton SEO 2017: Six Kick Ass Content Strategies - Laura Hampton
 
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
Kostas Voudouris - BrightonSEO - Perfromance-based optimisation using Google ...
 
BrightonSEO - Influencer Marketing - Allyson Griffiths iCrossing
BrightonSEO - Influencer Marketing - Allyson Griffiths iCrossingBrightonSEO - Influencer Marketing - Allyson Griffiths iCrossing
BrightonSEO - Influencer Marketing - Allyson Griffiths iCrossing
 
Better conversion with Intelligent Analytics
Better conversion with Intelligent AnalyticsBetter conversion with Intelligent Analytics
Better conversion with Intelligent Analytics
 
SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)
SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)
SPEAK EASY: THE RISE OF VOICE SEARCH (Mindshare Fast - Brighton SEO 2017)
 
BrightonSEO 2017- Harnessing your Reputation to win New Customers
BrightonSEO 2017- Harnessing your Reputation to win New CustomersBrightonSEO 2017- Harnessing your Reputation to win New Customers
BrightonSEO 2017- Harnessing your Reputation to win New Customers
 
Creating more human experiences with chatbots
Creating more human experiences with chatbotsCreating more human experiences with chatbots
Creating more human experiences with chatbots
 
How to Get Top Tier Links With No Budget
How to Get Top Tier Links With No BudgetHow to Get Top Tier Links With No Budget
How to Get Top Tier Links With No Budget
 
The SEO's Guide To JavaScript - Ric Rodriguez, Brighton SEO 2017
The SEO's Guide To JavaScript - Ric Rodriguez, Brighton SEO 2017The SEO's Guide To JavaScript - Ric Rodriguez, Brighton SEO 2017
The SEO's Guide To JavaScript - Ric Rodriguez, Brighton SEO 2017
 
Shut up and Listen: Social Listening Beyond Your Brand
Shut up and Listen: Social Listening Beyond Your BrandShut up and Listen: Social Listening Beyond Your Brand
Shut up and Listen: Social Listening Beyond Your Brand
 
Using Natural Language APIs in SEO
Using Natural Language APIs in SEOUsing Natural Language APIs in SEO
Using Natural Language APIs in SEO
 
Matching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information ArchitectureMatching Keywords to Pages - Information Architecture
Matching Keywords to Pages - Information Architecture
 
Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - B...
Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - B...Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - B...
Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - B...
 
Setting AMP for Success at #BrightonSEO
Setting AMP for Success at #BrightonSEOSetting AMP for Success at #BrightonSEO
Setting AMP for Success at #BrightonSEO
 

Similar to Technical SEO - Generational cruft in SEO - there is never a new site when theres history - brighton seo concise deck

SearchLove Boston 2018 - Emily Grossman - The Marketer’s Guide to Performance...
SearchLove Boston 2018 - Emily Grossman - The Marketer’s Guide to Performance...SearchLove Boston 2018 - Emily Grossman - The Marketer’s Guide to Performance...
SearchLove Boston 2018 - Emily Grossman - The Marketer’s Guide to Performance...
Distilled
 
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Dawn Anderson MSc DigM
 
Five SEO Strategies Every Company Needs to Master
Five SEO Strategies Every Company Needs to MasterFive SEO Strategies Every Company Needs to Master
Five SEO Strategies Every Company Needs to Master
Act-On Software
 
How Search Works
How Search WorksHow Search Works
How Search Works
Ahrefs
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonWebsite Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Jacque Doring
 
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Technologies
 
Google's Search Signals For Page Experience - SMX Advanced 2021 Patrick Stox
Google's Search Signals For Page Experience - SMX Advanced 2021 Patrick StoxGoogle's Search Signals For Page Experience - SMX Advanced 2021 Patrick Stox
Google's Search Signals For Page Experience - SMX Advanced 2021 Patrick Stox
Ahrefs
 
SEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO SuccessSEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO Success
Dawn Anderson MSc DigM
 
Website Migrations at SMX Munich 2019 - Patrick Stox
Website Migrations at SMX Munich 2019 - Patrick StoxWebsite Migrations at SMX Munich 2019 - Patrick Stox
Website Migrations at SMX Munich 2019 - Patrick Stox
patrickstox
 
SEOzone 2015 - Mark Thomas - 5 Actionable Technical SEO Tips
SEOzone 2015 - Mark Thomas - 5 Actionable Technical SEO TipsSEOzone 2015 - Mark Thomas - 5 Actionable Technical SEO Tips
SEOzone 2015 - Mark Thomas - 5 Actionable Technical SEO Tips
SEOzeo
 
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick StoxSMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
patrickstox
 
Page Experience Update TMC June 2021 Patrick Stox
Page Experience Update TMC June 2021 Patrick StoxPage Experience Update TMC June 2021 Patrick Stox
Page Experience Update TMC June 2021 Patrick Stox
patrickstox
 
Negotiating crawl budget with googlebots
Negotiating crawl budget with googlebotsNegotiating crawl budget with googlebots
Negotiating crawl budget with googlebots
Dawn Anderson MSc DigM
 
How to Win SEO in Complex Web Migrations Scenarios #YoastCon
How to Win SEO in Complex Web Migrations Scenarios #YoastConHow to Win SEO in Complex Web Migrations Scenarios #YoastCon
How to Win SEO in Complex Web Migrations Scenarios #YoastCon
Aleyda Solís
 
Stapling and patching the web of now - ForwardJS3, San Francisco
Stapling and patching the web of now - ForwardJS3, San FranciscoStapling and patching the web of now - ForwardJS3, San Francisco
Stapling and patching the web of now - ForwardJS3, San Francisco
Christian Heilmann
 
Technical SEO Audit
Technical SEO AuditTechnical SEO Audit
Technical SEO Audit
Outreach Digital
 
Build your own analytics power tools
Build your own analytics power toolsBuild your own analytics power tools
Build your own analytics power tools
Alban Gérôme
 
SEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm ShiftSEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm Shift
Dawn Anderson MSc DigM
 
SEO Audits & Anomalies: Fixing What's Broken By Kristine Schachinger
SEO Audits & Anomalies: Fixing What's Broken By Kristine SchachingerSEO Audits & Anomalies: Fixing What's Broken By Kristine Schachinger
SEO Audits & Anomalies: Fixing What's Broken By Kristine Schachinger
Search Marketing Expo - SMX
 
Seozone - 5 tips
Seozone  - 5 tips Seozone  - 5 tips
Seozone - 5 tips
Anna Morrison
 

Similar to Technical SEO - Generational cruft in SEO - there is never a new site when theres history - brighton seo concise deck (20)

SearchLove Boston 2018 - Emily Grossman - The Marketer’s Guide to Performance...
SearchLove Boston 2018 - Emily Grossman - The Marketer’s Guide to Performance...SearchLove Boston 2018 - Emily Grossman - The Marketer’s Guide to Performance...
SearchLove Boston 2018 - Emily Grossman - The Marketer’s Guide to Performance...
 
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
 
Five SEO Strategies Every Company Needs to Master
Five SEO Strategies Every Company Needs to MasterFive SEO Strategies Every Company Needs to Master
Five SEO Strategies Every Company Needs to Master
 
How Search Works
How Search WorksHow Search Works
How Search Works
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonWebsite Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
 
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
BrightEdge Share15 - S302: Beyond the Algorithm – Advanced SEO & Technical Tr...
 
Google's Search Signals For Page Experience - SMX Advanced 2021 Patrick Stox
Google's Search Signals For Page Experience - SMX Advanced 2021 Patrick StoxGoogle's Search Signals For Page Experience - SMX Advanced 2021 Patrick Stox
Google's Search Signals For Page Experience - SMX Advanced 2021 Patrick Stox
 
SEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO SuccessSEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO Success
 
Website Migrations at SMX Munich 2019 - Patrick Stox
Website Migrations at SMX Munich 2019 - Patrick StoxWebsite Migrations at SMX Munich 2019 - Patrick Stox
Website Migrations at SMX Munich 2019 - Patrick Stox
 
SEOzone 2015 - Mark Thomas - 5 Actionable Technical SEO Tips
SEOzone 2015 - Mark Thomas - 5 Actionable Technical SEO TipsSEOzone 2015 - Mark Thomas - 5 Actionable Technical SEO Tips
SEOzone 2015 - Mark Thomas - 5 Actionable Technical SEO Tips
 
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick StoxSMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
SMX Advanced 2018 SEO for Javascript Frameworks by Patrick Stox
 
Page Experience Update TMC June 2021 Patrick Stox
Page Experience Update TMC June 2021 Patrick StoxPage Experience Update TMC June 2021 Patrick Stox
Page Experience Update TMC June 2021 Patrick Stox
 
Negotiating crawl budget with googlebots
Negotiating crawl budget with googlebotsNegotiating crawl budget with googlebots
Negotiating crawl budget with googlebots
 
How to Win SEO in Complex Web Migrations Scenarios #YoastCon
How to Win SEO in Complex Web Migrations Scenarios #YoastConHow to Win SEO in Complex Web Migrations Scenarios #YoastCon
How to Win SEO in Complex Web Migrations Scenarios #YoastCon
 
Stapling and patching the web of now - ForwardJS3, San Francisco
Stapling and patching the web of now - ForwardJS3, San FranciscoStapling and patching the web of now - ForwardJS3, San Francisco
Stapling and patching the web of now - ForwardJS3, San Francisco
 
Technical SEO Audit
Technical SEO AuditTechnical SEO Audit
Technical SEO Audit
 
Build your own analytics power tools
Build your own analytics power toolsBuild your own analytics power tools
Build your own analytics power tools
 
SEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm ShiftSEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm Shift
 
SEO Audits & Anomalies: Fixing What's Broken By Kristine Schachinger
SEO Audits & Anomalies: Fixing What's Broken By Kristine SchachingerSEO Audits & Anomalies: Fixing What's Broken By Kristine Schachinger
SEO Audits & Anomalies: Fixing What's Broken By Kristine Schachinger
 
Seozone - 5 tips
Seozone  - 5 tips Seozone  - 5 tips
Seozone - 5 tips
 

More from Dawn Anderson MSc DigM

Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
Dawn Anderson MSc DigM
 
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic UpdatesLife of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Dawn Anderson MSc DigM
 
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Dawn Anderson MSc DigM
 
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you think
Dawn Anderson MSc DigM
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Dawn Anderson MSc DigM
 
Google BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual ConferenceGoogle BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual Conference
Dawn Anderson MSc DigM
 
Google BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to KnowGoogle BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to Know
Dawn Anderson MSc DigM
 
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Dawn Anderson MSc DigM
 
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
Dawn Anderson MSc DigM
 
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Dawn Anderson MSc DigM
 
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Dawn Anderson MSc DigM
 
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard RaceGoogle BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Dawn Anderson MSc DigM
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive Search
Dawn Anderson MSc DigM
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Dawn Anderson MSc DigM
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
Dawn Anderson MSc DigM
 
SEO in a Mobile First World
SEO in a Mobile First WorldSEO in a Mobile First World
SEO in a Mobile First World
Dawn Anderson MSc DigM
 
Modern Ecommerce SEO
Modern Ecommerce SEOModern Ecommerce SEO
Modern Ecommerce SEO
Dawn Anderson MSc DigM
 
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Dawn Anderson MSc DigM
 
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
Dawn Anderson MSc DigM
 
Voice Search Challenges For Search and Information Retrieval and SEO
Voice Search Challenges For Search and Information Retrieval and SEOVoice Search Challenges For Search and Information Retrieval and SEO
Voice Search Challenges For Search and Information Retrieval and SEO
Dawn Anderson MSc DigM
 

More from Dawn Anderson MSc DigM (20)

Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
 
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic UpdatesLife of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
 
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
 
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you think
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
 
Google BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual ConferenceGoogle BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual Conference
 
Google BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to KnowGoogle BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to Know
 
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
 
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
 
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
 
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019
 
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard RaceGoogle BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard Race
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive Search
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
SEO in a Mobile First World
SEO in a Mobile First WorldSEO in a Mobile First World
SEO in a Mobile First World
 
Modern Ecommerce SEO
Modern Ecommerce SEOModern Ecommerce SEO
Modern Ecommerce SEO
 
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
 
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
 
Voice Search Challenges For Search and Information Retrieval and SEO
Voice Search Challenges For Search and Information Retrieval and SEOVoice Search Challenges For Search and Information Retrieval and SEO
Voice Search Challenges For Search and Information Retrieval and SEO
 

Recently uploaded

Social Samosa's #30Under30 Guidebook.pdf
Social Samosa's #30Under30 Guidebook.pdfSocial Samosa's #30Under30 Guidebook.pdf
Social Samosa's #30Under30 Guidebook.pdf
Social Samosa
 
AI Best Practices for Marketing HUG June 2024
AI Best Practices for Marketing HUG June 2024AI Best Practices for Marketing HUG June 2024
AI Best Practices for Marketing HUG June 2024
Amanda Farrell
 
BEST DIGITAL MARKETING AGENCY IN BANGALORE
BEST DIGITAL MARKETING AGENCY IN BANGALOREBEST DIGITAL MARKETING AGENCY IN BANGALORE
BEST DIGITAL MARKETING AGENCY IN BANGALORE
ManuDv1
 
2025 Adventur ehomes Product Guide released Jun 1 2024
2025 Adventur ehomes Product Guide released Jun 1 20242025 Adventur ehomes Product Guide released Jun 1 2024
2025 Adventur ehomes Product Guide released Jun 1 2024
JimWilliams206540
 
How to Start Affiliate Marketing with ChatGPT- A Step-by-Step Guide (1).pdf
How to Start Affiliate Marketing with ChatGPT- A Step-by-Step Guide (1).pdfHow to Start Affiliate Marketing with ChatGPT- A Step-by-Step Guide (1).pdf
How to Start Affiliate Marketing with ChatGPT- A Step-by-Step Guide (1).pdf
SimpleMoneyMaker
 
Why bridging the gap between PR and SEO is the only way forward for PR Profes...
Why bridging the gap between PR and SEO is the only way forward for PR Profes...Why bridging the gap between PR and SEO is the only way forward for PR Profes...
Why bridging the gap between PR and SEO is the only way forward for PR Profes...
Isa Lavs
 
Content Marketing Blueprint For Content Strategy, content creation, content d...
Content Marketing Blueprint For Content Strategy, content creation, content d...Content Marketing Blueprint For Content Strategy, content creation, content d...
Content Marketing Blueprint For Content Strategy, content creation, content d...
Bertie Birchfield
 
一比一原版英国南安普顿大学毕业证(Soton学位证)如何办理
一比一原版英国南安普顿大学毕业证(Soton学位证)如何办理一比一原版英国南安普顿大学毕业证(Soton学位证)如何办理
一比一原版英国南安普顿大学毕业证(Soton学位证)如何办理
dqvc9xf1
 
The Future of B2B Audience Targeting with LinkedIn
The Future of B2B Audience Targeting with LinkedInThe Future of B2B Audience Targeting with LinkedIn
The Future of B2B Audience Targeting with LinkedIn
Tajul Islam
 
Advanced Storytelling Concepts for Marketers
Advanced Storytelling Concepts for MarketersAdvanced Storytelling Concepts for Marketers
Advanced Storytelling Concepts for Marketers
Ed Shimp
 
3 Best “Add to Calendar” Link Generator Tools (2024)
3 Best  “Add to Calendar”  Link Generator Tools (2024)3 Best  “Add to Calendar”  Link Generator Tools (2024)
3 Best “Add to Calendar” Link Generator Tools (2024)
Y
 
Advertising and Promotion of whisper by Sakthi Sundar
Advertising and Promotion of whisper by Sakthi SundarAdvertising and Promotion of whisper by Sakthi Sundar
Advertising and Promotion of whisper by Sakthi Sundar
sakthisundar2001
 
Facebook Marketing Strategy with SNJ Global Services.pptx
Facebook Marketing Strategy with SNJ Global Services.pptxFacebook Marketing Strategy with SNJ Global Services.pptx
Facebook Marketing Strategy with SNJ Global Services.pptx
sarfrazkhanm47
 
一比一原版美国加利福尼亚大学旧金山分校毕业证如何办理
一比一原版美国加利福尼亚大学旧金山分校毕业证如何办理一比一原版美国加利福尼亚大学旧金山分校毕业证如何办理
一比一原版美国加利福尼亚大学旧金山分校毕业证如何办理
edyaefo
 
DTA Promotion - Marketing Specialist Assignment
DTA Promotion - Marketing Specialist AssignmentDTA Promotion - Marketing Specialist Assignment
DTA Promotion - Marketing Specialist Assignment
Amal Agung Cahyadi
 
Customer relationship management notes for Bcom students
Customer relationship management notes for Bcom studentsCustomer relationship management notes for Bcom students
Customer relationship management notes for Bcom students
PratibhaShelake1
 
一比一原版美国加州大学戴维斯分校毕业证如何办理
一比一原版美国加州大学戴维斯分校毕业证如何办理一比一原版美国加州大学戴维斯分校毕业证如何办理
一比一原版美国加州大学戴维斯分校毕业证如何办理
ceqcau
 
Bad Bunny Merch.pptxKJDSVKBDSVKBDSVKDDSKDKJDBK
Bad Bunny Merch.pptxKJDSVKBDSVKBDSVKDDSKDKJDBKBad Bunny Merch.pptxKJDSVKBDSVKBDSVKDDSKDKJDBK
Bad Bunny Merch.pptxKJDSVKBDSVKBDSVKDDSKDKJDBK
rawlensproductionsco
 
01 Field+Guide+to+Human-Centered+Design_IDEOorg_English GUIA COMPLETA DETALLA...
01 Field+Guide+to+Human-Centered+Design_IDEOorg_English GUIA COMPLETA DETALLA...01 Field+Guide+to+Human-Centered+Design_IDEOorg_English GUIA COMPLETA DETALLA...
01 Field+Guide+to+Human-Centered+Design_IDEOorg_English GUIA COMPLETA DETALLA...
Jorge Calmett
 
HEM Webinar - Navigating the Future - Social Media Trends for 2024 in Educati...
HEM Webinar - Navigating the Future - Social Media Trends for 2024 in Educati...HEM Webinar - Navigating the Future - Social Media Trends for 2024 in Educati...
HEM Webinar - Navigating the Future - Social Media Trends for 2024 in Educati...
Higher Education Marketing
 

Recently uploaded (20)

Social Samosa's #30Under30 Guidebook.pdf
Social Samosa's #30Under30 Guidebook.pdfSocial Samosa's #30Under30 Guidebook.pdf
Social Samosa's #30Under30 Guidebook.pdf
 
AI Best Practices for Marketing HUG June 2024
AI Best Practices for Marketing HUG June 2024AI Best Practices for Marketing HUG June 2024
AI Best Practices for Marketing HUG June 2024
 
BEST DIGITAL MARKETING AGENCY IN BANGALORE
BEST DIGITAL MARKETING AGENCY IN BANGALOREBEST DIGITAL MARKETING AGENCY IN BANGALORE
BEST DIGITAL MARKETING AGENCY IN BANGALORE
 
2025 Adventur ehomes Product Guide released Jun 1 2024
2025 Adventur ehomes Product Guide released Jun 1 20242025 Adventur ehomes Product Guide released Jun 1 2024
2025 Adventur ehomes Product Guide released Jun 1 2024
 
How to Start Affiliate Marketing with ChatGPT- A Step-by-Step Guide (1).pdf
How to Start Affiliate Marketing with ChatGPT- A Step-by-Step Guide (1).pdfHow to Start Affiliate Marketing with ChatGPT- A Step-by-Step Guide (1).pdf
How to Start Affiliate Marketing with ChatGPT- A Step-by-Step Guide (1).pdf
 
Why bridging the gap between PR and SEO is the only way forward for PR Profes...
Why bridging the gap between PR and SEO is the only way forward for PR Profes...Why bridging the gap between PR and SEO is the only way forward for PR Profes...
Why bridging the gap between PR and SEO is the only way forward for PR Profes...
 
Content Marketing Blueprint For Content Strategy, content creation, content d...
Content Marketing Blueprint For Content Strategy, content creation, content d...Content Marketing Blueprint For Content Strategy, content creation, content d...
Content Marketing Blueprint For Content Strategy, content creation, content d...
 
一比一原版英国南安普顿大学毕业证(Soton学位证)如何办理
一比一原版英国南安普顿大学毕业证(Soton学位证)如何办理一比一原版英国南安普顿大学毕业证(Soton学位证)如何办理
一比一原版英国南安普顿大学毕业证(Soton学位证)如何办理
 
The Future of B2B Audience Targeting with LinkedIn
The Future of B2B Audience Targeting with LinkedInThe Future of B2B Audience Targeting with LinkedIn
The Future of B2B Audience Targeting with LinkedIn
 
Advanced Storytelling Concepts for Marketers
Advanced Storytelling Concepts for MarketersAdvanced Storytelling Concepts for Marketers
Advanced Storytelling Concepts for Marketers
 
3 Best “Add to Calendar” Link Generator Tools (2024)
3 Best  “Add to Calendar”  Link Generator Tools (2024)3 Best  “Add to Calendar”  Link Generator Tools (2024)
3 Best “Add to Calendar” Link Generator Tools (2024)
 
Advertising and Promotion of whisper by Sakthi Sundar
Advertising and Promotion of whisper by Sakthi SundarAdvertising and Promotion of whisper by Sakthi Sundar
Advertising and Promotion of whisper by Sakthi Sundar
 
Facebook Marketing Strategy with SNJ Global Services.pptx
Facebook Marketing Strategy with SNJ Global Services.pptxFacebook Marketing Strategy with SNJ Global Services.pptx
Facebook Marketing Strategy with SNJ Global Services.pptx
 
一比一原版美国加利福尼亚大学旧金山分校毕业证如何办理
一比一原版美国加利福尼亚大学旧金山分校毕业证如何办理一比一原版美国加利福尼亚大学旧金山分校毕业证如何办理
一比一原版美国加利福尼亚大学旧金山分校毕业证如何办理
 
DTA Promotion - Marketing Specialist Assignment
DTA Promotion - Marketing Specialist AssignmentDTA Promotion - Marketing Specialist Assignment
DTA Promotion - Marketing Specialist Assignment
 
Customer relationship management notes for Bcom students
Customer relationship management notes for Bcom studentsCustomer relationship management notes for Bcom students
Customer relationship management notes for Bcom students
 
一比一原版美国加州大学戴维斯分校毕业证如何办理
一比一原版美国加州大学戴维斯分校毕业证如何办理一比一原版美国加州大学戴维斯分校毕业证如何办理
一比一原版美国加州大学戴维斯分校毕业证如何办理
 
Bad Bunny Merch.pptxKJDSVKBDSVKBDSVKDDSKDKJDBK
Bad Bunny Merch.pptxKJDSVKBDSVKBDSVKDDSKDKJDBKBad Bunny Merch.pptxKJDSVKBDSVKBDSVKDDSKDKJDBK
Bad Bunny Merch.pptxKJDSVKBDSVKBDSVKDDSKDKJDBK
 
01 Field+Guide+to+Human-Centered+Design_IDEOorg_English GUIA COMPLETA DETALLA...
01 Field+Guide+to+Human-Centered+Design_IDEOorg_English GUIA COMPLETA DETALLA...01 Field+Guide+to+Human-Centered+Design_IDEOorg_English GUIA COMPLETA DETALLA...
01 Field+Guide+to+Human-Centered+Design_IDEOorg_English GUIA COMPLETA DETALLA...
 
HEM Webinar - Navigating the Future - Social Media Trends for 2024 in Educati...
HEM Webinar - Navigating the Future - Social Media Trends for 2024 in Educati...HEM Webinar - Navigating the Future - Social Media Trends for 2024 in Educati...
HEM Webinar - Navigating the Future - Social Media Trends for 2024 in Educati...
 

Technical SEO - Generational cruft in SEO - there is never a new site when theres history - brighton seo concise deck

  • 1. @dawnieando from  @MoveItMarketing Dawn  Anderson  @  dawnieando
  • 3. @dawnieando from  @MoveItMarketing The Great 302s Pass PageRank Debate
  • 4. @dawnieando from  @MoveItMarketing GENERATIONAL CRUFT MULTIPLE  GENERATIONS  OF  A   WEBSITE
  • 5. @dawnieando from  @MoveItMarketing NOT ‘Crufts’ – THE WORLD’S LARGEST DOG SHOW ERIC
  • 6. @dawnieando from  @MoveItMarketing CONTENT CRUFT https://moz.com/blog/c lean-­‐site-­‐cruft-­‐before-­‐it-­‐ causes-­‐ranking-­‐ problems-­‐whiteboard-­‐ friday
  • 7. @dawnieando from  @MoveItMarketing THIS TYPE OF CRUFT IS NOT THE SAME AS CONTENT CRUFT
  • 9. @dawnieando from  @MoveItMarketing ‘URL  CRUFT’  IS  A   THING “characters relevant  or  meaningful   only  to  the  people  who  created  the   site,  such  as  implementation  details   of  the  computer  system  which  serves   the  page.  Examples  of  URL  cruft   include filename  extensions such   as .php or .html,  and  internal   organizational  details  such   as /public/or /Users/john/work/draft s/.[9]”   (Wikipedia  Definition)
  • 10. ALL  THE  RANDOM CRAP PEOPLE  ADD  TO QUERY  STRINGS,   PARAMETERS,  DIRECTORY   FOLDERS  AND  URL   STRUCTURES
  • 11. @dawnieando from  @MoveItMarketing CODE  &  URL   CRUFT  MAKES   CRAWLING   SLUGGISH
  • 12. @dawnieando from  @MoveItMarketing “COOL  URIs  DON’T   CHANGE” Sir  Tim  Berners-­‐Lee (Inventor  of  the  World  Wide  Web) https://www.w3.org/Provider/Style/URI Attrubution:  By  Uldis Bojārs (Flickr.)  [CC  BY-­‐SA  2.0  (http://creativecommons.org/licenses/by-­‐sa/2.0)],  via  Wikimedia   Commons
  • 13. @dawnieando from  @MoveItMarketing A Clean Slate LET’S START WITH A CLEAN SLATE
  • 14. @dawnieando from  @MoveItMarketing Websites (AND URLs) are not disposable
  • 15. @dawnieando from  @MoveItMarketing SEARCH  ENGINES  NEVER  FORGETS Search  engines   have  a  long   memory  and  a  lot   of  storage
  • 16. @dawnieando from  @MoveItMarketing 404  NOT   FOUND &  410   GONE § “Of  course,  we   won’t  redirect   everything…” § “Not  everything   will  be  worth   redirecting”
  • 17. @dawnieando from  @MoveItMarketing 410 Gone § “Some,  we’ll  just  kill   off  with  a  410…” § “Then  the  URLs  will   be  gone”
  • 19. @dawnieando from  @MoveItMarketing 302  ==  Default 301  ==  Intentional 404  ==  Default 410  ==  Intentional “The  410  response  is  primarily  intended  to  assist  the  task  of  web  maintenance  by   notifying  the  recipient  that  the  resource  is  intentionally  unavailable  and  that  the  server   owners  desire  that  remote  links  to  that  resource  be  removed.”  (RFC  7231) https://tools.ietf.org/html/rfc7231#section-­‐6.5.9 ARE YOU SURE? MAYBE YES
  • 20. @dawnieando from  @MoveItMarketing https://www.youtube.com/watch?v=xp5Nf8ANfOw THE  DIFFERENCE  BETWEEN  HOW  GOOGLE  TREATS  404  VERSUS  410s
  • 21. @dawnieando from  @MoveItMarketing DO NOT THINK 410s WON’T BE RECRAWLED AGAIN Source:  https://www.docsplace.org/4578/09/410-­‐gone-­‐stops-­‐crawling-­‐dead-­‐urls/
  • 22. @dawnieando from  @MoveItMarketing “We  knew  there  was  content   there  at  some  point  so  we   just  swing  by  every  now  and   then  to  see  if  anything  came   back”  (John  Mueller,  2016) In Reality… Gone Is Never Gone
  • 23. @dawnieando from  @MoveItMarketing ZOMBIES ARE  NEVER GONE NO  URLS  ARE   EVER  GONE     ONLY  THE  RESOURCE  THERE   IS  GONE https://www.seroundtable.com/google-­‐410-­‐indexing-­‐22584.html 5  YEARS  LATER
  • 24. @dawnieando from  @MoveItMarketing HOW ABOUT 14 YEARS LATER? https://www.webmasterworld.com/google/4864613.htm 2  HOURS  ALIVE…   14  YEARS  LATER
  • 25. @dawnieando from  @MoveItMarketing YOU END UP WITH A CONGA LINE OF LEGACY URLS, SUBDOMAINS & VARIOUS SITE PROTOCOLS
  • 26. @dawnieando from  @MoveItMarketing “Forever, And ever, And ever, And ever… You’ll be a URL”
  • 27. @dawnieando from  @MoveItMarketing GOOGLEBOT GETS WHERE WATER COULDN’T https://petermeadit.com/blog /block-­‐web-­‐crawlers/
  • 28. @dawnieando from  @MoveItMarketing EVEN YOUR STAGING & DEV SITES Found  with  a  very  simple  wildcard  *  site:  query
  • 29. @dawnieando from  @MoveItMarketing THE CHALLENGE IS NOT IN INDEXING… BUT IN KEEPING EVERYTHING INDEXED UP TO DATE
  • 30. @dawnieando from  @MoveItMarketing INCREMENTAL CRAWLING NEVER ENDS “Crawling  method   based  on  crawl   frequency  based  on   URL  historical   change  &   importance   rate” Crawling Which Never Ends Ongoing
  • 31. @dawnieando from  @MoveItMarketing The Crawling ‘Frontier’ (THE URL QUEUE) ‘TO  BE  EXPLORED’ (OR  REVISTED)
  • 32. @dawnieando from  @MoveItMarketing URLs Take Their Place in The Frontier Queue (New & Revisit) The  Queue  Gets  Long  &   Congested
  • 34. @dawnieando from  @MoveItMarketing PAST DATA ON CHANGE IS A GREAT PREDICTOR OF FUTURE DATA PREDICTION  BASED   PRIORITY   SCHEDULING …  WHEN   THERE  IS   CONSISTENCY “past  changes  to  a  page  are  a  good  predictor  of  future  changes.  This  result   has  practical  implications  for  incremental  web  crawlers  that  seek  to   maximize  the  freshness  of  a  web  page  collection  or  index.”  (
  • 35. @dawnieando from  @MoveItMarketing BASED  ON  ROLLING   AVERAGES OF  PAST CRAWL  VISITS
  • 37. @dawnieando from  @MoveItMarketing A NEW URL HAS NO BUT YOUR OLD ONES HAVE LOTS
  • 38. @dawnieando from  @MoveItMarketing Stored in Search Engine History Logs
  • 39. @dawnieando from  @MoveItMarketing TO  BUILD   PROBABILITY  &   PREDICTABILITY   MODELS
  • 40. @dawnieando from  @MoveItMarketing History Log Records Include: • URL  fingerprint • Timestamp  (last  crawl  or  download   attempt) • Crawl  status  (success  or  error)   (Response  code) • Content  checksum  (binary  code) • Source  ID  (accessed  from  cache  or   downloaded) • Segment  identifier  (Crawl   segment  assigned  to??) • Page  importance  (a  measure  of   importance  assigned  to  the  URL)
  • 41. @dawnieando from  @MoveItMarketing ”The  URL  page  importance  score  can  be  retrieved  from  the  …  URL  history  log …or  it  can   be  obtained  by  obtaining  the  historical  page  importance  score  for  the  URL  for  a   predefined  number  of  prior  crawls  and  then  performing  a  predefined  filtering  function   on  those  values  to  obtain  the  URL  page  importance  score.” Scheduler  for  Search  Engine  Crawler https://www.google.com/patents/US8042112 DOC  ID CRAWL  1   IMPORTANCE   RECORD CRAWL  2   IMPORTANCE   RECORD CRAWL 3   IMPORTANCE   RECORD CRAWL  4   IMPORTANCE   RECORD CRAWL  5   IMPORTANCE   RECORD CRAWL  6 IMPORTANCE   RECORD DOC  ID  1 1 0.8 0.6 0.4 0.2 0 DOC  ID  2 0 0.2 0.4 0.6 0.8 1
  • 42. @dawnieando from  @MoveItMarketing URL_SEEN TEST YOU CAN’T JUST KEEP TRYING TO JUMP THE INDEXING QUEUE EITHER PUSH  INDEXING PULLINDEXING E.G.  FETCH  AS  GOOGLEBOT  &   SUBMIT  TO  INDEX VISITS  BY  NATURAL  CRAWLING   &  DISCOVERY  OF  URLS  /  URL   VISIT  SCHEDULING  /  REVISITS
  • 43. @dawnieando from  @MoveItMarketing ‘Sampling’ in Crawling for Efficiency ‘SMALL  TEST  VISITS  TO  A  SITE  TO   UNDERSTAND  WHETHER  IT  IS  WORTH   CRAWLING  &  UNDERSTAND    URL   PATTERNS  &  RESOURCES  THERE’
  • 44. @dawnieando from  @MoveItMarketing Popular CMS ’Rule Patterns’ (URL Parameters) ALL  WILL  HAVE  COMMON   CANONICALIZATION  PATTERNS  WHICH   CAN  BE  LEARNED
  • 45. @dawnieando from  @MoveItMarketing DUSTBUSTER & DUST CRAWLING RULES DO  NOT   CRAWL  IN   THE  DUST BUILDS   ‘HINTS’  ON   WHAT  NOT   TO  CRAWL EVERY  SITE  WILL   HAVE  ITS  OWN   CRAWLING   RULES
  • 46. @dawnieando from  @MoveItMarketing Aged ‘Patchwork Quilt’ Sites A  LITTLE  BIT  OF  THIS  CMS  AND  A   LITTLE  BIT  OF  THAT  CMS MANY  HISTORICAL  PARAMETERS   CREATED  &  CRAWLING  SAMPLE   PATTERNS
  • 47. @dawnieando from  @MoveItMarketing Every Version of Your Past Ecommerce Sites “Exponentially   multiplicative   URLs” Had  potential  to  spew…  at  some  point… DIFFERENT  PARAMETERS  &  URL   PATTERNS  WHICH  ARE  LEARNED  BY   CRAWLERS…  AND  REMEMBERED…   FOREVER
  • 48. @dawnieando from  @MoveItMarketing ‘Transitive’?? Transitive  -­‐ A  ==  B  +  B  ==  C  then  A  ==  C For  some  types  of  content  more  than   others  – e.g.  ecommerce/directories  but   not  news SAMPLING
  • 49. @dawnieando from  @MoveItMarketing EFFICIENCY IS  NOT  JUST  ABOUT  URL   SCHEDULING.   IT  IS  ABOUT  NEAR  MEMORY   STORAGE  (e.g.  CACHING)  TOO
  • 50. @dawnieando from  @MoveItMarketing REUSING PRE-­‐EMPTING  (PARTICULARLY   POPULAR  DOCUMENTS  /  QUERIES  )   &  REUSING  WHAT  WAS  ALREADY  IN   NEARBY  (MEMORY  V  DISC)   STORAGE
  • 51. @dawnieando from  @MoveItMarketing REUSE LOW  IMPORTANCE  and /  or   DOESN’T  CHANGE OFTEN REUSE IF  NOT  MODIFIED  SINCE LIKELY  TO  CHANGE  BY  X  DATE   (SINCE DATE) DOWNLOAD CHANGES  FREQUENTLY WITH   IMPORTANT  CHANGE  OR  IS  AN   IMPORTANT  DOCUMENT REUSE  IF  NOT  MODIFIED  SINCE https://www.google.com/patents/US8042112
  • 52. @dawnieando from  @MoveItMarketing CRAWL  SAMPLES  ALSO   HELP  WITH   MODELLING  TO  MAP   DOCS  TO  TOPIC   RELEVANCE
  • 53. @dawnieando from  @MoveItMarketing YOU BROKE YOUR SILO STRUCTURE Image  credit:  https://www.slideshare.net/patrickstox/nlp-­‐sitemap-­‐smx-­‐2016-­‐ patrick-­‐stox-­‐latest-­‐in-­‐advanced-­‐technical-­‐seo SEMANTIC   LOSS
  • 54. @dawnieando from  @MoveItMarketing ‘CONCEPT DRIFT’ IS A THING fuzzy difficult  to  perceive;;  indistinct  or  vague. synonyms: blurry, blurred, indistinct; unclear, bleary, misty, distorted, out  of   focus, unfocused, lacking  definition, low  resolution, nebulous; Ill-­‐ defined, indefinite, vague, hazy, imprecise, inexact, loose, woolly "a  fuzzy  picture" https://en.wikipedia.org/wiki/Concept_drift AI ALERT
  • 55. @dawnieando from  @MoveItMarketing BOOLEAN LOGIC – EXTREME CASES OF TRUTH (TRUE (1) OR FALSE (0))
  • 56. @dawnieando from  @MoveItMarketing ‘FUZZY LOGIC’ – DEGREES OF TRUTH SEMANTIC   LOSS
  • 57. @dawnieando from  @MoveItMarketing BIG TOPICAL URL FISH IN A SMALL TOPICAL POND
  • 58. @dawnieando from  @MoveItMarketing SMALL TOPICAL URL FISH IN A BIG TOPICAL POND SEMANTIC   LOSS
  • 59. @dawnieando from  @MoveItMarketing ’Fuzzy’ URL Targets with Each Site Generation EVERYTHING  GETS   A  BIT  BLURRED ‘Which  is  the  target  URL   again?
  • 60. @dawnieando from  @MoveItMarketing GENERATIONAL   CRUFT  CAN   SNOWBALL • Past  infinite  loops • Dodgy  URL  parameters • Misconfigured  URL  parameters • Old  URL  crawling  ‘rules  /  hints’ • Old  ‘importance  /  quality’   scores • Filtered  dupes  &  near-­‐dupes • Mixed  messaging  canonicals • 410s  still  being  revisited • Internal  links  to  old  sites  /   protocols
  • 61. @dawnieando from  @MoveItMarketing WRONG URL RANKING ’SWAPPING OUT’ (Especially   multiple   child  nodes) SHARP  &   VOLATILE RANKING   FLUX SOME  SYMPTOMS
  • 62. @dawnieando from  @MoveItMarketing A  LOT  OF  WRONG  TARGETS   RANKING  POST  MIGRATION SOME  SYMPTOMS
  • 63. @dawnieando from  @MoveItMarketing MIXED CONTENT & MULTIPLE SITE VERSIONS http://www.itv.com/news/
  • 64. @dawnieando from  @MoveItMarketing MIXED CONTENT & MULTIPLE SITE VERSIONS http://www.itv.com/news/ BOTH  HTTP  &   HTTPS  FIGHTING   EACH  OTHER
  • 65. @dawnieando from  @MoveItMarketing PEOPLE CHURN INTERNAL  TEAM   CHURN EXTERNAL  AGENCY   CHURN
  • 66. @dawnieando from  @MoveItMarketing FIND SITES ON THE SAME SERVER
  • 67. @dawnieando from  @MoveItMarketing DIAGNOSE: Validate & Retain in GSC ALL Past Domains & Past Site Versions (Protocols (HTTPS / HTTP) THERE  MAY  STILL  BE  UNDETECTED  ACTIVITY  GOING  ON  THERE
  • 68. @dawnieando from  @MoveItMarketing URL Parameter Handling is Your Friend Help  Google  Build  ‘Crawling   Rules’  for  your  site  rather   than  wasting  time  on   ‘sampling’  and  giving  a  bad   impression GIVE  HELP  AND   GUIDANCE  WITH  THE   CRAWL  RULE  AND   HINT  BUILDING
  • 69. @dawnieando from  @MoveItMarketing Help  Google  Build   ‘Crawling  Rules’  for   your  site  rather  than   wasting  time  on   ‘sampling’  and  giving   a  bad  impression BE  VERY   CAREFUL
  • 70. @dawnieando from  @MoveItMarketing PEOPLE CANONICALIZE WRONG ON  MULTIPLE  GENERATIONS  OF  SITES
  • 71. @dawnieando from  @MoveItMarketing 47%  of  TECHNICAL   SEOs  thought: “REL=NEXT  /  REL  =   PREV”  IS  A  FORM  OF   CANONICALIZATION
  • 72. @dawnieando from  @MoveItMarketing Lots  OF  SEOS  were   unaware  that: “301s  and  302s  are   BOTH  forms  of   canonicalization”
  • 73. @dawnieando from  @MoveItMarketing Only  64%  of  ’Technical SEOs’  realised Href Lang  is  a  form  of Canonicalization (Internationalization)
  • 75. @dawnieando from  @MoveItMarketing REVIEW & UNDERSTAND - THE CANONICAL LINK RELATION § 30X  redirects § Canonical  tag § Href lang § HTTPS  protocol § Global  canonicalization   rules § URL  normalization In  ’ALL’  its  forms
  • 76. @dawnieando from  @MoveItMarketing PEOPLE APPEND (ADD TO FILES) - SOMETIMES IT’S FEAR OF DEPENDENCIES
  • 77. @dawnieando from  @MoveItMarketing YOU  NEED   TO  KNOW   WHAT’S  ON   THAT   SERVER DIAGNOSE: HEAD BACK TO THE SERVER
  • 78. @dawnieando from  @MoveItMarketing DIAGNOSE: SERVER LOG FILE ANALYSIS BUT  WATCH  OUT  FOR   OTHER  TOOLS  EMULATING   GOOGLEBOT  AND  FILTER   THEM  OUT ANALYSE  THE  LOGS  FOR   ‘ALL’  YOUR  SITES  AND  ‘ALL’   PROTOCOLS  TO  SEE  THE   PATTERNS  EMERGE
  • 79. @dawnieando from  @MoveItMarketing When analysing logs you’re often viewing URLs from a ‘A LONNNNGGGG Time Ago’ LOOKING   AT LEGACY
  • 80. @dawnieando from  @MoveItMarketing REVISIT ALLPAST .HTACCESS FILES Can  you  rewrite  the  rules  to  be   more  efficient  with  regex  or  cut  out   some  old  rules  still  firing   unnecessarily?  (CREATE  SHORTCUTS) REMEMBER  .HTACCESS  RULES  RUN  IN  ORDER  OF   THEIR  APPEARANCE  IN  THE  FILE.     CAN  YOU  USE  WILDCARDS  TO  OPTIMIZE  OR  SKIP   STEPS? .HTACCESS   SITE  1 .HTACCESS   SITE  2 .HTACCESS   SITE  3
  • 82. @dawnieando from  @MoveItMarketing Help Googlebot Get Round its Shopping List OPEN  MORE  CHECKOUTS WIDEN  THE  AISLES MAKE  THINGS  EASY  TO  FIND DON’T  CONFUSE   GOOGLEBOT HELP  FILL  THE  TROLLEY   QUICKLY SPEED,  SPEED,  SPEED
  • 83. @dawnieando from  @MoveItMarketing XML Sitemaps Are Your Friend… (Strong Foundations) They  help  to   pass   ‘importance’   signals  to  URLs But…  never   leave  them  to   just   autogenerate without   periodically   checking ‘The   foundations’   underneath  a   site
  • 84. @dawnieando from  @MoveItMarketing EXTERNALLY HOSTED XML SITEMAPS • Take  back  control • Jump  the  dev  queue • Allows  for  custom  configuration  of  optimal   canonical  click  paths • Allows  for  consistent  signals  of  importance  to   included  URLs • Forget  about  setting  priority • Forget  about  last  modified • Even  a  simple  list  of  URLs  FTW  will  do • Keep  them  organised for  granular  analysis  of   problem  site  sections
  • 85. @dawnieando from  @MoveItMarketing INSTEAD  OF   REMOVE…   CONSIDER…   DISTRACT  &   ITERATIVELY IMPROVE STRATEGIC  USE  OF  INTERNAL  LINK   POPULARITY REDUCE  IMPORTANCE  SIGNALS   TO  DIFFERENT  PAGES INCLUDE  IMPORTANT  PAGES  IN   XML  SITEMAPS INCLUDE  IMPORTANT  PAGES  IN   HTML  SITEMAPS
  • 86. @dawnieando from  @MoveItMarketing BUILD WELL CATEGORIZED AND CONCEPTUALLY STRUCTURED SITEMAPS https://www.slideshare.net/p atrickstox/nlp-­‐sitemap-­‐smx-­‐ 2016-­‐patrick-­‐stox-­‐latest-­‐in-­‐ advanced-­‐technical-­‐seo
  • 87. @dawnieando from  @MoveItMarketing SOLUTION: Increase ‘Importance’ quickly of target URLs • Internal  link  optimization • Canonicalise to  (if  relevant) • Strengthen  up  importance  signals • Inclusion  in  front  facing  HTML  and  XML   sitemaps • Improve  the  content  &  keep  it  updated • 301  redirect  to  (if  relevant  redundant   content) • Topical  hubs  and  strong  information   views  to  navigate  users  &  add  relevance
  • 88. @dawnieando from  @MoveItMarketing SOLUTION: Reduce ‘Importance’ quickly of old URLs • Internal  link  UNOPTIMIZATION • 410 • Dig  out  URLs  with  links  to  them • Orphan  URLs • Canonicals  to  HTTPs • EXCLUSION  from  XML  sitemaps   (even  old  ones  on  the  server) • Archiving  of  content
  • 89. @dawnieando from  @MoveItMarketing CONTENT CRUFT https://moz.com/blog/c lean-­‐site-­‐cruft-­‐before-­‐it-­‐ causes-­‐ranking-­‐ problems-­‐whiteboard-­‐ friday
  • 90. @dawnieando from  @MoveItMarketing IT’S  VERY   IMPORTANT…   YOU  STAY  OUT   OF  SERVER   ERROR  STATUS 500 ‘Try  again’  intervals  likely  extended   between  each  failed  connection   attempt
  • 91. @dawnieando from  @MoveItMarketing Consistency is REMEMBER  ’ROLLING   AVERAGES’
  • 93. @dawnieando from  @MoveItMarketing 410 Likely Get Deindexed Quicker https://plus.google.com/+JohnMueller/ posts/NEsqE7Sr4Z4 “Usually  seeing  it  (410)  1-­‐2   times  is  enough  for  us  to   drop  those  URLs  from  the   index”    John  M  on  Google+ (https://plus.google.com/u/0 /+JohnMueller/posts/NEsq E7Sr4Z4)
  • 94. @dawnieando from  @MoveItMarketing LEGACY ISSUES VIA CANONICALS OR REDIRECTION (COMMON MISTAKES) • PAGE  CANONICALIZED  TO  IS  NOT  A  SUPERSET  OR   DUPLICATIVE  (IT  IS  NOT  RELEVANT  ENOUGH) • 301s  TO  IRRELEVANT  PAGES  BECOME  SOFT  404 • FOLDING  UP  PRODUCT  PAGES  TO  CATEGORES  (PEOPLE   WERE  LOOKING  FOR  A  SPECIFIC  PRODUCT) • CANONICALIZATION  TO  PAGES  WHEN  IN  THE  FUTURE   301  REDIRECT  TO  ANOTHER  URL  THEREFORE  NEGATING   THE  PAGES  CANONICALIZING  TO  THEM • CONFLICTS  BETWEEN  HREF  LANG  AND   CANONICALIZATION
  • 95. @dawnieando from  @MoveItMarketing MORE CAUSES SEARCH ENGINES ARE CRAWLING MORE CODE THAN YOU MIGHT HAVE INTENDED IN THE FIRST PLACE JAVASCRIPT ERRORS FROM LEGACY CODE & LIBRARIES LEGACY 302s FROM REDIRECTED LEGACY DOMAINS WHICH CONFUSE INTERMEDIATE SIGNALS BETWEEN 301S (WHICH ARE INTENDED DEFINITE REDIRECTIONS) ABANDONED URLS AJAX URLS (NOT THE SAME AS THE NAMED ANCHOR) – DEPRECATION OF AJAX CRAWLING (ASYNCHRONOUS JAVASCRIPT & XML)
  • 96. @dawnieando from  @MoveItMarketing “If  “change”  means  “any  change”,  then  about  40%  of  all  web  pages  change  weekly   [12].  Even  if  we  consider  only  pages  that  change  by  a  third  or  more,  about  7%  of  all   web  pages  change  weekly  [17].”  (Broder,  A.Z.,  Najork,  M.  and  Wiener,  J.L.,  2003) EVEN  AS  FAR  BACK  IN  2003 40% of ALL web pages changed weekly ___________________ 7%  of  web  pages  changed  a  1/3  of  their   page  content  or  more  weekly
  • 97. @dawnieando from  @MoveItMarketing HOW  MUCH  BIGGER  &  DYNAMIC  IS  THE  WEB   NOW  IN  2017? http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
  • 98. @dawnieando from  @MoveItMarketing FUZZY  LOGIC• Rule   based   logic • Been   around   for  20+   years • Is  within   a  subset   of  AI
  • 99. @dawnieando from  @MoveItMarketing THESE  THINGS  ADD  UP THEY  ALSO  STILL  NEED  TO  BE  DISCOVERED   WHICH  REQUIRES  INITIAL  CRAWLING https://twitter.com/dawnieando/status/906465965029969920
  • 100. @dawnieando from  @MoveItMarketing “404  vs  410  doesn't  affect  the  recrawl rate:  we'll  still  occasionally  check  to   see  if  these  pages  are  still  gone,   especially  when  we  spot  a  new  link  to   them” John  Mueller,  Google+ 2015 https://plus.google.com/u/0/+JohnMu eller/posts/NEsqE7Sr4Z4 ESPECIALLY IF THERE ARE LINKS TO IT
  • 101. @dawnieando from  @MoveItMarketing Pass Strong Clues - Highly Relevant New Conceptual Structures STRONG SEMANTICS  &   CONCEPTUALLY   CO-­‐OCCURRING   TERMS
  • 102. @dawnieando from  @MoveItMarketing THINK CAREFULLY ABOUT URL CREATION Not  EVERYTHING  is   worthy  of  its  own  URL VARIANTS STEMMINGS PLURALS RANDOM  TAGS LONG,  LONG,  LONG   TAIL  PARAMETERS
  • 103. @dawnieando from  @MoveItMarketing ONLY  DOWNLOAD  IF   THERE  IS  SUBSTANTIVE   CHANGE TAKE  SOME  CONTROL  WITH  304  &  EXPIRES  AFTER  HEADERS   ON  LESS  IMPORTANT  PAGES https://developers.google.com/web/fundamentals/pe rformance/optimizing-­‐content-­‐efficiency/http-­‐caching VALID   REPRESENTATION THE  URL  WILL  STILL  BE  VISITED   BUT  0  (ZERO)  WILL  BE   DOWNLOADED  SO  IT  IS  STRAIGHT   ON  TO  THE  NEXT  URL  VERY   QUICKLY https://webmasters.googleblog.com/2006/09/better-­‐ details-­‐about-­‐when-­‐googlebot.html https://tools.ietf.org/html/rfc7232#section-­‐4.1
  • 104. @dawnieando from  @MoveItMarketing A  URI  is  like  a  fine   wine Maturing  over   time “COOL  URIs   DON’T   CHANGE” Sir  Tim  Berners-­‐Lee (Inventor  of  the  World  Wide  Web) https://www.w3.org/Provider/Style/URI
  • 105. @dawnieando from  @MoveItMarketing A  LONG,  LONG  TIME  AGO • You  need  to  go  right  back  to  the  beginning • What  domains  did  the  organisation EVER  register? • Where  do  they  redirect  to? • Is  it  via  301,  302  or  are  they  merely  parked  domains? • Who  would  know?    Who  is  responsible? • Verify  them  all  in  Google  Search  Console • Some  of  these  may  EVEN  HAVE  PENALTIES  HISTORICALLY • If  there  are  links  to  any  there  is  likely  still  crawling  activity  there • Analyse logs  across  multiple  subdomains  &  protocols
  • 106. @dawnieando from  @MoveItMarketing QUESTIONS TO ASK HOW MANY MICRO-SITES HAVE YOU HAD? HOW MANY SUBDOMAINS? HOW MANY OTHER DOMAINS? WHO IS RESPONSIBLE FOR DOMAIN REG WHO KNOWS WITHIN THE ORGANISATION? WHO REGISTERED THE DOMAINS? WHO CAN UPDATE DNS RECORDS? ARE THESE SITES STILL ON SERVERS? HAVE ANY OF THESE SITES HAD MANUALACTIONS? HOW ARE THESE SITES REDIRECTED? ARE THEY PARKED DOMAINS?
  • 107. @dawnieando from  @MoveItMarketing DATA FROM HISTORY LOGS CONTRIBUTE TO WHEN TO REVISIT URIs ON THE WEB
  • 108. @dawnieando from  @MoveItMarketing SOLUTION: REVISITING BLOATED APPENDED .HTACCESS FILES ON ALL LEGACY SITES (IF NOT REDIRECTING AT A DNS LEVEL) NOT  JUST  THE  .HTACCESS  FILE  ON  THE  EXISTING   SITE  EITHER. GOOGLEBOT  MAY  HIT  .HTACCESS  ON  PAST  SITES   SO  THEY  MAY  ALSO  NEED  OPTIMIZING .HTACCESS  RUN  IN  ORDER  SO  PROVIDE   OPPORTUNITY  FOR  SHORT  CUTS  
  • 109. @dawnieando from  @MoveItMarketing SOME TYPES OF URL CRUFT • INCORRECTLY  APPLIED  CANONICAL   TAGS   • CONFLICTING  HREF  LANG  &   CANONICAL  TAGS • MIXED  CONTENT • URL  SHORTENERS • SESSION  IDS • UTM  TAGGING • OLD  AJAX  FRAGMENTS • PARAMETERS  FROM  MULTI  FACET   DROP  DOWN  CHOICES • .html,  .php,  .index.html,  .aspx • LEGACY  URL  REWRITING  &   PARAMETERS  IN  .HTACCESS  FILES • LEGACY  FOLDERS  WHICH  CONTRIBUTE   NO  MEANING  TO  SITE  ONTOLOGY UNCRUFTY www.myeasyurlwillmakeyouw onder.com/resume CRUFTY www.myeasyurlwillmakeyouw onder.com/resume.html CRUFTY http://nymag.com/scienceofus/2015/07/how-­‐ to-­‐recover-­‐from-­‐an-­‐all-­‐ nighter.html?om_rid=AAENcg&om_mid=_BTtF a0B869PyJp&utm_content=buffer8fdd1&utm_ medium=social&utm_source=twitter.com&ut m_campaign=buffer
  • 110. @dawnieando from  @MoveItMarketing INDEX TIERING Presented  by  B  Cambazoglu at  European  Summer  School  Information  Retrieval  2017  – (Cambazoglu,  B.B.  and  Baeza-­‐Yates,  R.,  2011.   Scalability  challenges  in  web  search  engines.  In Advanced  topics  in  information  retrieval (pp.  27-­‐50).  Springer  Berlin  Heidelberg.)
  • 111. @dawnieando from  @MoveItMarketing FIND SITES ON THE SAME SERVER
  • 112. @dawnieando from  @MoveItMarketing TWO-PHASE RANKING IN A SEARCH NODE Presented  by  B  Cambazoglu at  European  Summer  School  Information  Retrieval  2017  – (Cambazoglu,  B.B.  and  Baeza-­‐Yates,  R.,   2011.  Scalability  challenges  in  web  search  engines.  In Advanced  topics  in  information  retrieval (pp.  27-­‐50).  Springer  Berlin   Heidelberg.)
  • 113. @dawnieando from  @MoveItMarketing FUZZY LOGIC – DEGREES OF TRUTH 0.8  Doc  ID  likely  to   be  a  correct  URI  to   choose  from  term  /   query  cluster
  • 114. @dawnieando from  @MoveItMarketing EVERY  SINGLE  TIME  YOU  MIGRATE,  CHANGE  DESIGN,  REDIRECT,  REINVENT  A  SITE  /  URL A  CLEAN  START REDIRECTIONS ANOTHER  STRUCTURE FIRST  SITE   STRUCTURE NEW  CRAWLING  ‘RULES’   BUILT CRAWLING   ‘RULES’  BUILT EVERYTHING   IS  ‘200  OK’ MORE  URLs MIXED  RESPONSE  CODES REDIRECTIONS ‘FUZZINESS’  IS  EMERGING NEW  CRAWLING  ‘RULES’  BUILT MORE  URLs REDIRECT  CHAINS  &  MIXED   RESPONSE  CODES NEW  SEO’s  DON’T   KNOW  THE  ‘HISTORY’ TARGET  URLs  NOW  ‘VERY  FUZZY’
  • 115. @dawnieando from  @MoveItMarketing BUT WHEN DATA IS INCONSISTENT FUZZY LOGIC MAY FAIL ‘DEGREES  OF  TRUTH’ MAY  BECOME  MORE   BLURRED  /  VAGUE
  • 117. @dawnieando from  @MoveItMarketing TERM-FREQUENCY INVERSE DOCUMENT FREQUENCY Cruft  can  also  skew  term-­‐frequency   inverse  document  frequency AND  THE  QUERY  CLUSTERS  DOCUMENTS  BELONG  TO
  • 118. @dawnieando from  @MoveItMarketing The Generational ’Snail Trail’ • Old  XML  sitemaps • Redirects  drop  away  on  old  site   .htaccess • DNS  issues • People  link  to  old  site  but  wrong   protocol • Old  sites  not  verified  in  GSC • Not  all  protocols  redirecting Leaving  it’s   slithery     footprint
  • 119. @dawnieando from  @MoveItMarketing URL NORMALIZATION Can be problematic and ‘crufty’ too https://en.wikipedia.org/wiki/URL_normalization
  • 120. @dawnieando from  @MoveItMarketing REDUCTION & REPOPULATION OF INTERNAL LINK POPULARITY (IBP) BETWEEN URL SCHEDULING IT’S  NOT  ONLY  THEIR  ‘INTERNAL  PAGE   RANK’  BUT  ALSO  THE  ANCHORS,  INTER-­‐ CONNECTING  CONCEPTUAL  /  TOPIC   RELEVANCE  IN  CONTENT  AND  THE  TEXT   SURROUNDING  INTERNAL  LINK  ANCHORS   (AND  PROBABLY  OTHER  THINGS  TOO) SEMANTIC  ’CLUES’  WERE  LOST  ALONG   THE  WAY SEMANTIC ‘CONTEXT’ & IBP BUCKET IS LEAKING
  • 121. @dawnieando from  @MoveItMarketing SOLUTION: Wiki Page Redirects on Topics https://dbpedia.org/sparql Wikipedia   Redirects thesaurus.com OR  A  GOOD  OLD  FASHIONED  THESAURUS
  • 122. @dawnieando from  @MoveItMarketing Understand How URLs with Multiple Parameters Are Handled The  most  restrictive  parameter  blocked  overrules   lesser  restrictions
  • 123. @dawnieando from  @MoveItMarketing THE  USE  OF  REUSE  TABLESTABLE  I Reuse  Table  Example URL URL Record  No. Fingerprint  (FP) Reuse  Type If  Modified  Since  .  .  . 1 2123242 REUSE 2 2323232 REUSE  IF  NOT Feb.  5,  2004 MODIFIED  SINCE 3 3343433 DOWNLOAD . . . . . . . . . . . . https://www.google.com/patents/US8042112
  • 124. @dawnieando from  @MoveItMarketing REMEMBER ”Gone  is  Never  Gone” “Search  Engines  Never   Forget”Dawn  Anderson  @  dawnieando
  • 126. @dawnieando from  @MoveItMarketing Sources & References Bar-­‐Yossef,  Z.,  Keidar,  I.  and  Schonfeld,  U.,  2009.  Do  not  crawl  in  the  dust:   different  urls with  similar  text. ACM  Transactions  on  the  Web  (TWEB), 3(1),  p.3 Broder,  A.Z.,  Najork,  M.  and  Wiener,  J.L.,  2003,  May.  Efficient  URL  caching  for   world  wide  web  crawling.  In Proceedings  of  the  12th  international  conference   on  World  Wide  Web (pp.  679-­‐689).  ACM Cambazoglu,  B.B.  and  Baeza-­‐Yates,  R.,  2011.  Scalability  challenges  in  web  search   engines.  In Advanced  topics  in  information  retrieval (pp.  27-­‐50).  Springer  Berlin   Heidelberg. Cho,  J.,  Garcia-­‐Molina,  H.  and  Page,  L.,  1998.  Efficient  crawling  through  URL   ordering. Computer  Networks  and  ISDN  Systems, 30(1),  pp.161-­‐172 Fetterly,  D.,  Manasse,  M.,  Najork,  M.  and  Wiener,  J.,  2003,  May.  A  large-­‐scale   study  of  the  evolution  of  web  pages.  In Proceedings  of  the  12th  international   conference  on  World  Wide  Web (pp.  669-­‐678).  ACM
  • 127. @dawnieando from  @MoveItMarketing Sources & References • Olston,  C.  and  Najork,  M.,  2010.  Web  crawling. Foundations  and  Trends®  in   Information  Retrieval, 4(3),  pp.175-­‐246. • Pandey,  S.  and  Olston,  C.,  2008,  February.  Crawl  ordering  by  search  impact.   In Proceedings  of  the  2008  International  Conference  on  Web  Search  and  Data   Mining (pp.  3-­‐14).  ACM. • Olston,  C.  and  Pandey,  S.,  2008,  April.  Recrawl scheduling  based  on  information   longevity.  In Proceedings  of  the  17th  international  conference  on  World  Wide   Web (pp.  437-­‐446).  ACM • Pandey,  S.  and  Olston,  C.,  2005,  May.  User-­‐centric  web  crawling.  In Proceedings  of   the  14th  international  conference  on  World  Wide  Web (pp.  401-­‐411).  ACM. • Pandey,  S.  and  Olston,  C.,  2008,  February.  Crawl  ordering  by  search  impact.   In Proceedings  of  the  2008  International  Conference  on  Web  Search  and  Data   Mining (pp.  3-­‐14).  ACM
  • 128. @dawnieando from  @MoveItMarketing Sources & References • https://patentimages.storage.googleapis.com/US8042112B1/US08042112-­‐ 20111018-­‐D00000.png • Randall,  K.H.,  Google  Inc.,  2010. Scheduler  for  search  engine  crawler.  U.S.  Patent   7,725,452.