SlideShare a Scribd company logo
SEO	
  ‘Crawl	
  Tank’	
  -­‐ ‘Death	
  and	
  Resurrection’
WHY	
  YOU	
  SHOULD	
  CARE	
  ABOUT	
  TAKING	
  
CARE	
  OF	
  CRAWLS	
  (INTELLIGENT	
  USE	
  OF	
  
CRAWL	
  ALLOCATION	
  (BUDGET))
THE	
  QUEST	
  
FOR	
  ‘CRAWL	
  
RANK’ Dawn	
  Anderson	
  @	
  dawnieando
Indexed	
  Web	
  contains at	
  least	
  4.73	
  billion	
   pages (13/11/2015)
1
THE  WEB  IS  ‘BIG’
Total	
  number	
  of	
  websites
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
1,000,000,000
750,000,000
500,000,000
250,000,000
SINCE	
  2013	
  THE	
  WEB	
  IS	
  
THOUGHT	
  TO	
  HAVE	
  
INCREASED	
  IN	
  SIZE	
  BY	
  1/3
2THE  ABILITY  TO  ‘SELF  PUBLISH’  EASILY  HAS  CLEARLY  
INFLUENCED  THIS  – WE  ALL
‘LOVE  CONTENT’
IMPORTANT	
  TO	
  NOTE	
  
THAT	
  75%	
  OF	
  
WEBSITES	
  ONLINE	
  
ARE	
  DORMANT	
  (E.G.	
  
PARKED	
  DOMAINS)
IMAGINE	
  HOW	
  MANY	
  
UNIQUE	
  URLs	
  	
  COMBINED	
  
THIS	
  AMOUNTS	
  TO?	
  
– A	
  LOT
http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
Capacity	
  limits	
  
on	
  Google’s	
  
crawling	
  system
By	
  prioritising	
  
URLs	
  for	
  
crawling
By	
  assigning	
  
crawl	
  period	
  
intervals	
  to	
  URLs
How	
  have	
  
search	
  engines	
  
responded?
By	
  creating	
  work	
  
‘schedules’	
  for	
  
Googlebots
3
TOO  MUCH  CONTENT
4HERE’S  WHY  -­>  EVERYTHING  HAS  A  
FINITE  CAPACITY  (EVEN  CRAWLING)
“While	
  web	
  pages	
  can	
  be	
  manually	
  selected	
  for	
  
crawling,	
  this	
  becomes	
  impracticable	
  as	
  the	
  
number	
  of	
  web	
  pages	
  grows.	
  Moreover,	
  to	
  keep	
  
within	
  the	
  capacity	
  limits	
  of	
  the	
  crawler,	
  
automated	
  selection	
  mechanisms	
  are	
  needed	
  to	
  
determine	
  not	
  only	
  which	
  web	
  pages	
  to	
  crawl,	
  
but	
  which	
  web	
  pages	
  to	
  avoid	
  crawling.	
  For	
  
instance,	
  as	
  of	
  the	
  end	
  of	
  2003,	
  the	
  WWW	
  is	
  
believed	
  to	
  include	
  well	
  in	
  excess	
  of	
  10	
  billion	
  
distinct	
  documents	
  or	
  web	
  pages,	
  while	
  a	
  search	
  
engine	
  may	
  have	
  a	
  crawling	
  capacity	
  that	
  is	
  less	
  
than	
  half	
  as	
  many	
  documents.”	
  -­‐ Scheduler	
  for	
  
search	
  engine	
  crawler Google	
  Patent
US	
  8042112	
  B1,	
  (Zhu	
  et	
  al)
‘Managing items in crawl
schedule’ - US	
  8666964	
  B1
Include
5SOME  GOOGLE  CRAWL  SCHEDULER  
PATENTS
‘Scheduling a recrawl’ - US	
  
8386459	
  B1
‘Web crawler scheduler that
utilizes sitemaps from websites’ -
US	
  8037054	
  B2
‘Document reuse in a
search engine crawler’
- US	
  8707312	
  B1
‘Minimizing visibility of stale content in
web searching including revising web
crawl intervals of documents’ - US	
  
8407204	
  B2
‘Scheduler for search engine
crawler’ - US	
  8042112	
  B1
‘Distributed crawling of
hyperlinked documents’
- US	
  7305610	
  B1
IT	
  SEEMS	
  PRIORITIZATION	
  AND	
  GOOGLEBOT	
  
CRAWL	
  EFFICIENCY	
  ARE	
  IMPORTANT	
  TO	
  SEARCH	
  
ENGINES
Crawled	
  multiple	
  
times	
  daily
Crawled	
  daily	
  
Or	
  bi-­‐daily
Crawled	
  least	
  on	
  a	
  ‘round	
  
robin’	
  basis	
  – only	
  ‘active’	
  
segment	
  is	
  crawledSplit	
  into	
  segments	
  
on	
  random	
  rotation
6
“MANAGING  ITEMS  IN  A  CRAWL  
SCHEDULE”
(GOOGLE  PATENT  US	
  8666964	
  B1)
Real	
  Time
Crawl
Daily Crawl
Base	
  Layer	
  	
  Crawl
3	
  layers	
  /	
  tiers URLs	
  are	
  moved	
  
in	
  and	
  out	
  of	
  
layers	
  based	
  on	
  
past	
  visits	
  data	
  
(retrieved	
  from	
  
logs)
PAGE
‘IMPORTANCE’
AND URL
SCHEDULING
10	
  types
of
Googlebot
THE  KEY  SEARCH  ENGINE  (THE  
APPLIANCE)    CHARACTERS
7
SUPPORTING	
  ROLES	
  (LOG	
  
MANAGERS	
  &	
  PAGE	
  
RANKERS
Indexer	
  /	
  
Ranking	
  Engine
The	
  URL	
  
Scheduler
History	
  Logs
Link	
  Logs	
  /	
  Link	
  Maps
Anchor	
  Logs	
  /	
  Anchor	
  Maps
Status	
  Logs
Page	
  Rankers
8THE  ‘LOG’  MANAGERS          (‘The  Clerks’)
History	
  Logs
Link	
  Logs
JOBS	
  INCLUDE
JOBS	
  INCLUDE Other	
  Logs
JOBS	
  INCLUDE
Consider  these  as  ‘record-­keepers’  (record  
info  on  the  crawled  URLS
Retrieves	
  
previous	
  copies	
  of	
  
documents	
  for	
  
comparison	
  with	
  
newly	
  retrieved	
  
copies	
  for	
  
purposes	
   of	
  
’change	
  
frequency’	
  and	
  
‘change	
  weight’	
  
calculation	
  (last	
  
modified	
  &	
  
update	
  rate)
Include:
“identifies	
  all	
  the	
  links	
  (e.g.,	
  
URLs,	
  also	
  called	
  outbound	
  
links)	
  that	
  are	
  found	
  in	
  the	
  
document	
  associated	
  with	
  the	
  
record	
  and	
  the	
  text	
  that	
  
surrounds	
   the	
  link”	
  (Brawer	
  et	
  
al,	
  Google	
  Patent)
INFO	
  USED	
  TO	
  MAKE	
  LINK	
  
MAPS
• Anchor	
  Logs	
  &	
  
Maps
• Status	
  Logs
A	
  LOT	
  MORE	
  INFO	
  ON	
  
LOGS	
  AT:	
  Scheduler	
  for	
  
Search	
  Engine	
  Crawler
US	
  20100241621	
  A1
9
SUPERVISOR  -­ TEAM  LEADER  – ‘THE  URL  
SCHEDULER’
Think	
  of	
  it	
  as	
  Google’s	
  
line	
  manager	
  or	
  ‘air	
  
traffic	
  controller’	
  for	
  
Googlebots in	
  the	
  
web	
  crawling	
  system
JOBS
Schedules	
  Googlebot visits	
  to	
  URLs
Decides	
  which	
  URLs	
  to	
  ‘feed’	
  to	
  Googlebot
Uses	
  data	
  from	
  the	
  history	
  logs	
  about	
  past	
  visits
Assigns	
  visit	
  regularity	
  of	
  Googlebot to	
  URLs
Drops	
  ‘hints’	
  to	
  Googlebot to	
  guide	
  on	
  types	
  of	
  content	
  NOT	
  to	
  
crawl	
  and	
  excludes	
  some	
  URLs	
  from	
  schedules
Analyses	
  past	
  ‘change’	
  periods	
  and	
  predicts	
  future	
  ‘change’	
  
(BASED	
  ON	
  PAST	
  VISIT	
  DATA)	
  periods	
  for	
  URLs	
  for	
  the	
  purposes	
  of	
  
scheduling	
  Googlebot visits
Checks	
  ‘page	
  importance’	
  in	
  scheduling	
  visits	
  (PRIORITIES)
Assigns	
  URLs	
  to	
  ‘layers	
  /	
  tiers’	
  for	
  crawling	
  schedules	
   (REAL	
  TIME,	
  
DAILY,	
  BASE	
  LAYER	
  SEGMENT)
The	
  URL	
  
Scheduler	
  
controls	
  the	
  
meal	
  planner
Scheduler	
  checks	
  URLs	
  
for	
  ‘importance’,	
  ‘boost	
  
factor’	
  candidacy,	
  
‘probability	
  of	
  
modification’
‘Budgets’	
  are	
  allocated
Carefully	
  controls	
  
the	
  list	
  of	
  URLs	
  
Googlebot visits
THE  10  GOOGLEBOTS
Image
Video News
Adsense Adsbot
PAID	
  SEARCH	
  TYPES
10
MEDIA	
  TYPES
Smartphone AppsFeaturephoneMobile  
Adsense
MOBILE	
  TYPES
BOT TYPES HAVE
VARYING DEGREES OF
‘BUSY-NESS’
GOOGLEBOT	
  
WEB	
  SEARCH
Crawls	
  
images	
  only
Quality
Checks
Babybot (’the	
  
Noob’)
GOOGLEBOT  JOBS 11
JOBS
• ‘Ranks	
  nothing	
  at	
  all’
• Takes	
  a	
  list	
  of	
  URLs	
  to	
  crawl	
  from	
  URL	
  Scheduler
• Job	
  varies	
  based	
  on	
  ‘bot’	
  type	
  (e.g.	
  Image	
  bot	
  seems	
  a	
  bit	
  of	
  a	
  ‘part	
  
timer’	
  (images	
  change	
  less	
  frequently))
• Runs	
  errands	
  &	
  makes	
  deliveries	
  for	
  the	
  URL	
  server,	
  indexer	
  /	
  ranking	
  
engine	
  and	
  logs
• Makes	
  notes	
  of	
  outbound	
   linked	
  pages	
  and	
  additional	
  links	
  for	
  future	
  
crawling	
  (in	
  order	
  for	
  them	
  to	
  be	
  assigned	
  to	
  future	
  crawling	
  schedules)
• Takes	
  notes	
  of	
  ‘hints’	
  from	
  URL	
  scheduler	
  when	
  crawling
• Tells	
  tales	
  of	
  URL	
  accessibility	
  status,	
  server	
  response	
  codes,	
  notes	
  
relationships	
  between	
  links	
  and	
  collects	
  content	
  checksums	
  (binary	
  data	
  
equivalent	
  of	
  web	
  content)	
  for	
  comparison	
  with	
  past	
  visits	
  by	
  history	
  and	
  
link	
  logs
12
‘INDEXER’
Looks	
  at	
  all	
  of	
  the	
  
evidence	
  from	
  the	
  
various	
  logs	
  (and	
  the	
  
page	
  rankers)	
  of	
  the	
  
search	
  engine	
  to	
  
index	
  the	
  URLs
• Uses	
  the	
  combined	
  data	
  collected	
  in	
  order	
  to	
  index	
  the	
  
results	
  for	
  a	
  given	
  query
• TAKES	
  DATA	
  FROM	
  THE	
  LOGS	
  	
  TO	
  GENERATE	
  INDEXES
“The	
  indexer(s) 724 use	
  the	
  anchor	
  maps 718	
  
and	
  other	
  logs 716 to	
  generate	
  index(es) 726.	
  
The	
  index(es)	
  are	
  used	
  by	
  the	
  search	
  engine	
  to	
  
identify	
  documents	
  matching	
  queries	
  entered	
  by	
  
users	
  of	
  the	
  search	
  engine.”	
  (Web	
  crawler	
  
scheduler	
  that	
  utilizes	
  sitemaps	
  from	
  websites
US	
  8037054	
  B2,	
  Google	
  Patent,	
  Brawer	
  et	
  al,	
  
pub	
  2011)
I  ASKED  JOHN  MUELLER  AT  WEBMASTER  HANGOUT  
ABOUT  URL  QUEUES
14
GOOGLE	
  
WEBMASTER	
  
HANGOUT	
  
QUESTION	
  ON	
  
’URL	
  QUEUEING’
BUT	
  WHAT	
  OTHER	
  EVIDENCE	
  DO	
  WE	
  HAVE	
  TO	
  
SUPPORT	
  OUT	
  THEORIES?
“URLS	
  ARE	
  NOT	
  ALL	
  CRAWLED	
  IN	
  ORDER,	
  BUT	
  THAT	
  
SOME	
  RECEIVE	
  MULTIPLE	
  DAILY	
  CRAWLS,	
  SOME	
  DAILY,	
  
SOME	
  WEEKLY	
  AND	
  SOME	
  VERY	
  INFREQUENTLY”
https://www.seroundtable.com/google-­‐explains-­‐why-­‐
the-­‐search-­‐console-­‐has-­‐reporting-­‐delays-­‐21688.html
LOW	
  IMPORTANCE	
  URLs	
  
APPEAR	
  TO	
  BE	
  ‘QUEUED	
  
FOR	
  LATER’	
  AND	
  
VISITED	
  INFREQUENTLY	
  
WHEN	
  THERE	
  IS	
  SPARE	
  
CAPACITY	
  (LOWER	
  
PRIORITY)	
  (SCHEDULES)
WHICH  APPEARED  TO  SUPPORT… 15
“Priority	
  scores	
  are	
  
computed	
  for	
  each	
  
remaining	
  document	
  
identifier	
  based	
  on	
  
predetermined	
  criteria	
  
(e.g.,	
  a	
  page	
  importance	
  
score	
  of	
  the	
  document).”	
  
(Zhu	
  et	
  al,	
  2011)
PATENT	
  -­‐ Scheduler	
  for	
  search	
  
engine	
  crawler
US	
  8042112	
  B1
16
CRAWL  BUDGET
1.  CRAWL  BUDGET  – “AN  ALLOCATION  OF  
CRAWL  VISITS  TO  A  HOST”  
3.  PAGES  WITH  A  LOT  OF  LINKS  GET  
CRAWLED  MORE
4.  THE  VAST  MAJORITY  OF  URLS  ON  THE  WEB  DON’T  GET  A  LOT  
OF  BUDGET  ALLOCATED  TO  THEM  (LOW  TO  0  PAGERANK  URLS).  
2.  ROUGHLY  PROPORTIONATE  TO  
PAGERANK  AND  HOST  SPEED  /  CAPACITY
Mostly	
  taken	
  from	
  Eric	
  Enge’s (interview	
  with	
  
Matt	
  Cutts (@mattcutts)	
  interview	
  from	
  2010
https://www.stonetemple.com/matt-­‐cutts-­‐
interviewed-­‐by-­‐eric-­‐enge-­‐2/
I  ASKED  SOME  STUFF  ABOUT  CRAWL  
BUDGET  ALLOCATION
17
DISTRIBUTED	
  CRAWLING	
  OF	
  HYPERLINKED	
  
DOCUMENTS	
  -­‐ Patent	
  Abstract	
  – “Hyperlinked	
  
documents	
  to	
  be	
  crawled	
  are	
  grouped	
  by	
  host	
  
and	
  the	
  host	
  to	
  be	
  crawled	
  next	
  is	
  selected	
  
according	
  to	
  a	
  stall	
  time	
  of	
  the	
  host.	
  The	
  stall	
  
time	
  can	
  indicate	
  the	
  earliest	
  time	
  that	
  the	
  host	
  
should	
  be	
  crawled	
  and	
  the	
  stall	
  times	
  can	
  be	
  a	
  
predetermined	
  amount	
  of	
  time,	
  vary	
  by	
  host	
  and	
  
be	
  adjusted	
  according	
  to	
  actual	
  retrieval	
  times	
  
from	
  the	
  host”	
  (Dean	
  et	
  al	
  (Google,	
  2014))
IT	
  SEEMS	
  – BUDGET	
  IS	
  ASSIGNED	
  TO	
  THE	
  HOST	
  
(I.P)	
  AND	
  THEN	
  SHARED	
  BETWEEN	
  THE	
  SITES	
  
THERE
I  ASKED  SOME  STUFF  ABOUT  LINKS  AND  CRAWL  
BUDGET  (in  light  of  2012  ‘DISAVOW  TOOL’)
18
TIP  (IMHO  -­ DAWN)  –
YOU  MAY  NEED  TO  
RESTRUCTURE  /  
FLATTEN  SO  ‘BUDGET’  
CAN  REACH  
IMPORTANT  URLS
“Thanks	
  
John”	
  -­‐
Waving	
  J
19IT  SEEMS  THERE  MORE  FACTORS  AFFECTING  ‘CRAWL  
BUDGET??’
Transcript:	
  
https://searchenginewatch.com/201
6/04/06/webpromos-­‐qa-­‐with-­‐
googles-­‐andrey-­‐lipattsev-­‐transcript/
WEB	
  PROMOS	
  Q	
  &	
  A	
  WITH	
  GOOGLES	
  
ANDREY	
  LIPATTSEV
Andrev chatting	
  with	
  Ammon	
  J	
  
seemed	
  to	
  imply	
  that	
  a	
  lot	
  
more	
  things	
  affect	
  crawl	
  
frequency	
  now	
  than	
  just	
  
PageRank
20
ARE	
  THERE	
  OTHER	
  FACTORS	
  AFFECTING	
  
BUDGET	
  AND	
  /	
  OR	
  ‘CRAWL	
  RANK’	
  AS	
  WELL	
  AS	
  
PAGERANK	
  AND	
  SPEED?	
  
I  ASKED  @johnmu IF  I  
COULD  ASK  WHETHER  
THE  FACTORS  
AFFECTING  CRAWL  
BUDGET  HAD  
CHANGED?
JOHN	
  SAID	
  – “Sure…You	
  can	
  always	
  ask”	
  J J –
“But,	
  he	
  didn’t	
  tell	
  me	
  what	
  they	
  were	
  (if	
  any)”
SO  I  ASKED  IF  I  COULD  ASK  IF  FACTORS  AFFECTING  
CRAWL  BUDGET  /  CRAWL  FREQUENCY  HAD  
CHANGED  – I.E.  ADDITIONAL  FACTORS?
22
GOOGLE  PATENT  – ‘NOT  ALL  ‘CHANGE’  IS  
CONSIDERED  EQUAL’    (CRITICAL  &  NON-­CRITICAL)
“Changes  can  be  described  as  critical  or  non-­critical  and  that  
determination  may  depend  on  the  portion  of  the  document  changed,  or  
the  context  of  the  changes,  rather  than  the  amount  of  text  or  content  
changed.  Sometimes  a  change  to  a  document  may  be  insubstantial,  
e.g.,  the  change  of  advertisements  associated  with  a  document.  In  this  
case,  it  is  more  appropriate  to  ignore  those  accessory  materials  in  a  
document  prior  to  making  content  comparisons.  In  other  cases,  e.g.,  as  
part  of  a  product  search,  not  every  piece  of  information  in  a  
document  is  weighted  equally  by  a  potential  user.  For  instance,  the  
user  may  care  more  about  the  unit  price  of  the  product  and  the  
availability  of  the  product.  In  this  case,  it  is  more  appropriate  to  focus  
on  the  changes  associated  with  information  that  is  deemed  critical  
to  a  potential  user  rather  than  something  that  is  less  significant,  
e.g.,  a  change  in  a  product's  colour”    (Minimizing	
   Visibility	
  of	
  Stale	
  
Content	
  in	
  Web	
  Searching	
  Including	
  Revising	
  Web	
  Crawl	
  Intervals	
  of	
  
Documents -­‐ Anton	
  Carver,	
  Google	
  Patent	
  -­‐ US	
  20130226897	
  A1,	
  pub	
  2013)
Probability	
  &	
  
predictability	
  
of	
  future	
  
‘freshness’	
  
(newness	
  or	
  
critical	
  material	
  
change)	
  
(‘CHANGE	
  
RATE’	
  APPEARS	
  
TO	
  BE	
  
‘LEARNED’)
’CHANGE	
  RATE	
  
&	
  CHANGE	
  
WEIGHT	
  
THRESHOLDS’
CRITICAL  MATERIAL  CONTENT  CHANGE  
(IMPORTANT  CHANGE)  &  FEATURE  WEIGHTS  
21
C	
  =	
  ∑	
  i =	
  0	
  n	
  -­‐ 1	
   	
  weight	
  i *	
  feature
NOT JUST ‘RANDOM’
CHANGE like
Shuffle($variable) or
RAND($variable)
NOT	
  ALL	
  ‘FEATURES’	
  ARE	
  CREATED	
  EQUAL	
  ACCORDING	
  TO	
  THIS	
  LINE	
  
IN	
  PATENTS	
  –”	
  weight	
  i *	
  feature”
EXAMPLE	
  FEATURES	
  – E.G.	
  A	
  CHANGE	
  IN	
  PRICE	
  (FEATURE)	
  
MAY	
  BE	
  WEIGHTED	
  HIGHER	
  THAN	
  A	
  CHANGE	
  IN	
  COLOUR	
  
(FEATURE)	
  – FEATURE	
  WEIGHT	
  PRICE	
  >	
  FEATURE	
  WEIGHT	
  
COLOUR
”DEPENDS	
  ON	
  HOW	
  OFTEN	
  THE	
  
PAGE	
  CHANGES”	
  IS	
  MENTIONED	
  A	
  
LOT IN	
  WEBMASTER	
  HANGOUTS
Minimizing	
   Visibility	
  of	
  Stale	
  Content	
  in	
  Web	
  Searching	
  
Including	
  Revising	
  Web	
  Crawl	
  Intervals	
  of	
  Documents -­‐ Anton	
  
Carver,	
  Google	
  Patent	
  -­‐ US	
  20130226897	
  A1,	
  pub	
  2013
“BE  CONSISTENT”  -­ (@johnmu,  Nov  2015) 23
SMX	
  MILAN	
  (November	
  2015),	
  reported	
  here	
  by	
  SERoundtable on	
  quote	
  from	
  Google’s	
  
John	
  Mueller	
  @johnmu https://www.seroundtable.com/google-­‐number-­‐one-­‐seo-­‐advice-­‐
be-­‐consistent-­‐21196.html
DA	
  -­‐ I	
  HAVE	
  A	
  FEELING	
  CONSISTENCY	
  IS	
  
IMPORTANT	
  FOR	
  ‘HISTORY	
  LOGS’	
  TO	
  
‘LEARN’	
  CHANGE	
  RATES	
  /	
  THRESHOLDS
URL  EXCLUSIONS  FOR  ‘TRIPPING  ‘MINIMUM-­CRAWL-­
THRESHOLD’  REVISIT  ‘HINTS’  AND  ‘SPAM’  URLs
24
‘RANDOM’ CHANGE created programmatically like
Shuffle($variable) or RAND($variable) may even be
seen as ‘hints’ TO GOOGLEBOT TO ‘NOT’ CRAWL
HINTS	
  =	
  ‘MEH	
  CHANGES’	
  (E.G.	
  PATTERNS	
  OF	
  ’SAME	
  OLD,	
  SAME	
  OLD	
  
STUFF’	
  DUPLICATES,	
  PROGRAMMATICALLY	
  GENERATED	
  CONTENT)
"Hints  may  also  be  employed  on  pages  that  are  automatically  
generated  and/or  contain  dynamically  generated  elements  that  result  
in  the  page  having  a  different  checksum  every  time  it  is  crawled”  
(Managing  Items  In  A  Crawl  Schedule,  Google  Patent  -­ US	
  8666964	
  B1)
26
GOOGLE  THINKS  CRAWL  BUDGET  IS  
IMPORTANT  FOR  SEO
CIRCA	
  JULY	
  2015
BUT…	
  NO	
  ONE	
  HAS	
  EVER	
  OFFICIALLY	
  SAID	
  THAT	
  THERE’S	
  ANY	
  KIND	
  OF	
  	
  
RANKING	
  BENEFIT	
  FROM	
  POSITIVE	
  CRAWL	
  ACTIVITY
ENTER  ‘CRAWL  RANK’  -­ A  BENEFIT  OF  
CRAWL  OPTIMISATION??
27
“The	
  pages	
  that	
  aren’t	
  crawled	
  as	
  often	
  are	
  pages	
  
with	
  little	
  to	
  no	
  PageRank.	
  CrawlRankis	
  the	
  
difference	
  in	
  this	
  very	
  large	
  pool	
  of	
  pages.	
  	
  
You	
  win	
  if	
  you	
  get	
  your	
  low	
  PageRank	
  pages	
  
crawled	
  more	
  frequently	
  than	
  the	
  competition.”	
  	
  
“I’m	
  still	
  not	
  entirely	
  convinced	
  this	
  is	
  what	
  is	
  
happening,	
  but	
  I’m	
  seeing	
  success	
  using	
  this	
  
philosophy.	
  “-­‐ A	
  J	
  Kohn	
  @ajkohn
OTHERS	
  SEEM	
  TO	
  BE	
  TRACKING	
  IT	
  TOO	
  – E.G.	
  SEO	
  
CLARITY
DOES	
  THE	
  MYTHOLOGICAL	
  ‘CRAWL	
  RANK’	
  BENEFIT	
  EVEN	
  EXIST?
DOES  ‘CRAWL  RANK’  STILL  APPLY? 28
I	
  ASKED	
  A	
  J	
  KOHN	
  IF	
  HE	
  STILL	
  THOUGHT	
  IT	
  APPLIED	
  
NOW?
“Thanks	
  
A.J”	
  -­‐
Waving	
  J
”I	
  still	
  see	
  evidence	
  that	
  getting	
  pages	
  crawled	
  
frequently	
  (within	
  7-­‐10	
  days)	
  seems	
  to	
  have	
  an	
  
impact	
  on	
  their	
  ability	
  to	
  rank	
  well”	
  (AJ	
  Kohn,	
  2016)
IS  LONG-­TAIL  ‘LEAP-­FROGGING’  (AND  SOME  
CLUSTERING)   WHAT  ‘CRAWL  RANK’  LOOKS  LIKE?
29
SITES	
  JUMPING	
  OVER	
  EACH	
  
OTHER	
  ON	
  ’LONG	
  TAILED	
  
QUERIES’	
  IN	
  AN	
  ENDLESS	
  LAST	
  
LAP	
  RACE?
HOW  IT  APPEARS  TO  WORK  – ‘YOU  DON’T  
ALWAYS  HAVE  TO  FIGHT  THE  ‘BOSS’  
URLS’
30
Why	
  fight	
  with	
  the	
  
Hulk	
  when	
  you	
  can	
  be	
  
Yoda?
Image	
  
Credit:	
  
Flickr
EVEN  STRONGER  DOMAINS  HAVE  WEAKER  URLS 31
THE	
  SITES	
  MAY	
  ALL	
  BE	
  STRONGER	
  THAN	
  YOU	
  BUT	
  THERE	
  
ARE	
  A	
  LOT	
  OF	
  PAGES	
  ON	
  BIG	
  SITES	
  WITH	
  NO	
  STRENGTH
YOU	
  WON’T	
  BEAT	
  THE	
  STRONG	
  URLs	
  WITH	
  
CRAWL	
  OPTIMISATION	
  ALONE
You	
  are	
  unlikely	
  to	
  beat	
  
these	
  URLs	
  with	
  crawl	
  
optimisation techniques	
  
alone.	
  	
  These	
  URLs	
  are	
  not	
  
the	
  intended	
  target	
  for	
  
these	
  tactics	
  – TOO	
  
STRONG
SAVE	
  SOME	
  BATTLES	
  
FOR	
  LATER
Strong	
  
URLs
FIGHT  AT  A  URL  V  URL    OR  TEMPLATE  V  TEMPLATE  
LEVEL  WITH  LOW  TO  0  PAGE  RANK  URLS
32
PICK	
  OFF	
  THE	
  
WEAKER	
  URLS	
  
WHEN	
  BATTLING	
  
WITH	
  A	
  BIG	
  SITE	
  –
LOW	
  TO	
  NO	
  PAGE	
  
RANK	
  URLS• TARGETS	
  THE	
  LOW	
  STRENGTH	
  PAGES	
  FURTHER	
  
DOWN	
  IN	
  THE	
  SITES	
  OF	
  COMPETITORS	
  
(SUBCATEGORY	
  PAGES	
  E.G.	
  IN	
  ECOMMERCE	
  
SITES
• THERE	
  ARE	
  A	
  LOT	
  OF	
  PAGES	
  (MILLIONS	
  WITH	
  
LITTLE	
  TO	
  NO	
  PAGE	
  RANK)
• YOU’RE	
  AIMING	
  TO	
  BEAT	
  THOSE
VIRTUALLY	
  NO	
  
STRENGTH	
  IN	
  1,000s	
  OF	
  
URLS
POWERFULWELL
KNOWN BRANDS
BUT NO STRENGTH
LOWER DOWN THE
ARCHITECTURE
MANY LOW VOL/
DEEPURLsARE
COMPLETE
WEEDS ON
BEHEMOTH SITES
Weak	
  
URLs
25
A  BIG  FACTOR?  -­ ‘EMPHASIS  OF  ‘  URL  
IMPORTANCE’’  (E.G.  ON  PARAMETERS)
FULL	
  TRANSCRIPT	
  -­‐ https://www.stonetemple.com/matt-­‐cutts-­‐interviewed-­‐by-­‐eric-­‐enge-­‐2/
THIS	
  WAS	
  IN	
  THE	
  
ORIGINAL	
  INTERVIEW	
  
WITH	
  MATT	
  CUTTS
ALSO	
  LOTS	
  OF	
  THE	
  
PATENTS	
  MENTION	
  
“PAGE	
  IMPORTANCE	
  
(WHICH	
  MAY	
  INCLUDE	
  
PAGERANK)”
WHICH  SEEMS  TO  SUPPORT  THIS  PAPER  BY  PAGE  ET  AL  ON  IMPORTANCE 13
“Thanks	
  
Bill”	
  -­‐
Waving	
  J
THIS	
  REFERENCES	
  THE	
  PROBLEM	
  OF	
  THE	
  SIZE	
  OF	
  THE	
  WEB	
  AND	
  
PRIORITIZES	
  IMPORTANT	
  PAGES
Efficient	
  
Crawling	
  
Through	
  
URL	
  
Ordering
Page	
  et	
  al
’POINT  TO  THE  NEEDLE  IN  THE  HAY’  –
EMPHASISE  IMPORTANCE
33
• Googlebot is	
  also	
  ‘hunting’…	
  Hunting	
  for	
  relevant	
  
‘needles’	
  in	
  1,000,000,000s	
  of	
  straws	
  of	
  ‘hay’	
  on	
  the	
  web
• It’s	
  about	
  making	
  your	
  ‘one	
  needle’	
  stand	
  out	
  in	
  importance	
  in	
  not	
  just	
  your	
  own	
  
site’s	
  haystack,	
  but	
  tens	
  of	
  thousands	
  of	
  competing	
  similar	
  straws	
  of	
  hay	
  in	
  other	
  
site’s	
  haystacks…	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (DON’T	
  JUST	
  MAKE	
  YOUR	
  HAYSTACK	
  BIGGER)
“Hey,	
  you	
  Googlebot…	
  This	
  is	
  the	
  needle”	
  via	
  
architectural	
  internal	
  linking	
  without	
  blur	
  of	
  duplication	
  or	
  
too	
  many	
  redirects	
  or	
  canonicalization
13
WHICH  OF  YOUR  URLs  ARE  IMPORTANT?
“If	
  you	
  don’t	
  consistently	
  
indicate	
  via	
  clean	
  internal	
  
individual	
  URL	
  importance	
  
emphasis,	
  the	
  importance	
  of	
  
your	
  URLs,	
  how	
  will	
  
Googlebot know	
  which	
  are	
  
the	
  most	
  important?”
35
INTERNAL  LINKS  COUNT  (A  LOT)
(RELEVATIVE  IMPORTANCE  VOTES  ON  URL  
IMPORTANCE  FROM  YOUR  OWN  SITE)
THESE	
  ARE	
  
YOUR	
  ‘VOTES’	
  
TO	
  GOOGLEBOT	
  
ON	
  THE	
  
IMPORTANCE	
  
OF	
  EACH	
  URL
EMPLOY	
  
‘CONSISTENT’	
  
INTERNAL	
  LINK	
  
STRATEGIES
THINK	
  OF	
  THESE	
  
AS	
  ‘WALL-­‐TIES’	
  
HOLDING	
  YOUR	
  
BUILDING	
  (SITE	
  
ARCHITECTURE)	
  
TOGETHER
STOP  VOTING  FOR  
THE  WRONG  URLS
FROM  WITHIN  YOUR  
OWN  SITE.
WRONG  TARGETS  
RANKING?…  CHECK  
INTERNAL  LINKS
From	
  Google	
  Support	
   Pages
Consistent internal	
  &	
  external	
  emphasis	
  of	
  a	
  
URLs	
  ’IMPORTANCE’
38
NEGATIVE	
  CONSEQUENCES	
  
FROM	
  POOR	
  CRAWL	
  VISITS	
  
(E.G.	
  SPIDER	
  TRAPS	
  (INFINITE	
  
LOOPS),	
  INDIVIDUAL	
  URLS	
  
VISITED	
  LESS	
  AND	
  LESS	
  
FREQUENTLY	
  BECAUSE	
  
THERE’S	
  TOO	
  MANY)
BUT  IS  THERE  PERHAPS  AN  OPPOSITE  
OF  ‘CRAWL  RANK’?  -­ ’CRAWL  TANK’??
IS	
  THERE	
  ADVERSE	
  EFFECT	
  WHEN	
  CRAWLING	
  GOES	
  BAD?
WELL  -­ I’VE  SEEN  ‘CRAWL  TANK’  – IT
AIN’T  PRETTY
39
SITE	
  SEO	
  DEATH	
  BY	
  TOO	
  MANY	
  URLS	
  AND	
  
INSUFFICIENT	
  CRAWL	
  BUDGET	
  TO	
  SUPPORT	
  
(EITHER	
  DUMPING	
  A	
  NEW	
  THIN	
  PARAMETER	
  
INTO	
  A	
  SITE	
  OR	
  INFINITE	
  LOOP	
  (CODING	
  
ERROR)	
  (SPIDER	
  TRAP))
”BEEN THERE, DONE THAT”
IT  KIND  OF  LOOKS  A  BIT  LIKE  THIS 40
”BEEN THERE, DONE THAT”
DEFINITELY
41
‘EXPONENTIAL  URL  UNIMPORTANCE’?
Your	
  URLs	
  exponentially,	
  
CONSISTENTLY	
  	
  confirmed	
  
unimportant	
   to	
  queries	
  with	
  
each	
  iterative	
  crawl	
  visit	
  to	
  
other	
  similar	
  or	
  duplicate	
  
content	
  checksum	
  URLs?
MULTPLE	
  RANDOM	
  URLs	
  
competing	
  for	
  same	
  query	
  
confirm	
  irrelevance	
  of	
  all	
  
competing	
  in-­‐site	
  URLs	
  with	
  
no	
  dominant	
  relevant	
  
IMPORTANT	
  URL?
STILL…SILVER  LININGS 42
“EVERY	
  SEO	
  NEEDS	
  A	
  
’FLATLINER’	
  SITE	
  TO	
  
RESURRECT	
  AND	
  
MAKE	
  BETTER…	
  “
RIGHT?
Going	
  ‘where	
  the	
  action	
  is’	
  in	
  sites
The	
  ‘need	
  for	
  speed’
Logical	
  structure
Correct	
  ‘response’	
  codes
XML	
  sitemaps
‘Successful	
  crawl	
  visits
‘Seeing	
  everything’	
  on	
  a	
  page
Taking	
  ‘hints’
Clear	
  unique	
  single	
  ‘URL	
  
fingerprints’	
  (no	
  duplicates)
Predicting	
  likelihood	
  of	
  ‘future	
  
change’
Slow	
  sites
Too	
  many	
  redirects
Being	
  bored	
  (Meh)	
  (‘Hints’	
  are	
  built	
  in	
  by	
  the	
  
search	
  engine	
  systems	
  – Takes	
  ‘hints’)
Being	
  lied	
  to	
  (e.g.	
  On	
  XML	
  sitemap	
  priorities)
Crawl	
  traps	
  and	
  dead	
  ends
Going	
  round	
  in	
  circles	
  (Infinite	
  loops)
Spam	
  URLs
Crawl	
  wasting	
  minor	
  content	
  change	
  URLs
‘Hidden’	
  and	
  blocked	
  content
Uncrawlable	
  URLs
Duplicate	
  URLs
Not	
  just	
  any	
  change
Critical	
  material	
  change
Predicting	
  future	
  change
Dropping	
  ‘hints’	
  to	
  Googlebot
Sending	
  Googlebot
Where	
  ‘the	
  action	
  is’
43
LIKES DISLIKES CHANGE	
  IS	
  KEY
BASED  ON  DATA  FROM  THE  HISTORY  LOGS  -­ CAN  WE  
INFLUENCE  VIA  CRAWL  OPTIMISATION  TO  ESCAPE  THE  
‘BASE  LAYER  HOME’  OF  THE  ’UNIMPORTANT’  URLS?
44HERE’S  ONE  I  MADE  EARLIER…SOME  CAVEATS
THIS	
  IS	
  A	
  PERSONAL	
  PROJECT	
  – MY	
  20	
  IN	
  70:	
  20:10	
  MIX
IT’S	
  NOT	
  MOBILE	
  FRIENDLY	
  OR	
  HTTPS	
  
(HANGS	
  HEAD	
  IN	
  SHAME),	
  AND	
  YES,	
  IT	
  
NEEDS	
  A	
  MAKEOVER…	
  BUT…	
  TIME…	
  ,	
  
RESOURCES,	
  BUDGET…BLAH	
  BLAH
THERE	
  IS	
  NO	
  ‘BIG	
  BRAND’	
  
MARKETING,	
  VC	
  BACKING,	
  TV	
  OR	
  
RADIO	
  ADS	
  (LIKE	
  COMPETITORS)	
  –
JUST	
  ME	
  -­‐ ‘CHIPPING	
  AWAY’
90%+	
  OF	
  TRAFFIC	
  IS
NON-­‐BRANDED	
  GENERIC
ORGANIC
URL  CRAWL  FREQUENCY  ’CLOCKING’ 46
Spreadsheet	
  provided	
   by	
  
@johnmu during	
  Webmaster	
  
Hangout
https://goo.gl/1p
ToL8
ARE	
  THE	
  URLS	
  THAT	
  YOU	
  
WANT	
  BEING	
  CRAWLED	
  
‘REAL	
  TIME’,	
  DAILY	
  OR	
  
INFREQUENTLY?	
  
(REGULAR	
  LOG	
  ANALYSIS	
  
AND	
  INTERVENTION	
  TO	
  
EMPHASISE	
  IMPORTANCE)
MY	
  THOUGHTS	
  (DA)	
  -­‐ You	
  need	
  to	
  find	
  out	
  which	
  ones	
  are	
  getting	
  crawled	
  in	
  
the	
  ‘real	
  time’	
  schedule,	
  the	
  ‘daily	
  crawl’	
  schedule	
  and	
  via	
  random	
  selection	
  in	
  
the	
  ‘dross’	
  (or	
  UNLIKELY	
  TO	
  CHANGE	
  A	
  LOT	
  /	
  UNIMPORTANT)	
  ‘base	
  layer’	
  
section.	
  	
  If	
  it’s	
  not	
  the	
  URLs	
  that	
  you	
  want	
  to	
  be	
  there,	
  then	
  formulate	
  a	
  plan	
  
to	
  improve	
  the	
  ‘importance’	
  of	
  URLS.	
  (NOTE:	
  JOHN	
  DID	
  NOT	
  SAY	
  
THIS)
45LOSE  THE  ‘DEAD  WOOD’  SO  GOOGLEBOT  DETECTS  
‘IMPORTANCE’
FIX IT FOR A
BETTER CRAWL
EMBRACE
THE ‘410
GONE’FLATTENING	
  
ARCHITECTURES,	
  
CONSISTENTLY	
  AVOIDING	
  
CANNIBALISATION,	
  INTERNAL	
  
LINK	
  STRATEGIES,	
  LINKING	
  
RELEVANT	
  CONTENT	
  TO	
  
RELEVANT	
  CONTENT,	
  
UTILISING	
  XML	
  &	
  FRONT	
  
FACING	
  SITEMAPS	
  AND	
  
STRONG	
  HUB	
  PAGES	
  TO	
  
‘HERD’	
  GOOGLEBOT	
  AROUND	
  
THE	
  SITE
47
40,000  TOWNS,  CITIES  &  VILLAGES
40,000+	
  towns,	
  cities	
  and	
  
villages	
  across	
  the	
  UK	
  
multiplied	
   by	
  X	
  site	
  
categories	
  (THAT’S	
  A	
  LOT	
  
OF	
  LONG	
  TAIL	
  QUERY	
  
VOLUME)
48FWIW  – LONG  TAIL  CRAWL  TECHNIQUES  SEEM  TO  
APPLY  TO  OTHER  SEARCH  ENGINES    TOO
By	
  shortening	
   crawl	
  paths	
  and	
  crawl	
  
frequency	
  intervals	
  and	
  emphasing important	
  
to	
  subcategory	
  URLs	
  on	
  frequently	
  changed	
  
URLs	
  (fresh)	
  it	
  appears	
  you	
  may	
  gain	
  a	
  
competitive	
  advantage	
  on	
  long	
  tail	
  queries
IT’S  ALIVE…  NEEDS  WORK…  BUT  ALIVE 49
CAVEAT:	
  IT’S	
  TOO	
  COMPLEX	
  TO	
  ANSWER	
  WITH	
  A	
  
SIMPLE	
  FEW	
  EXAMPLES	
  OF	
  COURSE	
  (TOO	
  MANY	
  
FACTORS)	
  – BUT…	
  FOOD	
  FOR	
  THOUGHT
‘CRITICAL	
  MATERIAL	
  
CHANGE	
  FREQUENCY’	
  
(FRESHNESS)	
  AND	
  
DETECTED	
  URL	
  
IMPORTANCE	
  EMPHASIS	
  
VIA	
  EXTERNAL	
  OR	
  
INTERNAL	
  SIGNALS	
  (INC	
  
PAGERANK)	
  SEEM	
  KEY
IS	
  IT	
  ‘CRAWL	
  RANK’	
  OR	
  ‘EMPHASING	
  URL	
  IMPORTANCE’	
  BETTER	
  THAN	
  COMPETITORS	
  
EMPHASE	
  IMPORTANCE	
  OF	
  LOW	
  TO	
  NO	
  PAGERANK	
  PAGES	
  WHERE	
  FEW	
  OTHER	
  FACTORS	
  
SEPARATE?
50CRAWL  BUDGET  &  ‘CRAWL  RANK’  – OTHER  FACTORS??
1.  IT  APPEARS  TO  BE  APPORTIONED  
BY  THE  URL  SCHEDULER  (BUDGET)
2.  PAGES  WITH  A  LOT  OF  (HEALTHY??)  
LINKS  GET  CRAWLED  MORE  (EXTERNAL  
AND  INTERNAL?)  (BUDGET  AND  RANK?)
3.  THERE  ARE  URL  EXCLUSIONS  – (  
’HINT  TRIPPERS’,  OBJECTIONABLE  
CONTENT  AND  ‘SPAM  URLS’??  )  
(BUDGET)
4  – ‘CRITICAL  MATERIAL  CHANGE’  (FRESHNESS)  AND  THE  PROBABILITY  
AND  PREDICTABILITY  OF  CHANGE CORRELATE  (BUDGET)
5  –’CONSISTENT’ EMPHASIS  OF  URL  IMPORTANCE(BUT  I  THINK  THAT  THIS  
WAS  ALWAYS  THERE) MAY  BE  ’CRAWL  RANK’(BUDGET  AND  RANK??)
’CRAWL	
  RANK’	
  -­‐ IS	
  IT	
  
CORRELATION	
  OR	
  
CAUSATION?	
  	
  (DO	
  IMPORTANT	
  
PAGES	
  GET	
  CRAWLED	
  MORE,	
  	
  
OR	
  IS	
  IT	
  BECAUSE	
  THEY	
  ARE	
  
CRAWLED	
  MORE	
  THEY	
  ARE	
  
IMPORTANT?)
CAN  WEB  PAGES  CRAWLED  
INFREQUENTLY  
STILL  RANK?
36
YES
THEY  CAN  STILL  BE  
’IMPORTANT’
IT’S  THE  ONES  YOU’RE  INDICATING  ARE  UNIMPORTANT  
THAT  YOU  WANT  TO  KEEP  AN  EYE  ON  -­ #JUSTSAYING  ;;)
“BE  SMART  ABOUT  YOUR  TAGS  AND  SITE  
ARCHITECTURE,  STAY  FRESH  AND  RELEVANT”
(@maileohye,  2016)
37
SLIDE	
  FROM	
  APRIL	
  2016’S	
  SEJSUMMIT	
  ON	
  SEO	
  INSTRUCTIONS	
  2016
FROM	
  GOOGLE’S	
  @maileohye
52
EITHER  WAY  -­ ARE  ALL  THE  CHECKS  AND  BALANCES  
INDICATING  YOU  ARE  STILL  ON  TRACK?
BECAUSE	
  -­‐ BRINGING	
  A	
  
ROCKET	
  BACK	
  ON	
  COURSE	
  
IS	
  ‘CHALLENGING’
REGULAR	
  TESTS	
  AND	
  EARLY	
  DIAGNOSIS	
  ARE	
  CRUCIAL	
  –
STOP,	
  CHECK	
  AND	
  KEEP	
  CHECKING
‘TANK’	
  OR	
  
‘RANK’?
– YOU	
  DECIDE
TWITTER	
  -­‐ @dawnieando
GOOGLE+	
  -­‐ +DawnAnderson888
LINKEDIN	
  -­‐ msdawnanderson
THANKS	
  FOR	
  
LISTENING	
  
FOLKS	
  J Dawn	
  Anderson	
  @	
  dawnieando
ENJOY	
  BRIGHTON	
  SEO
REFERENCES
http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
Scheduler	
  for	
  search	
  engine	
  crawler Google	
  Patent
US	
  8042112	
  B1,	
  (Zhu	
  et	
  al) -­‐ https://www.google.com/patents/US8707313
Managing	
  items	
  in	
  crawl	
  schedule	
  – Google	
  Patent	
  (Alpert)	
  
http://www.google.ch/patents/US8666964
Document	
  reuse	
  in	
  a	
  search	
  engine	
  crawler	
  -­‐ Google	
  Patent	
  (Zhu	
  et	
  al)
https://www.google.com/patents/US8707312
Web	
  crawler	
  scheduler	
  that	
  utilizes	
  sitemaps	
  (Brawer	
  et	
  al)	
  -­‐
http://www.google.com/patents/US8037054
Distributed	
  crawling	
  of	
  hyperlinked	
  documents	
  (Dean	
  et	
  al)	
  -­‐
http://www.google.co.uk/patents/US7305610
Minimizing	
  visibility	
  of	
  stale	
  content	
  (Carver)	
  -­‐
http://www.google.ch/patents/US20130226897
REFERENCES
Efficient	
  Crawling	
  Through	
  URL	
  Ordering	
  (Page	
  et	
  al)	
  -­‐ http://oak.cs.ucla.edu/~cho/papers/cho-­‐order.pdf
Crawl	
  Optimisation (Blind	
  Five	
  Year	
  Old	
  – A	
  J	
  Kohn	
  -­‐ @ajkohn)	
  http://www.blindfiveyearold.com/crawl-­‐
optimization
Scheduling	
  a	
  recrawl (Auerbach)	
  	
  -­‐ http://www.google.co.uk/patents/US8386459
Scheduler	
  for	
  search	
  engine	
  crawler	
  (Zhu	
  et	
  al)	
  -­‐ http://www.google.co.uk/patents/US8042112
Efficient	
  crawling	
  through	
  URL	
  ordering	
  	
  (Page	
  et	
  al)	
  -­‐ http://oak.cs.ucla.edu/~cho/papers/cho-­‐order.pdf
Google	
  Explains	
  Why	
  The	
  Search	
  Console	
  Reporting	
  Is	
  Not	
  Real	
  Time	
  (SERoundtable)	
  
https://www.seroundtable.com/google-­‐explains-­‐why-­‐the-­‐search-­‐console-­‐has-­‐reporting-­‐delays-­‐21688.html
Crawl	
  Data	
  Aggregation	
  Propagation	
  (Mueller)	
  -­‐ https://goo.gl/1pToL8
Matt	
  Cutts Interviewed	
  By	
  Eric	
  Enge -­‐ https://www.stonetemple.com/matt-­‐cutts-­‐interviewed-­‐by-­‐eric-­‐enge-­‐
2/
Web	
  Promo	
  Q	
  and	
  A	
  with	
  Google’s	
  Andrev Lippatsev -­‐
https://searchenginewatch.com/2016/04/06/webpromos-­‐qa-­‐with-­‐googles-­‐andrey-­‐lipattsev-­‐transcript/
Google	
  Number	
  1	
  SEO	
  Advice	
  – Be	
  Consistent	
  -­‐ https://www.seroundtable.com/google-­‐number-­‐one-­‐seo-­‐
advice-­‐be-­‐consistent-­‐21196.html

More Related Content

What's hot

SEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO SuccessSEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO Success
Dawn Anderson MSc DigM
 
Pubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn andersonPubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn anderson
Dawn Anderson MSc DigM
 
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Rachel Costello
 
How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...
How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...
How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...
DeepCrawl
 
Creating Commerce Reviews and Considering The Case For User Generated Reviews
Creating Commerce Reviews and Considering The Case For User Generated ReviewsCreating Commerce Reviews and Considering The Case For User Generated Reviews
Creating Commerce Reviews and Considering The Case For User Generated Reviews
Dawn Anderson MSc DigM
 
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GambleTechnical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Philip Gamble
 
SEO Benchmark
SEO BenchmarkSEO Benchmark
SEO Benchmark
eBusiness Champions
 
Crawl Budget - Some Insights & Ideas @ seokomm 2015
Crawl Budget - Some Insights & Ideas @ seokomm 2015Crawl Budget - Some Insights & Ideas @ seokomm 2015
Crawl Budget - Some Insights & Ideas @ seokomm 2015
Jan Hendrik Merlin Jacob
 
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
Branded3
 
Digital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Digital Olympus Technical SEO Findings Whilst Taming An SEO BeastDigital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Digital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Dawn Anderson MSc DigM
 
BrightonSEO - The Search Universe - Links, Log Files, GSC and everything in b...
BrightonSEO - The Search Universe - Links, Log Files, GSC and everything in b...BrightonSEO - The Search Universe - Links, Log Files, GSC and everything in b...
BrightonSEO - The Search Universe - Links, Log Files, GSC and everything in b...
Jon Myers
 
An SEO's Guide to Website Migrations | Faye Watt | BrightonSEO's Advanced Tec...
An SEO's Guide to Website Migrations | Faye Watt | BrightonSEO's Advanced Tec...An SEO's Guide to Website Migrations | Faye Watt | BrightonSEO's Advanced Tec...
An SEO's Guide to Website Migrations | Faye Watt | BrightonSEO's Advanced Tec...
Faye Watt
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical check
Chloe Bodard
 
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Dawn Anderson MSc DigM
 
the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)
Alexis Sanders
 
Competitive On Site Optimization
Competitive On Site OptimizationCompetitive On Site Optimization
Competitive On Site Optimization
Sean Si
 
The Technical SEO Renaissance
The Technical SEO RenaissanceThe Technical SEO Renaissance
The Technical SEO Renaissance
Michael King
 
What a search engine can teach you about product sitemaps - BrightonSEO April...
What a search engine can teach you about product sitemaps - BrightonSEO April...What a search engine can teach you about product sitemaps - BrightonSEO April...
What a search engine can teach you about product sitemaps - BrightonSEO April...
Pricesearcher
 
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne
 
HOW TO INCREASE YOUR TRAFFIC 5X WITH THIS ONE SEO METHOD
HOW TO INCREASE YOUR TRAFFIC 5X WITH THIS ONE SEO METHODHOW TO INCREASE YOUR TRAFFIC 5X WITH THIS ONE SEO METHOD
HOW TO INCREASE YOUR TRAFFIC 5X WITH THIS ONE SEO METHOD
Christoph C. Cemper
 

What's hot (20)

SEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO SuccessSEO Cannibalisation of Your Own SEO Success
SEO Cannibalisation of Your Own SEO Success
 
Pubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn andersonPubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn anderson
 
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
Conflicting Website Signals & Confused Search Engines - Rachel Costello, Tech...
 
How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...
How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...
How to Survive & Thrive after Mobile First Indexing - Rachel Costello, Techni...
 
Creating Commerce Reviews and Considering The Case For User Generated Reviews
Creating Commerce Reviews and Considering The Case For User Generated ReviewsCreating Commerce Reviews and Considering The Case For User Generated Reviews
Creating Commerce Reviews and Considering The Case For User Generated Reviews
 
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip GambleTechnical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
Technical SEO Beyond the Audit - Brighton SEO April 2017 - Philip Gamble
 
SEO Benchmark
SEO BenchmarkSEO Benchmark
SEO Benchmark
 
Crawl Budget - Some Insights & Ideas @ seokomm 2015
Crawl Budget - Some Insights & Ideas @ seokomm 2015Crawl Budget - Some Insights & Ideas @ seokomm 2015
Crawl Budget - Some Insights & Ideas @ seokomm 2015
 
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
SearchLeeds 2018 - Steve Chambers - Stickyeyes - How not to F**K up a Migration
 
Digital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Digital Olympus Technical SEO Findings Whilst Taming An SEO BeastDigital Olympus Technical SEO Findings Whilst Taming An SEO Beast
Digital Olympus Technical SEO Findings Whilst Taming An SEO Beast
 
BrightonSEO - The Search Universe - Links, Log Files, GSC and everything in b...
BrightonSEO - The Search Universe - Links, Log Files, GSC and everything in b...BrightonSEO - The Search Universe - Links, Log Files, GSC and everything in b...
BrightonSEO - The Search Universe - Links, Log Files, GSC and everything in b...
 
An SEO's Guide to Website Migrations | Faye Watt | BrightonSEO's Advanced Tec...
An SEO's Guide to Website Migrations | Faye Watt | BrightonSEO's Advanced Tec...An SEO's Guide to Website Migrations | Faye Watt | BrightonSEO's Advanced Tec...
An SEO's Guide to Website Migrations | Faye Watt | BrightonSEO's Advanced Tec...
 
BrightonSEO 2017 - SEO quick wins from a technical check
BrightonSEO 2017  - SEO quick wins from a technical checkBrightonSEO 2017  - SEO quick wins from a technical check
BrightonSEO 2017 - SEO quick wins from a technical check
 
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
 
the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)the SEO cyborg - Moz 2018 (full edition)
the SEO cyborg - Moz 2018 (full edition)
 
Competitive On Site Optimization
Competitive On Site OptimizationCompetitive On Site Optimization
Competitive On Site Optimization
 
The Technical SEO Renaissance
The Technical SEO RenaissanceThe Technical SEO Renaissance
The Technical SEO Renaissance
 
What a search engine can teach you about product sitemaps - BrightonSEO April...
What a search engine can teach you about product sitemaps - BrightonSEO April...What a search engine can teach you about product sitemaps - BrightonSEO April...
What a search engine can teach you about product sitemaps - BrightonSEO April...
 
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
Mark Osborne - Brighton SEO April 2019 The Seedy Underbelly of Keyword Resear...
 
HOW TO INCREASE YOUR TRAFFIC 5X WITH THIS ONE SEO METHOD
HOW TO INCREASE YOUR TRAFFIC 5X WITH THIS ONE SEO METHODHOW TO INCREASE YOUR TRAFFIC 5X WITH THIS ONE SEO METHOD
HOW TO INCREASE YOUR TRAFFIC 5X WITH THIS ONE SEO METHOD
 

Viewers also liked

Brighton SEO April 2016 - Engagement Rate Optimisation
Brighton SEO April 2016 - Engagement Rate OptimisationBrighton SEO April 2016 - Engagement Rate Optimisation
Brighton SEO April 2016 - Engagement Rate Optimisation
Branded3
 
Etiquette and Branding in Your Community
Etiquette and Branding in Your CommunityEtiquette and Branding in Your Community
Etiquette and Branding in Your Community
Erica McGillivray
 
Use free Machine Learning APIs #brightonseo
Use free Machine Learning APIs #brightonseoUse free Machine Learning APIs #brightonseo
Use free Machine Learning APIs #brightonseo
Jan-Willem Bobbink - Freelance SEO Consultant
 
Harnessing the Power of Audience
Harnessing the Power of AudienceHarnessing the Power of Audience
Harnessing the Power of Audience
Koozai
 
Brighton SEO - Site Speed for Content Marketers
Brighton SEO - Site Speed for Content MarketersBrighton SEO - Site Speed for Content Marketers
Brighton SEO - Site Speed for Content Marketers
Tom Bennet
 
How Generation Z is Driving Change in Search UX: Brighton SEO 2016
How Generation Z is Driving Change in Search UX: Brighton SEO 2016How Generation Z is Driving Change in Search UX: Brighton SEO 2016
How Generation Z is Driving Change in Search UX: Brighton SEO 2016
Erudite
 
Deep diving into featured snippets: How to earn more and rise to the top.
Deep diving into featured snippets: How to earn more and rise to the top.Deep diving into featured snippets: How to earn more and rise to the top.
Deep diving into featured snippets: How to earn more and rise to the top.
Rob Bucci
 
Marketing to Local Customers: Moving Beyond Local SEO to Win the Race
Marketing to Local Customers: Moving Beyond Local SEO to Win the RaceMarketing to Local Customers: Moving Beyond Local SEO to Win the Race
Marketing to Local Customers: Moving Beyond Local SEO to Win the Race
Greg Gifford
 
Why SEO needs to get Emotional #BrightonSEO
Why SEO needs to get Emotional #BrightonSEO Why SEO needs to get Emotional #BrightonSEO
Why SEO needs to get Emotional #BrightonSEO
Lisa Myers
 
Brighton SEO - What It's Like Having GA Premium
Brighton SEO - What It's Like Having GA PremiumBrighton SEO - What It's Like Having GA Premium
Brighton SEO - What It's Like Having GA Premium
Arianne Donoghue
 
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
Mark Thomas
 
What Happens After You Get A New Lead?
What Happens After You Get A New Lead?What Happens After You Get A New Lead?
What Happens After You Get A New Lead?
Drift
 
Surviving Google: SEO in 2020
Surviving Google: SEO in 2020Surviving Google: SEO in 2020
Surviving Google: SEO in 2020
Peter "Dr. Pete" Meyers
 
How to Use Social Media to Influence the World
How to Use Social Media to Influence the WorldHow to Use Social Media to Influence the World
How to Use Social Media to Influence the World
Sean Si
 
Seo Logs y Big Data, Lino Uruñuela en Seonthebeach 2016
Seo Logs y Big Data, Lino Uruñuela en Seonthebeach 2016Seo Logs y Big Data, Lino Uruñuela en Seonthebeach 2016
Seo Logs y Big Data, Lino Uruñuela en Seonthebeach 2016
Lino Uruñuela
 
OSCON 2012: Design and Debug HTML5 Apps for Devices with RIB and Web Simulator
OSCON 2012: Design and Debug HTML5 Apps for Devices with RIB and Web SimulatorOSCON 2012: Design and Debug HTML5 Apps for Devices with RIB and Web Simulator
OSCON 2012: Design and Debug HTML5 Apps for Devices with RIB and Web Simulator
Gail Frederick
 
Powerful Quotes from BrightonSEO 2016 Speakers
Powerful Quotes from BrightonSEO 2016 SpeakersPowerful Quotes from BrightonSEO 2016 Speakers
Powerful Quotes from BrightonSEO 2016 Speakers
Gabriella Fonseca Ribeiro
 
BrightonSEO - Biddable Media Session
BrightonSEO - Biddable Media SessionBrightonSEO - Biddable Media Session
BrightonSEO - Biddable Media Session
Saija Mahon
 
Semantic web & structured data - #BrightonSEO
Semantic web & structured data  - #BrightonSEOSemantic web & structured data  - #BrightonSEO
Semantic web & structured data - #BrightonSEO
Jan-Willem Bobbink - Freelance SEO Consultant
 
In Pursuit of UGC: The Power of Genuine Reviews
In Pursuit of UGC: The Power of Genuine ReviewsIn Pursuit of UGC: The Power of Genuine Reviews
In Pursuit of UGC: The Power of Genuine Reviews
Feefo
 

Viewers also liked (20)

Brighton SEO April 2016 - Engagement Rate Optimisation
Brighton SEO April 2016 - Engagement Rate OptimisationBrighton SEO April 2016 - Engagement Rate Optimisation
Brighton SEO April 2016 - Engagement Rate Optimisation
 
Etiquette and Branding in Your Community
Etiquette and Branding in Your CommunityEtiquette and Branding in Your Community
Etiquette and Branding in Your Community
 
Use free Machine Learning APIs #brightonseo
Use free Machine Learning APIs #brightonseoUse free Machine Learning APIs #brightonseo
Use free Machine Learning APIs #brightonseo
 
Harnessing the Power of Audience
Harnessing the Power of AudienceHarnessing the Power of Audience
Harnessing the Power of Audience
 
Brighton SEO - Site Speed for Content Marketers
Brighton SEO - Site Speed for Content MarketersBrighton SEO - Site Speed for Content Marketers
Brighton SEO - Site Speed for Content Marketers
 
How Generation Z is Driving Change in Search UX: Brighton SEO 2016
How Generation Z is Driving Change in Search UX: Brighton SEO 2016How Generation Z is Driving Change in Search UX: Brighton SEO 2016
How Generation Z is Driving Change in Search UX: Brighton SEO 2016
 
Deep diving into featured snippets: How to earn more and rise to the top.
Deep diving into featured snippets: How to earn more and rise to the top.Deep diving into featured snippets: How to earn more and rise to the top.
Deep diving into featured snippets: How to earn more and rise to the top.
 
Marketing to Local Customers: Moving Beyond Local SEO to Win the Race
Marketing to Local Customers: Moving Beyond Local SEO to Win the RaceMarketing to Local Customers: Moving Beyond Local SEO to Win the Race
Marketing to Local Customers: Moving Beyond Local SEO to Win the Race
 
Why SEO needs to get Emotional #BrightonSEO
Why SEO needs to get Emotional #BrightonSEO Why SEO needs to get Emotional #BrightonSEO
Why SEO needs to get Emotional #BrightonSEO
 
Brighton SEO - What It's Like Having GA Premium
Brighton SEO - What It's Like Having GA PremiumBrighton SEO - What It's Like Having GA Premium
Brighton SEO - What It's Like Having GA Premium
 
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
 
What Happens After You Get A New Lead?
What Happens After You Get A New Lead?What Happens After You Get A New Lead?
What Happens After You Get A New Lead?
 
Surviving Google: SEO in 2020
Surviving Google: SEO in 2020Surviving Google: SEO in 2020
Surviving Google: SEO in 2020
 
How to Use Social Media to Influence the World
How to Use Social Media to Influence the WorldHow to Use Social Media to Influence the World
How to Use Social Media to Influence the World
 
Seo Logs y Big Data, Lino Uruñuela en Seonthebeach 2016
Seo Logs y Big Data, Lino Uruñuela en Seonthebeach 2016Seo Logs y Big Data, Lino Uruñuela en Seonthebeach 2016
Seo Logs y Big Data, Lino Uruñuela en Seonthebeach 2016
 
OSCON 2012: Design and Debug HTML5 Apps for Devices with RIB and Web Simulator
OSCON 2012: Design and Debug HTML5 Apps for Devices with RIB and Web SimulatorOSCON 2012: Design and Debug HTML5 Apps for Devices with RIB and Web Simulator
OSCON 2012: Design and Debug HTML5 Apps for Devices with RIB and Web Simulator
 
Powerful Quotes from BrightonSEO 2016 Speakers
Powerful Quotes from BrightonSEO 2016 SpeakersPowerful Quotes from BrightonSEO 2016 Speakers
Powerful Quotes from BrightonSEO 2016 Speakers
 
BrightonSEO - Biddable Media Session
BrightonSEO - Biddable Media SessionBrightonSEO - Biddable Media Session
BrightonSEO - Biddable Media Session
 
Semantic web & structured data - #BrightonSEO
Semantic web & structured data  - #BrightonSEOSemantic web & structured data  - #BrightonSEO
Semantic web & structured data - #BrightonSEO
 
In Pursuit of UGC: The Power of Genuine Reviews
In Pursuit of UGC: The Power of Genuine ReviewsIn Pursuit of UGC: The Power of Genuine Reviews
In Pursuit of UGC: The Power of Genuine Reviews
 

Similar to SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016

How to Optimize Your Website for Crawl Efficiency
How to Optimize Your Website for Crawl EfficiencyHow to Optimize Your Website for Crawl Efficiency
How to Optimize Your Website for Crawl Efficiency
Semrush
 
How Google WOrks?
How Google WOrks?How Google WOrks?
How Google WOrks?
07Deeps
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
IRJET Journal
 
How Google Works
How Google WorksHow Google Works
How Google Works
Ganesh Solanke
 
Web Crawler For Mining Web Data
Web Crawler For Mining Web DataWeb Crawler For Mining Web Data
Web Crawler For Mining Web Data
IRJET Journal
 
Web Crawler
Web CrawlerWeb Crawler
Web Crawler
iamthevictory
 
How google works and functions: A complete Approach
How google works and functions: A complete ApproachHow google works and functions: A complete Approach
How google works and functions: A complete Approach
Prakhar Gethe
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine
Aniket_1415
 
Dipika arora ppts
Dipika arora pptsDipika arora ppts
Dipika arora ppts
preetianeja
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Jason Mun
 
Seo Manual
Seo ManualSeo Manual
Seo Manual
imgaurav16
 
Modern SEO Players Guide
Modern SEO Players GuideModern SEO Players Guide
Modern SEO Players Guide
Michael King
 
Seo tutorial
Seo tutorialSeo tutorial
Seo tutorial
Avkash Patel
 
Web crawler
Web crawlerWeb crawler
Web crawler
anusha kurapati
 
JavaScript SEO Ungagged 2019 Patrick Stox
JavaScript SEO Ungagged 2019 Patrick StoxJavaScript SEO Ungagged 2019 Patrick Stox
JavaScript SEO Ungagged 2019 Patrick Stox
patrickstox
 
Week10
Week10Week10
Week10
kenji
 
Week10
Week10Week10
Week10
KA04YU04
 
C. Concept Mapping (Week # 3 - 7)
C. Concept Mapping (Week # 3 - 7) C. Concept Mapping (Week # 3 - 7)
C. Concept Mapping (Week # 3 - 7)
s1160202
 
Week10
Week10Week10
Week10
s1160210
 
Notes for
Notes forNotes for
Notes for
9pallen
 

Similar to SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016 (20)

How to Optimize Your Website for Crawl Efficiency
How to Optimize Your Website for Crawl EfficiencyHow to Optimize Your Website for Crawl Efficiency
How to Optimize Your Website for Crawl Efficiency
 
How Google WOrks?
How Google WOrks?How Google WOrks?
How Google WOrks?
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Web Crawler For Mining Web Data
Web Crawler For Mining Web DataWeb Crawler For Mining Web Data
Web Crawler For Mining Web Data
 
Web Crawler
Web CrawlerWeb Crawler
Web Crawler
 
How google works and functions: A complete Approach
How google works and functions: A complete ApproachHow google works and functions: A complete Approach
How google works and functions: A complete Approach
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine
 
Dipika arora ppts
Dipika arora pptsDipika arora ppts
Dipika arora ppts
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
 
Seo Manual
Seo ManualSeo Manual
Seo Manual
 
Modern SEO Players Guide
Modern SEO Players GuideModern SEO Players Guide
Modern SEO Players Guide
 
Seo tutorial
Seo tutorialSeo tutorial
Seo tutorial
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
JavaScript SEO Ungagged 2019 Patrick Stox
JavaScript SEO Ungagged 2019 Patrick StoxJavaScript SEO Ungagged 2019 Patrick Stox
JavaScript SEO Ungagged 2019 Patrick Stox
 
Week10
Week10Week10
Week10
 
Week10
Week10Week10
Week10
 
C. Concept Mapping (Week # 3 - 7)
C. Concept Mapping (Week # 3 - 7) C. Concept Mapping (Week # 3 - 7)
C. Concept Mapping (Week # 3 - 7)
 
Week10
Week10Week10
Week10
 
Notes for
Notes forNotes for
Notes for
 

More from Dawn Anderson MSc DigM

Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
Dawn Anderson MSc DigM
 
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic UpdatesLife of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Dawn Anderson MSc DigM
 
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Dawn Anderson MSc DigM
 
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you think
Dawn Anderson MSc DigM
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Dawn Anderson MSc DigM
 
Google BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual ConferenceGoogle BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual Conference
Dawn Anderson MSc DigM
 
Google BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to KnowGoogle BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to Know
Dawn Anderson MSc DigM
 
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Dawn Anderson MSc DigM
 
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
Dawn Anderson MSc DigM
 
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Dawn Anderson MSc DigM
 
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Dawn Anderson MSc DigM
 
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard RaceGoogle BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Dawn Anderson MSc DigM
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive Search
Dawn Anderson MSc DigM
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Dawn Anderson MSc DigM
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
Dawn Anderson MSc DigM
 
SEO in a Mobile First World
SEO in a Mobile First WorldSEO in a Mobile First World
SEO in a Mobile First World
Dawn Anderson MSc DigM
 
Modern Ecommerce SEO
Modern Ecommerce SEOModern Ecommerce SEO
Modern Ecommerce SEO
Dawn Anderson MSc DigM
 
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Dawn Anderson MSc DigM
 
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
Dawn Anderson MSc DigM
 
SEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm ShiftSEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm Shift
Dawn Anderson MSc DigM
 

More from Dawn Anderson MSc DigM (20)

Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
 
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic UpdatesLife of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
 
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
 
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you think
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
 
Google BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual ConferenceGoogle BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual Conference
 
Google BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to KnowGoogle BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to Know
 
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
Disambiguating Equiprobability in SEO Dawn Anderson Friends of Search 2020
 
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
 
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
 
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019
 
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard RaceGoogle BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard Race
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive Search
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
SEO in a Mobile First World
SEO in a Mobile First WorldSEO in a Mobile First World
SEO in a Mobile First World
 
Modern Ecommerce SEO
Modern Ecommerce SEOModern Ecommerce SEO
Modern Ecommerce SEO
 
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
 
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
 
SEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm ShiftSEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm Shift
 

Recently uploaded

Chandigarh Institute of Internet Marketing
Chandigarh Institute of Internet MarketingChandigarh Institute of Internet Marketing
Chandigarh Institute of Internet Marketing
CIIM
 
Adult Services Ads _ Adult Search Engine Marketing _ Adult Ads.pdf
Adult Services Ads _ Adult Search Engine Marketing _ Adult Ads.pdfAdult Services Ads _ Adult Search Engine Marketing _ Adult Ads.pdf
Adult Services Ads _ Adult Search Engine Marketing _ Adult Ads.pdf
jorge638714
 
ABM, The True Story - Rob Griffin, G5 Futures
ABM, The True Story - Rob Griffin, G5 FuturesABM, The True Story - Rob Griffin, G5 Futures
Content Optimization Master Class - Matt Raven
Content Optimization Master Class - Matt RavenContent Optimization Master Class - Matt Raven
Powering Up Your Digital Strategy, Amplifying the Potential of Performance-Ba...
Powering Up Your Digital Strategy, Amplifying the Potential of Performance-Ba...Powering Up Your Digital Strategy, Amplifying the Potential of Performance-Ba...
Powering Up Your Digital Strategy, Amplifying the Potential of Performance-Ba...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Do's and Don'ts: How to Pitch Your Press Release to Journalists
Do's and Don'ts: How to Pitch Your Press Release to JournalistsDo's and Don'ts: How to Pitch Your Press Release to Journalists
Do's and Don'ts: How to Pitch Your Press Release to Journalists
Aggregage
 
PPC and SEO Synergies - Strategies Every Company Should Deploy - Benjamin Lund
PPC and SEO Synergies - Strategies Every Company Should Deploy - Benjamin LundPPC and SEO Synergies - Strategies Every Company Should Deploy - Benjamin Lund
PPC and SEO Synergies - Strategies Every Company Should Deploy - Benjamin Lund
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Uncovering Marketo Engage's Target Account Management
Uncovering Marketo Engage's Target Account ManagementUncovering Marketo Engage's Target Account Management
Uncovering Marketo Engage's Target Account Management
GNW Consulting
 
Building A Powerful Brand on Any Platform
Building A Powerful Brand on Any PlatformBuilding A Powerful Brand on Any Platform
Building A Powerful Brand on Any Platform
Xpand Marketing
 
Brand Repositioning & Communication Presentation
Brand Repositioning & Communication PresentationBrand Repositioning & Communication Presentation
Brand Repositioning & Communication Presentation
Rajesh Math
 
Go To Market Strategy - Zig When Others Zag
Go To Market Strategy - Zig When Others ZagGo To Market Strategy - Zig When Others Zag
Go To Market Strategy - Zig When Others Zag
Rajesh Math
 
Digital Marketing Manager Job Role Jobss
Digital Marketing Manager Job Role JobssDigital Marketing Manager Job Role Jobss
Digital Marketing Manager Job Role Jobss
Landge Sachin Kumar
 
The Power of Micro Influencers in Influencer Marketing.pptx
The Power of Micro Influencers in Influencer Marketing.pptxThe Power of Micro Influencers in Influencer Marketing.pptx
The Power of Micro Influencers in Influencer Marketing.pptx
Inflyx
 
The Marketing Vunerabilities in Tech and IP-Led Companies
The Marketing Vunerabilities in Tech and IP-Led CompaniesThe Marketing Vunerabilities in Tech and IP-Led Companies
The Marketing Vunerabilities in Tech and IP-Led Companies
Bethan Vincent
 
TAM AdEx-Quarterly Report on Radio Advertising_2024.pdf
TAM AdEx-Quarterly Report on Radio Advertising_2024.pdfTAM AdEx-Quarterly Report on Radio Advertising_2024.pdf
TAM AdEx-Quarterly Report on Radio Advertising_2024.pdf
Social Samosa
 
3 Types Of PR & SEO Funnels That Will Maximize Conversions.pdf
3 Types Of PR & SEO Funnels That Will Maximize Conversions.pdf3 Types Of PR & SEO Funnels That Will Maximize Conversions.pdf
3 Types Of PR & SEO Funnels That Will Maximize Conversions.pdf
search engine jornal
 
Chemical Industry- Rashtriya Chemical Fertilizers (RCF) .pptx
Chemical Industry- Rashtriya Chemical Fertilizers (RCF) .pptxChemical Industry- Rashtriya Chemical Fertilizers (RCF) .pptx
Chemical Industry- Rashtriya Chemical Fertilizers (RCF) .pptx
mayurparate000
 
Marketing Plan for The Spark Foundation
Marketing Plan for The Spark Foundation Marketing Plan for The Spark Foundation
Marketing Plan for The Spark Foundation
SyrineTouati
 
Introduction-to-Social-Marketing-2024Trendy Mktg
Introduction-to-Social-Marketing-2024Trendy MktgIntroduction-to-Social-Marketing-2024Trendy Mktg
Introduction-to-Social-Marketing-2024Trendy Mktg
puneetmonga971720526
 
How Social Media effect on business | eflot
How Social Media effect on business | eflotHow Social Media effect on business | eflot
How Social Media effect on business | eflot
swethag283189
 

Recently uploaded (20)

Chandigarh Institute of Internet Marketing
Chandigarh Institute of Internet MarketingChandigarh Institute of Internet Marketing
Chandigarh Institute of Internet Marketing
 
Adult Services Ads _ Adult Search Engine Marketing _ Adult Ads.pdf
Adult Services Ads _ Adult Search Engine Marketing _ Adult Ads.pdfAdult Services Ads _ Adult Search Engine Marketing _ Adult Ads.pdf
Adult Services Ads _ Adult Search Engine Marketing _ Adult Ads.pdf
 
ABM, The True Story - Rob Griffin, G5 Futures
ABM, The True Story - Rob Griffin, G5 FuturesABM, The True Story - Rob Griffin, G5 Futures
ABM, The True Story - Rob Griffin, G5 Futures
 
Content Optimization Master Class - Matt Raven
Content Optimization Master Class - Matt RavenContent Optimization Master Class - Matt Raven
Content Optimization Master Class - Matt Raven
 
Powering Up Your Digital Strategy, Amplifying the Potential of Performance-Ba...
Powering Up Your Digital Strategy, Amplifying the Potential of Performance-Ba...Powering Up Your Digital Strategy, Amplifying the Potential of Performance-Ba...
Powering Up Your Digital Strategy, Amplifying the Potential of Performance-Ba...
 
Do's and Don'ts: How to Pitch Your Press Release to Journalists
Do's and Don'ts: How to Pitch Your Press Release to JournalistsDo's and Don'ts: How to Pitch Your Press Release to Journalists
Do's and Don'ts: How to Pitch Your Press Release to Journalists
 
PPC and SEO Synergies - Strategies Every Company Should Deploy - Benjamin Lund
PPC and SEO Synergies - Strategies Every Company Should Deploy - Benjamin LundPPC and SEO Synergies - Strategies Every Company Should Deploy - Benjamin Lund
PPC and SEO Synergies - Strategies Every Company Should Deploy - Benjamin Lund
 
Uncovering Marketo Engage's Target Account Management
Uncovering Marketo Engage's Target Account ManagementUncovering Marketo Engage's Target Account Management
Uncovering Marketo Engage's Target Account Management
 
Building A Powerful Brand on Any Platform
Building A Powerful Brand on Any PlatformBuilding A Powerful Brand on Any Platform
Building A Powerful Brand on Any Platform
 
Brand Repositioning & Communication Presentation
Brand Repositioning & Communication PresentationBrand Repositioning & Communication Presentation
Brand Repositioning & Communication Presentation
 
Go To Market Strategy - Zig When Others Zag
Go To Market Strategy - Zig When Others ZagGo To Market Strategy - Zig When Others Zag
Go To Market Strategy - Zig When Others Zag
 
Digital Marketing Manager Job Role Jobss
Digital Marketing Manager Job Role JobssDigital Marketing Manager Job Role Jobss
Digital Marketing Manager Job Role Jobss
 
The Power of Micro Influencers in Influencer Marketing.pptx
The Power of Micro Influencers in Influencer Marketing.pptxThe Power of Micro Influencers in Influencer Marketing.pptx
The Power of Micro Influencers in Influencer Marketing.pptx
 
The Marketing Vunerabilities in Tech and IP-Led Companies
The Marketing Vunerabilities in Tech and IP-Led CompaniesThe Marketing Vunerabilities in Tech and IP-Led Companies
The Marketing Vunerabilities in Tech and IP-Led Companies
 
TAM AdEx-Quarterly Report on Radio Advertising_2024.pdf
TAM AdEx-Quarterly Report on Radio Advertising_2024.pdfTAM AdEx-Quarterly Report on Radio Advertising_2024.pdf
TAM AdEx-Quarterly Report on Radio Advertising_2024.pdf
 
3 Types Of PR & SEO Funnels That Will Maximize Conversions.pdf
3 Types Of PR & SEO Funnels That Will Maximize Conversions.pdf3 Types Of PR & SEO Funnels That Will Maximize Conversions.pdf
3 Types Of PR & SEO Funnels That Will Maximize Conversions.pdf
 
Chemical Industry- Rashtriya Chemical Fertilizers (RCF) .pptx
Chemical Industry- Rashtriya Chemical Fertilizers (RCF) .pptxChemical Industry- Rashtriya Chemical Fertilizers (RCF) .pptx
Chemical Industry- Rashtriya Chemical Fertilizers (RCF) .pptx
 
Marketing Plan for The Spark Foundation
Marketing Plan for The Spark Foundation Marketing Plan for The Spark Foundation
Marketing Plan for The Spark Foundation
 
Introduction-to-Social-Marketing-2024Trendy Mktg
Introduction-to-Social-Marketing-2024Trendy MktgIntroduction-to-Social-Marketing-2024Trendy Mktg
Introduction-to-Social-Marketing-2024Trendy Mktg
 
How Social Media effect on business | eflot
How Social Media effect on business | eflotHow Social Media effect on business | eflot
How Social Media effect on business | eflot
 

SEO Crawl Rank And Crawl Tank - Brighton SEO April 2016

  • 1. SEO  ‘Crawl  Tank’  -­‐ ‘Death  and  Resurrection’ WHY  YOU  SHOULD  CARE  ABOUT  TAKING   CARE  OF  CRAWLS  (INTELLIGENT  USE  OF   CRAWL  ALLOCATION  (BUDGET)) THE  QUEST   FOR  ‘CRAWL   RANK’ Dawn  Anderson  @  dawnieando
  • 2. Indexed  Web  contains at  least  4.73  billion   pages (13/11/2015) 1 THE  WEB  IS  ‘BIG’ Total  number  of  websites 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 1,000,000,000 750,000,000 500,000,000 250,000,000 SINCE  2013  THE  WEB  IS   THOUGHT  TO  HAVE   INCREASED  IN  SIZE  BY  1/3
  • 3. 2THE  ABILITY  TO  ‘SELF  PUBLISH’  EASILY  HAS  CLEARLY   INFLUENCED  THIS  – WE  ALL ‘LOVE  CONTENT’ IMPORTANT  TO  NOTE   THAT  75%  OF   WEBSITES  ONLINE   ARE  DORMANT  (E.G.   PARKED  DOMAINS) IMAGINE  HOW  MANY   UNIQUE  URLs    COMBINED   THIS  AMOUNTS  TO?   – A  LOT http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/
  • 4. Capacity  limits   on  Google’s   crawling  system By  prioritising   URLs  for   crawling By  assigning   crawl  period   intervals  to  URLs How  have   search  engines   responded? By  creating  work   ‘schedules’  for   Googlebots 3 TOO  MUCH  CONTENT
  • 5. 4HERE’S  WHY  -­>  EVERYTHING  HAS  A   FINITE  CAPACITY  (EVEN  CRAWLING) “While  web  pages  can  be  manually  selected  for   crawling,  this  becomes  impracticable  as  the   number  of  web  pages  grows.  Moreover,  to  keep   within  the  capacity  limits  of  the  crawler,   automated  selection  mechanisms  are  needed  to   determine  not  only  which  web  pages  to  crawl,   but  which  web  pages  to  avoid  crawling.  For   instance,  as  of  the  end  of  2003,  the  WWW  is   believed  to  include  well  in  excess  of  10  billion   distinct  documents  or  web  pages,  while  a  search   engine  may  have  a  crawling  capacity  that  is  less   than  half  as  many  documents.”  -­‐ Scheduler  for   search  engine  crawler Google  Patent US  8042112  B1,  (Zhu  et  al)
  • 6. ‘Managing items in crawl schedule’ - US  8666964  B1 Include 5SOME  GOOGLE  CRAWL  SCHEDULER   PATENTS ‘Scheduling a recrawl’ - US   8386459  B1 ‘Web crawler scheduler that utilizes sitemaps from websites’ - US  8037054  B2 ‘Document reuse in a search engine crawler’ - US  8707312  B1 ‘Minimizing visibility of stale content in web searching including revising web crawl intervals of documents’ - US   8407204  B2 ‘Scheduler for search engine crawler’ - US  8042112  B1 ‘Distributed crawling of hyperlinked documents’ - US  7305610  B1 IT  SEEMS  PRIORITIZATION  AND  GOOGLEBOT   CRAWL  EFFICIENCY  ARE  IMPORTANT  TO  SEARCH   ENGINES
  • 7. Crawled  multiple   times  daily Crawled  daily   Or  bi-­‐daily Crawled  least  on  a  ‘round   robin’  basis  – only  ‘active’   segment  is  crawledSplit  into  segments   on  random  rotation 6 “MANAGING  ITEMS  IN  A  CRAWL   SCHEDULE” (GOOGLE  PATENT  US  8666964  B1) Real  Time Crawl Daily Crawl Base  Layer    Crawl 3  layers  /  tiers URLs  are  moved   in  and  out  of   layers  based  on   past  visits  data   (retrieved  from   logs) PAGE ‘IMPORTANCE’ AND URL SCHEDULING
  • 8. 10  types of Googlebot THE  KEY  SEARCH  ENGINE  (THE   APPLIANCE)    CHARACTERS 7 SUPPORTING  ROLES  (LOG   MANAGERS  &  PAGE   RANKERS Indexer  /   Ranking  Engine The  URL   Scheduler History  Logs Link  Logs  /  Link  Maps Anchor  Logs  /  Anchor  Maps Status  Logs Page  Rankers
  • 9. 8THE  ‘LOG’  MANAGERS          (‘The  Clerks’) History  Logs Link  Logs JOBS  INCLUDE JOBS  INCLUDE Other  Logs JOBS  INCLUDE Consider  these  as  ‘record-­keepers’  (record   info  on  the  crawled  URLS Retrieves   previous  copies  of   documents  for   comparison  with   newly  retrieved   copies  for   purposes   of   ’change   frequency’  and   ‘change  weight’   calculation  (last   modified  &   update  rate) Include: “identifies  all  the  links  (e.g.,   URLs,  also  called  outbound   links)  that  are  found  in  the   document  associated  with  the   record  and  the  text  that   surrounds   the  link”  (Brawer  et   al,  Google  Patent) INFO  USED  TO  MAKE  LINK   MAPS • Anchor  Logs  &   Maps • Status  Logs A  LOT  MORE  INFO  ON   LOGS  AT:  Scheduler  for   Search  Engine  Crawler US  20100241621  A1
  • 10. 9 SUPERVISOR  -­ TEAM  LEADER  – ‘THE  URL   SCHEDULER’ Think  of  it  as  Google’s   line  manager  or  ‘air   traffic  controller’  for   Googlebots in  the   web  crawling  system JOBS Schedules  Googlebot visits  to  URLs Decides  which  URLs  to  ‘feed’  to  Googlebot Uses  data  from  the  history  logs  about  past  visits Assigns  visit  regularity  of  Googlebot to  URLs Drops  ‘hints’  to  Googlebot to  guide  on  types  of  content  NOT  to   crawl  and  excludes  some  URLs  from  schedules Analyses  past  ‘change’  periods  and  predicts  future  ‘change’   (BASED  ON  PAST  VISIT  DATA)  periods  for  URLs  for  the  purposes  of   scheduling  Googlebot visits Checks  ‘page  importance’  in  scheduling  visits  (PRIORITIES) Assigns  URLs  to  ‘layers  /  tiers’  for  crawling  schedules   (REAL  TIME,   DAILY,  BASE  LAYER  SEGMENT) The  URL   Scheduler   controls  the   meal  planner Scheduler  checks  URLs   for  ‘importance’,  ‘boost   factor’  candidacy,   ‘probability  of   modification’ ‘Budgets’  are  allocated Carefully  controls   the  list  of  URLs   Googlebot visits
  • 11. THE  10  GOOGLEBOTS Image Video News Adsense Adsbot PAID  SEARCH  TYPES 10 MEDIA  TYPES Smartphone AppsFeaturephoneMobile   Adsense MOBILE  TYPES BOT TYPES HAVE VARYING DEGREES OF ‘BUSY-NESS’ GOOGLEBOT   WEB  SEARCH Crawls   images  only Quality Checks Babybot (’the   Noob’)
  • 12. GOOGLEBOT  JOBS 11 JOBS • ‘Ranks  nothing  at  all’ • Takes  a  list  of  URLs  to  crawl  from  URL  Scheduler • Job  varies  based  on  ‘bot’  type  (e.g.  Image  bot  seems  a  bit  of  a  ‘part   timer’  (images  change  less  frequently)) • Runs  errands  &  makes  deliveries  for  the  URL  server,  indexer  /  ranking   engine  and  logs • Makes  notes  of  outbound   linked  pages  and  additional  links  for  future   crawling  (in  order  for  them  to  be  assigned  to  future  crawling  schedules) • Takes  notes  of  ‘hints’  from  URL  scheduler  when  crawling • Tells  tales  of  URL  accessibility  status,  server  response  codes,  notes   relationships  between  links  and  collects  content  checksums  (binary  data   equivalent  of  web  content)  for  comparison  with  past  visits  by  history  and   link  logs
  • 13. 12 ‘INDEXER’ Looks  at  all  of  the   evidence  from  the   various  logs  (and  the   page  rankers)  of  the   search  engine  to   index  the  URLs • Uses  the  combined  data  collected  in  order  to  index  the   results  for  a  given  query • TAKES  DATA  FROM  THE  LOGS    TO  GENERATE  INDEXES “The  indexer(s) 724 use  the  anchor  maps 718   and  other  logs 716 to  generate  index(es) 726.   The  index(es)  are  used  by  the  search  engine  to   identify  documents  matching  queries  entered  by   users  of  the  search  engine.”  (Web  crawler   scheduler  that  utilizes  sitemaps  from  websites US  8037054  B2,  Google  Patent,  Brawer  et  al,   pub  2011)
  • 14. I  ASKED  JOHN  MUELLER  AT  WEBMASTER  HANGOUT   ABOUT  URL  QUEUES 14 GOOGLE   WEBMASTER   HANGOUT   QUESTION  ON   ’URL  QUEUEING’ BUT  WHAT  OTHER  EVIDENCE  DO  WE  HAVE  TO   SUPPORT  OUT  THEORIES? “URLS  ARE  NOT  ALL  CRAWLED  IN  ORDER,  BUT  THAT   SOME  RECEIVE  MULTIPLE  DAILY  CRAWLS,  SOME  DAILY,   SOME  WEEKLY  AND  SOME  VERY  INFREQUENTLY” https://www.seroundtable.com/google-­‐explains-­‐why-­‐ the-­‐search-­‐console-­‐has-­‐reporting-­‐delays-­‐21688.html LOW  IMPORTANCE  URLs   APPEAR  TO  BE  ‘QUEUED   FOR  LATER’  AND   VISITED  INFREQUENTLY   WHEN  THERE  IS  SPARE   CAPACITY  (LOWER   PRIORITY)  (SCHEDULES)
  • 15. WHICH  APPEARED  TO  SUPPORT… 15 “Priority  scores  are   computed  for  each   remaining  document   identifier  based  on   predetermined  criteria   (e.g.,  a  page  importance   score  of  the  document).”   (Zhu  et  al,  2011) PATENT  -­‐ Scheduler  for  search   engine  crawler US  8042112  B1
  • 16. 16 CRAWL  BUDGET 1.  CRAWL  BUDGET  – “AN  ALLOCATION  OF   CRAWL  VISITS  TO  A  HOST”   3.  PAGES  WITH  A  LOT  OF  LINKS  GET   CRAWLED  MORE 4.  THE  VAST  MAJORITY  OF  URLS  ON  THE  WEB  DON’T  GET  A  LOT   OF  BUDGET  ALLOCATED  TO  THEM  (LOW  TO  0  PAGERANK  URLS).   2.  ROUGHLY  PROPORTIONATE  TO   PAGERANK  AND  HOST  SPEED  /  CAPACITY Mostly  taken  from  Eric  Enge’s (interview  with   Matt  Cutts (@mattcutts)  interview  from  2010 https://www.stonetemple.com/matt-­‐cutts-­‐ interviewed-­‐by-­‐eric-­‐enge-­‐2/
  • 17. I  ASKED  SOME  STUFF  ABOUT  CRAWL   BUDGET  ALLOCATION 17 DISTRIBUTED  CRAWLING  OF  HYPERLINKED   DOCUMENTS  -­‐ Patent  Abstract  – “Hyperlinked   documents  to  be  crawled  are  grouped  by  host   and  the  host  to  be  crawled  next  is  selected   according  to  a  stall  time  of  the  host.  The  stall   time  can  indicate  the  earliest  time  that  the  host   should  be  crawled  and  the  stall  times  can  be  a   predetermined  amount  of  time,  vary  by  host  and   be  adjusted  according  to  actual  retrieval  times   from  the  host”  (Dean  et  al  (Google,  2014)) IT  SEEMS  – BUDGET  IS  ASSIGNED  TO  THE  HOST   (I.P)  AND  THEN  SHARED  BETWEEN  THE  SITES   THERE
  • 18. I  ASKED  SOME  STUFF  ABOUT  LINKS  AND  CRAWL   BUDGET  (in  light  of  2012  ‘DISAVOW  TOOL’) 18 TIP  (IMHO  -­ DAWN)  – YOU  MAY  NEED  TO   RESTRUCTURE  /   FLATTEN  SO  ‘BUDGET’   CAN  REACH   IMPORTANT  URLS “Thanks   John”  -­‐ Waving  J
  • 19. 19IT  SEEMS  THERE  MORE  FACTORS  AFFECTING  ‘CRAWL   BUDGET??’ Transcript:   https://searchenginewatch.com/201 6/04/06/webpromos-­‐qa-­‐with-­‐ googles-­‐andrey-­‐lipattsev-­‐transcript/ WEB  PROMOS  Q  &  A  WITH  GOOGLES   ANDREY  LIPATTSEV Andrev chatting  with  Ammon  J   seemed  to  imply  that  a  lot   more  things  affect  crawl   frequency  now  than  just   PageRank
  • 20. 20 ARE  THERE  OTHER  FACTORS  AFFECTING   BUDGET  AND  /  OR  ‘CRAWL  RANK’  AS  WELL  AS   PAGERANK  AND  SPEED?   I  ASKED  @johnmu IF  I   COULD  ASK  WHETHER   THE  FACTORS   AFFECTING  CRAWL   BUDGET  HAD   CHANGED? JOHN  SAID  – “Sure…You  can  always  ask”  J J – “But,  he  didn’t  tell  me  what  they  were  (if  any)” SO  I  ASKED  IF  I  COULD  ASK  IF  FACTORS  AFFECTING   CRAWL  BUDGET  /  CRAWL  FREQUENCY  HAD   CHANGED  – I.E.  ADDITIONAL  FACTORS?
  • 21. 22 GOOGLE  PATENT  – ‘NOT  ALL  ‘CHANGE’  IS   CONSIDERED  EQUAL’    (CRITICAL  &  NON-­CRITICAL) “Changes  can  be  described  as  critical  or  non-­critical  and  that   determination  may  depend  on  the  portion  of  the  document  changed,  or   the  context  of  the  changes,  rather  than  the  amount  of  text  or  content   changed.  Sometimes  a  change  to  a  document  may  be  insubstantial,   e.g.,  the  change  of  advertisements  associated  with  a  document.  In  this   case,  it  is  more  appropriate  to  ignore  those  accessory  materials  in  a   document  prior  to  making  content  comparisons.  In  other  cases,  e.g.,  as   part  of  a  product  search,  not  every  piece  of  information  in  a   document  is  weighted  equally  by  a  potential  user.  For  instance,  the   user  may  care  more  about  the  unit  price  of  the  product  and  the   availability  of  the  product.  In  this  case,  it  is  more  appropriate  to  focus   on  the  changes  associated  with  information  that  is  deemed  critical   to  a  potential  user  rather  than  something  that  is  less  significant,   e.g.,  a  change  in  a  product's  colour”    (Minimizing   Visibility  of  Stale   Content  in  Web  Searching  Including  Revising  Web  Crawl  Intervals  of   Documents -­‐ Anton  Carver,  Google  Patent  -­‐ US  20130226897  A1,  pub  2013) Probability  &   predictability   of  future   ‘freshness’   (newness  or   critical  material   change)   (‘CHANGE   RATE’  APPEARS   TO  BE   ‘LEARNED’) ’CHANGE  RATE   &  CHANGE   WEIGHT   THRESHOLDS’
  • 22. CRITICAL  MATERIAL  CONTENT  CHANGE   (IMPORTANT  CHANGE)  &  FEATURE  WEIGHTS   21 C  =  ∑  i =  0  n  -­‐ 1    weight  i *  feature NOT JUST ‘RANDOM’ CHANGE like Shuffle($variable) or RAND($variable) NOT  ALL  ‘FEATURES’  ARE  CREATED  EQUAL  ACCORDING  TO  THIS  LINE   IN  PATENTS  –”  weight  i *  feature” EXAMPLE  FEATURES  – E.G.  A  CHANGE  IN  PRICE  (FEATURE)   MAY  BE  WEIGHTED  HIGHER  THAN  A  CHANGE  IN  COLOUR   (FEATURE)  – FEATURE  WEIGHT  PRICE  >  FEATURE  WEIGHT   COLOUR ”DEPENDS  ON  HOW  OFTEN  THE   PAGE  CHANGES”  IS  MENTIONED  A   LOT IN  WEBMASTER  HANGOUTS Minimizing   Visibility  of  Stale  Content  in  Web  Searching   Including  Revising  Web  Crawl  Intervals  of  Documents -­‐ Anton   Carver,  Google  Patent  -­‐ US  20130226897  A1,  pub  2013
  • 23. “BE  CONSISTENT”  -­ (@johnmu,  Nov  2015) 23 SMX  MILAN  (November  2015),  reported  here  by  SERoundtable on  quote  from  Google’s   John  Mueller  @johnmu https://www.seroundtable.com/google-­‐number-­‐one-­‐seo-­‐advice-­‐ be-­‐consistent-­‐21196.html DA  -­‐ I  HAVE  A  FEELING  CONSISTENCY  IS   IMPORTANT  FOR  ‘HISTORY  LOGS’  TO   ‘LEARN’  CHANGE  RATES  /  THRESHOLDS
  • 24. URL  EXCLUSIONS  FOR  ‘TRIPPING  ‘MINIMUM-­CRAWL-­ THRESHOLD’  REVISIT  ‘HINTS’  AND  ‘SPAM’  URLs 24 ‘RANDOM’ CHANGE created programmatically like Shuffle($variable) or RAND($variable) may even be seen as ‘hints’ TO GOOGLEBOT TO ‘NOT’ CRAWL HINTS  =  ‘MEH  CHANGES’  (E.G.  PATTERNS  OF  ’SAME  OLD,  SAME  OLD   STUFF’  DUPLICATES,  PROGRAMMATICALLY  GENERATED  CONTENT) "Hints  may  also  be  employed  on  pages  that  are  automatically   generated  and/or  contain  dynamically  generated  elements  that  result   in  the  page  having  a  different  checksum  every  time  it  is  crawled”   (Managing  Items  In  A  Crawl  Schedule,  Google  Patent  -­ US  8666964  B1)
  • 25. 26 GOOGLE  THINKS  CRAWL  BUDGET  IS   IMPORTANT  FOR  SEO CIRCA  JULY  2015 BUT…  NO  ONE  HAS  EVER  OFFICIALLY  SAID  THAT  THERE’S  ANY  KIND  OF     RANKING  BENEFIT  FROM  POSITIVE  CRAWL  ACTIVITY
  • 26. ENTER  ‘CRAWL  RANK’  -­ A  BENEFIT  OF   CRAWL  OPTIMISATION?? 27 “The  pages  that  aren’t  crawled  as  often  are  pages   with  little  to  no  PageRank.  CrawlRankis  the   difference  in  this  very  large  pool  of  pages.     You  win  if  you  get  your  low  PageRank  pages   crawled  more  frequently  than  the  competition.”     “I’m  still  not  entirely  convinced  this  is  what  is   happening,  but  I’m  seeing  success  using  this   philosophy.  “-­‐ A  J  Kohn  @ajkohn OTHERS  SEEM  TO  BE  TRACKING  IT  TOO  – E.G.  SEO   CLARITY DOES  THE  MYTHOLOGICAL  ‘CRAWL  RANK’  BENEFIT  EVEN  EXIST?
  • 27. DOES  ‘CRAWL  RANK’  STILL  APPLY? 28 I  ASKED  A  J  KOHN  IF  HE  STILL  THOUGHT  IT  APPLIED   NOW? “Thanks   A.J”  -­‐ Waving  J ”I  still  see  evidence  that  getting  pages  crawled   frequently  (within  7-­‐10  days)  seems  to  have  an   impact  on  their  ability  to  rank  well”  (AJ  Kohn,  2016)
  • 28. IS  LONG-­TAIL  ‘LEAP-­FROGGING’  (AND  SOME   CLUSTERING)   WHAT  ‘CRAWL  RANK’  LOOKS  LIKE? 29 SITES  JUMPING  OVER  EACH   OTHER  ON  ’LONG  TAILED   QUERIES’  IN  AN  ENDLESS  LAST   LAP  RACE?
  • 29. HOW  IT  APPEARS  TO  WORK  – ‘YOU  DON’T   ALWAYS  HAVE  TO  FIGHT  THE  ‘BOSS’   URLS’ 30 Why  fight  with  the   Hulk  when  you  can  be   Yoda? Image   Credit:   Flickr
  • 30. EVEN  STRONGER  DOMAINS  HAVE  WEAKER  URLS 31 THE  SITES  MAY  ALL  BE  STRONGER  THAN  YOU  BUT  THERE   ARE  A  LOT  OF  PAGES  ON  BIG  SITES  WITH  NO  STRENGTH YOU  WON’T  BEAT  THE  STRONG  URLs  WITH   CRAWL  OPTIMISATION  ALONE You  are  unlikely  to  beat   these  URLs  with  crawl   optimisation techniques   alone.    These  URLs  are  not   the  intended  target  for   these  tactics  – TOO   STRONG SAVE  SOME  BATTLES   FOR  LATER Strong   URLs
  • 31. FIGHT  AT  A  URL  V  URL    OR  TEMPLATE  V  TEMPLATE   LEVEL  WITH  LOW  TO  0  PAGE  RANK  URLS 32 PICK  OFF  THE   WEAKER  URLS   WHEN  BATTLING   WITH  A  BIG  SITE  – LOW  TO  NO  PAGE   RANK  URLS• TARGETS  THE  LOW  STRENGTH  PAGES  FURTHER   DOWN  IN  THE  SITES  OF  COMPETITORS   (SUBCATEGORY  PAGES  E.G.  IN  ECOMMERCE   SITES • THERE  ARE  A  LOT  OF  PAGES  (MILLIONS  WITH   LITTLE  TO  NO  PAGE  RANK) • YOU’RE  AIMING  TO  BEAT  THOSE VIRTUALLY  NO   STRENGTH  IN  1,000s  OF   URLS POWERFULWELL KNOWN BRANDS BUT NO STRENGTH LOWER DOWN THE ARCHITECTURE MANY LOW VOL/ DEEPURLsARE COMPLETE WEEDS ON BEHEMOTH SITES Weak   URLs
  • 32. 25 A  BIG  FACTOR?  -­ ‘EMPHASIS  OF  ‘  URL   IMPORTANCE’’  (E.G.  ON  PARAMETERS) FULL  TRANSCRIPT  -­‐ https://www.stonetemple.com/matt-­‐cutts-­‐interviewed-­‐by-­‐eric-­‐enge-­‐2/ THIS  WAS  IN  THE   ORIGINAL  INTERVIEW   WITH  MATT  CUTTS ALSO  LOTS  OF  THE   PATENTS  MENTION   “PAGE  IMPORTANCE   (WHICH  MAY  INCLUDE   PAGERANK)”
  • 33. WHICH  SEEMS  TO  SUPPORT  THIS  PAPER  BY  PAGE  ET  AL  ON  IMPORTANCE 13 “Thanks   Bill”  -­‐ Waving  J THIS  REFERENCES  THE  PROBLEM  OF  THE  SIZE  OF  THE  WEB  AND   PRIORITIZES  IMPORTANT  PAGES Efficient   Crawling   Through   URL   Ordering Page  et  al
  • 34. ’POINT  TO  THE  NEEDLE  IN  THE  HAY’  – EMPHASISE  IMPORTANCE 33 • Googlebot is  also  ‘hunting’…  Hunting  for  relevant   ‘needles’  in  1,000,000,000s  of  straws  of  ‘hay’  on  the  web • It’s  about  making  your  ‘one  needle’  stand  out  in  importance  in  not  just  your  own   site’s  haystack,  but  tens  of  thousands  of  competing  similar  straws  of  hay  in  other   site’s  haystacks…                            (DON’T  JUST  MAKE  YOUR  HAYSTACK  BIGGER) “Hey,  you  Googlebot…  This  is  the  needle”  via   architectural  internal  linking  without  blur  of  duplication  or   too  many  redirects  or  canonicalization
  • 35. 13 WHICH  OF  YOUR  URLs  ARE  IMPORTANT? “If  you  don’t  consistently   indicate  via  clean  internal   individual  URL  importance   emphasis,  the  importance  of   your  URLs,  how  will   Googlebot know  which  are   the  most  important?”
  • 36. 35 INTERNAL  LINKS  COUNT  (A  LOT) (RELEVATIVE  IMPORTANCE  VOTES  ON  URL   IMPORTANCE  FROM  YOUR  OWN  SITE) THESE  ARE   YOUR  ‘VOTES’   TO  GOOGLEBOT   ON  THE   IMPORTANCE   OF  EACH  URL EMPLOY   ‘CONSISTENT’   INTERNAL  LINK   STRATEGIES THINK  OF  THESE   AS  ‘WALL-­‐TIES’   HOLDING  YOUR   BUILDING  (SITE   ARCHITECTURE)   TOGETHER STOP  VOTING  FOR   THE  WRONG  URLS FROM  WITHIN  YOUR   OWN  SITE. WRONG  TARGETS   RANKING?…  CHECK   INTERNAL  LINKS From  Google  Support   Pages Consistent internal  &  external  emphasis  of  a   URLs  ’IMPORTANCE’
  • 37. 38 NEGATIVE  CONSEQUENCES   FROM  POOR  CRAWL  VISITS   (E.G.  SPIDER  TRAPS  (INFINITE   LOOPS),  INDIVIDUAL  URLS   VISITED  LESS  AND  LESS   FREQUENTLY  BECAUSE   THERE’S  TOO  MANY) BUT  IS  THERE  PERHAPS  AN  OPPOSITE   OF  ‘CRAWL  RANK’?  -­ ’CRAWL  TANK’?? IS  THERE  ADVERSE  EFFECT  WHEN  CRAWLING  GOES  BAD?
  • 38. WELL  -­ I’VE  SEEN  ‘CRAWL  TANK’  – IT AIN’T  PRETTY 39 SITE  SEO  DEATH  BY  TOO  MANY  URLS  AND   INSUFFICIENT  CRAWL  BUDGET  TO  SUPPORT   (EITHER  DUMPING  A  NEW  THIN  PARAMETER   INTO  A  SITE  OR  INFINITE  LOOP  (CODING   ERROR)  (SPIDER  TRAP)) ”BEEN THERE, DONE THAT”
  • 39. IT  KIND  OF  LOOKS  A  BIT  LIKE  THIS 40 ”BEEN THERE, DONE THAT” DEFINITELY
  • 40. 41 ‘EXPONENTIAL  URL  UNIMPORTANCE’? Your  URLs  exponentially,   CONSISTENTLY    confirmed   unimportant   to  queries  with   each  iterative  crawl  visit  to   other  similar  or  duplicate   content  checksum  URLs? MULTPLE  RANDOM  URLs   competing  for  same  query   confirm  irrelevance  of  all   competing  in-­‐site  URLs  with   no  dominant  relevant   IMPORTANT  URL?
  • 41. STILL…SILVER  LININGS 42 “EVERY  SEO  NEEDS  A   ’FLATLINER’  SITE  TO   RESURRECT  AND   MAKE  BETTER…  “ RIGHT?
  • 42. Going  ‘where  the  action  is’  in  sites The  ‘need  for  speed’ Logical  structure Correct  ‘response’  codes XML  sitemaps ‘Successful  crawl  visits ‘Seeing  everything’  on  a  page Taking  ‘hints’ Clear  unique  single  ‘URL   fingerprints’  (no  duplicates) Predicting  likelihood  of  ‘future   change’ Slow  sites Too  many  redirects Being  bored  (Meh)  (‘Hints’  are  built  in  by  the   search  engine  systems  – Takes  ‘hints’) Being  lied  to  (e.g.  On  XML  sitemap  priorities) Crawl  traps  and  dead  ends Going  round  in  circles  (Infinite  loops) Spam  URLs Crawl  wasting  minor  content  change  URLs ‘Hidden’  and  blocked  content Uncrawlable  URLs Duplicate  URLs Not  just  any  change Critical  material  change Predicting  future  change Dropping  ‘hints’  to  Googlebot Sending  Googlebot Where  ‘the  action  is’ 43 LIKES DISLIKES CHANGE  IS  KEY BASED  ON  DATA  FROM  THE  HISTORY  LOGS  -­ CAN  WE   INFLUENCE  VIA  CRAWL  OPTIMISATION  TO  ESCAPE  THE   ‘BASE  LAYER  HOME’  OF  THE  ’UNIMPORTANT’  URLS?
  • 43. 44HERE’S  ONE  I  MADE  EARLIER…SOME  CAVEATS THIS  IS  A  PERSONAL  PROJECT  – MY  20  IN  70:  20:10  MIX IT’S  NOT  MOBILE  FRIENDLY  OR  HTTPS   (HANGS  HEAD  IN  SHAME),  AND  YES,  IT   NEEDS  A  MAKEOVER…  BUT…  TIME…  ,   RESOURCES,  BUDGET…BLAH  BLAH THERE  IS  NO  ‘BIG  BRAND’   MARKETING,  VC  BACKING,  TV  OR   RADIO  ADS  (LIKE  COMPETITORS)  – JUST  ME  -­‐ ‘CHIPPING  AWAY’ 90%+  OF  TRAFFIC  IS NON-­‐BRANDED  GENERIC ORGANIC
  • 44. URL  CRAWL  FREQUENCY  ’CLOCKING’ 46 Spreadsheet  provided   by   @johnmu during  Webmaster   Hangout https://goo.gl/1p ToL8 ARE  THE  URLS  THAT  YOU   WANT  BEING  CRAWLED   ‘REAL  TIME’,  DAILY  OR   INFREQUENTLY?   (REGULAR  LOG  ANALYSIS   AND  INTERVENTION  TO   EMPHASISE  IMPORTANCE) MY  THOUGHTS  (DA)  -­‐ You  need  to  find  out  which  ones  are  getting  crawled  in   the  ‘real  time’  schedule,  the  ‘daily  crawl’  schedule  and  via  random  selection  in   the  ‘dross’  (or  UNLIKELY  TO  CHANGE  A  LOT  /  UNIMPORTANT)  ‘base  layer’   section.    If  it’s  not  the  URLs  that  you  want  to  be  there,  then  formulate  a  plan   to  improve  the  ‘importance’  of  URLS.  (NOTE:  JOHN  DID  NOT  SAY   THIS)
  • 45. 45LOSE  THE  ‘DEAD  WOOD’  SO  GOOGLEBOT  DETECTS   ‘IMPORTANCE’ FIX IT FOR A BETTER CRAWL EMBRACE THE ‘410 GONE’FLATTENING   ARCHITECTURES,   CONSISTENTLY  AVOIDING   CANNIBALISATION,  INTERNAL   LINK  STRATEGIES,  LINKING   RELEVANT  CONTENT  TO   RELEVANT  CONTENT,   UTILISING  XML  &  FRONT   FACING  SITEMAPS  AND   STRONG  HUB  PAGES  TO   ‘HERD’  GOOGLEBOT  AROUND   THE  SITE
  • 46. 47 40,000  TOWNS,  CITIES  &  VILLAGES 40,000+  towns,  cities  and   villages  across  the  UK   multiplied   by  X  site   categories  (THAT’S  A  LOT   OF  LONG  TAIL  QUERY   VOLUME)
  • 47. 48FWIW  – LONG  TAIL  CRAWL  TECHNIQUES  SEEM  TO   APPLY  TO  OTHER  SEARCH  ENGINES    TOO By  shortening   crawl  paths  and  crawl   frequency  intervals  and  emphasing important   to  subcategory  URLs  on  frequently  changed   URLs  (fresh)  it  appears  you  may  gain  a   competitive  advantage  on  long  tail  queries
  • 48. IT’S  ALIVE…  NEEDS  WORK…  BUT  ALIVE 49 CAVEAT:  IT’S  TOO  COMPLEX  TO  ANSWER  WITH  A   SIMPLE  FEW  EXAMPLES  OF  COURSE  (TOO  MANY   FACTORS)  – BUT…  FOOD  FOR  THOUGHT ‘CRITICAL  MATERIAL   CHANGE  FREQUENCY’   (FRESHNESS)  AND   DETECTED  URL   IMPORTANCE  EMPHASIS   VIA  EXTERNAL  OR   INTERNAL  SIGNALS  (INC   PAGERANK)  SEEM  KEY IS  IT  ‘CRAWL  RANK’  OR  ‘EMPHASING  URL  IMPORTANCE’  BETTER  THAN  COMPETITORS   EMPHASE  IMPORTANCE  OF  LOW  TO  NO  PAGERANK  PAGES  WHERE  FEW  OTHER  FACTORS   SEPARATE?
  • 49. 50CRAWL  BUDGET  &  ‘CRAWL  RANK’  – OTHER  FACTORS?? 1.  IT  APPEARS  TO  BE  APPORTIONED   BY  THE  URL  SCHEDULER  (BUDGET) 2.  PAGES  WITH  A  LOT  OF  (HEALTHY??)   LINKS  GET  CRAWLED  MORE  (EXTERNAL   AND  INTERNAL?)  (BUDGET  AND  RANK?) 3.  THERE  ARE  URL  EXCLUSIONS  – (   ’HINT  TRIPPERS’,  OBJECTIONABLE   CONTENT  AND  ‘SPAM  URLS’??  )   (BUDGET) 4  – ‘CRITICAL  MATERIAL  CHANGE’  (FRESHNESS)  AND  THE  PROBABILITY   AND  PREDICTABILITY  OF  CHANGE CORRELATE  (BUDGET) 5  –’CONSISTENT’ EMPHASIS  OF  URL  IMPORTANCE(BUT  I  THINK  THAT  THIS   WAS  ALWAYS  THERE) MAY  BE  ’CRAWL  RANK’(BUDGET  AND  RANK??) ’CRAWL  RANK’  -­‐ IS  IT   CORRELATION  OR   CAUSATION?    (DO  IMPORTANT   PAGES  GET  CRAWLED  MORE,     OR  IS  IT  BECAUSE  THEY  ARE   CRAWLED  MORE  THEY  ARE   IMPORTANT?)
  • 50. CAN  WEB  PAGES  CRAWLED   INFREQUENTLY   STILL  RANK? 36 YES THEY  CAN  STILL  BE   ’IMPORTANT’ IT’S  THE  ONES  YOU’RE  INDICATING  ARE  UNIMPORTANT   THAT  YOU  WANT  TO  KEEP  AN  EYE  ON  -­ #JUSTSAYING  ;;)
  • 51. “BE  SMART  ABOUT  YOUR  TAGS  AND  SITE   ARCHITECTURE,  STAY  FRESH  AND  RELEVANT” (@maileohye,  2016) 37 SLIDE  FROM  APRIL  2016’S  SEJSUMMIT  ON  SEO  INSTRUCTIONS  2016 FROM  GOOGLE’S  @maileohye
  • 52. 52 EITHER  WAY  -­ ARE  ALL  THE  CHECKS  AND  BALANCES   INDICATING  YOU  ARE  STILL  ON  TRACK? BECAUSE  -­‐ BRINGING  A   ROCKET  BACK  ON  COURSE   IS  ‘CHALLENGING’ REGULAR  TESTS  AND  EARLY  DIAGNOSIS  ARE  CRUCIAL  – STOP,  CHECK  AND  KEEP  CHECKING ‘TANK’  OR   ‘RANK’? – YOU  DECIDE
  • 53. TWITTER  -­‐ @dawnieando GOOGLE+  -­‐ +DawnAnderson888 LINKEDIN  -­‐ msdawnanderson THANKS  FOR   LISTENING   FOLKS  J Dawn  Anderson  @  dawnieando ENJOY  BRIGHTON  SEO
  • 54. REFERENCES http://www.internetlivestats.com/total-­‐number-­‐of-­‐websites/ Scheduler  for  search  engine  crawler Google  Patent US  8042112  B1,  (Zhu  et  al) -­‐ https://www.google.com/patents/US8707313 Managing  items  in  crawl  schedule  – Google  Patent  (Alpert)   http://www.google.ch/patents/US8666964 Document  reuse  in  a  search  engine  crawler  -­‐ Google  Patent  (Zhu  et  al) https://www.google.com/patents/US8707312 Web  crawler  scheduler  that  utilizes  sitemaps  (Brawer  et  al)  -­‐ http://www.google.com/patents/US8037054 Distributed  crawling  of  hyperlinked  documents  (Dean  et  al)  -­‐ http://www.google.co.uk/patents/US7305610 Minimizing  visibility  of  stale  content  (Carver)  -­‐ http://www.google.ch/patents/US20130226897
  • 55. REFERENCES Efficient  Crawling  Through  URL  Ordering  (Page  et  al)  -­‐ http://oak.cs.ucla.edu/~cho/papers/cho-­‐order.pdf Crawl  Optimisation (Blind  Five  Year  Old  – A  J  Kohn  -­‐ @ajkohn)  http://www.blindfiveyearold.com/crawl-­‐ optimization Scheduling  a  recrawl (Auerbach)    -­‐ http://www.google.co.uk/patents/US8386459 Scheduler  for  search  engine  crawler  (Zhu  et  al)  -­‐ http://www.google.co.uk/patents/US8042112 Efficient  crawling  through  URL  ordering    (Page  et  al)  -­‐ http://oak.cs.ucla.edu/~cho/papers/cho-­‐order.pdf Google  Explains  Why  The  Search  Console  Reporting  Is  Not  Real  Time  (SERoundtable)   https://www.seroundtable.com/google-­‐explains-­‐why-­‐the-­‐search-­‐console-­‐has-­‐reporting-­‐delays-­‐21688.html Crawl  Data  Aggregation  Propagation  (Mueller)  -­‐ https://goo.gl/1pToL8 Matt  Cutts Interviewed  By  Eric  Enge -­‐ https://www.stonetemple.com/matt-­‐cutts-­‐interviewed-­‐by-­‐eric-­‐enge-­‐ 2/ Web  Promo  Q  and  A  with  Google’s  Andrev Lippatsev -­‐ https://searchenginewatch.com/2016/04/06/webpromos-­‐qa-­‐with-­‐googles-­‐andrey-­‐lipattsev-­‐transcript/ Google  Number  1  SEO  Advice  – Be  Consistent  -­‐ https://www.seroundtable.com/google-­‐number-­‐one-­‐seo-­‐ advice-­‐be-­‐consistent-­‐21196.html