SlideShare a Scribd company logo
1 of 55
#pubcon
Presented by: Dawn Anderson
@dawnieando
‘Myths, Facts And Theories On Crawl
Budget And The Importance Of ‘URL
Importance Optimization’’
#pubcon
Dawn Anderson
• Move It Marketing
• University Lecturer – Digital Marketing
• From Manchester, UK (rains a lot)
• International SEO Consultant – 10+ yrs in SEO
• Pomeranian pooch lover - Bert
• Fascinated by crawling (practice & academia)
• Doesn’t fare well in YouTube screen grabs ;P
• Party trick: Remembering UK postcode areas
(US Zip code equivalent)
• Search Awards Judge
• Twitter chatterer @dawnieando
#pubcon
Defining Crawl Budget
‘Host Load’ =
What can you
handle?
+
‘URL Scheduling’
= What is
important to
crawl & how
often?
#pubcon
Myths About Crawl Budget
#pubcon
Myth – It’s All About Just My Site, Right?
• NO – HOST LOAD is apportioned at an IP
level and shared amongst the sites
there (Host load)
#pubcon
Host Load - When Will This Matter?
• It’s more about server capacity than SEO TBH
• Your site is massive (similar in size e.g. to ’Amazon’)
• Your site is massive and you’re on a shared hosting
• You’re using a CDN and your site is massive
• You have lots of large subdomains sharing space
• Crawlable test or staging sites
• You have ‘infinite loops’ and ‘spider traps’
• You keep throwing server errors during
crawling
‘Average’ sites don’t
normally hit the payload
(‘host load’)
#pubcon
Myth - Google Search Console
Crawl Stats Is Where It’s At Right?
#pubcon
GSC Crawl Stats Is Not Really
Just ‘Web Pages’
• Includes ALL CSS, JS, Zip,
XML, PDF, AMP, HTML
files crawled
• Pages are NOT just single
webpages
https://support.google.com/webmasters/answer/3
5253
Not just ‘web
pages
#pubcon
Visits By ALL The 10 Types Of Googlebots Are
Recorded Together In GSC
Web Image News
Video Feature Phone Smartphone
Mobile
Adsense
Adsense Adsbot
App
Crawler
ALL The
Googlebot Family
#pubcon
It Also Includes All 200 And 30X
Responses
• That massive crawl you thought
you just got on new pages or
existing pages 200 Oks could also
be many, many 30X redirections
• Especially when using * wildcard
redirections on large sites
• NO 400, 500, robotted or
unreachables are recorded here
https://support.google.com/webmasters/answer/3
5253
#pubcon
GSC Doesn’t Even Show You WHAT URLs
Have Been Crawled & When
It will likely just a few URLs being crawled very often, some very rarely and most others
somewhere in between – YOU NEED TO KNOW
#pubcon
REALITY – Server Logs & Log Analysis Is
Where It’s At
AUTOMATE SERVER LOG
RETRIEVAL VIA CRON JOB
grep Googlebot access_log
>googlebot_access.txt
#pubcon
Use Tools Or Just Export, Convert Data
& Use Mr Mu’s Spreadsheet
Spreadsheet - https://goo.gl/1pToL8
#pubcon
For The Avoidance Of Doubt –
I Asked To Be Sure
#pubcon
Why Does This Matter?
On A Large Site You Need To Be Able To
See Through ‘Spider Eyes’
You need to see what
Googlebot
‘REALLY’ thinks of
your site
#pubcon
Myth – It’s The No Of ‘Pages’ Crawled In
GSC Crawl Stats Divided By Days
For all of the reasons
in the previous 7+
slides
#pubcon
Myth – Googlebot Crawls Through Your
Website From One End To The Other
Then Starts Again
• This is where it gets complicated
• Web crawl efficiency is key
• There is an order to things
• Minimizing visibility of existing stale content is
key too – the rest of the web is changing
• Fresh results are vital to searchers
#pubcon
“What I Think You Are Talking About Is
Scheduling” (Illyes, Google)
Remember that time when Mr Mu
kicked Andrey under the table?
(joking JJ)
#pubcon
Why Web Crawling Efficiency?
“WE ARE ALL
PUBLISHERS”
THE NUMBER OF WEBSITES
DOUBLED IN SIZE BETWEEN 2011
AND 2012
AND AGAIN BY 1/3 IN 2014
The Content
‘Explosion’
#pubcon
“We don't index every one of
those trillion pages -- many of
them are similar to each
other” (J Alpert, Google)
“There’s a needle in here
somewhere”
“It’s an important needle
too” If only we could
identify it
“So how many unique pages
does the
web really contain? We
don't know;
we don't have time to look
at them all!”
(J Alpert, Google)
#pubcon
The Duplicate Content ‘Penalty’ Myth
• ‘Real’ duplicates (matching
content checksum) filtered and
not indexed
“Each content filter sends the
retrieved web pages to Dupserver
to determine if they are duplicates
of other web pages”
http://www.google.ch/patents/US20120317089
#pubcon
Duplication & ’The Battle To Be The Single
URL / Content Fingerprint’
URL / CONTENT
FINGERPRINT
REDIRECT
YOU HAVE THE POWER TO
CHOOSE ‘THE ONE’
CANONICALIZATION,
HREFLANG, CONSISTENT
SIGNALS INTERNALLY
#pubcon
NON-
PREFERRED
VERSION
‘IMPOSTER
INDEXATION’ &
‘TOO SIMILAR’
CONTENT
The wrong version
of your URL is
selected and
indexed
Users may pick the wrong version of
the duplicate content and link to that
one. Then signals are dissipated
#pubcon
De-duping, URL Sorting & Scheduling
Original Image -
https://patentimages.storage.googleapis.com/US8666964B1/US08666964-20140304-
D00004.png
https://www.google.com/patents/US8666964
Lots and lots of
patents on crawling
efficiency
#pubcon
Important Pages Are Crawled More Frequently
These pages are important and need to be up to
date. They cannot be returned as stale data
#pubcon
Depth Of Crawl Is Greater In Higher
Quality Sections Of Sites
• Important grandparents and parents
begets ’important’ children and
grandchild URLs
• Higher quality site sections
(descendants) get crawled more
#pubcon
Low Quality Sites Get Crawled Less
Frequently
https://support.google.com/webmasters/answer/35253
They are low importance
#pubcon
Myth – It’s Based Just On PageRank
”There’s a ‘shit-ton’ of other
stuff going on which plays an
important role” (Illyes,
Google)
#pubcon
PageRank Has Become Just One Of Very
Many Things
“WHATEVERYOU ARE THINKING…
WHETHER IT BE ABOUTCRAWLING OR
RANKING… IT (PAGERANK)HAS
BECOME JUSTONE OFVERYMANY
THINGS” (Andrey Lipattsev, Google, 2016)
#pubcon
It’s Mostly Driven By ‘Importance’
“SCHEDULING  IS  MOSTLY  
DRIVEN  BY
IMPORTANCE”  (Illyes,  Google)
IMPORTANCE  MAY  INCLUDE  
PAGERANK  (Patents)  …  BUT  IT  IS  
ONLY  A  PART  OF  IT
RANKING  IS  ALSO  DRIVEN  BY  
IMPORTANCE  (IN  PART)
#pubcon
Page (URL) Importance Is Mahoossively
Important (May Include PageRank)
PAGE IMPORTANCE - The importance of a
page independent of a query
• Location in Site (e.g. home page more important than
parameter 3 level output)
• PageRank
• Page type / file type
• Internal PageRank
• Internal Backlinks
• In-site Anchor Text Consistency
• Relevance (content, anchors and elements) to a topic
(Similarity Importance)
• Directives from in-page robot and robots.txt management
• Parent quality brushes off on child page quality
• Inclusion in XML sitemaps and the index
IMPORTANT PARENTS LIKELY SEEN TO HAVE
IMPORTANT CHILD PAGES
Several Google Patents
#pubcon
But…Importance Signs From Whom?
3 Types Of ‘Importance Signal Sender’?
SEARCHERS WEBMASTERS LINKERATILooking for
results, creating
queries,
triggering
impressions,
demanding
freshness
Hreflang,
Canonicalization,
Internal links, Sitemap
and index inclusion,
Information
Architecture,Anchors,
Building content at a
URL on a topic
Passing PageRank
AND WHY IS ‘IMPORTANCE’ SO
IMPORTANT?
#pubcon
Concept Of Search Engine
Embarrassment
A concept mostly originally
attributed to Joel Wolf
#pubcon
Search Engine Embarrassment
Credit: Joel Wolf Et Al GOODNESS & BADNESS IN SEARCH
ENGINE EMBARRASSMENT
Concept of using probability
estimates to revisit web
pages ‘just in time’ and
based around limiting
‘likelihood of stale pages
being exposed’ to searchers
#pubcon
Search Engine Embarrassment
Probability(Seen_Stale_Data)=Function
(User_View_Rate,Document_Update_R
ate,Web_Crawl_Interval).
#pubcon
Search Engine Embarrassment
User_View_Rate – Likelihood of the document being seen
+
Document_Update_rate – How often it has material changes
+
Web_Crawl_Interval – How often is it currently crawled
COMBINED TO CALCULATE
Probability(Seen_Stale_Data) = Risk of Search Engine Embarrassment?
‘JUST IN TIME SMART CRAWLING’
#pubcon
THEORY - Search Engine Embarrassment
Joel Wolf’s ‘Optimal Crawl
Strategies’ (Search Engine
Embarrassment) Paper is Cited
in this Google Patent
#pubcon
Triggering More ’Real Searcher Impressions’
A SMALL TEST
THE PAGES
BECAME
ARGUABLY
MORE
IMPORTANT
CRAWLING
IMPROVED
RANKING IMPROVED
TRAFFIC IMPROVED
#pubcon
Myth – Don’t We Just Have To Make Random
Changes To Get Crawled More?
NOT ALL CHANGE IS
CREATED EQUAL
#pubcon
WHAT Changed? Was it important?
https://www.seroundtable.com/google-crawl-
frequency-ranking-21153.html
HINTS &
C = ∑ i = 0 n - 1
weight i *
feature
CRITICAL MATERIAL
CHANGE
#pubcon
Randomization & Lying About ‘Change’
To Googlebot Won’t Help
• NOT ALL CHANGE IS IMPORTANT ENOUGH TO BE RECRAWLED
• DO NOT TRY TO MANIPULATE ‘CHANGE’
• You can’t get more crawl just by changing your pages alone &
you may actually be doing your site harm
• WHY – Because… ‘hints’ & ’thresholds’ designed to pick up on
this
• If every URL changes header response will always be modified
since (current date)
• Randomization and shuffling could be preventing Googlebot from
crawling the important pages
• Last-modified is taken into consideration, IF it is correct
• Priority == ignored so don’t make it up
• Change frequency == ignored so don’t make it up
’IMPORTANCE’
BEATS ‘CHANGE’
#pubcon
‘Crawl Rank’ – Causation or Correlation?
• By getting your URL crawled more frequently do
they automatically rank higher?
• “A lot of people confuse crawling with ranking”
(John Mu)
• Crawl Rank - It seems this is more correlation
than causation
• You got your URLs crawled more by making
them more important (e.g. via internal linking
strategies), canonicalization, hreflang, merging
and improving thin content, etc, updating with
fresh and rich content to a topic… and
subsequently ranked higher
“Often times, it is kind of a
relationship that, when we think
something is important we tend to
crawl it more frequently and that
might be more visible in search”
John Mueller, Google
#pubcon
The Four Main Types Of
Cannibalisation– Slideshare
@jonearnshaw
http://www.slideshare.net/jonat
hanearnshaw/seo-46813620
Consistently Avoiding Importance Cannibalisation
You must be consistently
clear in emphasising the
‘importance’ of the right
version of your ‘special
ones’ (your key most
important URLs).
#pubcon
Consistently avoiding ‘Mixed Signals’ & Skewed
URL Importance
GOOGLE CAN GET
CONFUSED AS TO WHICH
PAGE IT SHOULD RANK
FROM YOUR SITE FOR
KEY TERMS – BE CLEAR
ON TARGETS
#pubcon
Consistency - Avoiding ‘importance
dissipation’ from generational cruft
Consider keeping the
same URL for annual
events and optimise
the content for
current year
“Choose a URL
structure that can
stand the test of
time” (John Mu,
Google)
#pubcon
Cool URIs (And URLs) Don’t Change
• The iterative drip, drip, drip of Importance
• Nurture & mature (grow) importance
• Consistent importance signals ongoing
• Think URL as well as URI
“…many, many things can change and your URIs
can and should stay the same” (Sir Tim Berners-
Lee)
COOL URIs DON’T CHANGE
https://www.w3.org/Provider/Style/URI
“allocate URIs which
you will be able to
stand by in 2 years, in
20 years, in 200
years” (Sir Tim-
Berners Lee)
IMPORTANCE VIA
CONSISTENCY
#pubcon
“all over the Web, webmasters are
making decisions which will make
it really difficult for themselves in
the future” (Sir Tim Berners-Lee)
Don’t Let That Be You
#pubcon
THANK	
  YOU
TWITTER - @dawnieando
GOOGLE+ -+DawnAnderson888
LINKEDIN – msdawnanderson
www.move-it-marketing.co.uk
#pubcon
Importance Via Internal Links
Most Important Page 1
Most	
  Important	
  Page	
  2
Most	
  Important	
  Page	
  3
IS THIS
YOUR BLOG??
HOPE NOT
https://support.google.com/webmasters/answer/
138752?hl=en
#pubcon
Descending Importance Clues Via Internal
Links (Breadcrumbs)
SINGLE
TEXT OUTPUT ONLY
BREADCRUMB
FEWER
FEWER
MOST
Image credit:
https://www.smashingmagazine.com/2009/03/breadcrumbs-in-web-
design-examples-and-best-practices/
Home
Category
Sub
Product
#pubcon
YES? … YOU’RE IN
NO? … YOU’RE OUT
(sitemaps and index)
Importance By Inclusion (& Unimportance via
Exclusion
#pubcon
Importance Via Consistently Indicating ‘Correct Version’
of Duplicates
• Canonicalisation
• Choose one https / http / nonwww / www version and 301 redirect the others
• Eliminate ‘too similar’URLs
• Consistency of internal link targets (right site version, right target for
keywords / topics / topic intent / user intent)
• Right version inclusionin XML sitemaps
• Re-optimization/ unpicking of 30X redirect chains internallyand externally
• Review of internal links in GSC for ‘skew’
• Review of existingcontent to improve on topic for ‘importance’
• Save / nurture the URL (thinkfor the long term in URL planning)
• Breadcrumbs
• Minimize boiler plate content
• Minimize regurgitatedcontent in various parts of your site
#pubcon
SOURCES
• Scheduler For Search Engine Crawler -http://www.google.ch/patents/US20120317089
• We Knew The Web Was Big - https://googleblog.blogspot.co.uk/2008/07/we-knew-
web-was-big.html
• https://www.youtube.com/watch?v=GVKcMU7YNOQ
• http://webpromo.expert/google-qa-duplicate-content/
#pubcon
SOURCES
• http://webpromo.expert/google-qa-crawlingrendering/
• https://twitter.com/dergal/status/777782401497980928
• Cool URIs Don’t Change -https://www.w3.org/Provider/Style/URI
• https://searchenginewatch.com/2016/04/06/webpromos-qa-with-googles-andrey-
lipattsev-transcript/
• https://www.youtube.com/watch?v=Wcnz1kCoiks
• https://www.youtube.com/watch?v=MryA3F0ySew
• ‘Optimal Crawling Strategies For Web Search Engines’ -
http://dl.acm.org/citation.cfm?id=511465

More Related Content

What's hot

What's hot (20)

How to Use Search Intent to Dominate Google Discover
How to Use Search Intent to Dominate Google DiscoverHow to Use Search Intent to Dominate Google Discover
How to Use Search Intent to Dominate Google Discover
 
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless Delays
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless DelaysBrighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless Delays
Brighton SEO Autumn 2021: Core Web Vitals: Loopholes, Flaws, and Endless Delays
 
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdfBrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
BrightonSEO April 2022 - Kara Thurkettle - Search in the Metaverse.pdf
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
 
[BrightonSEO 2019] Restructuring Websites to Improve Indexability
[BrightonSEO 2019] Restructuring Websites to Improve Indexability[BrightonSEO 2019] Restructuring Websites to Improve Indexability
[BrightonSEO 2019] Restructuring Websites to Improve Indexability
 
How to leverage indexation tracking to monitor issues and improve performance
How to leverage indexation tracking to monitor issues and improve performanceHow to leverage indexation tracking to monitor issues and improve performance
How to leverage indexation tracking to monitor issues and improve performance
 
BrightonSEO April 2023 Similar AI: Automation recipes for SEO success
BrightonSEO April 2023 Similar AI: Automation recipes for SEO successBrightonSEO April 2023 Similar AI: Automation recipes for SEO success
BrightonSEO April 2023 Similar AI: Automation recipes for SEO success
 
A Simple method to Create Content using NLP
A Simple method to Create Content using NLP A Simple method to Create Content using NLP
A Simple method to Create Content using NLP
 
BrightonSEO - Apr 2022 - No excuses for doing UX
BrightonSEO - Apr 2022 - No excuses for doing UXBrightonSEO - Apr 2022 - No excuses for doing UX
BrightonSEO - Apr 2022 - No excuses for doing UX
 
What Makes your SEO Fail (and how to fix it) #BrightonSEO
What Makes your SEO Fail (and how to fix it) #BrightonSEO What Makes your SEO Fail (and how to fix it) #BrightonSEO
What Makes your SEO Fail (and how to fix it) #BrightonSEO
 
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdfCore Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
Core Web Vitals Audit - Sophie Gibson - PDF - BrightonSEO.pdf
 
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering BudgetBrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
BrightonSEO 2019 - Crawl Budget is dead, please welcome Rendering Budget
 
Competitive SEO Analysis: How to Identify Opportunities to Win #TheInbounder
Competitive SEO Analysis: How to Identify Opportunities to Win #TheInbounderCompetitive SEO Analysis: How to Identify Opportunities to Win #TheInbounder
Competitive SEO Analysis: How to Identify Opportunities to Win #TheInbounder
 
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance FrameworkGoodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
Goodbye SEO fck ups! Learn to set an SEO Quality Assurance Framework
 
10 most common mistakes when working from home
10 most common mistakes when working from home10 most common mistakes when working from home
10 most common mistakes when working from home
 
Entity seo
Entity seoEntity seo
Entity seo
 
How to Combat SERP Volatility - Adriana Stein - BrightonSEO Slides 2023pdf
How to Combat SERP Volatility - Adriana Stein - BrightonSEO Slides 2023pdfHow to Combat SERP Volatility - Adriana Stein - BrightonSEO Slides 2023pdf
How to Combat SERP Volatility - Adriana Stein - BrightonSEO Slides 2023pdf
 
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
How to Implement Machine Learning in Your Internal Linking Audit - Lazarina S...
 
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO  Apri...
HELP! I've Been Hit By An Algorithm Update - Jess Maloney - BrightonSEO Apri...
 
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
PubCon, Lazarina Stoy. - Machine Learning in Search: Google's ML APIs vs Open...
 

Viewers also liked

How to use fumbaro wall paper site powered by Plone
How to use fumbaro wall paper site powered by PloneHow to use fumbaro wall paper site powered by Plone
How to use fumbaro wall paper site powered by Plone
Takanori Suzuki
 
Erik Proposal Final
Erik Proposal FinalErik Proposal Final
Erik Proposal Final
Erik Messier
 

Viewers also liked (20)

How to use fumbaro wall paper site powered by Plone
How to use fumbaro wall paper site powered by PloneHow to use fumbaro wall paper site powered by Plone
How to use fumbaro wall paper site powered by Plone
 
SEO Make Micro-Moments and Wordpress Work For User Journey Mapping With Conte...
SEO Make Micro-Moments and Wordpress Work For User Journey Mapping With Conte...SEO Make Micro-Moments and Wordpress Work For User Journey Mapping With Conte...
SEO Make Micro-Moments and Wordpress Work For User Journey Mapping With Conte...
 
AMP Accelerated Mobile Pages - To AMPFinity And Beyond
AMP Accelerated Mobile Pages - To AMPFinity And BeyondAMP Accelerated Mobile Pages - To AMPFinity And Beyond
AMP Accelerated Mobile Pages - To AMPFinity And Beyond
 
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your SitesSEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
 
Crawl Budget Optimization - SMX München 2016
Crawl Budget Optimization - SMX München 2016Crawl Budget Optimization - SMX München 2016
Crawl Budget Optimization - SMX München 2016
 
Negotiating crawl budget with googlebots
Negotiating crawl budget with googlebotsNegotiating crawl budget with googlebots
Negotiating crawl budget with googlebots
 
Digitized Student Development, Social Media, and Identity
Digitized Student Development, Social Media, and IdentityDigitized Student Development, Social Media, and Identity
Digitized Student Development, Social Media, and Identity
 
SEOs as Whole Brain T Shaped Marketers
SEOs as Whole Brain T Shaped MarketersSEOs as Whole Brain T Shaped Marketers
SEOs as Whole Brain T Shaped Marketers
 
Xmersion 5 - TEDx Kalamazoo Talk (2015)
Xmersion 5 - TEDx Kalamazoo Talk (2015)Xmersion 5 - TEDx Kalamazoo Talk (2015)
Xmersion 5 - TEDx Kalamazoo Talk (2015)
 
URL Design with Lasso
URL Design with LassoURL Design with Lasso
URL Design with Lasso
 
El sindrome del Impostor - ¿Incapacidad o Pobre Imagen de sí mismo/a?
El sindrome del Impostor - ¿Incapacidad o Pobre Imagen de sí mismo/a?El sindrome del Impostor - ¿Incapacidad o Pobre Imagen de sí mismo/a?
El sindrome del Impostor - ¿Incapacidad o Pobre Imagen de sí mismo/a?
 
11 Of The Oddest Pets You Might Want To Look After
11 Of The Oddest Pets You Might Want To Look After11 Of The Oddest Pets You Might Want To Look After
11 Of The Oddest Pets You Might Want To Look After
 
Erik Proposal Final
Erik Proposal FinalErik Proposal Final
Erik Proposal Final
 
2014 Land Markets Survey
2014 Land Markets Survey2014 Land Markets Survey
2014 Land Markets Survey
 
Vision 2030: Gauteng Provincial Fire & Rescue Services - RG Hendricks
Vision 2030: Gauteng Provincial Fire & Rescue Services - RG HendricksVision 2030: Gauteng Provincial Fire & Rescue Services - RG Hendricks
Vision 2030: Gauteng Provincial Fire & Rescue Services - RG Hendricks
 
Tablas de contenidos
Tablas de contenidosTablas de contenidos
Tablas de contenidos
 
2015 Land Markets Survey | REALTORS Land Institute & NAR
2015 Land Markets Survey | REALTORS Land Institute & NAR2015 Land Markets Survey | REALTORS Land Institute & NAR
2015 Land Markets Survey | REALTORS Land Institute & NAR
 
reshma resume
reshma resumereshma resume
reshma resume
 
Lodgement Order dated 28.01.2017 of Registrar Supreme Court of India
Lodgement Order dated 28.01.2017 of  Registrar Supreme Court of IndiaLodgement Order dated 28.01.2017 of  Registrar Supreme Court of India
Lodgement Order dated 28.01.2017 of Registrar Supreme Court of India
 
Current challenges in web crawling
Current challenges in web crawlingCurrent challenges in web crawling
Current challenges in web crawling
 

Similar to Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

Analyzing search engine results pages(SERPs) All over the worlds
Analyzing search engine results pages(SERPs) All over the worldsAnalyzing search engine results pages(SERPs) All over the worlds
Analyzing search engine results pages(SERPs) All over the worlds
Anil Sah
 

Similar to Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016 (20)

Pubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn andersonPubcon florida 2018 logs dont lie dawn anderson
Pubcon florida 2018 logs dont lie dawn anderson
 
Intro to Google, SEO, and You in 2017
Intro to Google, SEO, and You in 2017Intro to Google, SEO, and You in 2017
Intro to Google, SEO, and You in 2017
 
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick StoxA Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
A Technical Look at Content - PUBCON SFIMA 2017 - Patrick Stox
 
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick StoxPubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
Pubcon Vegas 2017 You're Going To Screw Up International SEO - Patrick Stox
 
Sunday Business Post SEO Masterclass - John RIng
Sunday Business Post SEO Masterclass �- John RIngSunday Business Post SEO Masterclass �- John RIng
Sunday Business Post SEO Masterclass - John RIng
 
Arts Marketing Association North-East Network Meeting: The Evolution of Searc...
Arts Marketing Association North-East Network Meeting: The Evolution of Searc...Arts Marketing Association North-East Network Meeting: The Evolution of Searc...
Arts Marketing Association North-East Network Meeting: The Evolution of Searc...
 
Seo Made Easy
Seo Made EasySeo Made Easy
Seo Made Easy
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2
 
Crawl Budget - Some Insights & Ideas @ seokomm 2015
Crawl Budget - Some Insights & Ideas @ seokomm 2015Crawl Budget - Some Insights & Ideas @ seokomm 2015
Crawl Budget - Some Insights & Ideas @ seokomm 2015
 
Demand Quest SEO training session 2
Demand Quest SEO training session 2Demand Quest SEO training session 2
Demand Quest SEO training session 2
 
Sales Funnel & Content Marketing Audits
Sales Funnel & Content Marketing Audits Sales Funnel & Content Marketing Audits
Sales Funnel & Content Marketing Audits
 
page ranking algorithm
page ranking algorithmpage ranking algorithm
page ranking algorithm
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
 
SEO Checklists
SEO ChecklistsSEO Checklists
SEO Checklists
 
SEO 2015
SEO 2015SEO 2015
SEO 2015
 
SEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive GuideSEO for Ecommerce: A Comprehensive Guide
SEO for Ecommerce: A Comprehensive Guide
 
Analyzing search engine results pages(SERPs) All over the worlds
Analyzing search engine results pages(SERPs) All over the worldsAnalyzing search engine results pages(SERPs) All over the worlds
Analyzing search engine results pages(SERPs) All over the worlds
 
How to do a SEO Site Audit
How to do a SEO Site AuditHow to do a SEO Site Audit
How to do a SEO Site Audit
 
From Pandalized to Panda Loved
From Pandalized to Panda LovedFrom Pandalized to Panda Loved
From Pandalized to Panda Loved
 
SEO Predictions for 2013 & Beyond
SEO Predictions for 2013 & Beyond SEO Predictions for 2013 & Beyond
SEO Predictions for 2013 & Beyond
 

More from Dawn Anderson MSc DigM

More from Dawn Anderson MSc DigM (20)

Human vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdfHuman vs AI Quality Raters for Search Engines.pdf
Human vs AI Quality Raters for Search Engines.pdf
 
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic UpdatesLife of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
Life of An SEO - Surfing The Waves of Googles Many Algorithmic Updates
 
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
Natural Semantic SEO - Surfacing Walnuts in Densely Represented, Every Increa...
 
Passage indexing is likely more important than you think
Passage indexing is likely more important than you thinkPassage indexing is likely more important than you think
Passage indexing is likely more important than you think
 
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
Zipfs Law & Zipfian Distribution in SEO - Pubcon Virtual Fall 2020 - Dawn And...
 
Google BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual ConferenceGoogle BERT - SMX London 2020 Virtual Conference
Google BERT - SMX London 2020 Virtual Conference
 
Google BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to KnowGoogle BERT - What SEOs and Marketers Need to Know
Google BERT - What SEOs and Marketers Need to Know
 
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
2019 Tech SEO Boost Dawn Anderson Contextual Recommender Search
 
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
Connecting The Worlds of Information Retrieval & SEO - Search solutions 2019 ...
 
Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019Planning an SEO Strategy for a New Website - SMXL Milan 2019
Planning an SEO Strategy for a New Website - SMXL Milan 2019
 
Google BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard RaceGoogle BERT and Family and the Natural Language Understanding Leaderboard Race
Google BERT and Family and the Natural Language Understanding Leaderboard Race
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive Search
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
SEO in a Mobile First World
SEO in a Mobile First WorldSEO in a Mobile First World
SEO in a Mobile First World
 
Modern Ecommerce SEO
Modern Ecommerce SEOModern Ecommerce SEO
Modern Ecommerce SEO
 
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
Voice Search and Conversation Action Assistive Systems - Challenges & Opportu...
 
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
The Iceberg Approach - Power from what lies beneath in SEO for a mobile-first...
 
SEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm ShiftSEO and The Mobile-First Paradigm Shift
SEO and The Mobile-First Paradigm Shift
 
Voice Search Challenges For Search and Information Retrieval and SEO
Voice Search Challenges For Search and Information Retrieval and SEOVoice Search Challenges For Search and Information Retrieval and SEO
Voice Search Challenges For Search and Information Retrieval and SEO
 

Recently uploaded

WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Sampang
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di SampangWA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Sampang
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Sampang
infoobataborsi24
 
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Magetan
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di MagetanWA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Magetan
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Magetan
infoobataborsi24
 
Licença Lotter Pro - Conheça o Certificado Oficial da Licença Lotter Pro.pdf
Licença Lotter Pro - Conheça o Certificado Oficial da Licença Lotter Pro.pdfLicença Lotter Pro - Conheça o Certificado Oficial da Licença Lotter Pro.pdf
Licença Lotter Pro - Conheça o Certificado Oficial da Licença Lotter Pro.pdf
Lotter Pro Brasil
 

Recently uploaded (20)

Digital PR & Content Marketing Lecture for Advanced Digital & Social Media St...
Digital PR & Content Marketing Lecture for Advanced Digital & Social Media St...Digital PR & Content Marketing Lecture for Advanced Digital & Social Media St...
Digital PR & Content Marketing Lecture for Advanced Digital & Social Media St...
 
The Art of sales from fictional characters.
The Art of sales from fictional characters.The Art of sales from fictional characters.
The Art of sales from fictional characters.
 
Influencer Marekting Trends- Where the creator economy is going in in 2024
Influencer Marekting Trends- Where the creator economy is going in in 2024Influencer Marekting Trends- Where the creator economy is going in in 2024
Influencer Marekting Trends- Where the creator economy is going in in 2024
 
Key Social Media Marketing Trends for 2024
Key Social Media Marketing Trends for 2024Key Social Media Marketing Trends for 2024
Key Social Media Marketing Trends for 2024
 
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Sampang
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di SampangWA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Sampang
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Sampang
 
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Magetan
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di MagetanWA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Magetan
WA | 0821-8888-6412 | Apotik Jual Obat Aborsi Cytotec Asli Di Magetan
 
Fantasy Cricket Apps: A New Viewpoint for Online Cricket Betting Apps
Fantasy Cricket Apps: A New Viewpoint for Online Cricket Betting AppsFantasy Cricket Apps: A New Viewpoint for Online Cricket Betting Apps
Fantasy Cricket Apps: A New Viewpoint for Online Cricket Betting Apps
 
Aiizennxqc Digital Marketing | SEO & SMM
Aiizennxqc Digital Marketing | SEO & SMMAiizennxqc Digital Marketing | SEO & SMM
Aiizennxqc Digital Marketing | SEO & SMM
 
Taprank - Boost your Google reviews with personalized NFC cards
Taprank - Boost your Google reviews with personalized NFC cardsTaprank - Boost your Google reviews with personalized NFC cards
Taprank - Boost your Google reviews with personalized NFC cards
 
Passive Income System 2.0 Digital: Effortless Earnings
Passive Income System 2.0 Digital: Effortless EarningsPassive Income System 2.0 Digital: Effortless Earnings
Passive Income System 2.0 Digital: Effortless Earnings
 
The BoF Brand Magic Index Volume Two — Preview.pdf
The BoF Brand Magic Index Volume Two — Preview.pdfThe BoF Brand Magic Index Volume Two — Preview.pdf
The BoF Brand Magic Index Volume Two — Preview.pdf
 
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptxPersonal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
 
Tea Gobec, Kako pluti po morju tehnoloških sprememb, Innovatif.pdf
Tea Gobec, Kako pluti po morju tehnoloških sprememb, Innovatif.pdfTea Gobec, Kako pluti po morju tehnoloških sprememb, Innovatif.pdf
Tea Gobec, Kako pluti po morju tehnoloških sprememb, Innovatif.pdf
 
Killer Packaging | PrintAction
Killer Packaging | PrintActionKiller Packaging | PrintAction
Killer Packaging | PrintAction
 
Intelligent Cryptocurrency VIP Digital - Membership Area
Intelligent Cryptocurrency VIP Digital - Membership AreaIntelligent Cryptocurrency VIP Digital - Membership Area
Intelligent Cryptocurrency VIP Digital - Membership Area
 
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdfGlobal Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
 
Klaus Schweighofer, Zakaj je digitalizacija odlična priložnost za medije, Sty...
Klaus Schweighofer, Zakaj je digitalizacija odlična priložnost za medije, Sty...Klaus Schweighofer, Zakaj je digitalizacija odlična priložnost za medije, Sty...
Klaus Schweighofer, Zakaj je digitalizacija odlična priložnost za medije, Sty...
 
Impacts Of Smart Watch & Wearable Technology On Daily Life
Impacts Of Smart Watch & Wearable Technology On Daily LifeImpacts Of Smart Watch & Wearable Technology On Daily Life
Impacts Of Smart Watch & Wearable Technology On Daily Life
 
Licença Lotter Pro - Conheça o Certificado Oficial da Licença Lotter Pro.pdf
Licença Lotter Pro - Conheça o Certificado Oficial da Licença Lotter Pro.pdfLicença Lotter Pro - Conheça o Certificado Oficial da Licença Lotter Pro.pdf
Licença Lotter Pro - Conheça o Certificado Oficial da Licença Lotter Pro.pdf
 
Beyond Silos: How Holistic B2B Digital Strategy Drives Pipeline
Beyond Silos: How Holistic B2B Digital Strategy Drives PipelineBeyond Silos: How Holistic B2B Digital Strategy Drives Pipeline
Beyond Silos: How Holistic B2B Digital Strategy Drives Pipeline
 

Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

  • 1. #pubcon Presented by: Dawn Anderson @dawnieando ‘Myths, Facts And Theories On Crawl Budget And The Importance Of ‘URL Importance Optimization’’
  • 2. #pubcon Dawn Anderson • Move It Marketing • University Lecturer – Digital Marketing • From Manchester, UK (rains a lot) • International SEO Consultant – 10+ yrs in SEO • Pomeranian pooch lover - Bert • Fascinated by crawling (practice & academia) • Doesn’t fare well in YouTube screen grabs ;P • Party trick: Remembering UK postcode areas (US Zip code equivalent) • Search Awards Judge • Twitter chatterer @dawnieando
  • 3. #pubcon Defining Crawl Budget ‘Host Load’ = What can you handle? + ‘URL Scheduling’ = What is important to crawl & how often?
  • 5. #pubcon Myth – It’s All About Just My Site, Right? • NO – HOST LOAD is apportioned at an IP level and shared amongst the sites there (Host load)
  • 6. #pubcon Host Load - When Will This Matter? • It’s more about server capacity than SEO TBH • Your site is massive (similar in size e.g. to ’Amazon’) • Your site is massive and you’re on a shared hosting • You’re using a CDN and your site is massive • You have lots of large subdomains sharing space • Crawlable test or staging sites • You have ‘infinite loops’ and ‘spider traps’ • You keep throwing server errors during crawling ‘Average’ sites don’t normally hit the payload (‘host load’)
  • 7. #pubcon Myth - Google Search Console Crawl Stats Is Where It’s At Right?
  • 8. #pubcon GSC Crawl Stats Is Not Really Just ‘Web Pages’ • Includes ALL CSS, JS, Zip, XML, PDF, AMP, HTML files crawled • Pages are NOT just single webpages https://support.google.com/webmasters/answer/3 5253 Not just ‘web pages
  • 9. #pubcon Visits By ALL The 10 Types Of Googlebots Are Recorded Together In GSC Web Image News Video Feature Phone Smartphone Mobile Adsense Adsense Adsbot App Crawler ALL The Googlebot Family
  • 10. #pubcon It Also Includes All 200 And 30X Responses • That massive crawl you thought you just got on new pages or existing pages 200 Oks could also be many, many 30X redirections • Especially when using * wildcard redirections on large sites • NO 400, 500, robotted or unreachables are recorded here https://support.google.com/webmasters/answer/3 5253
  • 11. #pubcon GSC Doesn’t Even Show You WHAT URLs Have Been Crawled & When It will likely just a few URLs being crawled very often, some very rarely and most others somewhere in between – YOU NEED TO KNOW
  • 12. #pubcon REALITY – Server Logs & Log Analysis Is Where It’s At AUTOMATE SERVER LOG RETRIEVAL VIA CRON JOB grep Googlebot access_log >googlebot_access.txt
  • 13. #pubcon Use Tools Or Just Export, Convert Data & Use Mr Mu’s Spreadsheet Spreadsheet - https://goo.gl/1pToL8
  • 14. #pubcon For The Avoidance Of Doubt – I Asked To Be Sure
  • 15. #pubcon Why Does This Matter? On A Large Site You Need To Be Able To See Through ‘Spider Eyes’ You need to see what Googlebot ‘REALLY’ thinks of your site
  • 16. #pubcon Myth – It’s The No Of ‘Pages’ Crawled In GSC Crawl Stats Divided By Days For all of the reasons in the previous 7+ slides
  • 17. #pubcon Myth – Googlebot Crawls Through Your Website From One End To The Other Then Starts Again • This is where it gets complicated • Web crawl efficiency is key • There is an order to things • Minimizing visibility of existing stale content is key too – the rest of the web is changing • Fresh results are vital to searchers
  • 18. #pubcon “What I Think You Are Talking About Is Scheduling” (Illyes, Google) Remember that time when Mr Mu kicked Andrey under the table? (joking JJ)
  • 19. #pubcon Why Web Crawling Efficiency? “WE ARE ALL PUBLISHERS” THE NUMBER OF WEBSITES DOUBLED IN SIZE BETWEEN 2011 AND 2012 AND AGAIN BY 1/3 IN 2014 The Content ‘Explosion’
  • 20. #pubcon “We don't index every one of those trillion pages -- many of them are similar to each other” (J Alpert, Google) “There’s a needle in here somewhere” “It’s an important needle too” If only we could identify it “So how many unique pages does the web really contain? We don't know; we don't have time to look at them all!” (J Alpert, Google)
  • 21. #pubcon The Duplicate Content ‘Penalty’ Myth • ‘Real’ duplicates (matching content checksum) filtered and not indexed “Each content filter sends the retrieved web pages to Dupserver to determine if they are duplicates of other web pages” http://www.google.ch/patents/US20120317089
  • 22. #pubcon Duplication & ’The Battle To Be The Single URL / Content Fingerprint’ URL / CONTENT FINGERPRINT REDIRECT YOU HAVE THE POWER TO CHOOSE ‘THE ONE’ CANONICALIZATION, HREFLANG, CONSISTENT SIGNALS INTERNALLY
  • 23. #pubcon NON- PREFERRED VERSION ‘IMPOSTER INDEXATION’ & ‘TOO SIMILAR’ CONTENT The wrong version of your URL is selected and indexed Users may pick the wrong version of the duplicate content and link to that one. Then signals are dissipated
  • 24. #pubcon De-duping, URL Sorting & Scheduling Original Image - https://patentimages.storage.googleapis.com/US8666964B1/US08666964-20140304- D00004.png https://www.google.com/patents/US8666964 Lots and lots of patents on crawling efficiency
  • 25. #pubcon Important Pages Are Crawled More Frequently These pages are important and need to be up to date. They cannot be returned as stale data
  • 26. #pubcon Depth Of Crawl Is Greater In Higher Quality Sections Of Sites • Important grandparents and parents begets ’important’ children and grandchild URLs • Higher quality site sections (descendants) get crawled more
  • 27. #pubcon Low Quality Sites Get Crawled Less Frequently https://support.google.com/webmasters/answer/35253 They are low importance
  • 28. #pubcon Myth – It’s Based Just On PageRank ”There’s a ‘shit-ton’ of other stuff going on which plays an important role” (Illyes, Google)
  • 29. #pubcon PageRank Has Become Just One Of Very Many Things “WHATEVERYOU ARE THINKING… WHETHER IT BE ABOUTCRAWLING OR RANKING… IT (PAGERANK)HAS BECOME JUSTONE OFVERYMANY THINGS” (Andrey Lipattsev, Google, 2016)
  • 30. #pubcon It’s Mostly Driven By ‘Importance’ “SCHEDULING  IS  MOSTLY   DRIVEN  BY IMPORTANCE”  (Illyes,  Google) IMPORTANCE  MAY  INCLUDE   PAGERANK  (Patents)  …  BUT  IT  IS   ONLY  A  PART  OF  IT RANKING  IS  ALSO  DRIVEN  BY   IMPORTANCE  (IN  PART)
  • 31. #pubcon Page (URL) Importance Is Mahoossively Important (May Include PageRank)
  • 32. PAGE IMPORTANCE - The importance of a page independent of a query • Location in Site (e.g. home page more important than parameter 3 level output) • PageRank • Page type / file type • Internal PageRank • Internal Backlinks • In-site Anchor Text Consistency • Relevance (content, anchors and elements) to a topic (Similarity Importance) • Directives from in-page robot and robots.txt management • Parent quality brushes off on child page quality • Inclusion in XML sitemaps and the index IMPORTANT PARENTS LIKELY SEEN TO HAVE IMPORTANT CHILD PAGES Several Google Patents
  • 33. #pubcon But…Importance Signs From Whom? 3 Types Of ‘Importance Signal Sender’? SEARCHERS WEBMASTERS LINKERATILooking for results, creating queries, triggering impressions, demanding freshness Hreflang, Canonicalization, Internal links, Sitemap and index inclusion, Information Architecture,Anchors, Building content at a URL on a topic Passing PageRank AND WHY IS ‘IMPORTANCE’ SO IMPORTANT?
  • 34. #pubcon Concept Of Search Engine Embarrassment A concept mostly originally attributed to Joel Wolf
  • 35. #pubcon Search Engine Embarrassment Credit: Joel Wolf Et Al GOODNESS & BADNESS IN SEARCH ENGINE EMBARRASSMENT Concept of using probability estimates to revisit web pages ‘just in time’ and based around limiting ‘likelihood of stale pages being exposed’ to searchers
  • 37. #pubcon Search Engine Embarrassment User_View_Rate – Likelihood of the document being seen + Document_Update_rate – How often it has material changes + Web_Crawl_Interval – How often is it currently crawled COMBINED TO CALCULATE Probability(Seen_Stale_Data) = Risk of Search Engine Embarrassment? ‘JUST IN TIME SMART CRAWLING’
  • 38. #pubcon THEORY - Search Engine Embarrassment Joel Wolf’s ‘Optimal Crawl Strategies’ (Search Engine Embarrassment) Paper is Cited in this Google Patent
  • 39. #pubcon Triggering More ’Real Searcher Impressions’ A SMALL TEST THE PAGES BECAME ARGUABLY MORE IMPORTANT CRAWLING IMPROVED RANKING IMPROVED TRAFFIC IMPROVED
  • 40. #pubcon Myth – Don’t We Just Have To Make Random Changes To Get Crawled More? NOT ALL CHANGE IS CREATED EQUAL
  • 41. #pubcon WHAT Changed? Was it important? https://www.seroundtable.com/google-crawl- frequency-ranking-21153.html HINTS & C = ∑ i = 0 n - 1 weight i * feature CRITICAL MATERIAL CHANGE
  • 42. #pubcon Randomization & Lying About ‘Change’ To Googlebot Won’t Help • NOT ALL CHANGE IS IMPORTANT ENOUGH TO BE RECRAWLED • DO NOT TRY TO MANIPULATE ‘CHANGE’ • You can’t get more crawl just by changing your pages alone & you may actually be doing your site harm • WHY – Because… ‘hints’ & ’thresholds’ designed to pick up on this • If every URL changes header response will always be modified since (current date) • Randomization and shuffling could be preventing Googlebot from crawling the important pages • Last-modified is taken into consideration, IF it is correct • Priority == ignored so don’t make it up • Change frequency == ignored so don’t make it up ’IMPORTANCE’ BEATS ‘CHANGE’
  • 43. #pubcon ‘Crawl Rank’ – Causation or Correlation? • By getting your URL crawled more frequently do they automatically rank higher? • “A lot of people confuse crawling with ranking” (John Mu) • Crawl Rank - It seems this is more correlation than causation • You got your URLs crawled more by making them more important (e.g. via internal linking strategies), canonicalization, hreflang, merging and improving thin content, etc, updating with fresh and rich content to a topic… and subsequently ranked higher “Often times, it is kind of a relationship that, when we think something is important we tend to crawl it more frequently and that might be more visible in search” John Mueller, Google
  • 44. #pubcon The Four Main Types Of Cannibalisation– Slideshare @jonearnshaw http://www.slideshare.net/jonat hanearnshaw/seo-46813620 Consistently Avoiding Importance Cannibalisation You must be consistently clear in emphasising the ‘importance’ of the right version of your ‘special ones’ (your key most important URLs).
  • 45. #pubcon Consistently avoiding ‘Mixed Signals’ & Skewed URL Importance GOOGLE CAN GET CONFUSED AS TO WHICH PAGE IT SHOULD RANK FROM YOUR SITE FOR KEY TERMS – BE CLEAR ON TARGETS
  • 46. #pubcon Consistency - Avoiding ‘importance dissipation’ from generational cruft Consider keeping the same URL for annual events and optimise the content for current year “Choose a URL structure that can stand the test of time” (John Mu, Google)
  • 47. #pubcon Cool URIs (And URLs) Don’t Change • The iterative drip, drip, drip of Importance • Nurture & mature (grow) importance • Consistent importance signals ongoing • Think URL as well as URI “…many, many things can change and your URIs can and should stay the same” (Sir Tim Berners- Lee) COOL URIs DON’T CHANGE https://www.w3.org/Provider/Style/URI “allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years” (Sir Tim- Berners Lee) IMPORTANCE VIA CONSISTENCY
  • 48. #pubcon “all over the Web, webmasters are making decisions which will make it really difficult for themselves in the future” (Sir Tim Berners-Lee) Don’t Let That Be You
  • 49. #pubcon THANK  YOU TWITTER - @dawnieando GOOGLE+ -+DawnAnderson888 LINKEDIN – msdawnanderson www.move-it-marketing.co.uk
  • 50. #pubcon Importance Via Internal Links Most Important Page 1 Most  Important  Page  2 Most  Important  Page  3 IS THIS YOUR BLOG?? HOPE NOT https://support.google.com/webmasters/answer/ 138752?hl=en
  • 51. #pubcon Descending Importance Clues Via Internal Links (Breadcrumbs) SINGLE TEXT OUTPUT ONLY BREADCRUMB FEWER FEWER MOST Image credit: https://www.smashingmagazine.com/2009/03/breadcrumbs-in-web- design-examples-and-best-practices/ Home Category Sub Product
  • 52. #pubcon YES? … YOU’RE IN NO? … YOU’RE OUT (sitemaps and index) Importance By Inclusion (& Unimportance via Exclusion
  • 53. #pubcon Importance Via Consistently Indicating ‘Correct Version’ of Duplicates • Canonicalisation • Choose one https / http / nonwww / www version and 301 redirect the others • Eliminate ‘too similar’URLs • Consistency of internal link targets (right site version, right target for keywords / topics / topic intent / user intent) • Right version inclusionin XML sitemaps • Re-optimization/ unpicking of 30X redirect chains internallyand externally • Review of internal links in GSC for ‘skew’ • Review of existingcontent to improve on topic for ‘importance’ • Save / nurture the URL (thinkfor the long term in URL planning) • Breadcrumbs • Minimize boiler plate content • Minimize regurgitatedcontent in various parts of your site
  • 54. #pubcon SOURCES • Scheduler For Search Engine Crawler -http://www.google.ch/patents/US20120317089 • We Knew The Web Was Big - https://googleblog.blogspot.co.uk/2008/07/we-knew- web-was-big.html • https://www.youtube.com/watch?v=GVKcMU7YNOQ • http://webpromo.expert/google-qa-duplicate-content/
  • 55. #pubcon SOURCES • http://webpromo.expert/google-qa-crawlingrendering/ • https://twitter.com/dergal/status/777782401497980928 • Cool URIs Don’t Change -https://www.w3.org/Provider/Style/URI • https://searchenginewatch.com/2016/04/06/webpromos-qa-with-googles-andrey- lipattsev-transcript/ • https://www.youtube.com/watch?v=Wcnz1kCoiks • https://www.youtube.com/watch?v=MryA3F0ySew • ‘Optimal Crawling Strategies For Web Search Engines’ - http://dl.acm.org/citation.cfm?id=511465