Crawl Budget - Some Insights & Ideas @ seokomm 2015

Crawl Budget
Some Insights + Ideas
Jan Hendrik Merlin Jacob
Founder + CTO
! @jhmjacob 
" jhm@onpage.org 
# linkedin.com/in/jhmjacob

! @jhmjacob
Agenda
» Philosophy
» Parameters to inﬂuence Crawl Budget
» Best practice & next steps

! @jhmjacob
Crawl Budget
Deﬁnition
The resources (aka money) Google invests in  
your website by sending its crawlers

! @jhmjacob
Philosophy
What would you do, 
if you were Google?

! @jhmjacob
Primary Target:  
Make money! 
Secondary Target: 
The best search results
Philosophy

! @jhmjacob
With their crawlers Google invests money,
to ﬁnd the “best” webpages - 
in order provide the best search results.
Philosophy

! @jhmjacob
Problem 1:  
The size of the web is inﬁnite 
Problem 2:  
Even Googles resources are limited
Philosophy

! @jhmjacob
Size of the Google index:
Something between 5 billion 
and 1 trillion documents*
(means: around 5-1000 pages per domain)
* = As a matter of fact, there is no real data on this.  
Probably even Google doesn’t know.
Philosophy

! @jhmjacob
Conclusion
Search engines like Google have to 
constantly decide if they continue 
spending resources on the  
current website or rather go to another.

! @jhmjacob
What is bing saying about this?
“By providing clear, deep, easy to ﬁnd  
content on your website, we are more likely  
to index and show your content in search results.”
More: https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a

! @jhmjacob
“clear”
» Distinct Canonical settings

» Valid redirects (not via Meta-Refresh!)

» Exactly one main headline (H1) per page

» Title, description, alt, links to relevant (!) content

» Standard HTML links (“No Rich Media like JS or Flash”)

» Clean and readable HTML site-navigation

» Clean and normalized URL structure

» “Clear keyword focus”

! @jhmjacob
“deep”
» No “Thin content”

» “Do not copy from other websites”

» Be as relevant as possible for one topic (“Holistic”)

» Keep your pages updated (“freshness”)

! @jhmjacob
“easy to ﬁnd”
» Clean and up-to-date Sitemap.xml (last-mod!)
» “keep valuable content close to the home page”  
(aka short click-path aka “page level”)

» “use targeted keywords wherever possible” 
(regarding internal linking)

» Well structured navigation 
(found in URL + Breadcrumbs)

! @jhmjacob
Between the lines:
» Sitemap.xml is used to identify new articles and get  
them indexed asap.

» If the system recognizes regular updates on a page, 
it will be crawled more frequently.

» Relevancy of a page is calculated based on internal  
(& external) links as well as the “click distance from 
the homepage” (aka “page-level”).

» Pagespeed matters: Otherwise Bounce-Rate can 
have negative eﬀects on crawl budget (+ rankings)

! @jhmjacob
What is Yandex saying about this?
More: https://yandex.com/support/webmaster/yandex-indexing/webmaster-advice.xml
Summary of the Webmaster Guidelines:
» Do not use cloaking

» Do not use auto-generated / gibberish text

» No thin content

» No hidden text

» Popups + Downunders = Bad Quality Indicator
» Do not do “User Behaviour Emulation”

! @jhmjacob
What is Google saying about this?
“The best way to think about it is that the number
of pages that we crawl is roughly proportional to
your PageRank. So if you have a lot of incoming
links on your root page, we’ll deﬁnitely crawl that.
Then your root page may link to other pages, and
those will get PageRank and we’ll crawl those as
well. As you get deeper and deeper in your site,
however, PageRank tends to decline.”
More: https://www.stonetemple.com/matt-cutts-interviewed-by-eric-enge-2/

! @jhmjacob
Reminder
» Internal Links are responsible 
for passing Pagerank through your 
pages  
(Some believe Pagerank is only 
generated out of external Backlinks)
» Pagerank “0 to 10” is just a simpliﬁed 
display for humans. In reality this score 
is way more precise.

! @jhmjacob
“Another way to think about it is that the low
PageRank pages on your site are competing
against a much larger pool of pages with the
same or higher PageRank. There are a large
number of pages on the web that have very little or
close to zero PageRank. The pages that get linked
to a lot tend to get discovered and crawled quite
quickly. The lower PageRank pages are likely to
be crawled not quite as often.”

! @jhmjacob
“If we can only take two pages from a site at any
given time, and we are only crawling over a certain
period of time, that can then set some sort of
upper bound on how many pages we are able to
fetch from that host.”

! @jhmjacob
“Imagine we crawl three pages from a site, and
then we discover that the two other pages were
duplicates of the third page. We’ll drop two out of
the three pages and keep only one, and that’s why
it looks like it has less good content. So we might
tend to not crawl quite as much from that site. 
… 
If there are a large number of pages that we
consider low value, then we might not crawl quite
as many pages from that site, but that is
independent of rel=canonical.”

! @jhmjacob
“If you link to three pages that are duplicates, a
search engine might be able to realize that those
three pages are duplicates and transfer the
incoming link juice to those merged pages.”

! @jhmjacob
“There are some things that we will run a
HEAD for. For example, our image crawl may
use HEAD requests because images might
be much, much larger in content than web
pages…In terms of crawling the web and text
content and HTML, we’ll typically just use a
GET and not run a HEAD query ﬁrst”

! @jhmjacob
» “There is also not a hard limit on our crawl.”
» Pages with higher Pagerank will get crawled more often

» Free crawling resources will be spend on low-PR pages, 
but chances the bot will leave the page are higher 
(how are they chosen?!)

» You compete against all other pages. Give the bots 
reasons to stay.

» Limitation is not based on “Amount of URLs”, rather in  
form of “Machine-Hours” (time-based limits) 
(Loadtime matters!)

» Bad page-quality + bad content metrics can scare away bots 
(Exit-Condition like “Amount of Unique Content / Time”)

» Google tries to avoid waste of bandwith 
(HEAD Requests for images + if-modiﬁed-since)

! @jhmjacob
Google Search Console

! @jhmjacob
Searchability
Deﬁnitions
(aka Findability)

! @jhmjacob
ility!
Crawlability + Indexability + Rankability
=
Searchability
(aka Findability)

! @jhmjacob
ility!
=
Searchability
(aka Findability)
Crawlability
Is your Webpage (URL)
accessible for crawlers?

! @jhmjacob
ility!
=
Searchability
(aka Findability)
Indexability
Should the crawled, extracted
and interpreted content be
added to a search index?

! @jhmjacob
ility!
=
Searchability
(aka Findability)
Rankability
Should a particular page 
be displayed in the  
search results for a 
particular keyword  
(search phrase).

! @jhmjacob
have a direct or indirect inﬂuence
on the Crawl Budget

! @jhmjacob
Technical SEO Buzzword Bingo
“Crawlability" “Indexability" “Rankability”
robots.txt robots Directive 
(Response Header / Meta Tag)
rel=prev 
Status Code  
(Response Header)
Canonical  
hreﬂang Directives 
(Response Header / Meta Tag / Sitemap)
Ladezeit 
(DNS+Server)
Redirects  
Device Directives 
Fragment aka Ajax Crawling 
(Meta Tag)
Unique Content 
(Content)
Content Quality 
(Content)
URL-Structure

(URL)
Encoding 
(Content)
Rendertime 
(Server+Content)
Vary 
(Response Header)
File Size 
(Content)
Location Directives 
(Content)
if-modiﬁed-since Support 
(Response Header)
Rendering 
(CSS+JS)

! @jhmjacob
Analyzed by OnPage.org
“Crawlability" “Indexability” “Rankability”
robots.txt robots Directive 
rel=prev 
Status Code  
(Response Header)
Canonical  
hreﬂang Directives 
(Response Header / Meta Tag / Sitemap)
Ladezeit 
(DNS+Server)
Redirects  
(Response Header)
Device Directives 
Fragment aka Ajax Crawling 
(Meta Tag)
Unique Content 
(Content)
Content Quality 
(Content)
URL-Structure

(URL)
Encoding 
(Content)
Rendertime 
(Server+Content)
Vary 
(Response Header)
File Size 
(Content)
Location Directives 
(Content)
if-modiﬁed-since Support 
(Response Header)
Rendering 
(CSS+JS)
We offer the most
comprehensive
analysis on website
quality assurance!

! @jhmjacob
robots.txt
This is obvious!
» Learn how to setup your robots.txt ﬁle

» Block irrelevant URLs, so the bots don’t waste  
their time on those pages

» Basics: https://en.onpage.org/wiki/Robots.txt
Always remember: If a page is blocked via robots.txt, 
the bots can’t see additional settings like  
Canonicals or “noindex” directives.

! @jhmjacob
Even though a page might look well -
under the hood it can be still 
broken as hell.
Status Code

! @jhmjacob
200 Valid Page
 
301 Permanent redirect (after Redesigns)
302 Temporary Redirect
303 Alternative Version
304 Page did not change since last visit 
403 Access forbidden 
404 Page does not exist
Status Code

! @jhmjacob
Loadtime
Nice - only 
82 Milliseconds
until Googlebot got 
the sourcecode of the 
page
Not so nice - in average  
1.76 Seconds until the sourcecode 
has been transfered
Page A
Page B

! @jhmjacob
0
3.5
7
10.5
14
Page A Page B
0.59 Pages / Second
12.2 Pages / Second
Loadtime

! @jhmjacob
Page A Page B
Per Second 12.2 Pages 0.59 Pages
Per Minute 731.71 Pages 35.29 Pages
Per Hour 43,902.44 Pages 2,117.65 Pages
Per Day 1,053,658.54 Pages 50,823.53 Pages
Loadtime
ouch!

! @jhmjacob
Fragment aka Ajax Crawling
More: https://angularjs.org/
excursion

! @jhmjacob
Why angularjs?
Tries to achieve a better User Experience,  
by transferring only small segments 
instead of complete pages.
Provides testing functionalities.
excursion

! @jhmjacob
easy way to identify 
a angularjs site (“ng-app”)

! @jhmjacob

! @jhmjacob
<!DOCTYPE html>

<html lang="en" id="ng-app" data-ng-app="MainApp">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, …”>
<meta name="keywords" content="mercedes ﬁlms, mercedes clip, …”>
<link href="assets/images/favicon.ico" type="image/x-icon" rel="shortcut icon">
<title>Mercedes-Benz Video Channel</title>
<meta name="keywords" content="{{keywords}}"/>
</head> 
…

! @jhmjacob
<!DOCTYPE html>

<html lang="en" id="ng-app" data-ng-app="MainApp">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, …”>
<meta name="keywords" content="mercedes ﬁlms, mercedes clip, …”>
<link href="assets/images/favicon.ico" type="image/x-icon" rel="shortcut icon">
<title>Mercedes-Benz Video Channel</title>
<meta name="keywords" content="{{keywords}}"/>
</head> 
…
angularjs
placeholder

! @jhmjacobMore: https://developers.facebook.com/tools/debug/

! @jhmjacobMore: https://cards-dev.twitter.com/validator

! @jhmjacob
Why?
1) There are also other JS testing frameworks 
Jasmine / PhantomJS
2) WallabyJS  
Nice Plugin for realtime JS Unit Tests
3) IMO: AngularJS is rather suited  
for web-apps 
Not so well for content based sites which  
rely on their ﬁt in the web eco-system

! @jhmjacob
Ajax Crawling Scheme
1) Within <head> Tag 
<meta name="fragment" content="!"/>
2) Hashbang URLs (“#!”) 
https://www.seokomm.at/#!agenda
+ Snapshot URL with “real” HTML

! @jhmjacob
What happens here?
GET http://video.mercedes-benz.co.uk/#!/
Complete Sourcecode (9kb)
1

! @jhmjacob
GET http://video.mercedes-benz.co.uk/?_escaped_fragment_=/
Complete Sourcecode (9kb) without AngularJS placeholders
2
Two requests were required to gather the valid HTML code!
What happens here?

! @jhmjacob
Support of Ajax Crawling
“Ajax Crawling Scheme” Native Ajax Crawling
Google Yes, but “deprecated” Yes
Bing Yes Nope
OnPage.org Yes Nope
Facebook Nope Nope
Twitter Nope Nope
Pinterest Nope Nope

! @jhmjacob
URL Structure
1) Speaking URLs (aka Hackable URLs) 
https://www.ccc.de/events/2015/congress
2) Sort GET Parameter (predeﬁned order) 
https://de.onpage.org/?currency=de&lang=de
3) Relevant Content on top tier (subfolder), 
should correlate with Pagerank ﬂow
4) Session IDs in URLs are a No-Go! 
If no other way: Remove them via GSC

! @jhmjacob
Vary Response Header
1) Does the page provide compression? (Must!!!)  
Vary: Compression
2) Do Cookies (notably) change the content? 
Vary: Cookie
3) Is the page multi-lingual? (Within same URL!) 
Vary: Accept-Language

! @jhmjacob
“if-modiﬁed-since” Workﬂow

! @jhmjacob
11/01/2015: 
GoogleBot calls en.onpage.org
Server response:
Complete Sourcecode (10,3kb) 
+ Response Header “Last-Modiﬁed”

! @jhmjacob
11/5/2015: 
GoogleBot calls en.onpage.org again  
and includes an additional 
Request Header “Last-Modiﬁed”
Server response:
Empty body (0kb) 
+ Response Header “304 Not Modiﬁed”

! @jhmjacob
» Dramatically reduces downloaded ﬁle size  
for unchanged content
» Enables bots + users to download more relevant 
content within the same timespan
» Requires good Infrastructure / CMS  
like Page-Caching - more on that later!

! @jhmjacob
robots Directive
<meta name="robots"
content="noindex,follow"/>
2) Via Response Header 
X-Robots-Tag: noindex,follow
Remember: A lot of “noindex” pages have a negativ eﬀect 
on the crawl budget … because resources are wasted 
to ﬁnd out that the URL has no real content.

! @jhmjacob
robots Directive: “unavailable-after”
More: https://googleblog.blogspot.de/2007/07/robots-exclusion-protocol-now-with-even.html
<meta name="robots"
content="unavailable_after: 20-Nov-2015
15:35:00 CET">
X-Robots-Tag: unavailable_after: 20 Nov
2015 15:35:00 CET

! @jhmjacob
Canonical
<link rel="canonical" href="https://de.onpage.org/"/>
Link: <https://de.onpage.org/>; rel="canonical"
The Response Header Version can also be used for 
PDF files and images (yummy).
Remember: A lot of “canonicalized” pages (canonical to  
other URL) have a negativ effect on the crawl budget …  
because resources are wasted to find out that the URL  
has no real content.

! @jhmjacob
Redirects
Status Code: 301 
Location: https://de.onpage.org/
2) Im <head> Bereich 
<meta http-equiv="refresh" content="5; url=http://example.com/">
Redirect-Chains should be avoided.  
Best practice is to avoid internal redirects at all. 
Rather update old links and point them to the new URL.
Search Engines do not like redirects with Meta Tags or Javascript.  
These should only be used with caution to navigate users. 
Semantically correct way is the response header (“301 vs 302”)

! @jhmjacob
Unique & Relevant Content
1) No thin content
2) No duplicate content
3) No auto-translated pages
In terms of indexability

! @jhmjacob
Crawler: Behind the Scenes
Bloomﬁlter
De-Duplication
Index

! @jhmjacob
The Challenge: Big Data Scale
» Was a given URL already crawled? 
(if so: Does a reload make sense?) 
Solution: Bloomﬁlter + Key-Value Store
» Is the content of a crawled URL  
valuable enough to be added in the index?  
Solution: Content-Fingerprinting + Hamming Distance

! @jhmjacob
“Most algorithms for near-duplicate detection
run in batch- mode over the entire collection
of documents. For web crawling, an online
algorithm is necessary because the decision
to ignore the hyper-links in a recently-
crawled page has to be made quickly”
More: http://www2007.cpsc.ucalgary.ca/papers/paper215.pdf
Crawler: Behind the Scenes

! @jhmjacob
Encoding
Content-Type: text/html; charset=UTF-8
<meta charset="UTF-8" />
Charset should always be deﬁned. 
Try to work with UTF-8 - saves a lot of headaches in the long run.

! @jhmjacob
Encoding
This is how an 
encoding f*ckup
looks like

! @jhmjacob
File Size
1) Within “Google Search Appliance”: Max. 20 MB 
But thats the Enterprise version of Google
2) In the wild the limit is probably way lower 
(something around 500 KB and 1 MB)
The bigger the ﬁle, the longer it takes to download. 
Rule of thumb: The smaller, the better!

! @jhmjacob
Rendering
1) Javascript and CSS ﬁles have to be accessible  
for GoogleBot 
OnPage.org provides good reports on that
2) If Google has issues rendering the page, indexation 
is at risk
3) Also make sure that the rendering does not take too long 
(Pagespeed Test).
4) Does the rendering on mobile devices look ﬁne? 
(Viewport Tag)

! @jhmjacob
rel=prev
<link rel="prev" href="http://abc.com/article?page=1" /> 
<link rel="next" href="http://abc.com/article?page=3" />
2) Im Response Header 
Link: <http://abc.com/article?page=1>; rel="prev" 
Link: <http://abc.com/article?page=3>; rel="next"
More: http://googlewebmastercentral.blogspot.co.at/2011/09/pagination-with-relnext-and-relprev.html

! @jhmjacob
rel=prev
» Semantic Markup  
for Paginations 
Groups multiple pages 
into one ranking
» Intended for multi-page 
articles (newspapers). 
But Google now also 
shows product-listings 
as use case.
More: http://googlewebmastercentral.blogspot.co.at/2011/09/pagination-with-relnext-and-relprev.html

! @jhmjacob
rel=prev alternative: “show all page”
More: http://googlewebmastercentral.blogspot.co.at/2011/09/view-all-in-search-results.html

! @jhmjacob
Already sleepy?! 
;)

! @jhmjacob
hreﬂang Directives
More: https://moz.com/blog/using-the-correct-hreﬂang-tag-a-new-generator-tool

! @jhmjacob
hreflang Directives
More: https://moz.com/blog/using-the-correct-hreflang-tag-a-new-generator-tool
Article XYZ 
(“de” = German)
Article XYZ 
(“es” = Spanish)
hreflang=“es”
hreflang=“de”
Article XYZ 
(English)
hreflang=“x-default”
hreflang=“de”
hreflang=“es”
hreflang=“x-default”

! @jhmjacob
Device Directives
1) Viewport Tag 
<meta name="viewport" content="width=device-width, initial-
scale=1.0" />
2) Media Queries 
<link rel="stylesheet" media="only screen and (max-width: 800px)"
href="/mobile.min.css" />
3) Dedicated URL for mobile devices 
<link rel="alternate" media="only screen and (max-width: 640px)”
href="http://m.example.com/page-1" >

! @jhmjacob
Content Quality
1) The basics  
Title, Description etc.
2) Zero tolerance for broken pages
3) Avoid internal redirects 
Update links instead
4) Lightweight Sourcecode 
Get rid of unnecessary inline JS + CSS, remove Whitespaces, Line
Breaks, Tabs, etc.

! @jhmjacob
Location Directives
1) schema.org Markup (“LocalBusiness”) 
Seems to be used by Google for “Local Search”
2) Address / Telephone 
So your websites also matches Query-Modiﬁcations
3) Dublin Core Markup 
Not really relevant for SEO, but does not hurt (semantic!)
More: https://plus.google.com/+JohnMueller/posts/1EwfjTuCzPQ
More: http://schema.org/LocalBusiness

! @jhmjacob
Static CMS
More: https://www.staticgen.com/

! @jhmjacobMore: https://www.getkirby.com/
Static CMS

! @jhmjacob
Wordpress is kind of 
the Internet Explorer 
in the CMS space
Static CMS

! @jhmjacob
Static File System in the Wild

! @jhmjacob
if-modiﬁed-since: OnPage.org
1. First download of the page: The system generates the ﬁnal sourcecode

! @jhmjacob
2. An optimized version of the sourecode gets saved on disk (“Page-Caching”).  
The cache ﬁlename is generated based on relevant cookie values. 
(in our case: language + currency of visitor)

! @jhmjacob
3. The same URL (+ same cookie settings) gets called again. 
Search Engines will append the “Last-Modiﬁed” value (from the
previous request) to the Request Header.

! @jhmjacob
4. The response for the second call is just taken from the cache file 
Means: Ultra fast Time to First Byte, because server doesn’t need to “think”
We dropped irrelevant characters (newlines, tabs, spaces) when we saved the cache file.
-> We have seen clients who reduced 30% (!) of their filesizes with that simple step 
-> This results in better loadtimes

! @jhmjacob
5. Part of the returned response was the “Last-Modiﬁed” setting.  
It was calculated based on the cache ﬁle timestamp.

! @jhmjacob
» Super fast Time to First Byte 
When the file is cached
» Sends optimized sourcecode to reduce 
bandwith usage 
for both parties: Our servers + Google Crawlers
» If the file was loaded before, only send what’s 
really required  
=> “304 Not modified” aka  
“Everything is cool, you have the latest version in
your index”
» Bonus: This workflow enables us to set 
the last-mod attribute in sitemap.xml

! @jhmjacob
Other design principles of  
our homegrown static CMS

! @jhmjacob
Static CMS: Design Principles
1) File-Position: Folders in URL are the same  
as on the ﬁlesystem 
Authors are conditioned to build a clean structure + ﬁle-hierarchy

! @jhmjacob
2) Separation of Code, Design and Content 
Every member of the team sees his part
For Designers:
afﬁliate.tpl

! @jhmjacob
Every member of the team sees his part
For Texters:
afﬁliate.de.json

! @jhmjacob
Makes MS Word etc. redundant. 
 
If a new translation needs to be added, the translator gets the 
english version. Renames the ﬁle, translates the contents, uploads
the ﬁle.  
 
Bam! It’s online. 
 
Text updates, Design changes and new images are versioned by
git.

! @jhmjacob
3) Multilinguality by nature 
If a new translations is uploaded, the system starts a couple of  
cool things

! @jhmjacob
Editor Friendliness + File-Management
If a user navigates to the wrong language version of a page, he will 
see a friendly reminder that there is a localized version for him

! @jhmjacob
Links to the translated versions of the current page are
automatically added to the footer

! @jhmjacob
And hreﬂang markup is automatically added to the <head> section
of the document

! @jhmjacob
4) Fast + Secure 
No Database which slows down server responses! Git keeps track of
changes and provides rollback functionalities! No other
dependencies / services which might cause security holes

! @jhmjacob
5) Transparent und logical structure 
Images reside where they belong: In the same folder as the article
itself - like its template, translations, additional script logic. 
 
Cleaning up made easy: If an article needs to be deleted, just
remove the folder -> All ﬁles are gone, no more deserted ﬁles in  
“images” folders or localization databases, etc.

! @jhmjacob
Outlook
What we want to build next

! @jhmjacob
Outlook
» Multi-Language Images 
The same URL for all localized versions of an image
https://en.onpage.org/beispiel/teaser.jpg https://en.onpage.org/beispiel/teaser.jpg

! @jhmjacob
Outlook
Be careful: This is untested freestyle code - just to give you an idea :)
htaccess ﬁle detects that an image ﬁle is requested

! @jhmjacob
Outlook
The browser exposes the preferred languages of the user

! @jhmjacob
Outlook
» Multi-Language  
Images 
A script takes the  
request, checks if a localized 
version exists and returns 
the value (or the default image). 
 
Result is cached in  
browser cache.
Be careful:  
This is untested freestyle  
code - just to give  
you an idea :)

! @jhmjacob
Outlook
» Last-Modiﬁed Logging 
To ﬁnd out how popular a page is among search engines

! @jhmjacob
Outlook
» Last-Modified Logging 
To find out how popular a page is among search engines
» By setting the last-modified response header,
Search engines will include its value in the next
request of the page  
(for if-modified-since checks) 
Knowing this, we can calculate the timespan between this visit and
the last one.

! @jhmjacob
Outlook
» Low timespan 
= URL seems to relevant for the search engine 
= Good chances to rank
» High timespan 
= URL seems to be rather irrelevant for the SE 
= Less chances to rank 
= Alerting based on the importance of the page

! @jhmjacob
“It’s not that Google will penalize
you, it’s the opportunity cost for
dirty architecture based on a ﬁnite
crawl budget.”
More: http://www.blindﬁveyearold.com/crawl-optimization
Last words

Thanks!
OnPage.org GmbH
! http://twitter.com/onpage_org 
$ http://fb.me/onpage.org 
% https://en.onpage.org 
Jan Hendrik Merlin Jacob
Founder + CTO
! https://twitter.com/jhmjacob 
" jhm@onpage.org 
# http://linkedin.com/in/jhmjacob 
http://onpa.ge/V141p

Crawl Budget - Some Insights & Ideas @ seokomm 2015

More Related Content

What's hot

Viewers also liked

Similar to Crawl Budget - Some Insights & Ideas @ seokomm 2015

Recently uploaded

Crawl Budget - Some Insights & Ideas @ seokomm 2015