2. Robots.txt is a text file webmasters create to instruct web robots (typically search
engine robots) how to crawl pages on their website. It is placed in root folder of
website.
Basic Samples:
Blocking all web crawlers from all content
User-agent: *
Disallow: /
Allowing all web crawlers access to all content
User-agent: *
Disallow:
There are a lot more commands to restrict search engine bots to restrict crawling a
particular section of website. Read here: https://www.robotstxt.org/
3. The robots meta tag lets you utilize a granular, page-specific approach to controlling
how an individual page should be indexed and served to users in Google Search
results.
It is placed in <head> section of a page.
Example:
<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noindex" /> (…)
</head>
<body>(…)</body>
</html>
4. X-Robots-Tag is a part of an HTTP header sent from a web server designed to
control the indexing process of the overall page including specific file types.
Imagine you run a website which also has some .doc files, but you don’t want search
engines to index that filetype for a particular reason. On Apache servers, you should
add the following line to the configuration / a .htaccess file:
<FilesMatch ".doc$">
Header set X-Robots-Tag "noindex, noarchive, nosnippet"
</FilesMatch>
Or, if you’d want to do this for both .doc and .pdf files:
<FilesMatch ".(doc|pdf)$">
Header set X-Robots-Tag "noindex, noarchive, nosnippet"
</FilesMatch>
5. There are a few types of directives that tell search engine bots what pages and other
content search engine bots will be allowed to crawl and index. The most commonly
referred to are the robots.txt file and the meta robots tag.
There are two different kinds of directives:
o Crawler Directives
o Indexer Directives
I’ll briefly explain the difference below.
6. Robots.txt – uses the user agent, allow, disallow and sitemap directives to specify
where on site which search engine bots are allowed to crawl and not allowed to
crawl.
Allow
Disallow
7. Meta Robots tag – allows you to specify and prevent search engines from showing
particular pages on a site in search results.
Nofollow – allows you to specify links the should not pass on authority or PageRank
X-Robots-tag – allows you to control how specified file types are indexed
8. The X-Robots-Tag differs from the robots.txt file and meta robots tag, though, in that
the X-Robots-Tag is a part of the HTTP header that controls indexing of a page on
the whole, in addition to specific elements on a page.
For example, if you were wanting to block a specific image or video, you could use
the HTTP response method.