2. WEB ROBOTS
๏ข Web Robots (also known as Web Wanderers,
Crawlers, or Spiders), are programs that crawl the
Webpages automatically
๏ข Search engines such as Google, Bing etc. use
them to index the web contents
๏ข spammers use them to scan for email addresses
๏ข Such programs have many other uses too.
3. WHAT IS ROBOTS.TXT
๏ข Robots.txt is a plain text file that you upload to the
root (Public Html) folder from your websiteโs
cPanel.
๏ข Once the web spiders (ants, bots, indexers) that
index your webpage search your site, they first look
at that text file and process it.
๏ข More precisely, robots.txt says to the spider which
pages to crawl and index and which not
4. THE SIMPLEST VERSION OF ROBOTS.TXT
User-agent: *
Disallow:
๏ข The first line โuser agent asteriskโ indicates that the
following lines apply to all agents/bots.
๏ข Blank after "disallow:" means that nothing is limited.
๏ข Means this robots.txt file does nothing. It allows all
types of robots to see everything on the site.
5. SOME OTHER COMMON EXAMPLES OF
ROBOTS.TXT
๏ข To exclude all robots from the entire server
User-agent: *
Disallow: /
๏ข To allow all robots full access
User-agent: *
Disallow:
(or just create an empty โ/robots.txt" file, or dont use
one at all)
6. SOME OTHER COMMON EXAMPLES OF
ROBOTS.TXT
๏ข To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
๏ข To exclude a single robot
User-agent: BadBot
Disallow: /
7. SOME OTHER COMMON EXAMPLES OF
ROBOTS.TXT
๏ข To allow a single robot
User-agent: Googlebot
Disallow:
๏ข You can disallow single pages:
User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html
8. SOME OTHER COMMON EXAMPLES OF
ROBOTS.TXT
๏ข You can specify the Sitemap location in your
robots.txt file
User-agent: *
Disallow: /
Sitemap: http://www.example.com/sitemap.xml
11. WHAT IS SITEMAP
๏ข A sitemap tells search engines which pages are
available for crawling.
๏ข An XML sitemap is a document that helps Google
and other major search engines have a better
understanding of your website while crawling it.
๏ข A Sitemap is an XML file that lists URLs for a site
along with additional metadata about each URL.
๏ when it was last updated
๏ how often it usually changes
๏ how important it is, relative to other URLs in the site
12. WHY DO YOU NEED AN XML SITEMAP
๏ข XML Sitemaps are important for search engines.
๏ข It makes their job easier.
๏ข Even if you rank in the #1 position today you still
want to take care of the maintaining your position.