Introduction to robots.txt

• Robots.txt is a plain text file that you upload to the root
directory of your site. Once the web spiders (ants, boots,
indexers) that index your webpages reach your site, they first
look at that text file and process it. Put differently, robots.txt
says to the spider which pages to crawl.

THE SIMPLESTVERSION OF ROBOTS.TXT FILE IS:
USER-AGENT:*
DISALLOW:
THE FIRST LINE INDICATESTHE USER AGENT- ASTERISK INDICATES
THATTHE FOLLOWING LINES APPLYTO ALL AGENTS. SPACE AFTER
"DISALLOW:" MEANSTHAT NOTHING IS LIMITED. THIS ROBOTS.TXT
FILE DOES NOTHING- IT ALLOWS USER AGENTSTO SEE EVERYTHING
ONTHE SITE.

NOW LET'S MAKE IT A LITTLE MORE COMPLICATED-THISTIME
WE DO NOT WANT SPIDERSTO CRAWL IN OUR /FAQ
DIRECTORY:
USER-AGENT:*
DISALLOW: /FAQ/
IT IS RELATIVELY EASY. SLASH INDICATESTHATTHIS IS A
DIRECTORY. IFYOU DON'T PUT A SLASH, NOT ONLY/FAQ
DIRECTORY, BUT ALSO EACH FILE WHICH NAME BEGINS WITH
"FAQ" WILL BE BANNED. ALTERNATIVELY, YOU CAN ADD
MORE DIRECTORIESTOYOUR BANNED LIST.

VIEWTHE FULL PRESENTATION AT WWW.MADAMSEO.COM
OR SUBSCRIBETO MY PAID COURSE ON WWW.UDEMY.COM

Introduction to robots.txt

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to Introduction to robots.txt

Similar to Introduction to robots.txt (20)

More from madamseo

More from madamseo (10)

Recently uploaded

Recently uploaded (20)

Introduction to robots.txt