Robots.txt is a plain text file that webmasters place in the root directory of their site to control which pages search engine crawlers are permitted to access. The simplest robots.txt file allows all crawlers to access all pages, while more advanced versions can restrict specific directories or pages from being crawled. The file uses directives like User-agent and Disallow to specify which crawlers are affected and which URL paths should be blocked from indexing.