Apache Nutch is an open source web crawler built on Hadoop. It crawls websites, indexes the downloaded content using Lucene, and supports querying the index via Solr. The crawl process involves seeding, filtering, fetching pages, indexing content, and merging results. Nutch can crawl websites in a single process or distributed mode using Hadoop. It provides tools to inject URLs, read crawl segments from HDFS, and demonstrate the crawl lifecycle.