This document discusses how to build a small distributed search engine using open source software. It describes the main subsystems of a search engine, including a page database, crawler, parser, indexer and link graph database. It then introduces Apache Hadoop and Apache Lucene as open source tools that can be used to build each subsystem in a distributed manner. Hadoop provides HDFS for distributed storage and MapReduce for distributed processing, while Lucene handles full-text indexing and search. The document outlines how Lucene indexes and searches document contents, and how its components can be integrated with HDFS to build a distributed search index and query system.