This document summarizes a thesis about building a scalable web crawler. It discusses architectural decisions around using Ruby and Amazon EC2 infrastructure. It evaluates different data storage and search options like MongoDB, CouchDB, and Elasticsearch. It describes the implemented crawler system's components and data schema. It addresses challenges like HTTP redirects, subdomain crawling, and balancing parallelism.