This document summarizes Rackspace's use of Hadoop to process and query logs from multiple datacenters. Key points: - Rackspace needed to query logs from mail/app servers to answer support and analytics questions. Previous solutions using single databases could not scale across datacenters. - Hadoop allowed ingesting raw logs, building Lucene indexes for querying, and storing data across multiple datacenters. Real-time queries used Solr, batch queries used MapReduce. - Implementation collected logs into Hadoop, used SolrOutputFormat to generate indexes, and queried via distributed Solr and MapReduce. This provided scalable storage, analysis, and querying across datacenters.