This document provides an overview of Apache Hadoop, a framework for distributed storage and processing of very large datasets across clusters of commodity hardware. It discusses how Hadoop addresses challenges of large-scale computation by distributing data across nodes and moving computation to the data. The key components of Hadoop are HDFS for distributed file storage, MapReduce for distributed processing, and HBase for distributed database access. Hadoop allows scaling to very large datasets using inexpensive, commodity hardware.