This document provides an overview of Apache Hadoop, an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware. It describes Hadoop's core components like HDFS for distributed file storage and MapReduce for distributed processing. Key aspects covered include HDFS architecture, data flow and fault tolerance, as well as MapReduce programming model and architecture. Examples of Hadoop usage and a potential project plan for load balancing enhancements are also briefly mentioned.