This document provides an overview of Apache Hadoop, including its architecture, components, and ecosystem. Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It consists of HDFS for storage, MapReduce for processing, and YARN for resource management. Related projects in the Hadoop ecosystem include HBase, Hive, Pig, Flume, Sqoop, Oozie, Zookeeper, and Mahout.