This document provides an overview of Hadoop, including its core components HDFS, MapReduce, and YARN. It describes how HDFS stores and replicates data across nodes for reliability. MapReduce is used for distributed processing of large datasets by mapping data to key-value pairs, shuffling, and reducing results. YARN was introduced to improve scalability by separating job scheduling and resource management from MapReduce. The document also gives examples of using MapReduce on a movie ratings dataset to demonstrate Hadoop functionality and running simple MapReduce jobs via the command line.