This document provides an overview of Hadoop and compares it to Spark. It discusses the key components of Hadoop including HDFS for storage, MapReduce for processing, and YARN for resource management. HDFS stores large datasets across clusters in a fault-tolerant manner. MapReduce allows parallel processing of large datasets using a map and reduce model. YARN was later added to improve resource management. The document also summarizes Spark, which can perform both batch and stream processing more efficiently than Hadoop for many workloads. A comparison of Hadoop and Spark highlights their different processing models.