Hadoop is an open source framework for distributed storage and processing of large datasets across clusters of computers. It uses HDFS for data storage, which partitions data into blocks and replicates them across nodes for fault tolerance. The master node tracks where data blocks are stored and worker nodes execute tasks like mapping and reducing data. Hadoop provides scalability and fault tolerance but is slower for iterative jobs compared to Spark, which keeps data in memory. The Lambda architecture also informs Hadoop's ability to handle batch and speed layers separately for scalability.