Hadoop is a framework for distributed storage and processing of large datasets across clusters of computers. It allows storing large amounts of data reliably across multiple nodes and running algorithms on parts of the data in parallel. Key components include a distributed file system (HDFS) to manage data storage across nodes, a job tracker to coordinate jobs, and a map-reduce programming model to break jobs into parallelized tasks.