Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It uses MapReduce as a programming model and HDFS for storage. MapReduce allows for massively parallel processing of large datasets by breaking jobs into smaller tasks that can be run in parallel on multiple machines. HDFS stores very large files across machines in a distributed file system for fault tolerance.