Hadoop is an open-source framework that allows distributed processing of large datasets across clusters of computers. It has two major components - the MapReduce programming model for processing large amounts of data in parallel, and the Hadoop Distributed File System (HDFS) for storing data across clusters of machines. Hadoop can scale from single servers to thousands of machines, with HDFS providing fault-tolerant storage and MapReduce enabling distributed computation and processing of data in parallel.