Hadoop is an open-source software platform for distributed storage and processing of large datasets across clusters of computers. It was designed to scale up from single servers to thousands of machines, with very high fault tolerance. The document outlines the history of Hadoop, why it was created, its core components HDFS for storage and MapReduce for processing, and provides an example word count problem. It also includes information on installing Hadoop and additional resources.