Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It provides storage for large datasets in the Hadoop Distributed File System (HDFS) and allows parallel processing of the data using the MapReduce programming model. Hadoop has evolved from Google's work and is developed by Yahoo and Apache to provide a low-cost solution for very large data volumes and processing needs across both structured and unstructured data sources.