Hadoop is an open source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It provides reliable, scalable storage and processing using the Hadoop Distributed File System (HDFS) and MapReduce programming model. Hadoop has been shown to sort terabytes of data in minutes using thousands of nodes. It is widely used to analyze large, diverse datasets such as web logs, social media, and sensor data.