Hadoop is an open source software framework for distributed storage and processing of large datasets across clusters of commodity servers. It has several related projects including Pig, Hive, Mahout, Avro, ZooKeeper, and Chukwa. Large companies like Yahoo, Facebook, and Amazon use Hadoop to process petabytes of data daily on clusters of thousands of servers.