Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity servers. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. Hadoop can reliably process and analyze extremely large datasets in a distributed computing environment and has become the de facto standard for distributed data processing.