Big data refers to large and complex datasets that are difficult to process using traditional methods. Key challenges include capturing, storing, searching, sharing, and analyzing large datasets in domains like meteorology, physics simulations, biology, and the internet. Hadoop is an open-source software framework for distributed storage and processing of big data across clusters of computers. It allows for the distributed processing of large data sets in a reliable, fault-tolerant and scalable manner.