Big data refers to the large volumes of structured, semi-structured and unstructured data that are so large that traditional data processing applications are inadequate. This data comes from a wide variety of sources including sensors, social media, websites and more. Hadoop is an open-source software framework that allows distributed processing of large data sets across clusters of computers using simple programming models. It is commonly used by large companies for applications such as web search, data mining, and machine learning.