Big data refers to extremely large data sets that traditional data processing systems cannot handle. Big data is characterized by high volume, velocity, and variety of data. Hadoop is an open-source software framework that allows distributed storage and processing of big data across clusters of computers. A key component of Hadoop is MapReduce, a programming model that enables parallel processing of large datasets. MapReduce allows programmers to break problems into independent pieces that can be processed simultaneously across distributed systems.