This document discusses big data and Hadoop. It defines big data as structured and unstructured data that is analyzed for predictive purposes. Hadoop is described as an open-source software framework for distributed storage and processing of large datasets across clusters of commodity servers. Key components of Hadoop include HDFS for storage, MapReduce for processing, and YARN which allows multiple data processing engines like Spark to run on Hadoop clusters. The document also briefly outlines other big data tools that can be used with Hadoop like Flume, Sqoop, and Spark.