This document discusses big data concepts and distributed computing frameworks. It explains that big data refers to large and complex datasets that are difficult to process using traditional tools. Examples are given of enterprises generating terabytes and petabytes of data daily from various sources. Common big data uses like recommendation engines and fraud detection are listed. Apache Hadoop and Apache Spark are introduced as open-source frameworks for distributed processing of large datasets across clusters of computers. Key characteristics of each are outlined, with Spark noted as building on Hadoop MapReduce but extending the model to support more types of computations in a faster, more efficient manner.