This document discusses big data and data mining. It defines big data as large volumes of structured and unstructured data that are difficult to process using traditional techniques due to their size. It outlines the 4 Vs of big data: volume, velocity, variety, and veracity. The proposed system would use distributed parallel computing with Hadoop to identify relationships in huge amounts of data from different sources and dimensions. It discusses challenges of big data like data location, volume, privacy, and gaining insights. Solutions involve parallel programming, distributed storage, and access restrictions.