By Rishi Arora discusses big data and Hadoop. An estimated 2,500 exabytes of new information was generated in 2012 primarily due to the internet, and the digital universe is expected to grow to 1.2 zettabytes in 2014. Hadoop is a distributed, fault tolerant, and scalable platform for big data. It consists of HDFS for storage and MapReduce for processing. HDFS stores large files across multiple nodes and provides high throughput access to application data. MapReduce allows distributed processing of large datasets across clusters of computers.