This document discusses using MapReduce and Apache Hadoop for large-scale data mining and analytics. It describes several Apache Hadoop projects like HDFS, MapReduce, HBase and Mahout. It discusses using Mahout for tasks like clustering, classification and recommendation. The document reviews literature on parallel K-means clustering with MapReduce and using clouds for scalable big data analytics. It outlines a plan to study parallel K-means clustering and implement a solution to handle large datasets.