This document summarizes a lecture on clustering and provides a sample MapReduce implementation of K-Means clustering. It introduces clustering, discusses different clustering algorithms like hierarchical and partitional clustering, and focuses on K-Means clustering. It also describes Canopy clustering, which can be used as a preliminary step to partition large datasets and parallelize computation for K-Means clustering. The document then outlines the steps to implement K-Means clustering on large datasets using MapReduce, including selecting canopy centers, assigning points to canopies, and performing the iterative K-Means algorithm in parallel.