The document summarizes the Terasort algorithm used in Hadoop. It describes:
1) Terasort uses MapReduce to sample and sort very large datasets like 100TB in under 3 hours by leveraging thousands of nodes.
2) The algorithm first generates sample data using Teragen. It then uses the samples to partition the full data among reducers with Terasort.
3) Each reducer locally sorts its partition so the entire dataset is sorted when concatenated in order of the reducers.