This document discusses using Hadoop to process large amounts of spam data. It describes different types of spam, including email spam, social media spam, and web spam. It then explains why Hadoop is well-suited for spam processing due to its ability to parallelize tasks and handle large datasets. Sample system architectures and heuristics for spam detection are presented, such as analyzing IP addresses, link patterns, and content. Metrics like Jaccard similarity and arrival times can also help evaluate spam. Overall, the document advocates using Hadoop to gain insights from massive spam datasets through simple solutions that can effectively capture the majority of spam.