Pig is a platform for analyzing large data sets that operates on Hadoop. It provides tools for loading, filtering, and aggregating data stored in Hadoop Distributed File System. Pig allows for expressing data analysis programs in a high-level language called Pig Latin, which can then be optimized and executed in parallel on a Hadoop cluster.