This document provides an overview of Apache Pig, including its data model, relational commands like JOIN and GROUP, and how it is implemented on Hadoop. Pig is a platform for analyzing large datasets that uses a declarative language called Pig Latin. It features a rich nested data model and relational commands that are compiled into MapReduce jobs for parallel processing. The implementation utilizes lazy execution, compiling logical plans on-the-fly into physical MapReduce plans only when needed. While Pig provides an easy parallel programming model, it can incur overhead from compiling Pig Latin and user-defined functions.