This document summarizes Apache Spark batch APIs, provides real-world examples of Spark jobs, addresses shortcomings of the Spark APIs, and outlines how to run and configure Spark jobs on AWS EMR. The document introduces the RDD, SQL, DataFrame and Dataset APIs in Spark and compares them. It then gives examples of enriching and shredding data with Spark. It discusses type-safe APIs to address issues in the default Spark APIs. Finally, it outlines the configuration needed to run optimized Spark jobs on EMR, including memory, parallelism and allocation settings.