This document discusses accelerating Spark workloads on Amazon S3 using Alluxio. It describes the challenges of running Spark interactively on S3 due to its eventual consistency and expensive metadata operations. Alluxio provides a data caching layer that offers strong consistency, faster performance, and API compatibility with HDFS and S3. It also allows data outside of S3 to be analyzed. The document demonstrates how to bootstrap Alluxio on an AWS EMR cluster to accelerate Spark workloads running on S3.