This document provides a summary of a presentation on scaling Apache Spark. It discusses techniques for reusing RDDs through caching, persistence levels and checkpointing. It also covers best practices for working with key-value data to avoid problems from groupByKey, and using Spark SQL and accumulators. Finally, it previews bringing code generation to Spark ML to improve performance.