This document provides an introduction to Apache Spark, including:
- An overview of what Spark is and the types of problems it can solve
- A brief look at the Spark API through the word count example
- Details on Spark's core abstractions of RDDs and how transformations and actions work
- Potential pitfalls of using groupByKey and how reduceByKey is preferable
- Resources for learning more about Spark including books and video tutorials