Apache Spark is a fast and general engine for large-scale data processing. It is a framework for massive parallel computing that harnesses the power of cheap memory. Spark provides high performance through its in-memory computing engine and supports Scala, Java, Python and R. It includes modules for streaming, machine learning, graph processing and SQL. Resilient Distributed Datasets (RDDs) are Spark's basic abstraction, which are immutable and fault-tolerant collections that can be operated on in parallel.