Spark is a framework for clustered in-memory data processing. It was developed at UC Berkeley and is now an Apache top-level project. Spark uses cluster-wide memory to speed up computations on large data. The core abstraction in Spark is the resilient distributed dataset (RDD), which acts as a fault-tolerant collection of objects across a cluster. Spark also provides APIs for batch processing, streaming, SQL, machine learning, and graph processing.