Spark is a fast, general processing engine that improves efficiency through in-memory computing and computation graphs. It offers APIs in Scala, Java, Python and R. Spark applications use Resilient Distributed Datasets (RDDs) which are immutable, partitioned objects that support fault tolerance. Spark also supports Spark SQL for structured data querying and Spark MLlib for machine learning.