Spark is an open-source software framework for rapid calculations on in-memory datasets. It uses Resilient Distributed Datasets (RDDs) that can be recreated if lost and supports transformations and actions on RDDs. Spark is useful for batch, interactive, and real-time processing across various problem domains like SQL, streaming, and machine learning via MLlib.