Apache Spark is an open-source cluster computing system that focuses on fast data analytics. It is a fast, in-memory data processing engine with smart APIs in Scala, Java, Python, and R that allow data workers to efficiently execute machine learning algorithms requiring fast iterative access to datasets. Spark runs programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk, through its advanced DAG execution engine supporting cyclic data flow and in-memory computing.