Apache Spark is a distributed computing engine that originated in 2009 at UC Berkeley, becoming an Apache project in 2013 with significant community contributions. It provides functionalities like RDDs, transformations, and actions for data processing, alongside newer abstractions such as DataFrames and Datasets. The training covers Spark's architecture, lifecycle, various operations, and job scheduling techniques.