Spark is a fast and general engine for large-scale data processing. It provides an interface called resilient distributed datasets (RDDs) that allow data to be distributed in memory across clusters and manipulated using parallel operations. Shark is a system built on Spark that allows running SQL queries over large datasets using Spark's speed and generality. The document discusses Spark and Shark's performance advantages over Hadoop for iterative and interactive applications.