Apache Spark is an open-source distributed general-purpose cluster computing framework. The following gives an interface for programming the complete cluster with the help of absolute information parallelism as well as fault tolerance. The Apache Spark has its architectural groundwork in RDD or Resilient Distributed Dataset.